PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Can the indexer tell me the source of links it skips

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can the indexer tell me the source of links it skips

    The indexer is reporting some links are skipped because they are not in the domain -- problem is, I need to find these links in the site so I can fix them.

    Is there a way for the indexer log to record the page it finds these links on.

  • #2
    Links are identified during the spidering phase of the indexing.

    So if you look at the full log then you see entries like this.

    Code:
    Spidering for links on http://www.passmark.com/index.html
    Skipping http://www.passmark.com/images/Passmark_logo3.gif (Blocked by extensions list)
    Skipping http://www.passmark.com.au/ (External site - does not match base URL)
    In this case the link http://www.passmark.com.au/ was found on the page http://www.passmark.com/index.html

    Also if you right click on the log you can copy the entire log to the clipboard. You can then paste it into Excel or Notepad for searching / filtering.

    Comment


    • #3
      If you are spidering with multiple threads it may help clear up where the links are coming from by switching to single thread mode for this purpose. "Configure"->"Spider options"->"Single-threaded downloading".
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thanks, that really helps.

        Especially the single thread mode. I found where the badly embedded links (hard coded to a test domain) were.

        Comment

        Working...
        X