PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Can't seem to get ZOOMSTOPFOLLOW to work

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can't seem to get ZOOMSTOPFOLLOW to work

    Hey!

    I have a massive blog with over 27,000 articles I need to index. The indexer has now been running all night but still not finished. When sitting looking at it index I see that it is requesting the links in the comments (the reply to link) even though I have wrapped the comment section with ZOOMSTOPFOLLOW/ZOOMRESTARTFOLLOW.

    Here is the structure of the ZOOM tags:

    ** HEADER STUFF I WANT TO INDEX ON SINGLE.php **
    <!--ZOOMSTOP-->
    ** THE FLUFF BEFORE THE ARTICLE
    <!--ZOOMRESTART-->

    ** THE ARTICLE **

    <!--ZOOMSTOP-->

    ** THE FLUFF AFTER THE ARTICLE
    <!--ZOOMSTOPFOLLOW-->

    ** THE COMMENTS **

    <!--ZOOMRESTARTFOLLOW-->

    <!--ZOOMSTOPFOLLOW-->
    ** THE SIDE BAR LINKS **
    <!--ZOOMRESTARTFOLLOW-->

    <!--ZOOMRESTART-->
    ** THE LAST TAG TO MATCH THE VERY TOP TAG **

    But still I see the indexer requesting the comment reply to links. This surely is slowing the indexing down considerably. Any clue to why it might be still trying to follow those links?

    Here's a link to one article on my site:

    http://www.elephantjournal.com/2013/07/court-allows-yoga-in-public-schools-we-won-but-at-what-cost-roopa-singh/

    Hope you can help as I can't take 15 hours to index a site each time! (I will be trying the incremental once this one has completed...)

  • #2
    I tried indexing that article, and stepped through the process. It did not follow any of the links within the ZOOMSTOPFOLLOW...ZOOMRESTARTFOLLOW sections as far as I can see. Can you give us some example URLs from that page which it should be skipping but isn't?

    One possible explanation is if you only recently added those tags and you are still indexing from a cached copy of the page (before you added the tags). To ensure this isn't happening, click "Configure"->"Spider options"->and check the option for "Reload all files (do not use cache)".
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thanks for the reply. It was happening in all my posts. I was sitting watching the log seeing lines like:

      11|07/23/13 19:53:58|Queued URL: http://www.elephantjournal.com/2013/07/a-comprehensive-list-of-good-reasons-to-practice-hot-yoga-in-the-summertime/?replytocom=3642516

      appear. But then the log would say:

      01|07/23/13 20:33:36|Not indexing content on page: http://www.elephantjournal.com/2013/07/a-comprehensive-list-of-good-reasons-to-practice-hot-yoga-in-the-summertime/?replytocom=3642516 (meta robots "noindex" tag found)

      So the link was followed, but WordPress must have added a noindex tag in automatically.

      But I have "fixed" the issue myself in that I added to the skip list any url containing "?replytocom=". This sped up the index greatly as I had left it go first time over night, and it took over 10 hours. With replytocom skipped, I was able to index the entire site in 4.5 hours.

      Am currently running the incremental indexer, but after 5 mins it hasn't progressed past 1%. I wonder, is there an option to run the incremental against the homepage only? All of our content gets displayed on the homepage, so does the software update the index for only the new links that have appear on a certain page, but do not deep link further?

      Comment


      • #4
        From the menu in the indexer there are several incremental options.
        Index / Incremental Indexing

        If you select the option, Add new or updated pages, then you can manually enter in a few new links. e.g. just the home page. This should be very quick.

        If you don't tell the indexer what pages are new, then there is still a lot of work for the indexer to check if each page has been updated. It also depends on the way your server / Wordpress works. Some sites don't return accurate update dates for pages. It depends on a correct Last-Modified date appearing in the HTTP header.

        Comment

        Working...
        X