PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Not sure why external links being followed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Not sure why external links being followed

    Hello in 5 Enterprise Edition
    I have in "More" Zoom set up to index and follow all of the http://www.thewebsite.com/ address, base url the same. Then I have
    http://www.thewebsite.com/softwaretemplates/stmain.php set up to only index that page and follow the internal links only from there with the base url set up as http://www.thewebsite.com/softwaretemplates/

    After indexing with Zoom the search results shows it following all http://www.thewebsite.com/softwaretemplates/ external links as well. It shows the external links are coming from all the http://www.thewebsite.com/softwaretemplates/ pages when I have it set up in "More" not to be. I don't want any external links from http://www.thewebsite.com/softwaretemplates/ or following http://www.thewebsite.com/softwaretemplates/stmain.php to be followed externally. Only want them to be following internally, on the site itself. In "configure" "general" I also marked "reload all files (do not use cache)" to see if that reset anything also to get it to not do it, didn't help.

    Please let me know what I'm missing to keep it from following links externally on some pages. Thank you very much for your help.

  • #2
    What do these external links look like? Are they just different folders on your site, or completely different domain names? If they are under the same domain, then it might just be that they were indexed from your first start point (which allow links to anything under that domain).

    It would help alot more if you can give us actual URLs to look at. Better yet, save and send us your index log ("File"->"Save index log to file") via e-mail. You should also send us your .ZCFG file with your indexer configuration saved.

    Also tell us exactly which version and build of Zoom you are using (click "Help"->"About" to confirm). The final V5 release is V5.1 build 1017.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      It is build beta 15 and the version said above. 5.0 Enterprise. I mean it's following external links to others web sites. I don't want the search to show the others web sites from these or external websites. Meaning others web sites.
      http://www.thewebsite.com/softwaretemplates/
      http://www.thewebsite.com/softwaretemplates/stmain.php

      I sent you the .ZCFG to that email. Thank you for your help!

      Comment


      • #4
        Thank you very much for your help Ray I got your reply. I set it up that way so I wouldn't have to list so many large amounts of pages in "More" that I wanted it to follow the external links on. I didn't know of a better way to have it not follow external links in only some areas only, like the softwaretemplates area of the site. http://www.thewebsite.com/softwaretemplates/
        Because did want it following external links in most other areas of the site.

        Is there a way to not have it list external links in software and templates without listing so many extra pages in "More."

        To sum up, what I need is for external links (others sites) to be followed on most of the site. Just not followed in some areas only, like not followed in the http://www.thewebsite.com/softwaretemplates/ area of the site. Is there a way to do that without having to list so many extra pages in "More?"

        Because it sounds like you're saying I need to list the main site as following internal links for the site itself only. Then to list all the extra sections to follow them externally as well in "More" which would be most of the site want most links followed externally. Which sounds like will be a lot of sections added in "More." Please let me know if there is another way or adjustment other than this so a whole lot of pages don't need to be added to "More" to do this. Thank you very much for your insight on this situation and the fastest way it can be done or in "More."

        Comment


        • #5
          Okay, if you intentionally want to index external sites, then that changes the issue somewhat.

          The first thing to point out is that your first start point renders most of your subsequent start points irrelevant. That is, your very first start point is (domain name replaced for privacy reasons):

          http://www.thewebsite.com/

          This is configured with "Index page and follow internal and external links". Since this page is linked to the other parts of your website, including most (if not all) of your subsequent start points like these:

          http://www.thewebsite.com/codescript/
          http://www.thewebsite.com/softwaretemplates/
          ... etc.

          This means that when the Indexer is crawling the first start point, it would have already indexed these pages and these subfolders of the site, before it ever got to the second or third start point. So when it does proceed to those start points, it will ignore them, because they have already been indexed.

          Note that the indexer goes through the start point list in a linear fashion... it will not be aware of what start point will come next and avoid those pages or anything like that.

          So this is the first point: the way you have those start points setup, it means most of them aren't used. Most of the site is indexed as part of your first start point.

          The second point is this: I don't see any external links being followed from the "/softwaretemplates/" subfolder pages. The only external links that are being followed, are those from the actual start point page which you have marked with the "index and follow all" option.

          You may be confusing which link is found on which page, because you have multiple spider threads enabled (4). This is the setting on the "General" tab of the Configuration window. This can make it look like you have links downloading from one page, but really, it was a link from several pages before, because of the threading which allows for simultaneous downloads to happen. To get a clearer picture of what is happening, change this to use "Single-threaded downloading" so you will only download one file at a time.

          If after all that, you still think there are external links being picked up from that particular page, can you tell us exactly which link/URL you are looking at. Chances are, that URL is actually linked on another page of your site and you don't realize it, but we could confirm this.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            OK trying to get to the bottom of why this is happening better. One thing I'm still not sure why something is happening. Maybe you can give me some insight on why it might be. When we have it index and follow all internal and external links. Then it searches and finds what the words say as the links on the pages. But when we have some pages just say index only. Then it does not pick up what the words say as links on the pages. For example lets say there is a text link called:
            Files Land Software

            When it's set up to index and follow all internal and external links, the search finds those words on the page.
            But when it's set up to index only on that page instead, the search does not find those words on the page.

            I thought the search was still supposed to find those words on the page. Even when it's not set up to follow the external links on that page.
            Because it's asking to index that page only, which I thought meant "read and save all those words on the page for a search."

            If you read that through a few times you probably will follow when I'm not sure why it's not saving the words on a page for search. But if you are unclear about something in this I said and want me to clarify please let me know.

            Basically I'm not sure why it's not reading and storing links as words on the pages, when it's asked to index only, some pages.

            Comment


            • #7
              I know what you mean, but I can't reproduce the behaviour. That is, when I index a page with "Index single page only" (and not follow links), the words the form the text link are still being indexed. Are you sure it's not skipping the words for other reasons? (e.g. you have a ZOOMSTOP tag that isn't closed by a matching ZOOMRESTART tag yet?) Perhaps if the page is online we can take a look.

              I tested this with both V5.1 build 1017 (the final V5 release, link given above) and the latest V6 release. If you are using a beta V5 release, you should really update to the final V5.1 release at the very least. We would recommend upgrading to V6 for more improvements and various other issues that have since been addressed.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X