PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

New user, can't get Indexing to work.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New user, can't get Indexing to work.

    I am trying to index my site on Hostmonster and everything I try in the spider
    mode results in a very quick error message "No files found to spider." "Check
    that the URL exists and satisfies the settings in the configuration." I have set
    the authentication and don't know what else to try.

    Thx

    RHans

  • #2
    I assume your site is Hostmonster.com ???

    I can see a few possible problems with this site.

    1) Your pages don't have any file extension. So you have URLs like,
    http://www.hostmonster.com/cgi/help
    instead of
    http://www.hostmonster.com/cgi/help.html
    To fix this up you need to check the box in Zoom to "Scan files with no extensions". This is in the scan options configuration window.

    2) You are using some sub-domains. Like, http://helpdesk.hostmonster.com/.
    By default Zoom will only index a single domain. If you want to index additional sub-domains then you can change the base URL to include all sub-domains, or add additional start points.
    e.g. set the base URL to,
    http://www.hostmonster.com/;http://helpdesk.hostmonster.com/

    3) You switch protocols. So some of your pages are HTTP and some are HTTPS.
    e.g.
    https://www.hostmonster.com/cgi/info/contact_us
    This is a similar problem to switching domains. If you want the encrypted pages to be indexed, then you then you can change the base URL to include all sub-domains, or add additional start points.
    e.g. Add a new start point
    https://www.hostmonster.com/

    But the error you got would seem to indicate that not even a single page was indexed. Possible causes might be a firewall preventing access to the internet, typing in the wrong URL as the start point, your server being down, a robots.txt file preventing access, and many other reasons.

    Maybe you can post the log from the Zoom log window.

    Comment


    • #3
      Thx for the response.

      My site is nutritionscienceinfo.com hosted by Hostmonster. Sorry for the confusion I created.

      I will try to understand what you suggest (as I best understand), I am not fully literate on all the terminology. (I think you have a wonderful piece of
      software if I can get it working with my limited knowledge and your help.)

      Since I confused you about my site, I am concerned that maybe your response was not truly what I need. I will monitor this forum in case you need to revise what I should try because I confused you about my site. Unless you post a revision to your advice I will try to do what you suggest. I think I understand the gist.

      If I still cannot get it working, I will try to post the log.

      Thx again,

      RHans

      Comment


      • #4
        I had a chat with the Hostmonster help and they said that they are not blocking spiders.

        I tried another index with the box checked "scan files with no extensions" and still it does not see any files.

        I will try to post the log as long as it does not contain my access codes. I wll have to check that first. Should I clear the log and post just a single index attempt?

        Comment


        • #5
          I cleared the log so that I could see what it contained better. The following is my most recent tries.

          11:50:31 - Start indexing (spider mode) at Mon Nov 07 12:50:31 2011
          12:50:31 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/index.html
          12:50:31 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/index.html (Invalid URL or domain name)
          12:50:33 - [ERROR] No files found to spider from http://www.nutritionscienceinfo.com/index.html
          12:50:33 - Indexing failed
          12:51:03 - Start indexing (spider mode) at Mon Nov 07 12:51:03 2011
          12:51:03 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/index.html/
          12:51:03 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/index.html/ (Invalid URL or domain name)
          12:51:05 - [ERROR] No files found to spider fromhttp://www.nutritionscienceinfo.com/index.html/
          12:51:05 - Indexing failed
          12:55:56 - Start indexing (spider mode) at Mon Nov 07 12:55:56 2011
          12:55:57 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/public.html/index.html
          12:55:57 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/public.html/index.html (Invalid URL or domain name)
          12:56:00 - [ERROR] No files found to spider from http://www.nutritionscienceinfo.com/public.html/index.html
          12:56:00 - Indexing failed
          12:56:08 - Start indexing (spider mode) at Mon Nov 07 12:56:08 2011
          12:56:08 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/public.html/index.html/
          12:56:08 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/public.html/index.html/ (Invalid URL or domain name)
          12:56:11 - [ERROR] No files found to spider from http://www.nutritionscienceinfo.com/public.html/index.html/
          12:56:11 - Indexing failed
          12:58:30 - Start indexing (spider mode) at Mon Nov 07 12:58:30 2011
          12:58:30 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/home2/nutritl5/public.html/index.html/
          12:58:30 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/home2/nutritl5/public.html/index.html/ (Invalid URL or domain name)
          12:58:32 - [ERROR] No files found to spider from http://www.nutritionscienceinfo.com/home2/nutritl5/public.html/index.html/
          12:58:32 - Indexing failed
          12:59:25 - Start indexing (spider mode) at Mon Nov 07 12:59:25 2011
          12:59:25 - [DOWNLOAD] Downloading file http://www.nutritionscienceinfo.com/home2/nutritl5/public_html/index.html/
          12:59:25 - [WARNING] Could not download file: http://www.nutritionscienceinfo.com/home2/nutritl5/public_html/index.html/ (Invalid URL or domain name)
          12:59:27 - [ERROR] No files found to spider from http://www.nutritionscienceinfo.com/home2/nutritl5/public_html/index.html/
          12:59:27 - Indexing failed
          Last edited by Richard; Nov-07-2011, 06:25 PM.

          Comment


          • #6
            Testing http://www.nutritionscienceinfo.com/index.html in Zoom, I was able to successfully index. The other urls are invalid, you can test this by entering the URL into a browser, they bring up 404 page (Page not found).

            If you would like us to investigate further, please send in your index log and your .zcfg file to 'zoom at wrensoft.com'

            Comment


            • #7
              So as Richard pointed out only the first URL is valid.
              http://www.nutritionscienceinfo.com/index.html

              As we can index the site from here, and view your site in a browser, the problem is most likely not with your web site. It is more likely that the problem is on your PC.

              This error,
              Could not download file ... Invalid URL or domain name
              has several possible causes.

              A) Your internet connection isn't working reliable (e.g. a bad Wireless link)
              B) You have a firewall on your PC blocking Zoom from getting to the internet.
              C) You have some other type of "security" software preventing access to the internet from Zoom.
              D) Bad local routing tables. So www.nutritionscienceinfo.com doesn't actually go to the live internet site. Maybe you had some local routing setup for testing against a local server.

              As a test can you try indexing another site with Zoom.
              e.g.
              www.cnn.com

              Comment


              • #8
                My internet connection seems quite fast and is very reliable.

                I changed the firewall settings to allow Zoom to go thru and rebooted also.

                I will check the security software but have not had any indication of it blocking anything I have tried to do before. (I think this or the like is the problem in any case.)

                I do not know what a local routing table is. This is a stand alone computer.

                I tried indexing www.cnn.com and get the same message about files not found.

                I have yet to do the test per your 1:35 post.

                I think I have sent you the files you have asked for, if not, let me know and
                I will try again.

                Comment


                • #9
                  Turns out www.cnn.com wasn't a good test site. Sorry about that. They don't have a page at this address, their real site is at,
                  http://us.cnn.com/
                  So try that URL instead (don't forget to include the http:// part of the address).

                  Also try turning of your firewall entirely for a few minutes as en experiment.

                  Comment


                  • #10
                    I tried the new us.cnn address and get the same error message.

                    I have lost computers in the past to virus and I probably am not going
                    to be able to risk turning of the firewall completely but I have sent a mail to my
                    son (worked at Microsoft) and asked him about the risk of that.

                    I will continue to work with the firewall, however, which I think is probably the
                    problem in the 64 bit version of windows 7 I have.

                    Comment


                    • #11
                      I am running F-prot antivirus and tried turning that off and that made no
                      difference.

                      Comment


                      • #12
                        I also went again to the control panel and enabled all four??????? of the zoom search indexers shown in the panel to go through the firewall. Is the fact
                        that zoom is shown four times significant.

                        Still did not work, in any case.

                        Comment


                        • #13
                          There isn't 4 indexers. You are confused.
                          There is only the 32bit version and the 64bit version.
                          Most people (even on a 64bit O/S) will be running the 32bit release.

                          There is no problem with 64bit Windows. It has been around for many years. Most of our new customers are on 64bit now, and all our internal development machines are 64bit.

                          Your web site is fine. So it really only leaves the possibility of some software on your PC blocking internet access.

                          If your son setup the security on your PC, then maybe get him to have a look at it if you don't feel comfortable with it.

                          Comment


                          • #14
                            I didn't say there were four indexers. What I said was the four of what appear
                            to be the Indexer are listed in the firewall list and asked you what that means
                            an could it be a symptom of the problem.

                            I am under the impression that I am not running the 64 bit version of Zoom. Is
                            that correct and what version can I run with my 64 bit system?

                            I am running version 6 build 1027. Is that compatible with my 64 bit system?

                            I am still awaiting input from my son on the firewall.

                            Comment


                            • #15
                              There is no problem with 64bit Windows. It has been around for many years. Most of our new customers are on 64bit now, and all our internal development machines are 64bit. We fully support 64bit with V6 of Zoom and have done for years. IMHO the problem is unrelated to 32bit/64bit.

                              Comment

                              Working...
                              X