PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Can only index start page

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can only index start page

    Hi

    I am trying to index a site I am finishing right now. Until just now I have been using Zoom 6 Pro.

    I tryed to index this new site an kept having problems: zoom was always only indexing the starting page. So I upgraded to Zoom Version 7. But the problem is still the same. Even though there are quite a bunch of pages, Zoom is only indexing the index.php page.

    Thank you for any suggestions that might help.

    Regards
    Jörg

  • #2
    Can you E-mail or post the index log so we can see what happened?

    Comment


    • #3
      My index log reads like this:

      08:38:28 - Zoom Search Engine Indexer (Professional Edition)
      08:38:28 - Version 7.0 (Build: 100 on Windows 7
      08:38:28 - Copyright Wrensoft 2000-2014 (http://www.wrensoft.com/)
      08:38:28 - Plugin for PDF files found. PDF file support enabled.
      08:38:28 - Plugin for WPD files found. WPD file support enabled.
      08:38:28 - Plugin for SWF files found. SWF file support enabled.
      08:38:28 - Plugin for RTF files found. RTF file support enabled.
      08:38:28 - Plugin for DjVu files found. DjVu file support enabled.
      08:38:28 - Plugin for MP3 files found. MP3 file support enabled.
      08:38:28 - Plugin for DWF files found. DWF file support enabled.
      08:38:28 - Plugin for Office 2007 files found. DOCX, XLSX, and PPTX file support enabled.
      08:38:28 - Plugin for OpenDocument Text files found. ODT file support enabled.
      08:38:28 - Plugin for MHT files found. MHT file support enabled.
      08:38:28 - Plugin for image files found. Image file support enabled. (Using Exif)
      08:38:28 - Plugin for video files found. Video file support enabled.
      08:38:29 - Config file loaded: C:\Users\jmatter\Documents\Wrensoft\Zoom Search Engine Indexer\zoom.zcfg
      08:38:48 - Config file saved: C:\Users\jmatter\Documents\Wrensoft\Zoom Search Engine Indexer\zoom.zcfg
      08:38:48 - Core Engine: Version 7.0 (Build: 100 on Windows 7
      08:38:48 - Config file loaded: C:\Users\jmatter\Documents\Wrensoft\Zoom Search Engine Indexer\zoom.zcfg
      08:38:49 - Maximum number of files: 50000
      08:38:49 - Maximum file size: 2097152
      08:38:49 - Will scan files with extensions
      08:38:49 - .htm
      08:38:49 - .html
      08:38:49 - .txt
      08:38:49 - .php
      08:38:49 - .asp
      08:38:49 - .cgi
      08:38:49 - .aspx
      08:38:49 - .pl
      08:38:49 - .php3
      08:38:49 - .pdf
      08:38:49 - Spider from: http://www.krimitage.ch/dev/
      08:38:49 - Web site URL: http://www.krimitage.ch/dev/
      08:38:49 - Estimated RAM required during index process: 569233 KB
      08:38:49 - [DOWNLOAD] Downloading robots.txt file found at http://www.krimitage.ch/robots.txt
      08:38:49 - Initiating HTTP session (thread #1) ...
      08:38:49 - Initiating HTTP session (thread #2) ...
      08:38:49 - [DOWNLOAD] Downloading file http://www.krimitage.ch/dev/
      08:38:49 - Initiating HTTP session (thread #3) ...
      08:38:49 - Initiating HTTP session (thread #4) ...
      08:38:49 - Initiating HTTP session (thread #5) ...
      08:38:49 - Initiating HTTP session (thread #6) ...
      08:38:49 - [INDEXED] Indexing http://www.krimitage.ch/dev/
      08:38:50 - [FILEIO] All index files will be written to: F:\Krimitage\dev\zoom
      08:38:50 - [FILEIO] Writing index data for PHP search... (Please wait)
      08:38:50 - [FILEIO] Created pagedata data file (zoom_pagedata.zdat)
      08:38:50 - [FILEIO] Created pagetext data file (zoom_pagetext.zdat)
      08:38:50 - [FILEIO] Created pageinfo data file (zoom_pageinfo.zdat)
      08:38:50 - [FILEIO] Created dictionary data file (zoom_dictionary.zdat)
      08:38:50 - [FILEIO] Created wordmap data file (zoom_wordmap.zdat)
      08:38:50 - [FILEIO] Created script settings file (settings.php)
      08:38:50 - Indexing completed at Fri Aug 29 08:38:50 2014
      08:38:50 - INDEX SUMMARY
      08:38:50 - Files indexed: 1
      08:38:50 - Files skipped: 37
      08:38:50 - Files filtered: 0
      08:38:50 - Files downloaded: 1
      08:38:50 - Emails indexed: 0
      08:38:50 - Unique words found: 238
      08:38:50 - Variant words found: 33
      08:38:50 - Total words found: 276
      08:38:50 - Avg. unique words per page: 238.00
      08:38:50 - Avg. words per page: 276
      08:38:50 - Peak physical memory used: 35 MB
      08:38:50 - Peak virtual memory used: 129 MB
      08:38:50 - Errors: 0
      08:38:50 - URLs visited by spider: 1
      08:38:50 - URLs in spider queue: 0
      08:38:50 - Total bytes scanned/downloaded: 6885
      08:38:50 - File extensions:
      08:38:50 - .htm indexed: 0
      08:38:50 - .html indexed: 0
      08:38:50 - .txt indexed: 0
      08:38:50 - .php indexed: 0
      08:38:50 - .asp indexed: 0
      08:38:50 - .cgi indexed: 0
      08:38:50 - .aspx indexed: 0
      08:38:50 - .pl indexed: 0
      08:38:50 - .php3 indexed: 0
      08:38:50 - .pdf indexed: 0
      08:38:50 - Cleaning up memory used for index data... please wait.
      08:38:50 - Finished cleaning up memory.
      08:38:50 - [FILEIO] Copied search script to: F:\Krimitage\dev\zoom\search.php
      08:38:50 - [FILEIO] Successfully created all required files

      Comment


      • #4
        Oh, sorry, I just became aware, that "Enable robots.txt" was checked. In robots.txt the folder "dev" ist disallowed from spiders.
        When I unchecked that one, zoom indexes all the pages.

        Sorry for being stupid.

        But I have still one question. Running a search I get the following error:

        Warning: Cannot modify header information - headers already sent by (output started at /daten/kunden/vhosts/krimitage.ch/httpdocs/dev/krimisuche.php:9) in /daten/kunden/vhosts/krimitage.ch/httpdocs/dev/zoom/search.php on line 103

        What's going wrong here?

        Regards
        Jörg

        Comment


        • #5
          The search.php page is expecting to be able to send headers. But if you have "included" search.php within another script (krimisuche.php?) then it would not be able to do so. You will have to change the setting under "Configure"->"Advanced" and check the option to 'Disable charset enforcing on search script'.

          This is mentioned in the steps to include search.php within other PHP pages, here:
          http://www.wrensoft.com/zoom/support/faq_ssi.html
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Thanks Ray

            That did the trick.

            Now I encounter another one … Sorry, but I couldn't find any answer in the forum.

            When I enter more than one words in the search form, the following error message appears:

            Warning: parse_url() expects exactly 1 parameter, 2 given in /daten/kunden/vhosts/krimitage.ch/httpdocs/dev/zoom/search.php on line 2770
            Does one always have to wrap it in double quotes, if one is searching for more than one words?
            Not only for exact expressions?

            Comment


            • #7
              No, you don't need quotes.

              I did a two word search on your site, but didn't see any errors.
              Have you already fixed the problem? Or maybe it only happens with particular words?

              Comment


              • #8
                Yes, with some combinations of words, there is no error, with others there is. Sometimes even a single search term produces the error. Try for instance "Ngugi" or "Dagmar" or "Sabine". "Zur Linde" produces over a dozen of the errors.
                Last edited by joe_ma; Aug-29-2014, 11:10 AM.

                Comment


                • #9
                  OK, so I had a look on what's happening at line 2770. I found these lines:

                  if(!defined('PHP_URL_HOST')) define('PHP_URL_HOST', 2); // This define is missing in PHP4.

                  while ($arrayline < $matches && $div_res_count < $MaxDomainDiversity)
                  {
                  $pageUrlBuffer = GetUrlFromPageData($output[$arrayline][0]);
                  $domainBuffer = parse_url($pageUrlBuffer, PHP_URL_HOST);
                  Since the warning said that there is only one parameter allowed, I commented out some lines and modified one like this:

                  // if(!defined('PHP_URL_HOST')) define('PHP_URL_HOST', 2); // This define is missing in PHP4.

                  while ($arrayline < $matches && $div_res_count < $MaxDomainDiversity)
                  {
                  $pageUrlBuffer = GetUrlFromPageData($output[$arrayline][0]);
                  // $domainBuffer = parse_url($pageUrlBuffer, PHP_URL_HOST);
                  $domainBuffer = parse_url($pageUrlBuffer);
                  With these modifications there are no more warning messages.

                  But: please telle me, if this is likely to go to produce other problems.
                  Last edited by joe_ma; Aug-29-2014, 03:18 PM.

                  Comment


                  • #10
                    What version of PHP is installed on your server?

                    It looks like it might be an pretty old version. In older versions of PHP, parse_url worked differently and only accepted 1 parameter.

                    A quick work-around for this particular error is to turn off the "Ensure domain diversity" feature from the "Searh page" configuration window in Zoom. You shouldn't need to edit the code. The edit you made would break this feature in any case.

                    Depending on how old your version of PHP is we might update the code to make it compatible with the older PHP release.

                    Comment


                    • #11
                      Yes, indeed, it is a very old version of PHP: 4.3.10-22

                      Your "quick work-around" works nicely. So there is no need to update the code, thank you. Also because this is the last time, we are using this server.

                      Thank you very much for your precious help.

                      Kind regards
                      Jörg

                      Comment


                      • #12
                        This has been fixed for the next release. Thanks for bringing it to our attention.
                        --Ray
                        Wrensoft Web Software
                        Sydney, Australia
                        Zoom Search Engine

                        Comment

                        Working...
                        X