PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Stats After Indexing Just One PDF..Normal?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stats After Indexing Just One PDF..Normal?

    I just downloaded the latest version of ZOOM. I indexed JUST one .PDF file that has 61 pages.

    Here are the indexing status:

    13:49:40 - Start indexing (offline mode) at Sun Apr 20 13:49:40 2008
    13:49:40 - Maximum number of words: 50000
    13:49:40 - Maximum number of files: 1000
    13:49:40 - Will scan files with extensions
    13:49:40 - .htm
    13:49:40 - .html
    13:49:40 - .php
    13:49:40 - .asp
    13:49:40 - .cfm
    13:49:40 - .aspx
    13:49:40 - .php3
    13:49:40 - .php4
    13:49:40 - .txt
    13:49:40 - .pdf
    13:49:40 - Search root directory: F:\!zoom just one file
    13:49:40 - Web site URL: http://www.internet-marketing-public-library.org/
    13:49:40 - Estimated RAM required during index process: 25733 KB
    13:49:40 - [PLUGIN] Processing PDF file F:\!zoom just one file\24-hour.pdf
    13:49:41 - [INDEXED] Indexing F:\!zoom just one file\24-hour.pdf
    13:49:41 - [FILEIO] All index files will be written to: F:\!zoom_output
    13:49:41 - [FILEIO] Writing index data for PHP search... (Please wait)
    13:49:41 - [FILEIO] Created pagedata data file (zoom_pagedata.zdat)
    13:49:41 - [FILEIO] Created pagetext data file (zoom_pagetext.zdat)
    13:49:41 - [FILEIO] Created pageinfo data file (zoom_pageinfo.zdat)
    13:49:41 - [FILEIO] Created dictionary data file (zoom_dictionary.zdat)
    13:49:41 - [FILEIO] Created wordmap data file (zoom_wordmap.zdat)
    13:49:41 - [FILEIO] Created script settings file (settings.php)
    13:49:41 - Indexing completed at Sun Apr 20 13:49:41 2008
    13:49:41 - INDEX SUMMARY
    13:49:41 - Files indexed: 1
    13:49:41 - Files skipped: 2
    13:49:41 - Files filtered: 0
    13:49:41 - Files downloaded: 0
    13:49:41 - Unique words found: 2312
    13:49:41 - Total words found: 13934
    13:49:41 - Avg. unique words per page: 2312.00
    13:49:41 - Avg. words per page: 13934
    13:49:41 - Start index time: 13:49:40 (2008/04/20)
    13:49:41 - Elapsed index time: 00:00:01
    13:49:41 - Errors: 0
    13:49:41 - Total bytes scanned/downloaded: 277782
    13:49:41 - File extensions:
    13:49:41 - .htm indexed: 0
    13:49:41 - .html indexed: 0
    13:49:41 - .php indexed: 0
    13:49:41 - .asp indexed: 0
    13:49:42 - .cfm indexed: 0
    13:49:42 - .aspx indexed: 0
    13:49:42 - .php3 indexed: 0
    13:49:42 - .php4 indexed: 0
    13:49:42 - .txt indexed: 0
    13:49:42 - .pdf indexed: 1
    13:49:42 - No extensions indexed: 0
    13:49:42 - Cleaning up memory used for index data... please wait.
    13:49:42 - Finished cleaning up memory.


    How is it possible that the total count of words is the same than the the average words per page?

    Thanks

    Roger Pilon, Librarian
    Internet Marketing Public Library | User Friendly Repository
    Last edited by David; Apr-20-2008, 10:54 PM. Reason: spelling errors

  • #2
    Zoom calls all documents pages.
    So,
    1 HTML page is 1 page
    1 JPG file is a 1 page
    1 XML file is 1 page
    1 PDF file is 1 page (regardless of the size or number of internal pages)

    We thought 'page' was the best term to use because the most common usage scenario is indexing HTML pages. So the stats would seem to be correct.

    Comment


    • #3
      A Tiny Suggestion!

      Hi!

      Maybe using the word "document" rather than "page" would be less confusing!

      Just my $0.02 cents!

      Roger Pilon, Librarian
      Internet Marketing Public Library | Internet Marketers Repository

      Comment


      • #4
        'File' might in fact be the best term. Document implies a Word or PDF document. But 'web page' is the commonly used term for HTML files. You don't hear people talking about 'web documents' when they are referring to HTML files.

        But 'File' doesn't make any sense for web pages that have no file, like all the dynamically generated pages that are created on the fly by scripts.

        Comment

        Working...
        X