PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing issue - index size and skipping files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing issue - index size and skipping files

    I'm trying out the free version to see if it will work for my needs.

    Our site has many folders. Each folder contains between 10 and 15 files. When I run the index program it indexes certain files in each folder, but skips certain other files. The files it skips are the same ones in each folder.

    Also, we update our site every day and the site is approx. 15,000 pages. We would have to reindex daily and upload the new index files to the server. Since this would be a daily task, I need to know approx. how large the index files would be.

    Thanks
    Rich

  • #2
    If you turn on verbose mode in Zoom, it will give you the reason why files are being skipped. But common problems are listed here,
    http://www.wrensoft.com/zoom/support...s.html#skipped

    The size of your index files depends on the type of content you are indexing. If you are indexing binary documents that contain a lots of formatting, images or diagrams, then you will get very good compression (80 - 90% size reduction from the orginal documents). If you are indexing HTML files then you'll get maybe 60% - 75% size reduction. If you are indexing plain text files with no formating or markup, compression might only be 40 - 50%.

    Do you have a Windows server? Maybe you could run Zoom directly on the server and avoid the upload?

    -----
    David

    Comment


    • #3
      With verbose on, the files don't even appear in the list. I guess "skipped" is the wrong term. The files aren't even being seen.

      The files are html files with filenames of sports.html, sports1.html, sports2.html, etc . . . When indexing it goes through the folder and does certain files, then it goes to the next folder like the sports files don't even exist.

      Thanks

      Comment


      • #4
        Are you indexing in offline mode or Spider mode?

        In Spider mode, files need to be linked in order to be found. Are the files that are missed linked to from another file (which is in turn linked to from your start point)?

        Is your site available online where we can see it?

        ------
        David

        Comment


        • #5
          Thanks for your help. I'm indexing in Spider mode and the files are not linked to the index.html file, so that is why they are not being seen.

          Is there a way to set it up to index every file in a directory rather than spidering through the links on the index page?

          Thanks

          Comment


          • #6
            You could look at using offline mode (as offline mode doesn't follow links).

            Or on some web servers you can just get a directory listing, in HTML format, (with links) by just using the URL for the directory.

            For example here is our image directory,
            http://www.wrensoft.com/images/

            ------
            David

            Comment

            Working...
            X