PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Modify zoom_pagedata.zdat after indexing?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modify zoom_pagedata.zdat after indexing?

    Hi,

    I'm finding that I'm not able to create the type of vertical search engine index I want simply by using base URLs, skip options and filtering alone.

    It seems that my last hope is to strip out certain URL patterns in the zoom_pagedata.zdat file using a text editor like TextMage. Every time I try it, though, the file seems to get corrupted, and the search engine doesn't work correctly.

    Is it possible to manipulate the zoom_pagedata.zdat after indexing in a way that doesn't corrupt the file?

    Thanks,
    Cory

  • #2
    Is it possible to manipulate the zoom_pagedata.zdat [file] after indexing
    No it isn't.
    It isn't a flat text file. It is effectively part of a database.

    Better that you apply the filter options in the user interface.

    Comment


    • #3
      Maybe you can elaborate what you are trying to do as well and where you are struggling with. There's alot of options available for -- support for multiple base URLs (per start point), use of multiple start points, limits per start point, etc.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Hi Ray,

        Thanks for the quick reply. As I've discussed with your colleague over the past few weeks, I'm building a vertical search engine that indexes about 100 different event-related sites, all with different site structures.

        The core problem is that I have to scan certain types of pages in order to index all the pages I want to index. I can't add those undesirable pages to the skip list, or I won't find all the pages I want to index (only the event details page, not the event category page, for example). That leaves thousands of pages that I want to then remove from the index.

        It's not practical for me to add thousands of start points, and even if I did, that wouldn't necessarily solve all my problems. For version 1 of the site, I believe I'll have to find a way to remove pages from the index after indexing. If it's not possible to use a text file to manipulate zoom_pagedata.zdat programmatically, then I can pretty quickly remove the unnecessary pages using the "delete from index" feature in Zoom Search.

        However, now the problem is that whenever I delete pages from within Zoom, a zoom_deletepages.inc.tmp file is created by the program, but never goes away. Subsequent attempts to reindex my start points results in the application hanging.

        Any ideas on why that might be happening?

        Thanks,
        Cory

        Comment


        • #5
          Yes, you should be using the "Delete pages from existing index" feature in Zoom, instead of manually editing any of the .zdat files in an external program.

          We have confirmed that there is a bug with the "zoom_deletepages.inc.tmp" file being left behind. This is just a temporary file however, and there's no other side effect to this. You can simply delete this after the task or ignore the file. The next build will automatically remove the file.

          We'll also look into the problem with not being able to do subsequent indexings.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Hi Ray,

            Thanks for confirming the existence of the bug. However, it appears that the bug does more than simply forget to delete the TMP file. As I suspected, and have confirmed with testing, the pages that I've tried to delete aren't actually deleted.

            After attempting to delete files and then re-uploading all Zoom files to the site, the deleted pages are still searchable. Further, when I reopen the configuration file, I'm able to find the deleted pages again, proving they weren't deleted at all.

            Please advise.

            Thanks,
            Cory

            Comment


            • #7
              Cory -- you're right, we've confirmed that there is yet another bug which impairs the delete pages function. This is partly a race condition whereby if the delete operation happens too quickly (or the page load operation), it gets into a jumbled state.

              We're working on this now and it should be addressed for the next build. Sorry for the inconvenience.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X