PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Error: access violation 4.2 build 1010

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error: access violation 4.2 build 1010

    I keep getting this error at the end of my indexing of a localhost website that is about 125,000 pages

    (but in the next future i have to index around 5.000.000 little .html files of 16k each):

    "Error: access violation at 0X7c920f29 (tried to read from 0X000000000) Program terminated."

    And

    I still use zoom version 4.2 build 1010
    I use P4 s478b cpu 3000 mhz, asus p4p800-x, 2GB ddr, raid 0 on 2 maxtor sata 160
    I'm indexing only .html files

    _---____----___---___---___----_____

    Zoom Search Engine Indexer (Professional Edition)
    Version 4.2 (Build: 1010) on Windows XP
    Copyright Wrensoft 2000-2006 (http://www.wrensoft.com/)
    Config file loaded: C:\Programmi\Zoom Search Engine 4.2\zoom.zcfg
    Config file loaded: C:\Programmi\Zoom Search Engine 4.2\zoom.zcfg
    Config file loaded: D:\Italiano\zoom.zcfg
    Config file saved: D:\Italiano\zoom.zcfg
    Start indexing (offline mode)
    Maximum number of words: 65000
    Maximum number of files: 130000
    Will scan files with extensions
    .html
    ..
    .
    .
    ..
    .
    .

    Writing index data for CGI/Win32 search... (Please wait)
    Created dictionary data file (zoom_dictionary.zdat)
    Created wordmap data file (zoom_wordmap.zdat)
    Created pagetext data file (zoom_pagetext.zdat)
    Created pages data file (zoom_pages.zdat)
    Created titles data file (zoom_titles.zdat)
    Created descriptions data file (zoom_descriptions.zdat)
    Created dates data file (zoom_datetime.zdat)
    Created spelling data file (zoom_spelling.zdat)
    Created script settings file (settings.zdat)
    Indexing completed
    INDEX SUMMARY
    Files scanned: 121516
    Files skipped: 41774
    Unique words found: 225800
    Total words found: 17608138
    Avg. unique words per page: 1
    Avg. words per page: 144
    Errors: 0
    Total bytes scanned/downloaded: 3195265577
    File extensions:
    .html scanned: 121516
    Cleaning up memory used for index data... please wait.

    _---____----___---___---___----_____


    Please help me.. i'm also sending you .zcfg file via email....

  • #2
    We have received your config file, thanks. Unfortunately because we don't have access to any of your files and a large volume of data is involved it is not going to be easy to reproduce the problem.

    From your post it appears as if the crash happened immediately after the line, "Cleaning up memory used for index data", appeared in the log file. Can you confirm this?

    If this was the case, the index files are probably still intact and complete and can be used.

    Needless to say, we still need to investigate and fix the bug.

    Could you also send us a couple of example HTML files.

    I don't think you are going to be able to index 5,000,000 files in a single indexing session your current hardware (nor with our current software). 200,000 is certainly possible. But 500,000 is probably about the limit given the RAM you have.

    Can you spilt the data and have several databases of 250,000 - 500,000 documents each?

    ----
    David

    Comment


    • #3
      From your post it appears as if the crash happened immediately after the line, "Cleaning up memory used for index data", appeared in the log file. Can you confirm this?

      YES i confirm..

      If this was the case, the index files are probably still intact and complete and can be used.
      NO I can't .. It's corrupted (i'll send it you via giga email)


      Could you also send us a couple of example HTML files.
      YES


      I don't think you are going to be able to index 5,000,000 files in a single indexing session your current hardware (nor with our current software). 200,000 is certainly possible. But 500,000 is probably about the limit given the RAM you have.
      Can your software use disk (tmp file) instead of RAM? There is some workaround solution or tip or something else that you know.. (virtual ram, disk cache.. ecc.. ecc.)


      thanks

      Comment


      • #4
        We didn't receive any E-Mail / files as yet.

        We did a few tests yesterday with sets of 150,000+ files but couldn't get a crash. So the problem must only occur sometimes, depending on content being indexed and / or configuration options.

        So if you could send us your Zoom configuration file as well, this might also help us match you scenario.

        Would you be willing to send us a copy of your HTML files if we needed them to reproduce the problem? (all 125,000 of them). We would pay for the cost of FedEx'ing a DVD-ROM if they were too large to put on the web.

        Can your software use disk (tmp file) instead of RAM?
        As physical RAM gets low, Zoom automatically starts to use space from the Windows swap file. Windows also automatically reduces the size of the disk cache to free memory as well. This works well for a while. As time goes on and more pieces of the index are written out to the disk. But these pieces are not a complete index and from time to time they need to be added to as more pages are indexed.

        To add a few bytes to data in the swap file it needs to be read from disk into RAM, manipulated and then put back onto the disk at a later point. This is an extremely slow operation. Eventually the indexer spends all its time reading writing from the disk and doing no real work. (Thrashing is the technical term).

        Virtual RAM is always used by Windows and all Windows applications to 'vitualise' the addresss space. Each application always gets 2GB of usable virtual address space, even if you only have 512MB of RAM installed. It is this concept of Virtual RAM that limits all 32bit Windows applications to using at most 2GB of RAM. So this is another problem in addition to the disk thrashing.

        So that was the bad news. The good news is,

        1) Windows 64bit will remove the Virtual RAM limit and we are willing to do a 64bit version of Zoom when someone needs it.

        2) We have some other ideas about storing index data in a different format so that more of it can be written to disk without there being a risk of needing to swap it between the disk and RAM too often.

        But lets fix the crash bug 1st, then worry about the RAM issue.

        ----
        David

        Comment

        Working...
        X