Curious problem with queueing?

rkshead

Newbie

Join Date: Apr 2011

Posts: 2
- Share
- Tweet
#1

Curious problem with queueing?

May-24-2011, 03:06 AM

Using ZoomSearch Pro in Spider mode, I'm indexing a site comprising 50+ .asp pages and 200+ .doc/.pdf files (each typically a few MB long). Because the URLs of the latter are generated dynamically when executing the former, I have produced a list of these files within an additional page - "indexed.asp"
(in the form:
<a href=file1.pdf>x</a>
<a href=file2.pdf>x</a>
<a href=file3.pdf>x</a>
<a href=file4.doc>x</a> etc)
I then added this indexed.asp file to the "List of Start Points" (follow links only) after the start point URLs for .asp pages.
(Note, all the .pdf and .doc files have .desc files in the same directory - this may/may not be relevant.)

Indexing the .asp pages proceeds perfectly, and on reaching the indexed.asp file, the 200+ .doc/.pdf files are queued and then progressively processed. This works about 98% to 99% of the time: On each run, 2 to 8 files fail to be processed, giving an error of the form:

10:52:06 - [ERROR] Can not write file C:\output directory\zoom_plugin.in (Error code: 5)
10:52:06 - [ERROR] Failed to write plugin file to disk: http:// site name/filexx.pdf

The files which fail (identified as filexx.pdf in above ERROR message) appear to be completely random.
Re-running the indexing with identical parameters again produces between 2 and 8 failures but of different files (and those which failed last time are successfully indexed this time). Out of a dozen or so attempts, I've not had less than 2 failures and I've not had more than 8.

Examining the failed files (in Acrobat Pro) reveals no errors, and producing a "List of Start Points" comprising just the URLs of the "failed" files results in a perfect index for this limited list.

So instead of producing the "indexed.asp" file, I produced a .txt file which just lists the URLs of the .doc/.pdf files, and imported this into the "List of Start Points" after the start point URLs for .asp pages. This works perfectly.

I'm curious as to why the second method works but the first method does not? The only difference appears to be the large queue. (It's not that large - I see from other messages that some people have a queue >4000 long).
Tags: None
Ray

Administrator

Join Date: Dec 2004

Posts: 4357
- Share
- Tweet
#2

May-24-2011, 06:01 AM

It looks like there's something intermittently holding access to files written in your output directory.

Possibilities include:
(a) You have multiple instances of Zoom running and trying to write to the same output folder
(b) You have security or anti-virus software running which is interferring and holding exclusive access to the new files created in the folder (e.g. as it tries to scan the file that was last written there, it prevents the next file to overwrite it). Try temporarily disabling these services.
(c) The output directory is a network shared folder and access is failing (wifi issues, etc). Try using a local folder.

--Ray
Wrensoft Web Software
Sydney, Australia
Zoom Search Engine
Comment

Announcement

Curious problem with queueing?

Curious problem with queueing?

Comment