We have wrensoft zoom search enterprise Order number: WS73NF3137 Date: 22/Feb/2013
Due to our needs, we need to set up various start points to index part of the site, keep in mind that our site is http://www.elcolombiano.com, that way we don't have a full index home page but a series of articles in its own URL, that is why we create a bunch of files for this purpose and we need to perform incremental indexing due the large size of the site and the regular changes on the files.
I have put a sample in http://www.elcolombiano.com/bancomedios/z1/t1.html http://www.elcolombiano.com/bancomedios/z1/t2.html http://www.elcolombiano.com/bancomedios/z1/t3.html and http://www.elcolombiano.com/bancomedios/z1/t4.html
For your review I'm attaching at the bottom of the message the configuration file text we are using to perform this steps.
These are my steps I performed let me know if am I doing something wrong:
NOTE: We use HTML and ASP extensions only and we are going to setup for spider mode.
1. Open up the Zoom search engine
2. In start Options press MORE and write the following in the dialog
Spider URL: http://www.elcolombiano.com/bancomedios/z1/t1.html (if you see this file in source mode, you will see only a bunch of links in <A> HTML tag) for the URL
BASE URL: http://www.elcolombiano.com/ for the BASE URL
Spidering Options: "Check the Follow all links on this page only"
3. Save configuration
4. Run the indexer
5. When it is done do not upload anywhere.
6. From this point we are having trouble
7. Activate the Index>Incremental Indexing>Add start points (or domains) to existing index (which I think is the option we need for this setting).
8. Add the following start point with the following settings
Spider URL: http://www.elcolombiano.com/bancomedios/z1/t2.html
Base URL: http://www.elcolombiano.com for the BASE URL
Spidering Options: Check the Follow all Links on this page only
9. Click in Proceed and Voilá, here is the problem, no indexing occurs even tough the file is correct.
Note: We tried using both start points not using the incremental option the indexing occurs correctly.
Question is, can you please let me know if I am doing something wrong ?
Can you help me fixing it?
We still have some doubts on the Incremental Search. Let say we use one file as a start point but this file changes during the day including in it more pages to index, can I run the indexer several times over one file incrementally? How can I do this?
Thank you hope we are clear and let us know any question,
///////////////////////zoom.cfg content////////////////////////////
__6_0
#STARTDIR:
#SPIDERURL:http://www.elcolombiano.com/bancomedios/zoom/t1.html
#BASEURL:http://www.elcolombiano.com/
#OUTDIR:\SitiosWeb\Sitio\buscador
#SPIDERURLTYPE:5
#SPIDERURLUSELIMIT:0
#SPIDERURLLIMIT:0
#SPIDERURLBOOST:0
#USE-CRC:1
#CURRENTMODE:1
#DLTHREADS:10
#NOCACHE:1
#BEEP-ON-FINISH:0
#THROTTLEDELAY:200
#OUTPUT:ASP
#OUTPUT_OS:0
#ISDOTNET:0
#VERBOSE:0
#LOGMODE:1
#LOGOPTIONS:INDEXED|SKIPPED|FILTERED|INIT|DOWNLOAD |UPLOAD|FILEIO|PLUGIN|INFO|ERROR|WARNING|QUEUE|SUM MARY|THREAD|BROKEN|
#LOGWRITETOFILE:0
#LOGWRITETOFILENAME:C:\Documents and Settings\All Users\Application Data\Wrensoft\Zoom Search Engine Indexer\temp\indexlog.txt
#LOGAPPENDDATETIME:1
#LOGDEBUGMODE:0
#LOGHTMLERRORS:1
#SCAN_NOEXTENSION:0
#SCAN_FILELINKS:0
#SCAN_USELOCALDESCPATH:0
#SCAN_LOCALDESCPATH:
#SCAN_ROBOTSTXT:1
#SCAN_CHECKTHUMBS:0
#PARSEJSLINKS:1
#REWRITELINKS:0
#REWRITEFIND:
#REWRITEWITH:
#INDEXOPTIONS:METADESC|CONTENT|TITLE|
#RESULTOPTIONS:TITLE|METADESC|CONTEXT|DATE|
#USE-UTF8:0
#CODEPAGE:28591
#USESTEMMING:0
#STEMALGO:2
#DIGRAPHS:0
#ZLANGFILE:Spanish.zlang
#SKIPUNDERSCORE:1
#MINWORDLEN:2
#FORMFORMAT:2
#HIGHLIGHTING:1
#GOTOHIGHLIGHT:0
#USEXML:0
#XMLTITLE:
#XMLDESC:
#XMLURL:
#XMLXSLTURL:
#XML_OPENSEARCH_DESCURL:
#XMLHIGHLIGHT:0
#LOGGING:0
#LOGGING_FILE:./logs/searchwords.log
#TIMING:0
#NOCHARSET:0
#DEFAULT_TO_AND:0
#CONTEXTSIZE:30
#EXACTPHRASE:0
#SEARCHASSUBSTRING:0
#STRIPDIACRITICS:0
#NO_TOLOWER:0
#ZOOMINFO:0
#USEDATETIME:1
#WORDJOINCHARS:.-_'
#ZOOMIMAGE:0
#SPELLING:1
#SPELLINGWHENLESSTHAN:5
#WIZARD_UPLOADREQD:0
#REPORTUSEDATES:0
#WORDWEIGHT_TITLE:3
#WORDWEIGHT_DESC:0
#WORDWEIGHT_KEYWORDS:1
#WORDWEIGHT_FILENAME:0
#WORDWEIGHT_HEADINGS:2
#WORDWEIGHT_LINKTEXT:0
#WORDWEIGHT_CONTENT:-1
#WORDWEIGHT_DENSITY:1
#WORDWEIGHT_SHORTURLS:1
#WORDWEIGHT_PROXIMITY:1
#USE-AUTH:0
#USE-COOKIES:1
#USE-COOKIELOGIN:0
#BINUSEDESC:0
#PLUGIN_DESCFILES:
#PLUGIN_USEMETA:PDF|DOC|PPT|RTF|SWF|WPD|XLS|DJVU|I MAGE|MP3|DWF|OFFICE|
#PLUGIN_USETECHNICAL:MP3|IMAGE|DWF|
#PLUGIN_TEXTONLY:
#PLUGIN_PDF_METHOD:0
#PLUGIN_PDF_HIGHLIGHT:1
#PLUGIN_IMG_MINFILESIZE:5
#PLUGIN_ZIP_EXTRACT:1
#MAXPAGES_LIMIT:65500
#MAXWORDS_LIMIT:500000
#MAXFILESIZE_LIMIT:4194304
#DESCLENGTH_LIMIT:300
#OPTIMIZE_SETTING:6
#EXTENSIONS_START
.html|FILETYPE:0
.asp|FILETYPE:0
#EXTENSIONS_END
#SKIPPAGES_START
ventanas
#SKIPPAGES_END
#SKIPWORDS_START
#SKIPWORDS_END
#USECATS:0
#USEDEFCATNAME:0
#SEARCHMULTICATS:0
#DISPLAYCATSUMMARY:1
#RECOMMENDED_MAX:3
#USEFILTER:0
#FILTER_START
#FILTER_END
#SITEMAP_TXT:0
#SITEMAP_XML:0
#SITEMAP_UPLOAD:0
#SITEMAP_UPLOADPATH:
#SITEMAP_USEPAGEBOOST:1
#SITEMAP_USEBASEURL:1
#SITEMAP_BASEURL:
Due to our needs, we need to set up various start points to index part of the site, keep in mind that our site is http://www.elcolombiano.com, that way we don't have a full index home page but a series of articles in its own URL, that is why we create a bunch of files for this purpose and we need to perform incremental indexing due the large size of the site and the regular changes on the files.
I have put a sample in http://www.elcolombiano.com/bancomedios/z1/t1.html http://www.elcolombiano.com/bancomedios/z1/t2.html http://www.elcolombiano.com/bancomedios/z1/t3.html and http://www.elcolombiano.com/bancomedios/z1/t4.html
For your review I'm attaching at the bottom of the message the configuration file text we are using to perform this steps.
These are my steps I performed let me know if am I doing something wrong:
NOTE: We use HTML and ASP extensions only and we are going to setup for spider mode.
1. Open up the Zoom search engine
2. In start Options press MORE and write the following in the dialog
Spider URL: http://www.elcolombiano.com/bancomedios/z1/t1.html (if you see this file in source mode, you will see only a bunch of links in <A> HTML tag) for the URL
BASE URL: http://www.elcolombiano.com/ for the BASE URL
Spidering Options: "Check the Follow all links on this page only"
3. Save configuration
4. Run the indexer
5. When it is done do not upload anywhere.
6. From this point we are having trouble
7. Activate the Index>Incremental Indexing>Add start points (or domains) to existing index (which I think is the option we need for this setting).
8. Add the following start point with the following settings
Spider URL: http://www.elcolombiano.com/bancomedios/z1/t2.html
Base URL: http://www.elcolombiano.com for the BASE URL
Spidering Options: Check the Follow all Links on this page only
9. Click in Proceed and Voilá, here is the problem, no indexing occurs even tough the file is correct.
Note: We tried using both start points not using the incremental option the indexing occurs correctly.
Question is, can you please let me know if I am doing something wrong ?
Can you help me fixing it?
We still have some doubts on the Incremental Search. Let say we use one file as a start point but this file changes during the day including in it more pages to index, can I run the indexer several times over one file incrementally? How can I do this?
Thank you hope we are clear and let us know any question,
///////////////////////zoom.cfg content////////////////////////////
__6_0
#STARTDIR:
#SPIDERURL:http://www.elcolombiano.com/bancomedios/zoom/t1.html
#BASEURL:http://www.elcolombiano.com/
#OUTDIR:\SitiosWeb\Sitio\buscador
#SPIDERURLTYPE:5
#SPIDERURLUSELIMIT:0
#SPIDERURLLIMIT:0
#SPIDERURLBOOST:0
#USE-CRC:1
#CURRENTMODE:1
#DLTHREADS:10
#NOCACHE:1
#BEEP-ON-FINISH:0
#THROTTLEDELAY:200
#OUTPUT:ASP
#OUTPUT_OS:0
#ISDOTNET:0
#VERBOSE:0
#LOGMODE:1
#LOGOPTIONS:INDEXED|SKIPPED|FILTERED|INIT|DOWNLOAD |UPLOAD|FILEIO|PLUGIN|INFO|ERROR|WARNING|QUEUE|SUM MARY|THREAD|BROKEN|
#LOGWRITETOFILE:0
#LOGWRITETOFILENAME:C:\Documents and Settings\All Users\Application Data\Wrensoft\Zoom Search Engine Indexer\temp\indexlog.txt
#LOGAPPENDDATETIME:1
#LOGDEBUGMODE:0
#LOGHTMLERRORS:1
#SCAN_NOEXTENSION:0
#SCAN_FILELINKS:0
#SCAN_USELOCALDESCPATH:0
#SCAN_LOCALDESCPATH:
#SCAN_ROBOTSTXT:1
#SCAN_CHECKTHUMBS:0
#PARSEJSLINKS:1
#REWRITELINKS:0
#REWRITEFIND:
#REWRITEWITH:
#INDEXOPTIONS:METADESC|CONTENT|TITLE|
#RESULTOPTIONS:TITLE|METADESC|CONTEXT|DATE|
#USE-UTF8:0
#CODEPAGE:28591
#USESTEMMING:0
#STEMALGO:2
#DIGRAPHS:0
#ZLANGFILE:Spanish.zlang
#SKIPUNDERSCORE:1
#MINWORDLEN:2
#FORMFORMAT:2
#HIGHLIGHTING:1
#GOTOHIGHLIGHT:0
#USEXML:0
#XMLTITLE:
#XMLDESC:
#XMLURL:
#XMLXSLTURL:
#XML_OPENSEARCH_DESCURL:
#XMLHIGHLIGHT:0
#LOGGING:0
#LOGGING_FILE:./logs/searchwords.log
#TIMING:0
#NOCHARSET:0
#DEFAULT_TO_AND:0
#CONTEXTSIZE:30
#EXACTPHRASE:0
#SEARCHASSUBSTRING:0
#STRIPDIACRITICS:0
#NO_TOLOWER:0
#ZOOMINFO:0
#USEDATETIME:1
#WORDJOINCHARS:.-_'
#ZOOMIMAGE:0
#SPELLING:1
#SPELLINGWHENLESSTHAN:5
#WIZARD_UPLOADREQD:0
#REPORTUSEDATES:0
#WORDWEIGHT_TITLE:3
#WORDWEIGHT_DESC:0
#WORDWEIGHT_KEYWORDS:1
#WORDWEIGHT_FILENAME:0
#WORDWEIGHT_HEADINGS:2
#WORDWEIGHT_LINKTEXT:0
#WORDWEIGHT_CONTENT:-1
#WORDWEIGHT_DENSITY:1
#WORDWEIGHT_SHORTURLS:1
#WORDWEIGHT_PROXIMITY:1
#USE-AUTH:0
#USE-COOKIES:1
#USE-COOKIELOGIN:0
#BINUSEDESC:0
#PLUGIN_DESCFILES:
#PLUGIN_USEMETA:PDF|DOC|PPT|RTF|SWF|WPD|XLS|DJVU|I MAGE|MP3|DWF|OFFICE|
#PLUGIN_USETECHNICAL:MP3|IMAGE|DWF|
#PLUGIN_TEXTONLY:
#PLUGIN_PDF_METHOD:0
#PLUGIN_PDF_HIGHLIGHT:1
#PLUGIN_IMG_MINFILESIZE:5
#PLUGIN_ZIP_EXTRACT:1
#MAXPAGES_LIMIT:65500
#MAXWORDS_LIMIT:500000
#MAXFILESIZE_LIMIT:4194304
#DESCLENGTH_LIMIT:300
#OPTIMIZE_SETTING:6
#EXTENSIONS_START
.html|FILETYPE:0
.asp|FILETYPE:0
#EXTENSIONS_END
#SKIPPAGES_START
ventanas
#SKIPPAGES_END
#SKIPWORDS_START
#SKIPWORDS_END
#USECATS:0
#USEDEFCATNAME:0
#SEARCHMULTICATS:0
#DISPLAYCATSUMMARY:1
#RECOMMENDED_MAX:3
#USEFILTER:0
#FILTER_START
#FILTER_END
#SITEMAP_TXT:0
#SITEMAP_XML:0
#SITEMAP_UPLOAD:0
#SITEMAP_UPLOADPATH:
#SITEMAP_USEPAGEBOOST:1
#SITEMAP_USEBASEURL:1
#SITEMAP_BASEURL:
Comment