I've come across an oddity where no results are found, but, in fact there are files with the exact text.
It most frequently occurrences on larger files (300kb+). I was able to whittle down a file to 2kb though and reproduce the issue. I have narrowed down the file to a point where pretty much any change at the top of the file affects an exact phrase match at the bottom of the file. It seems to be related to how many times the words in the exact phrase match occurs in a document before the exact match
Seems to occur around 290 instances or more of words in the phrase occurring before. It doesn't seem to be exact though, and seems to be a combination of both words in the matching phrase occurring in a document before the match. Maybe there is some kind of seek/search limit for performance (I tried changing the optimization though)?
This is kinda a big issue for us. We have many large pages that are long lists of system definitions. It's not good that the search is sometimes not providing results for these pages where it may be the only file with the exact system phrases in them. And who knows how many files it actually affects, it may affect 30kb files that contain a table with a repeated word in it many times before other text...
Example file:
Search: "day two"
Result: No results found.
Now go and remove/edit/change any of the repeated paragraphs at the top of the test file.
After a re-index and search again, bam, the exact phrase in red is now correctly matched and returned as a result.
Why would the number of occurrences of some of the words in an exact phrase search query in a document cause a non match when the exact phrase is actually further down in a document? I tried changing the optimization between Fast, Default, and Slow, same results and issue.
Index with zoom Search Engine Indexer 7.1 build 1002
Core Engine: Version 7.1 (Build: 1002) on Windows 7
ASP.NET Server Control 32-bit build 1002b. Assembly Version: 7.0.5962.22541
Sorry for the long winded post, the issue is a bit tricky to reproduce and somewhat random.
It most frequently occurrences on larger files (300kb+). I was able to whittle down a file to 2kb though and reproduce the issue. I have narrowed down the file to a point where pretty much any change at the top of the file affects an exact phrase match at the bottom of the file. It seems to be related to how many times the words in the exact phrase match occurs in a document before the exact match
Seems to occur around 290 instances or more of words in the phrase occurring before. It doesn't seem to be exact though, and seems to be a combination of both words in the matching phrase occurring in a document before the match. Maybe there is some kind of seek/search limit for performance (I tried changing the optimization though)?
This is kinda a big issue for us. We have many large pages that are long lists of system definitions. It's not good that the search is sometimes not providing results for these pages where it may be the only file with the exact system phrases in them. And who knows how many files it actually affects, it may affect 30kb files that contain a table with a repeated word in it many times before other text...
Example file:
Code:
<html> <head><title>Test Doc</title></head> <body id="topic"> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>day day day day day day day day day day day day day day day day day day day day day day day text two</p> <p>[COLOR="#FF0000"]day two[/COLOR] testing</p> </body> </html>
Result: No results found.
Now go and remove/edit/change any of the repeated paragraphs at the top of the test file.
After a re-index and search again, bam, the exact phrase in red is now correctly matched and returned as a result.
Why would the number of occurrences of some of the words in an exact phrase search query in a document cause a non match when the exact phrase is actually further down in a document? I tried changing the optimization between Fast, Default, and Slow, same results and issue.
Index with zoom Search Engine Indexer 7.1 build 1002
Core Engine: Version 7.1 (Build: 1002) on Windows 7
ASP.NET Server Control 32-bit build 1002b. Assembly Version: 7.0.5962.22541
Sorry for the long winded post, the issue is a bit tricky to reproduce and somewhat random.
Comment