Hi. I'm running a Professional Edition implementation of Zoom on a reasonably large website. Generally, the results are very good due to the many options and setting available. However, we have encountered an issue which is now causing major problems.
There are a large number of large pdf files on the website – these are swamping the html results simply due to the large amount of words (and repetition of words) within the pdfs.
We have given the all html pages a +5 Boost, and the pdf’s a -5 Deboost (using Desc files). We’ve set the Content Density adjustment to Strong, as well as the Word Positioning, and we’ve even given Body Content a -5 Deboost.
However, we still have issues with documents taking precedence.
Note that Recommended links are not an option.
We have currently disabled body content indexing, so to make use of the reasonably comprehensive metadata used across the site, but unsurprisingly the results are now very much dependant on specific keywords.
My client has asked if it is possible to still score body content, but only on the first hit for each unique word in the html/document. This would allow for reasonably accurate results and prevent the pdf's always taking precedence.
Is this a possibility? Is there anyway of indexing in this way without asking for some custom development? Is there another way we can approach the problem?
Thanks in advance for any help.
There are a large number of large pdf files on the website – these are swamping the html results simply due to the large amount of words (and repetition of words) within the pdfs.
We have given the all html pages a +5 Boost, and the pdf’s a -5 Deboost (using Desc files). We’ve set the Content Density adjustment to Strong, as well as the Word Positioning, and we’ve even given Body Content a -5 Deboost.
However, we still have issues with documents taking precedence.
Note that Recommended links are not an option.
We have currently disabled body content indexing, so to make use of the reasonably comprehensive metadata used across the site, but unsurprisingly the results are now very much dependant on specific keywords.
My client has asked if it is possible to still score body content, but only on the first hit for each unique word in the html/document. This would allow for reasonably accurate results and prevent the pdf's always taking precedence.
Is this a possibility? Is there anyway of indexing in this way without asking for some custom development? Is there another way we can approach the problem?
Thanks in advance for any help.
Comment