Hi,
Is it possible to specify in the indexer that urls that are the same but with a different case are actually the same page and should not be indexed more than once? Basically the reason it has come about is that I tried to index a link which had querystring data in it. The link was specified on two different pages and the capitalisation was different between the two. The spider indexed both pages as it thought they were different when in fact they were the same. The CRC check failed to pick it up because the querystring data was displayed on the page itself.
The same goes for the skip options. Is it possible to make this check case insensitive as well, so links with e.g. &Date= and &date= will both be ignored without having to have them as two separate entries?
Is it possible to specify in the indexer that urls that are the same but with a different case are actually the same page and should not be indexed more than once? Basically the reason it has come about is that I tried to index a link which had querystring data in it. The link was specified on two different pages and the capitalisation was different between the two. The spider indexed both pages as it thought they were different when in fact they were the same. The CRC check failed to pick it up because the querystring data was displayed on the page itself.
The same goes for the skip options. Is it possible to make this check case insensitive as well, so links with e.g. &Date= and &date= will both be ignored without having to have them as two separate entries?
Comment