I love this product. The only thing I am finding a little cumbersome is managing a larger number of start points. I am slowly trying to build a small vertical search engine for our niche market (maybe I am stretching the intended scalability a bit but I expect to eventually have about a million pages) - why use google co-op when i can build my own vertical engine and have 100% of my own branding? ) and it would be much easier to use this product in terms of working with additional start points if the start points could be managed like a database within the program as opposed to importing/exporting which is forcing me to have to maintain an external text database to manage this.
I am finding that more often than not due to a little bit of confusion I am having to perform complete reindexes instead of updates with new startpoints.
I currently have my start pages follow external links and then I run a small script that scans the urlist.txt for unique new urls and I then add these to the start list. I know - it wouldn't take long before there's a big list of start points but a feature that would automatically add these to a start point or base url database and then the user could manually decide whether to flag these base urls for further indexing or delete from database.
The database would indicate whether the url has already been indexed or not so as to avoid doubling up.
Would be great if we could see this in a future version.
I am finding that more often than not due to a little bit of confusion I am having to perform complete reindexes instead of updates with new startpoints.
I currently have my start pages follow external links and then I run a small script that scans the urlist.txt for unique new urls and I then add these to the start list. I know - it wouldn't take long before there's a big list of start points but a feature that would automatically add these to a start point or base url database and then the user could manually decide whether to flag these base urls for further indexing or delete from database.
The database would indicate whether the url has already been indexed or not so as to avoid doubling up.
Would be great if we could see this in a future version.
Comment