PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Third Party Site - Possible to skip some parts of pages?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Third Party Site - Possible to skip some parts of pages?

    Hi!

    GOAL
    Indexing pages from different real estate sites

    SITUATION
    I have this situation from a big multi listing services site, where I am getting a navigation bar + some others properties from inside a kind of "latest listings box" indexed at the same time single specific properties

    SKIPPING FEATURE
    I know when it is your own site you can add meta tag to skip parts of pages when there are your pages

    QUESTION
    Is there any way at all that I could tell ZOOM to index certain THIRD PARTY pages skipping some parts of it:

    1) using html code
    2) any other means

    Possible at all?

    Any advices are welcome!

    Roger

  • #2
    The indexing of 3rd party sites is no problem. Even indexing selected pages from 3rd party sites is OK.

    The problem is how to specify only parts of pages are indexed when you don't control the site. You can in Zoom control which elements of the page are indexed, and can turn off for example indexing of the title and meta keys words. But there is no way to filter out parts of the page content if you don't control the site.

    The typical method of scraping this type of information (real estate and job adds) is to custom develop a solution per 3rd party site.

    Comment


    • #3
      We're considering adding a feature in the next major release (V7) to skip HTML blocks based on <div> id= or name= or class= attributes. This may help in some cases, but ultimately, if the third party site is not marked up well, then this wouldn't work for all sites.

      As noted above, there is no magic solution that can automatically determine what is skippable content on any arbitrary site.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment

      Working...
      X