PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Spidering an RSS Feed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spidering an RSS Feed

    Much of our website content is now database driven - and we publish / syndicate it using many categorised RSS feeds.

    We currently use Zoom4.1 to provide search functions on several of our pdf-based subscription services.

    Can Zoom 4.2 (or perhaps V5) spider/index RSS feeds? This would be ideal, since we could then offer RSS feed-specific search functions.

    Each feed includes a series of urls to MS-SQL-delivered articles, together with the title of the article. The underlying content from each article is obviously held in the DB.

    Any thoughts about the practicalities of this?

    Jim

    www.tutor2u.net
    Jim Riley
    Managing Director, Tutor2u Limited
    www.tutor2u.net
    UK Online Learning Resource of the Year

  • #2
    The current version of Zoom can index much of the content from a RSS/XML feed. However, it will not be able to follow links unless they appear in HTML (eg. if you have CDATA sections with <a href="test"> style links). We could consider adding support for following RSS links such as those specified within a <link>...</link> tag if there is enough user interest in this.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      The RSS feed uses the following XML schema, with the URL listed as as <link> item:

      <?xml version="1.0" ?>
      - <rss version="2.0">
      - <channel>
      <title>Tutor2u - Economics in the News</title>
      <link>www.tutor2u.net/</link>
      <description>Daily commentary on economics in the news - with a UK focus</description>
      - <item>
      <title>MPs call for higher road and air taxes</title>
      <description>A House of Commons committee on carbon emissions from transport has recommended that the government consider introducing higher taxes on aviation as a tool to curb the take off of CO emissions from the UK transport industry</description>
      <link>http://www.tutor2u.net/newsmanager/templates/?a=1552&z=1</link>
      </item>
      - <item>
      <title>Zimbabwe enforces a price freeze</title>
      <description>The Zimbabwean government has decided to introduce a three week price freeze in a bid to bring some semblance of control back to their economy.</description>
      <link>http://www.tutor2u.net/newsmanager/templates/?a=1553&z=1</link>
      </item>

      Is this any good?

      I guess the alternative is to produce an html page with all the RSS feed items listed as links

      (e.g. http://www.tutor2u.net/economicsblog.html )

      But I'm not sure how I configure Zoom 4.1 to spider this list given that each item's URL is in the form:

      http://www.tutor2u.net/newsmanager/t...es/?a=1536&z=1

      Any advice gratefully received

      Jim

      Jim
      Jim Riley
      Managing Director, Tutor2u Limited
      www.tutor2u.net
      UK Online Learning Resource of the Year

      Comment


      • #4
        Originally posted by tutor2u
        The RSS feed uses the following XML schema, with the URL listed as as <link> item:
        [...]
        Is this any good?
        As I mentioned above, <link>...</link> URLs are not currently picked up and followed by the V4.2 spider. But its something we could consider for a future version if there was enough interest.

        Originally posted by tutor2u
        I guess the alternative is to produce an html page with all the RSS feed items listed as links

        (e.g. http://www.tutor2u.net/economicsblog.html )

        But I'm not sure how I configure Zoom 4.1 to spider this list given that each item's URL is in the form:
        http://www.tutor2u.net/newsmanager/t...es/?a=1536&z=1
        I don't see any problem with indexing from that HTML page of links. What problem are you having indexing those links? Turn on verbose mode to get skipping messages if you need them.

        And we recommend upgrading to V4.2, for new features as well as bug fixes that may be relevant.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X