PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

RSS description element for images

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RSS description element for images

    Is it possible to configure the OpenSearch RSS output for image categories such that the description element contains the actual image? Seems all the image plug-in can do is provide something like:

    <description>JPEG file, size 250x302</description>

    Whereas I'd like what Flickr does with its RSS OpenSearch output:

    <description>&lt;a href="<A href="http://www.flickr.com/photos/maximillian_millipede/333864212/"&gt;&lt;img">
    http://www.flickr.com/photos/maximillian_millipede/333864212/"&gt;&lt;img src="http://farm1.static.flickr.com/129/3...3fc7e978_t.jpg" width="90" height="100"
    alt="Wolf Spider - Pardosa species" /&gt;&lt;/a&gt;</description>

    It's far easier to aggregate search results with the latter when the img src is actually provided in the description element.

  • #2
    The Opensearch XML output that Zoom produces for an image will typically be something like this,

    <item>
    <title>Chris Absailing Palm Beach</title>
    <link>http://www.mywebsite.com/IMG_0143.JPG</link>
    <description>Chris Shiel Absailing Palm Beach NSW 2007</description>
    <zoom:context> ... Width:640, Height:480, Make:Canon, Model:Canon DIGITAL IXUS 300, FNumber:F2.7, Aperture:F2.7, ShutterSpeed: ...</zoom:context>
    <zoom:termsMatched>1</zoom:termsMatched>
    <zoom:score>30</zoom:score>
    <pubDate>Mon, 16 Oct 2006 14:13:26 GMT</pubDate>
    </item>



    This is a straight cut and paste from the output generated by Zoom after a search for the word 'Canon' was performed.

    The description comes from the image meta data, if there is any meta data available.

    Comment


    • #3
      Unless one hotlinks images to themselves, the RSS link element produced will point to the page on which the image resides and not the image itself. More often than not, images are not hyperlinked & rarely to themselves. I tried making a few desc files for some images (in the hopes that I might be able to run a batch to make these) as:

      <title>My image</title>
      <meta name="description" content="&lt;a href='http://www.flickr.com/photos/maximillian_millipede/333864212/'&gt;&lt;img src='http://farm1.static.flickr.com/129/333864212_a93fc7e978_t.jpg' width='90' height='100' alt='Wolf Spider - Pardosa species' /&gt;&lt;/a&gt;">

      ...and adjusting the plug-in settings to use a desc, but that didn't work. Seems like a desc doesn't work with the image plug-in and stuff will be pulled from the META data in the file anyway. If I wanted something like the <description> element contents for the RSS output as I mentioned in the first post, making a batch of desc files would have been marginally better, but to have to change the contents of the META data for all the images I host would be terribly tedious not to mention wasted effort because a lot of the presentation of images comes from a backend database (i.e. the file title & meta data are inconsequential to me and searching by these could potentially lead to erroneous search results). So, the only alternative is to hotlink the images & change the contents of the META data for ALL images? This doesn't seem practical, especially since none of the stuff I'd want in the <description> element need come from the META data. It could come from img src= and alt= mark-up right from the page on which the image is embedded. Doing so would also be a step closer to the Yahoo RSS Media Module too: http://search.yahoo.com/mrss.

      Dave

      Comment


      • #4
        We have no plans to include the resource link in the description field as well as in the link field. The Yahoo 'mrss' page didn't seem to have much to say about the description field.

        Using a .desc file should have worked with a JPG file however. But I did a quick test and it misbehaved for me as well. Either it didn't request the .desc file, for the coresponding JPG, from the server or the server refused the request.

        I need to investigate a bit more. If it turns out to be a bug it will be fixed in the next patch release.

        Comment


        • #5
          I want to point out that there is a difference between having images indexed (which means you can search for image files by their meta content, filename, image attributes, ALT text or link text, etc.), and having images displayed (showing up as a thumbnail) in the search results. Both are possible in Zoom, but they are not one and the same and they require different things.

          It sounds like what you are asking for is a way to display the images in the RSS search results. Zoom does this, but I believe there is some misunderstanding involved here.

          When Zoom returns a search result which has an image or thumbnail associated to be displayed with the link, the RSS looks like this:

          Code:
          <item>
            <zoom:imageURL>http://mywebsite.com/images/thumb/th_book.jpg</zoom:imageURL>
            <title>About my book</title> 
            <link>http://mywebsite.com/aboutmybook.html</link> 
            <description>About my latest book and my work in progress</description> 
            <zoom:context>... the long-awaited cover for my book returned from the graphic artist and I've ...</zoom:context> 
            <zoom:termsMatched>1</zoom:termsMatched> 
            <zoom:score>20</zoom:score> 
            <pubDate>Wed, 29 Mar 2006 10:50:58 GMT</pubDate> 
            <zoom:fileSize>37k</zoom:fileSize> 
            </item>
          The OpenSearch standard does not specify a tag for images or thumbnails to be associated with a search result. This is why a custom tag is required here, in the form of <zoom:imageURL>. If you are parsing in the XML/RSS output here, you can then do with the imageURL as you need and present that in whatever manner you desire.

          From your posted description, I don't think you have images setup to be displayed or thumbnails associated with search results. You need to do this before considering having images displayed in the results like flickr, etc. You can read more about it in the thumbnails chapter of the Users Guide.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Originally posted by wrensoft View Post
            Using a .desc file should have worked with a JPG file however. But I did a quick test and it misbehaved for me as well.
            We have looked into this problem further and can confirm that it is a bug in the current release. The use of .desc files for images will be fixed in the next build (5.0.1002).
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Originally posted by Ray View Post
              The OpenSearch standard does not specify a tag for images or thumbnails to be associated with a search result. This is why a custom tag is required here, in the form of <zoom:imageURL>. If you are parsing in the XML/RSS output here, you can then do with the imageURL as you need and present that in whatever manner you desire.

              From your posted description, I don't think you have images setup to be displayed or thumbnails associated with search results. You need to do this before considering having images displayed in the results like flickr, etc. You can read more about it in the thumbnails chapter of the Users Guide.
              Right! Just noticed that. I think where things start to get muddied though is if one were to have search categories like "PDF" and specified an icon URL for the .pdf extension. Having something like:

              <zoom:imageURL>http://canadianarachnology.dyndns.org/template/pdf_icon.gif</zoom:imageURL>

              in the resultant XML/RSS doesn't really provide any value.

              Things really get muddied with ZOOMIMAGE meta tags and the creation of an "Image" category with .jpg extensions.

              I think what we really need is a dedicated "Image" category whereby the <zoom:imageURL> is constructed from the img src tag on the indexed page.

              Comments?

              Comment


              • #8
                Originally posted by Ray View Post
                You need to do this before considering having images displayed in the results like flickr, etc. You can read more about it in the thumbnails chapter of the Users Guide.
                I just stumbled across flickr's RSS 2.0, which has the Yahoo <media:> type enclosures. This would make things so much easier to parse since these are becoming much more prevalent. (e.g. http://api.flickr.com/services/feeds...ne?format=rss2). Any chance you guys would consider the media RSS 2.0 module even though is doesn't "yet" fit the OpenSearch bill?

                Comment


                • #9
                  Originally posted by dps1 View Post
                  Right! Just noticed that. I think where things start to get muddied though is if one were to have search categories like "PDF" and specified an icon URL for the .pdf extension. Having something like:

                  <zoom:imageURL>http://canadianarachnology.dyndns.org/template/pdf_icon.gif</zoom:imageURL>

                  in the resultant XML/RSS doesn't really provide any value.
                  Well it does, but it rather depends on what you are trying to achieve with the XML/RSS output. Remember that the XML output has many possible uses, and some people may want the icon image pre-determined so they don't need to do it on a per file basis when they are post-processing the results in their own scripts/apps.

                  Originally posted by dps1 View Post
                  Things really get muddied with ZOOMIMAGE meta tags and the creation of an "Image" category with .jpg extensions.
                  Zoom will only display up to one image per search result. So if you have icon images setup, this will be the image used, and it would not achieve the objective of having an image which is representative of the result - for that, you should be using the thumbnail options.

                  If you have a ZOOMIMAGE meta tag on a page, this will override any icon or thumbnail image setting for that page.

                  Does that clear things up?

                  Originally posted by dps1 View Post
                  I think what we really need is a dedicated "Image" category whereby the <zoom:imageURL> is constructed from the img src tag on the indexed page.
                  You can achieve this for the image file results (eg. search results for the JPG files themselves as opposed to the HTML pages etc.) by setting your thumbnail options with the default (mostly blank) values. This tells Zoom that the thumbnail image is the same URL as the original file's URL.

                  However, if you're trying to do this with HTML web pages, then I'm not quite sure how it would be useful. First of all, in most cases, you do not want the original images for your search results - these are potentially very large, full-screen images. You would only want thumbnailed versions of images amongst your search results in most cases (for layout reasons, as well as traffic/page loading time). Second, there are many images on a page (all inserted with the img src tag). Which image would be selected?

                  Perhaps you should describe what you are actually trying to achieve, the searches you are expecting to make and the format of results you wish to produce. It would help us understand what exactly your requirements are.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment


                  • #10
                    Raymond,

                    Thanks for replying. What it all boils down to for me is that I'm not too keen on creating hyperlinks for all the images I host, which in my mind encourages the sharing of these images without providing either a credit or a link back to the page on which the image resides. As I understand it, your spider necessarily uses <a href="...jpg"><img src="...jpg"></a> in order to effectively index images for XML/RSS output (or HTML output too for that matter). And yet, I'm looking for a means to share these images (OpenSearch seems to be a pretty good candidate), provided links back to the hosting page and some extra bits like a credit are in the RSS output....that's where at this stage I was hoping to make use of a slew of .desc files to get a credit for an image somewhere in the XML/RSS OpenSearch output.

                    Even with a specified thumbnail, the link to the hosted page is lost with the XML/RSS output. Rather, <link> points to the image itself and <zoom:imageURL> points to its thumbnail.

                    This is where the RSS 2.0 media module is looking really attractive for me as a means to effectively share images via OpenSearch because a number of parsers are already available for this module.

                    My ideal would be something like:
                    Code:
                    <?xml version="1.0" encoding="windows-1252"?>
                    <!--Zoom Search Engine Version 5.0 (1000) PRO-->
                    <rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss" xmlns:zoom="http://www.wrensoft/zoom/response/5.0/schema/">
                    <channel>
                    <title>TITLE</title>
                    <description>DESCRIPTION</description>
                    <link>HOMEPAGE</link>
                    <zoom:searchquery>SEARCH TERM</zoom:searchquery>
                    <zoom:searchcategory>All</zoom:searchcategory>
                    <opensearch:totalResults>32</opensearch:totalResults>
                    <opensearch:startIndex>1</opensearch:startIndex>
                    <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
                    <item>
                    <zoom:imageURL>http://...image_thumb.jpg</zoom:imageURL>
                    <title>Title</title>
                    <link>...link to the page...</link>
                    <category>...whatever...</category>
                    <media:content url="http://...image.jpg" 
                               type="image/jpeg"
                               height="480"
                               width="640"/>
                    <media:title>DSCN1093</media:title>
                    <media:thumbnail url=http://...image_thumbnail.jpg height="75" width="75" />
                    <media:credit role="photographer">dscheffy</media:credit>
                    <zoom:context>zoom:context>
                    <zoom:termsMatched>1</zoom:termsMatched>
                    <zoom:score>753</zoom:score>
                    </item>
                    </channel>
                    </rss>
                    I realize the above is at this stage proprietary, but I see no way in your current spidering routine to have all of the following somewhere between any one <item> element:
                    1. A link to the hosted page
                    2. A link to the image
                    3. A link to the thumbnail
                    Have I misunderstood this?

                    Since you also have an mp3 plugin, it would seem to only make sense to use the RSS 2.0 media module.

                    Comment


                    • #11
                      Originally posted by dps1 View Post
                      As I understand it, your spider necessarily uses <a href="...jpg"><img src="...jpg"></a> in order to effectively index images for XML/RSS output (or HTML output too for that matter)
                      No, not quite. Zoom will index image files found through several means:

                      1.) In Offline mode, Zoom will index any image files found within the start directory, provided the files match the image extensions you've added to the Extensions List (found on the Scan Options tab).

                      2.) In Spider mode, Zoom will download and index image files which are either linked via a <a href="mypic.jpg">...</a> tag, OR an inline image in the form of a stand-alone <img src="mypic.jpg"> tag. Your image tags do not need to be links for them to be indexed by the spider.

                      Originally posted by dps1 View Post
                      And yet, I'm looking for a means to share these images (OpenSearch seems to be a pretty good candidate), provided links back to the hosting page and some extra bits like a credit are in the RSS output....that's where at this stage I was hoping to make use of a slew of .desc files to get a credit for an image somewhere in the XML/RSS OpenSearch output.
                      OK, this gives me a better picture of what you're trying to do, although not quite a complete one. It sounds like you have a gallery or photo album of some sort. Can you clarify if what you actually mean is that you have individual pages for each of the image files that you wish to index? Or can a "hosting page" contain several images that you wish to index? Do you have a URL to the pages in question and the images in question?

                      The reason I ask is that if it is the former, you could avoid the need to index images at all, if all you need is to index these "hosting pages", which presumably can contain the credit and other meta details, and you will then only need to associate the image to that page via use of the ZOOMIMAGE meta tag.

                      Originally posted by dps1 View Post
                      Even with a specified thumbnail, the link to the hosted page is lost with the XML/RSS output. Rather, <link> points to the image itself and <zoom:imageURL> points to its thumbnail.
                      Actually, no, not always. Images found via an <img src="blah.jpg"> tag as opposed to an <a href="blah.jpg">...</a> link should return a URL to the page that was hosting the image.

                      However! I think I just found a bug that turns this behaviour off when you disable "ALT text" from the "Indexing options" tab. I'll presume you must have this option off. Turn it back on and you should see the behaviour described above. I'll look into this further and confirm if this needs to be fixed for the next build.

                      Here's another note though. If the same image is hosted on several different pages via an <img src="..."> tag, only the first page that the spider finds with this image tag will be used as the result URL.

                      Originally posted by dps1 View Post
                      This is where the RSS 2.0 media module is looking really attractive for me as a means to effectively share images via OpenSearch because a number of parsers are already available for this module.
                      Out of curiosity, which parsers specifically are supporting this module? Knowing this would help us to gauge the level of public acceptance a non-standard format has attained.

                      At this point, it does not seem likely for us to add support for Yahoo's proprietary, non-standard "media" namespace. As you have already noted, it is not part of the OpenSearch format, which means that it may become redundant or obsolete any time in the future (should OpenSearch decides to implement things differently, or should Yahoo change their format, which they are obliged to do). You mentioned before that flickr was using this format, but that's because flickr is owned by Yahoo!

                      As always though, support for this kind of stuff depend on the level of user demand, and we'll need to hear from more users before seriously considering it at any length.

                      Originally posted by dps1 View Post
                      I realize the above is at this stage proprietary, but I see no way in your current spidering routine to have all of the following somewhere between any one <item> element:
                      1. A link to the hosted page
                      2. A link to the image
                      3. A link to the thumbnail
                      This is true, although you can get 1+2 or 1+3. The question though, is whether you really need 1+2+3? I thought you just said you didn't want links to the image directly, and only wanted links to the hosted page?
                      --Ray
                      Wrensoft Web Software
                      Sydney, Australia
                      Zoom Search Engine

                      Comment


                      • #12
                        Here are a few parsers that fit the bill for OpenSearch and media RSS:

                        http://www.magicparser.com/node/235 (a discussion based on the Magic Parser php app).

                        SimplePie handles enclosures and can strip out file types, URLs, etc. through their array of functions (http://simplepie.org/)

                        Universal Feed Parser handles the Media namespace: http://feedparser.org/docs/namespace-handling.html

                        If you want a commercial parser for OpenSearch and media RSS: http://www.geckotribe.com/rss/carp/ (site's a bit garish, but the feature list is here: http://www.geckotribe.com/rss/carp/features.php)

                        There are bound to be others I don't know about.

                        And, it's not just Yahoo that produces media RSS. Google has it in their video feeds: http://video.google.com/videofeed?ty...=20&output=rss. Vox also spits it out as discussed here: http://www.sixapart.com/developers/p...a_profile.html for their bloggers.

                        Anyhow, that gives you a feel for what's out there in terms of the parsers that can do it. As for the rest of your questions, that was in a PM because they may be specific to my case and others like me. I'll be sure to post here if I can get a roll call for others wanting this.

                        As always, you guys are a great help & have developed a truly fantastic product.

                        Comment


                        • #13
                          OpenSearch: DeWitt Clinton's response on media RSS

                          Folks,

                          The primary creator of OpenSearch is keen on Yahoo's Media RSS extension and had this to say:

                          http://lists.opensearch.org/pipermai...ry/thread.html

                          Comment

                          Working...
                          X