PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Link text included in search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Link text included in search

    Under Indexing Options, I deselected the Link text option. However, it seems that link text is still being indexed, based on the results I'm seeing.

    The links are in this format in the code:
    Code:
    <a id="h7048" class="jumptemplate" title="Contact Info tab"
    href="7038.htm" target="_self">
    Is the indexer looking for tags only in the format of
    Code:
    <a href="
    or something like that?

  • #2
    Do you mean that links are still being followed, or do you mean that the link text (of which there isn't any in your examples) is still being indexed?

    Link text is the text between the <a> and </a> tags. So in this example, "Users guide" is the link text.
    Code:
     
    <a href="../ftp/masternode.pdf">Users Guide</a>
    The deselection of the link text indexing option does not effect if links are followed or not. If link text is indexed then the keywords are assigned to the destination document and not the source. So the words "users guide" will be associsted with the masternode.pdf PDF file and not with the HTML source file.

    Comment


    • #3
      I wish I could upload an image to show an example...

      I meant that the text that falls between the "a" tags is still being indexed.

      I'm doing a search on a term, for the sake of the example, we'll say I'm searching on "apple". In some of the results that are returned, "apple" appears only within link text on the page. I would assume that those topics would not be included in the search results at all.

      Is there another related option that is somehow overriding the fact that I deselected the Link Text option on the Indexing Options page?
      Last edited by jroosevelt; Aug-14-2007, 01:09 PM.

      Comment


      • #4
        No, the text that is part of a hypertext link is always indexed for the page that it appears on.

        This is because it is considered to be an integral part of the content of a page. For example, it is common to have text such as the following:

        I have a pet dog named <a href="alfie.html">Alfie</a>, and he likes to dig around in my <a href="garden.html">garden</a>, and cause havoc throughout the house. <a href="alfie.html">Click here</a> for more info on him!
        If the link text was stripped out, it would appear as the following in your context description:

        I have a pet dog named, and he likes to dig around in my, and cause havoc throughout the house. for more info on him!
        So there is no option to change this behaviour, as it would alter the meaning of much content.

        The option you are referring to, the "Link text" checkbox on the "Indexing Options" tab of the Configuration window, as David explained above, refers to the indexing and association of the link text to the destination page.

        That is, with this option enabled, if we find the following HTML on "page1.html":

        Code:
        You can find more information in our 
        <a href="http://www.wrensoft.com/zoom/usersguide.html">Users Guide</a>.
        Then, when we index "usersguide.html", we will associate the words "Users Guide" with it, because that was used to link to the page. This is particularly useful for image files, which lack meaningful search text on its own.

        We should be updating our documentation in the near future, to clarify this functionality and avoid confusion.
        Last edited by Ray; Aug-16-2007, 01:06 AM.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Okay, that makes more sense now. Thank you.

          The issue I was running into (and the reason I was hoping that you could completely omit indexing of link text), is that I use Zoom to index an online Help system that has a lot of "See also" links. Therefore, the text in those links only loosely applies to the context of the page it appears on, if that makes sense.

          For example, if I had a help topic all about Apples, I might have links that said:
          See also:
          - Oranges
          - Bananas
          - Grapes

          So, when I search for "Oranges" the results include the Apples topic, even though it really doesn't apply to apples at all. But the link text weighting should help out here, at least.

          Anyway, thanks for the clarification.

          Comment


          • #6
            In your scenario, the most suitable thing would be to use the <!--ZOOMSTOP--> and <!--ZOOMRESTART--> tags to exclude your "See also" link text from being indexed.

            See this FAQ for more information:
            Q. How do I prevent parts of my webpage from being indexed (eg. exclude navigation menus, or page footers)?
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X