PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Space added when html tag precedes a period

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Space added when html tag precedes a period

    I'm not sure if this might be related to my previous post about the article "A" being merged with the next word, but here it is.

    When an html tag (</i>, </a>, etc.) precedes a period (comma, semicolon, etc.), a space gets added:

    Code:
    Develop the SQL</a>.

    Search result:
    Develop the SQL .

  • #2
    It is a similar issue in that it only applies with text extracted from the page content for use as description (when context description is disabled, and a meta description tag is not found on the page). However, it is a different issue.

    You should find that this doesn't actually occur with certain HTML tags like <i> ... </i>, or </b>, etc. These inline tags we consider to be non-word-breaking, so we handle them differently. However, it is true that other tags here, such as </a> or </p> will cause an extra space in such a scenario. This only effects the aesthetic display of the results and we consider it minor enough to be a non-issue. The description shown in the results is never meant to be an exact recreation of the layout/formatting of the text from the page. It is only supposed to be a representation of it. It does not effect how words are indexed or searched at all. In some cases, the space is not necessarily wrong, such as in the case of the </p> tag before the dot, which implies that the word should terminate there before the next character.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Actually, the italics are coded this way (in RoboHelp):
      ...to <span style="font-style: italic;">Y</span>.

      which produces in the result:
      ...to Y .

      And, yes, I understand that it's strictly an aesthetic thing. However, the management types who were screaming for a better search engine in our online help will home in on this issue and insist that we fix it. Sigh...

      I'm already running a few post-indexing sweeps through the zoom_pageinfo.js file (some folder renaming and lowercase issues), so maybe I can just add another one to replace "space-dot-space" with "dot-space".


      Leon

      Comment

      Working...
      X