PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

HTML Tag Inclusion or Exclusion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTML Tag Inclusion or Exclusion

    I have just started using the free version of this great script and so far I've been reasonably successful at getting it to do and look the way I want it to.

    You can see it in action at this site: RailWorks America
    http://railworksamerica.com/FileLibrary/

    I do have a couple of questions though.

    Is there a way to exclude all <p> tags without having to go through every page inserting the ZOOMSTOP & ZOOMRESTART comments?
    Or alternatively, is there a way to have it index only <h4> and <h6> tags?

    What I'm trying to achieve is for the search results page to only display the following two lines (I realize I'll have to change the <p> to <h6>):
    <h4>EMDX GP38-2 Re-Paints</h4>
    <p>Updated: April 6, 2011 - by Mike (<em>MadMike1024</em>) Calvin
    As it is, it's displaying part of these two lines (which I don't want):
    <p class="cent XVI">Page 1 &nbsp;|&nbsp; <a href="RollingStock_Page2.html">Page 2</a> &nbsp;|&nbsp; <a href="RollingStock_Page3.html">Page 3</a> &nbsp;|&nbsp; <a href="RollingStock_Page4.html">Page 4</a> &nbsp;|&nbsp; <a href="RollingStock_Page5.html">Page 5</a> &nbsp;|&nbsp; <a href="RollingStock_Page6.html">Page 6</a> &nbsp;|&nbsp; <a href="RollingStock_Page7.html">Page 7</a>
    <p><a href="link removed">Download</a> &nbsp;-&nbsp; 12.3 MB &nbsp;-&nbsp; <span class="format">RAR Format</span> &nbsp;-&nbsp; <span class="lt-blu ital"><script type="text/javascript" src="link removed"></script> Downloads</span> &nbsp;-&nbsp; <a href="../zips/rol-stk/m-c/EMDX-GP38-2_Readme.txt">Read Me</a>
    *****************

    Also, I'm having a bit of difficulty getting the indexer to exclude the index.html page found in the FileLibrary directory.
    In the Skip Options section I've tried the following:

    http://railworksamerica.com/FileLibrary/index.html
    \FileLibrary\*\index.html
    index.html
    *\index.html
    \index.html

    None of which seem to have been successful.

    If you want to test this, try doing a search for GreatNortherner.

    I was successful in getting the indexer to ignore the following page:

    http://railworksamerica.com/FileLibrary/LastMonth_Additions.html

    by adding that whole link to the Skip Options.

    Eventually I plan on buying the Professional Edition (when I get the extra money) because i would like to utilize some of the thumbnail options (if I can figure out if they will work as I want), plus I have another site I would like to use this on that is much larger than the site linked to above.
    Last edited by Hawk; Apr-08-2011, 03:41 PM.
    Hawk

  • #2
    You seem to be trying to control what is displayed in the context section of the search results page.

    The context section display the text surrounding the search word on the page in question. So the text displayed as context will vary depending on the search word. You can control the amount of text displayed as context (i.e. how many words before and after the search word). This can be done from the "Results layout" configuration window.

    For the index.html file. The skip rules seem to be working fine as far as I can see. You aren't indexing this URL,
    http://railworksamerica.com/FileLibrary/index.html
    But there is a 2nd URL to the same page,
    http://railworksamerica.com/FileLibrary/
    and this 2nd one is harder to skip with the skip list, as the URL is common to all others on the site.

    I would suggest adding this tag to the page,
    <meta name="robots" content="noindex">

    Comment


    • #3
      Thanks for your reply. What you say about the skip rule makes sense. I guess I'll just have to deal with it like it is. Not that big of a problem.

      As to what can be done on the Results Layout configuration window, other than the check boxes, the only other option I see is Context size, which doesn't seem to me would have any control over what is displayed before or after the search term, but only how much text is displayed. I was hoping to control what text is displayed.
      Oh well. The script still works pretty good.

      I did notice one item on the Indexing options page that I can't find any information about either on your site or in the pdf. What does the Param tag values option control?

      Edit 1: I added the noindex tag you suggested but the results page is still displaying the index page.
      No big deal. I guess I can live with it.

      Thanks for your help.
      Hawk

      Comment


      • #4
        I was hoping to control what text is displayed.
        You can for meta descriptions and page title. But as the context depends on the search word used by the user there isn't any sensible way to control what is displayed for the near infinite number of search words a user might type in.

        What does the Param tag values option control
        It controls is the value of the HTML Param tag is indexed.
        e.g.
        <param name="Proprietary.Data" value="Serial#12344451">

        I added the noindex tag you suggested but the results page is still displaying the index page
        You'll need to re-index your site and upload a new set of index files.
        Also turn off caching on the "Spider options" configuration window.

        Comment


        • #5
          Originally posted by wrensoft View Post
          You'll need to re-index your site and upload a new set of index files.
          Also turn off caching on the "Spider options" configuration window.
          I did reindex the site and upload the new files, but I didn't try turning off the cache.
          I'll try that.

          Thanks!
          Hawk

          Comment

          Working...
          X