PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

HTML Tag Inclusion or Exclusion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hawk
    replied
    Originally posted by wrensoft View Post
    You'll need to re-index your site and upload a new set of index files.
    Also turn off caching on the "Spider options" configuration window.
    I did reindex the site and upload the new files, but I didn't try turning off the cache.
    I'll try that.

    Thanks!

    Leave a comment:


  • David
    replied
    I was hoping to control what text is displayed.
    You can for meta descriptions and page title. But as the context depends on the search word used by the user there isn't any sensible way to control what is displayed for the near infinite number of search words a user might type in.

    What does the Param tag values option control
    It controls is the value of the HTML Param tag is indexed.
    e.g.
    <param name="Proprietary.Data" value="Serial#12344451">

    I added the noindex tag you suggested but the results page is still displaying the index page
    You'll need to re-index your site and upload a new set of index files.
    Also turn off caching on the "Spider options" configuration window.

    Leave a comment:


  • Hawk
    replied
    Thanks for your reply. What you say about the skip rule makes sense. I guess I'll just have to deal with it like it is. Not that big of a problem.

    As to what can be done on the Results Layout configuration window, other than the check boxes, the only other option I see is Context size, which doesn't seem to me would have any control over what is displayed before or after the search term, but only how much text is displayed. I was hoping to control what text is displayed.
    Oh well. The script still works pretty good.

    I did notice one item on the Indexing options page that I can't find any information about either on your site or in the pdf. What does the Param tag values option control?

    Edit 1: I added the noindex tag you suggested but the results page is still displaying the index page.
    No big deal. I guess I can live with it.

    Thanks for your help.

    Leave a comment:


  • David
    replied
    You seem to be trying to control what is displayed in the context section of the search results page.

    The context section display the text surrounding the search word on the page in question. So the text displayed as context will vary depending on the search word. You can control the amount of text displayed as context (i.e. how many words before and after the search word). This can be done from the "Results layout" configuration window.

    For the index.html file. The skip rules seem to be working fine as far as I can see. You aren't indexing this URL,
    http://railworksamerica.com/FileLibrary/index.html
    But there is a 2nd URL to the same page,
    http://railworksamerica.com/FileLibrary/
    and this 2nd one is harder to skip with the skip list, as the URL is common to all others on the site.

    I would suggest adding this tag to the page,
    <meta name="robots" content="noindex">

    Leave a comment:


  • Hawk
    started a topic HTML Tag Inclusion or Exclusion

    HTML Tag Inclusion or Exclusion

    I have just started using the free version of this great script and so far I've been reasonably successful at getting it to do and look the way I want it to.

    You can see it in action at this site: RailWorks America
    http://railworksamerica.com/FileLibrary/

    I do have a couple of questions though.

    Is there a way to exclude all <p> tags without having to go through every page inserting the ZOOMSTOP & ZOOMRESTART comments?
    Or alternatively, is there a way to have it index only <h4> and <h6> tags?

    What I'm trying to achieve is for the search results page to only display the following two lines (I realize I'll have to change the <p> to <h6>):
    <h4>EMDX GP38-2 Re-Paints</h4>
    <p>Updated: April 6, 2011 - by Mike (<em>MadMike1024</em>) Calvin
    As it is, it's displaying part of these two lines (which I don't want):
    <p class="cent XVI">Page 1 &nbsp;|&nbsp; <a href="RollingStock_Page2.html">Page 2</a> &nbsp;|&nbsp; <a href="RollingStock_Page3.html">Page 3</a> &nbsp;|&nbsp; <a href="RollingStock_Page4.html">Page 4</a> &nbsp;|&nbsp; <a href="RollingStock_Page5.html">Page 5</a> &nbsp;|&nbsp; <a href="RollingStock_Page6.html">Page 6</a> &nbsp;|&nbsp; <a href="RollingStock_Page7.html">Page 7</a>
    <p><a href="link removed">Download</a> &nbsp;-&nbsp; 12.3 MB &nbsp;-&nbsp; <span class="format">RAR Format</span> &nbsp;-&nbsp; <span class="lt-blu ital"><script type="text/javascript" src="link removed"></script> Downloads</span> &nbsp;-&nbsp; <a href="../zips/rol-stk/m-c/EMDX-GP38-2_Readme.txt">Read Me</a>
    *****************

    Also, I'm having a bit of difficulty getting the indexer to exclude the index.html page found in the FileLibrary directory.
    In the Skip Options section I've tried the following:

    http://railworksamerica.com/FileLibrary/index.html
    \FileLibrary\*\index.html
    index.html
    *\index.html
    \index.html

    None of which seem to have been successful.

    If you want to test this, try doing a search for GreatNortherner.

    I was successful in getting the indexer to ignore the following page:

    http://railworksamerica.com/FileLibrary/LastMonth_Additions.html

    by adding that whole link to the Skip Options.

    Eventually I plan on buying the Professional Edition (when I get the extra money) because i would like to utilize some of the thumbnail options (if I can figure out if they will work as I want), plus I have another site I would like to use this on that is much larger than the site linked to above.
    Last edited by Hawk; Apr-08-2011, 03:41 PM.
Working...
X