PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

How the page be scanned but not be indexed ?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How the page be scanned but not be indexed ?

    The index_main.html contains a lot of url that linking to the other informational web pages. The latter are those I would like to be scanned and indexed while the former I don't want it to be scanned but not to be indexed.

    I tried to put index_main.html in the "Page and folder skip list", however, seems that the informational web pages url inside the index_main.html were not be scanned and indexed too. Could anyone help to tell me how to do to fit my purpose in the first paragraph ?

    I'm using the "Zoom Search Engine Version 5.1 (Build: 1017) Free Edition for evaluation.

    Regards.
    Ivan

  • #2
    Sorry, the line in the first paragraph should be "The latter are those I would like to be scanned and indexed while the former I want it to be scanned but not to be indexed.

    Comment


    • #3
      Specify that page as an Additional Start Point. Click on the "More" button to do so. Here, you can change the Spider Option for that start point (click "Edit") to "Follow links only".

      An alternative is to use a robots meta tag set to "noindex" (making sure to enable robots.txt support in Zoom)

      Another alternative is to wrap the contents of the page you want to exclude from indexing (but the spider to follow the links) by using the ZOOMSTOP and ZOOMRESTART tags.

      Look these features up in the Users Guide for more information.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Hi, Raymond,

        Thx for the reply, in fact, there is a lot of page that I would like to be scanned but not indexed, ..., all of them have a common property that their filename contains "index" like "index_catergory.html ", "help_index.html" or "abc_index_xyz.html"...

        Is there a way to specify a filename that matched the pattern like such "index_*.html" , "*_index.html" or "*_index_*.html" then those files will only be scanned but not indexed.

        Regards.
        Ivan

        Comment


        • #5
          You can also add this tag to each of the pages in question.
          <meta name="robots" content="noindex">

          Comment


          • #6
            Does the .PDF be scanned and indexed? Thx.

            Comment


            • #7
              "noindex" means to not index any content on the page, but continue to follow any links found on the page. "nofollow" specifies not to follow or look for any links on the page.

              So if your PDF files are linked from this HTML page containing a "noindex" tag, they will be found and indexed.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Can PDF content be scanned

                so, could the PDF content ( , not title or filename ) be scanned? let's say if there is a word "Help" within the "customerInfo.pdf", will this file be found out to be one of search result link? (The search key word is "Help" ) Thx.

                Comment


                • #9
                  Please check the FAQ:
                  Q. Does Zoom (with plugins) index all the words inside the PDF and DOC documents?

                  You need one of the registered editions (Standard, Pro or Enterprise) to index PDF files.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment

                  Working...
                  X