PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Index a newsletter(PDF) but not directly ??

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Index a newsletter(PDF) but not directly ??

    I have a newsletter in PDF format on my site which I'd like to include in the site index. BUT I want to a) deemphasize it and b) force users to access it via a "download gateway" page rather than directly. How can I do this?

    It seems that for a) I need to apply a (negative) "boost" to one specific whole page (e.g. the PDF newsletter). How can I do this?

    And it seems for b) that after the index is built I want to change the URL of one specific page (the PDF newsletter) to a different one (the "download gateway"). How can I do this?

    tia!

  • #2
    a) To apply a boost value on that specific PDF document, you will need to use a .desc file containing the ZOOMPAGEBOOST meta tag, eg:

    <meta name="ZOOMPAGEBOOST" content="-5">

    See chapter 2.10.4 in the Users Guide for information on .desc files,
    http://www.wrensoft.com/zoom/usersguide.html

    b) If you only need to do this for one or two pages, you can manually edit the "zoom_pages.zdat" file in a text editor, and locate the URL to the relevant file and change it correspondingly. However, be careful when you modify index files, as you could corrupt them if you make unexpected changes. This method also has the disadvantage of requiring you to make this change every time you re-index.

    A better solution to both of the above problems will be available in Version 4.1 of Zoom.

    For the first problem, Zoom 4.1 will provide a "page density" weighting factor, which will automatically scale the weighting of words on a page, depending on the size of a document. This helps to de-emphasize big PDF documents (which may contain hundreds of pages) from "swamping" the results because of the large number of words found in them, and would not require you to create .desc files for each document that needs de-emphasizing.

    For the second problem, Zoom 4.1 will be capable of indexing PDF files from a download gateway page. This means that a page like http://www.mysite.com/dl.php?fid=123 (which will return a PDF document), will be indexed correctly, and the search result URL will thus point to the download gateway page.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      snipped.
      "For the second problem, Zoom 4.1 will be capable of indexing PDF files from a download gateway page. This means that a page like http://www.mysite.com/dl.php?fid=123 (which will return a PDF document), will be indexed correctly, and the search result URL will thus point to the download gateway page"
      snipped.

      Can you tell me when this release might be finished as I need this feature urgently. I have just upgraded to the standard version from the free version so that I can index a number of pdf newsletters at this website (www.grasslandnsw.com.au) but access to the pdf files needs to be restricted to members only. I want to be able to index the pdf files directly so a) it saves me typing a table of contents into a html page that can be indexed and b) so that potential new members can see what is on offer but cannot actually access the whole document until they join up.

      What are my options here - wait for the upgrade (but need it urgently) or is there another solution?

      cheers
      Leah Lane

      Comment


      • #4
        Leah, send us an e-mail (see our Contact Us page) if you need something urgent.

        You should also note that because you want to actually index documents which have restricted user access, this could vary depending on how your authorisation process is implemented. HTTP authentication is internally supported by Zoom, but cookie-based authentication requires logging in beforehand with IE. See http://www.wrensoft.com/zoom/support...lems.html#auth for more information.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X