PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing only the title for some PDF files?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing only the title for some PDF files?

    [JavaScript]

    I've added PDF files with the plug-in and have added the PDF icon on the results page. Very cool!

    However, we have several dozen PDF files that all contain many examples that each show multiple uses of identical parameters. This, of course, weights all of these files ahead of other, more informative files when the user enters one of these parameters.

    It appears that using .desc files might be the solution, but what meta tag would I use to index only the title but none of the content? All my searching in the User Guide goes around the edges of this situation without a clear direction for me.

    Is this even possible, or will I have to simply put them on the Skip List and fuhgeddaboutit?


    Thx,
    Leon
    Last edited by MergeThis; Nov-05-2007, 02:32 PM. Reason: Refinement of environment

  • #2
    Updating:

    I added a .desc file with these entries:

    <title>CRDSWISSBVV2 Template</title>
    <meta name=”description” content=”Switzerland BVV2 Rule Library Report”>
    <meta name="ZOOMPAGEBOOST" content="-2">

    Zoom is enabling the negative page boost, but is ignoring the title and description information. I even added the .desc file type to the Scan Extensions (although the Zoom help doesn't state that as a requirement), but that didn't change anything.

    This seems to be the proper solution to my initial request, but something's not clicking here. Sigh...


    Leon

    Comment


    • #3
      Update #2:

      I unchecked the "Retrieve internal meta information" option for .pdf files, but to no avail.

      Darn, thought for sure that would do it!


      Leon

      Comment


      • #4
        PDF files can often "swamp" results because they are typically large (one document may in fact contain 10 to 100s of pages), this may result in them always being near the top when searching for common words (or words that appear in these PDF files alot).

        To prevent this from happening, you can adjust the "Content density" weighting (found on the "Weightings" tab of the Configuration window), to give preference to small and medium sized files over large files (typically PDFs). Setting this to "Strong adjustment" should make a significant difference to your results, and you may find this more reasonable for your use.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Ray, thanks for the reminder on the Content density option.

          However, in Update #1, I had indicated that "Zoom is enabling the negative page boost, but is ignoring the title and description information" (in the .desc file). Any ideas on why this is happening (or not, actually)?

          Thx again

          Comment


          • #6
            Make sure you have "Use description (.desc) files" enabled for that PDF file extension.

            Remove the ".desc" file extension from the Scan Extensions list. This isn't necessary and may only confuse the results.

            If you continue to have problems, can you provide us the URLs to the PDF file and its DESC files so we can take a look? (or e-mail the files to us).
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              I'll zip them up and send them, sometime today, thanks.

              Oh, and by the way, I admire your restraint in dealing with the other, more contentious thread on the highlighting script! Even though you're probably losing precious tooth enamel from clenching your jaw so much as you type your response.

              Comment


              • #8
                Duhhhh....

                How do I attach a zip file, either in a reply or in email (all emails are supposedly blocked, and the Posting Rules box says I can't post attachments).

                Comment


                • #9
                  Originally posted by MergeThis View Post
                  Oh, and by the way, I admire your restraint in dealing with the other, more contentious thread on the highlighting script! Even though you're probably losing precious tooth enamel from clenching your jaw so much as you type your response.
                  Thanks, although I do believe I have lost a molar in the process.

                  Originally posted by MergeThis View Post
                  How do I attach a zip file, either in a reply or in email (all emails are supposedly blocked, and the Posting Rules box says I can't post attachments).
                  You can find our e-mail address on the "Contact Us" page. Send it to us directly and just reference this forum thread. We'll get back to you as soon as we get a look at it.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment


                  • #10
                    Grrrr....

                    I still don't see how to attach my zip file.

                    Comment


                    • #11
                      We block all attachments to forum posts. But you can link to a Zip file from a forum post, or as Ray suggested just use normal E-mail. We don't block Zips in normal E-mail.

                      Oh, and by the way, I admire your restraint
                      Yes. He might have had some reasonable technical arguments, but he resorted to name calling instead. Which is pretty unprofessional.

                      Comment


                      • #12
                        Just an update to this problem:

                        Originally posted by MergeThis View Post
                        I added a .desc file with these entries:

                        <title>CRDSWISSBVV2 Template</title>
                        <meta name=description content=Switzerland BVV2 Rule Library Report>
                        <meta name="ZOOMPAGEBOOST" content="-2">

                        Zoom is enabling the negative page boost, but is ignoring the title and description information.
                        Leon has sent us the files and we have taken a closer look. It turns out the problem is in the slanted/curly quotes used in the meta description tag (which is actually evident now in the above code he pasted, now that we know what we're looking for - I've bolded them in red for emphasis).

                        This is not a valid HTML meta tag and that is why Zoom did not recognize the tag. It did however, successfully retrieve the valid title and meta ZOOMPAGEBOOST tags.

                        These slanted quotes appeared in our Help file so that is why Leon inherited them when he copy and pasted from the Help page. We've updated our Help pages since so they won't be a problem if people copy and paste the code in the future.

                        Changing the slanted quotes to straight quotes will fix this problem. That is, it should be as follows:

                        <title>CRDSWISSBVV2 Template</title>
                        <meta name="description" content="Switzerland BVV2 Rule Library Report">
                        <meta name="ZOOMPAGEBOOST" content="-2">
                        --Ray
                        Wrensoft Web Software
                        Sydney, Australia
                        Zoom Search Engine

                        Comment


                        • #13
                          Another instance of the Wrensoft team going out of its way to help its customers!


                          Thx again,
                          Leon

                          Comment

                          Working...
                          X