PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

filename indexing question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filename indexing question

    Hi,

    Some websites have URLs that don't show filenames. e.g. the file at:

    http://www.website.com/GetArticle.aspx?doi=10.1155/AFS.1999.17

    has a filename of:

    S0793029199000038.pdf

    As I see that indexing filenames is an option in Zoom, what would Zoom index as the filename in this case? S0793029199000038.pdf or AFS.1999.17?

    If it is the former, then (with the "." join character disabled), would a search of S0793029199000038 retrieve the page?

    And (with the "." join character enabled), would a search of S0793029199000038.pdf retrieve the page?

    Thanks,

    Will
    Last edited by will; Aug-31-2007, 03:14 PM.

  • #2
    I believe you have to use the Zoom plugins for PDF, DOC, etc.

    They're a free download for the web site.


    Good luck,
    Leon

    Comment


    • #3
      The issue:

      Need to know what Zoom indexes when it processes the file from a URL which does not contain the filename (please refer to my first example earlier in the thread).

      But thanks for replying.

      Comment


      • #4
        I believe we always use the 'file name' from the URL as the file name. Even if the file name happens to be the name of a script (AFS.1999.17 in your example). But I would need to test it to be sure, (as we also look at the internal file name and mime type in the HTTP header to help work out what type of file it is we are dealing with).

        Comment


        • #5
          Thanks. Makes sense since .desc files work for URLs of these types.

          Not sure how Zoom would react if asked to index the filename here though:

          http://website.com/index.php?show=this_download&id_this_name=6807

          "6807" perhaps?

          Comment


          • #6
            The file name is index.php

            Comment


            • #7
              Zoom will actually look at the Content-Disposition header for these situations.

              So in your original scenario, if your script at:
              http://www.website.com/GetArticle.as...55/AFS.1999.17
              actually returns a content-disposition header specifying the filename as "S0793029199000038.pdf", Zoom will correctly pickup the "S0793029199000038.pdf" filename for indexing. And if you have filenames enabled for indexing, and dots disabled from word joining, it should return in search results when searching for "S0793029199000038".

              If a content-disposition header is not sent by the script serving the PDF file, then it will behave as described above, and use the filename of the script ("GetArticle.aspx" in your original example, and "index.php" in the last example).

              FYI, a content-disposition header is typically sent like this, in PHP:
              Code:
              <?php
              header("Content-Disposition: attachment; filename=\"myfilename.pdf\"");
              ?>
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Thanks. Very helpful.

                Comment

                Working...
                X