PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Metadata in file description .desc files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metadata in file description .desc files

    I think that using cusom .desc files is very good idea for indexing files but I believe that there is an important option which is still missing there. At least I could not find any mentioning about it in documentation. In particular I am talking about <url> or similar metadata field which allows to open any file from the search page. One can do it now using "Base URL" if searched file is in the folder inside website document root, but it is not always the case. Quite often files are stored outside document root for security reasons and can be viewed/downloaded by script only.

    Of course I am talking about "offline" file indexing which is much faster then spidering of website.

    Is it possible to use custom url field now or can we expect this feature in the future versions ?

    Thanks,
    Greg

  • #2
    It is hard to give specific advise as we don't know the full details of your situation.

    If your files can only be accessed via a script, then you should be indexing the site in spider mode. Server side scripts that generate dynamic web pages or control access to documents are never executed in offline mode.

    Sticking with offline mode in this case to index documents in more complex. Yes, you can manipulate the base URL to transform file paths into URLs that call script that access the files. Maybe you can define multiple offline start directories, each with their own unique base URLs.

    ----
    David

    Comment


    • #3
      Thanks for reply David. Your idea about using multiple base URL helps in many cases but not in all. One example is "private" file which must be accessible for logged users only. You can not place such files inside your document root. I believe that it is quite common situation. At least most our clients have such files.

      You are right that one can use indexing in spider mode for attachments but it is not always enough as well. For example it happens when (some) files are managed separately from your web site. In any case it is much less efficient.

      Please consider using additional metatags in .desc files as idea for the future extension of Zoom. I would be happy if it appear in next version . There are other tags which may be required for displaying search results as well for example <author>. I believe that even adding arbitrary new tag (probably with content indexed by default) to this file should be relatively straitforward thing to do. The only requirement is that content of the tag should be accessible to search results page template through your API.

      Originally posted by Wrensoft
      It is hard to give specific advise as we don't know the full details of your situation.

      If your files can only be accessed via a script, then you should be indexing the site in spider mode. Server side scripts that generate dynamic web pages or control access to documents are never executed in offline mode.

      Sticking with offline mode in this case to index documents in more complex. Yes, you can manipulate the base URL to transform file paths into URLs that call script that access the files. Maybe you can define multiple offline start directories, each with their own unique base URLs.

      ----
      David

      Comment


      • #4
        Originally posted by gregr
        Thanks for reply David. Your idea about using multiple base URL helps in many cases but not in all. One example is "private" file which must be accessible for logged users only. You can not place such files inside your document root. I believe that it is quite common situation. At least most our clients have such files.
        It is difficult to know exactly what you mean without knowing how you are handling authentication, what you mean by logging users, etc. But you can have multiple start folders (by clicking on the More button) which allows you to index from different document roots. This means you don't need to place all your files within a common root folder.

        You are right that one can use indexing in spider mode for attachments but it is not always enough as well. For example it happens when (some) files are managed separately from your web site. In any case it is much less efficient.
        You can specify multiple start points in Spider Mode as well (click on the "More" button). This allows you to index files which are on separate web sites. You can also allow a single start point to follow links across multiple domains by specifying multiple base URLs in Spider Mode.

        But yes, this would not be as fast as Offline Mode. However, as noted before, if you have any dynamic content (PHP scripts etc) then Spider Mode is recommended.

        There are other tags which may be required for displaying search results as well for example <author>.
        We already index the meta author tag (you will need to turn on "Author" in the "Indexing Options" tab of the Configuration window).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X