PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing properties of word documents

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing properties of word documents

    Hi,

    In a word document if you click File...Properties and select the 'Summary' tab you have Title, Subject, Author, Manager, Company, Category, Keywords and Comments. Are these all indexed by Zoom or just the title?

    It would be great if comments had been added to a file that a small icon appeared next to the hyperlink indicating this that you could hover over or click on. Or maybe if you typed in someone's name it would find all documents where they were the author.

    Sorry if this has been asked before - I've had a look through the forum but couldn't see anything.

    Thanks very much

    Ed

  • #2
    When indexing Word documents the following properties fields from the .DOC file will be indexed as meta data in addition to the main text of the document.

    Subject
    Title
    Keywords

    Other meta data such as 'manager' and 'hyperlink base' are ignored. As are any custom fields you include.

    Most people creating Word documents fail to fill in even this limited about of meta data correctly however.

    You can also use .desc files to associate extra meta data with documents if required. See the users guide for details.

    Comment


    • #3
      Fantastic news!

      Currently I have a whole load of documents in a folder tree that are indexed and I also have a database of these documents that has filepath and name, title, author and notes amongst other things.

      Presumably this means I could write something that would go through the database and write a .desc file for every single document in one hit.

      If a file has properties and also a .desc file which one 'wins' or does all the information get indexed?

      Thanks very much

      Ed

      Comment


      • #4
        Here is some code which your users may find helpful if they want to automatically create .desc files from a database. The code isn't very tidy but I've tested it and it works!

        Set DB = createobject("ADODB.Connection")
        Set TBL = CreateObject("ADODB.RecordSet")
        DB.Mode = adModeReadWrite
        DB.Open "driver={Microsoft Access Driver (*.mdb)};dbq=mydb.mdb"
        TBL.Open "SELECT DocsMaster.Title, DocsMaster.AuthorID, DocsMaster.Notes, DocsMaster.FilePath FROM DocsMaster", DB
        Set objFSO = CreateObject("Scripting.FileSystemObject")
        counter=0
        Do While Not TBL.EOF
        FilePath=TBL("FilePath")
        If objFSO.FileExists(FilePath) Then
        Set FilePath = objFSO.GetFile(FilePath)
        DestinationFolder=FilePath.ParentFolder
        DescFile=FilePath.Name & ".desc"
        strFile=""
        strFile=strFile & "<title>" & TBL("Title") & "</title>" & vbCRLF
        strFile=strFile & "<meta name=" & CHR(034) & "author" & CHR(034) & " content=" & CHR(034) & TBL("AuthorID") & CHR(034) & ">" & vbCRLF
        strFile=strFile & "<meta name=" & CHR(034) & "description" & CHR(034) & " content=" & CHR(034) & TBL("Notes") & CHR(034) & ">" & vbCRLF
        Set oFiletxt = objFSO.CreateTextFile(DestinationFolder & "\" & DescFile, True)
        oFiletxt.WriteLine(strFile)
        oFiletxt.Close
        counter=counter+1
        End If
        TBL.MoveNext
        Loop
        TBL.Close
        Set objFSO=Nothing
        Set DB=Nothing
        Set DB=Nothing
        wscript.echo counter & " .desc files complete"

        Comment


        • #5
          Hi,

          Here's an old post and I'd like to add three questions about indexing Word documents.

          1) I'm interested in the answer to Emozley question that seemed to be missed. It was, "If a file (Word) has properties and also a .desc file which one 'wins' or does all the information get indexed?"

          2) When creating a .desc file, the help documentation says in should be a text file. Does it matter which kind of encoding the text file uses? For example, from Notepad, one can choose ANSI, Unicode, Unicode big endian or UTF-8. In the Languages tab, I've selected to "Use Unicode (UTF-8 encoding)", two of the ones listed. I saved my .desc text file with UTF-8 encoding, but it didn't work.

          3) For the .desc file, I'm setting a title and a meta name=”description”. My Word document has title and keyword properties. My results layout specifies to use title and meta description, but not context description. How does one go about matching up the .desc file and the Word properties to the options in the results layout tab?

          Thanks again,
          Michele J. Jones, PMP

          Comment


          • #6
            Originally posted by mjones View Post
            1) I'm interested in the answer to Emozley question that seemed to be missed. It was, "If a file (Word) has properties and also a .desc file which one 'wins' or does all the information get indexed?"
            The .desc file has priority and will override any meta properties found in the Word file.

            Originally posted by mjones View Post
            2) When creating a .desc file, the help documentation says in should be a text file. Does it matter which kind of encoding the text file uses? For example, from Notepad, one can choose ANSI, Unicode, Unicode big endian or UTF-8. In the Languages tab, I've selected to "Use Unicode (UTF-8 encoding)", two of the ones listed. I saved my .desc text file with UTF-8 encoding, but it didn't work.
            The encoding of the .desc file should be in the encoding that you have specified on the Languages tab of Zoom. So it should work if you have saved it as UTF-8 encoding. Can you elaborate on what happens when you find that it doesn't "work"? Does it not find the .desc file at all, or does it find it, index it, but the title or description specified isn't used? In which case, what title/description is used? And also, does your title or description etc. contain any foreign characters, or are they in English?

            It may help if you send us some examples of the files in question (or if the site is online, perhaps just your ZCFG file will suffice - with a description of the problem, and what files to look out for where this problem occurs).

            Originally posted by mjones View Post
            3) For the .desc file, I'm setting a title and a meta name=”description”. My Word document has title and keyword properties. My results layout specifies to use title and meta description, but not context description. How does one go about matching up the .desc file and the Word properties to the options in the results layout tab?
            If you have "title" enabled in Results Layout, you will see either the Word document's title, or, if specified (and desc files are enabled for DOC files), the .desc file title.

            If you have the "meta description" option enabled in Results Layout, you will see either the internal Word "description" field, or the .desc file meta description.

            Context description is the extracted content surrounding the matched word, which can include text from the actual body of the document.

            Hope that clarifies things somewhat.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X