Announcement

**Ray** · Oct-18-2007, 01:05 AM

Zoom supports a list of known file extensions for the file types it indexes. For file extensions which it does not recognize (such as ".otmi" in this case), it will index them as "Unknown text". This means, that it will treat it the same as a ".txt" text file and ignore all formatting within it (so XML or HTML tags would also be indexed). You can easily test this out by just entering ".otmi" into your Scan Extensions list.

Note also that Zoom will index a file differently if the server specifies a Content-Type header which indicates it is of a different format (so for example, your server could return a "text/html" content-type, and Zoom will filter out the XML tags).

One of the things we are considering for the next major release, is the ability to specify the file type for such unrecognized files. For example, Zoom actually has the ability to index HTML and XML files, but this is currently limited to files specified with the correct content-type header, or file extension such as ".html", ".htm", ".php", ".asp", ... etc. In the future, we could allow users to select the file type that a file extension should be associated with.

**Ray** · Oct-18-2007, 01:15 AM

Also to add, I just noticed that we're not recognizing "text/xml" as a content-type at the moment. Although for indexing purposes, this is essentially pretty much the same as "text/html". We'll add this to the next public build (5.1.100

.

[Update: Version 6 of Zoom is much more flexible in this regard, you can assign any file type to any indexing method]

**forestgreen** · Mar-04-2010, 01:05 PM

Problem with indexing .xlsx files in a database

Hello,
I have got a problem with the indexing of Excel/Office2007 files stored in a database (MS SQL DB in combination with iGrafx software). Each file is not addressed by name/path but by a link to the database containing a unique object-id. Yet, most of the files are properly retrieved and indexed by Zoom. So far so good. But not all of the plugins are able to determine the proper file type. Especially the Office2007 Plugin does not recognize .xlsx files as Excel 2007 and therefore will not be able to retrieve any useful contents from them.
I configured the .xlsx plugin to "retrieve internal meta information", but it will still not work.
Other file types like MS Word or PDF are perfectly indexed including their meta info.
Is that a bug which I detected or did I miss something?

The message in the log file goes like
Index Thread got ready buffer for http://dnde_igrafx/webcentral/BMS_approved/?objid=1336 (Content-type: Unknown text)

Would be great if anyone here could give me a hint how to solve this.

Thanks, Christian

**David** · Mar-04-2010, 07:54 PM

The example URL that you gave doesn't have a file extension. So there is no information in the URL to allow Zoom to determine the file type.

So instead Zoom will look at the HTTP headers returned from the server. In particular the "Content-Type" field and maybe other fields if they are present like the "location" field.

As the URL you provided is a private URL we can't test it from here, but you chould check the HTTP header fields. These are probalby being set by the script that acceses your database.

**Ray** · Mar-04-2010, 11:48 PM

Might also want to refer to this page, which provides a list of recognized content-type for Office 2007 documents:
http://www.wrensoft.com/forum/showthread.php?t=2834

**forestgreen** · Mar-11-2010, 10:12 AM

Originally posted by wrensoft View Post

The example URL that you gave doesn't have a file extension. So there is no information in the URL to allow Zoom to determine the file type...
So instead Zoom will look at the HTTP headers returned from the server. In particular the "Content-Type" field and maybe other fields if they are present like the "location" field.

The URL is created by a PHP script which I programmed myself - it points to the file in the database. Maybe I can solve the problem, but how can I set or retrieve the contents of the "Content-Type" HTTP header?

Originally posted by wrensoft View Post

As the URL you provided is a private URL we can't test it from here, but you chould check the HTTP header fields. These are probalby being set by the script that acceses your database.

Yes the URL is an internal one in our Intranet, this is why you cannot access it from outside. How can I check the HTTP header fields?
And how can I let Zoom know the correct content-type of the file, when there is no file extension and the http header does not provide it?

**David** · Mar-11-2010, 06:59 PM

At the risk of stating the obvious. To set the HTTP header in PHP use the PHP header command.
http://php.net/manual/en/function.header.php

Announcement

any limitations on file extensions?

any limitations on file extensions?

Comment

Comment

Comment

Comment

Comment

Comment

Comment