Hi there again
I have a site that has a lot of PDF and DOC files. The html files are encoded ok so the results are ok, but the DOC and PDF files have our characters mungled, č is missing, there is a � instead of š etc...
How does the Zoom indexer takes the PDF and word files? Our word files are in MS1250 codepage allways so I have used:
response.Charset="utf-8"
Response.CodePage = 1250
Dim WshShell, env, oExec
Set WshShell = CreateObject("WScript.Shell")
Set env = WshShell.Environment("Process")
env.Item("REQUEST_METHOD") = "GET"
env.Item("QUERY_STRING") = Request.QueryString
set oExec = WshShell.Exec(Server.MapPath("/search/search.cgi"))
oExec.StdOut.ReadLine() ' skip the HTTP header line
Response.Write(oExec.StdOut.ReadAll())
Response.CodePage = 65001
in my ASP page. But the DOC and PDF are all mungled. Any ideas, is there and preference to set the correct codepage for the PDF and DOC?
I have a site that has a lot of PDF and DOC files. The html files are encoded ok so the results are ok, but the DOC and PDF files have our characters mungled, č is missing, there is a � instead of š etc...
How does the Zoom indexer takes the PDF and word files? Our word files are in MS1250 codepage allways so I have used:
response.Charset="utf-8"
Response.CodePage = 1250
Dim WshShell, env, oExec
Set WshShell = CreateObject("WScript.Shell")
Set env = WshShell.Environment("Process")
env.Item("REQUEST_METHOD") = "GET"
env.Item("QUERY_STRING") = Request.QueryString
set oExec = WshShell.Exec(Server.MapPath("/search/search.cgi"))
oExec.StdOut.ReadLine() ' skip the HTTP header line
Response.Write(oExec.StdOut.ReadAll())
Response.CodePage = 65001
in my ASP page. But the DOC and PDF are all mungled. Any ideas, is there and preference to set the correct codepage for the PDF and DOC?
Comment