PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Before purchase question - CGI from ASP + Accents

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Before purchase question - CGI from ASP + Accents

    Hi there

    I'd like to purchase the Zoom Search and I am wondering if V7 is right before release, so I do not buy old software at the first day ?

    I tried the demo and I have some problems. I am using ASP on my pages and trying to include CGI inside my ASP page. Works, but for some reason the CGI get's MS1250 encoded although I selected UTF-8 in the config as my pages are UTF-8. Our special chars like čš don't show OK. I have to do:

    Response.CodePage = 1250 insetead of Response.CodePage = 65001 that the rest of my page has, but then the rest of the page doesn't have proper characters.

    My next question goes to "enable accent/diacritic insensitivety for". That works, but for some reason if I use the č or š or ž inside search phrase the results say: searching for c s or z. I think in the results page if I entered like č in the search form the č must be preserved. Ok, you search for c also, but don't say I searched for "pec" if I entered "peč" inside the form, it looks wierd.

    Yours

    Jerry

  • #2
    V7 will be a free upgrade to all users who purchased V6 six months prior to its release. So buying it now should give you a free upgrade when V7 comes out. We're probably just a few months away from its release.

    The codepage issue, is addressed by a few extra lines of code to change (or rather, maintain) the encoding of the CGI output. It just occurred to us we've only documented this on the ASP.NET page, which is why you wouldn't have found it. See the section entitled "UTF-8 Troubleshooting" on this page:
    http://www.wrensoft.com/zoom/support/aspdotnet.html

    We'll have to add this to the CGI/ASP support page.

    Regarding the accent insensitivity and the "Search results for: ..." heading, this was by design, because it helps the user recognize why suddenly other words were matching and being highlighted (like "pec").
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Originally posted by Ray View Post
      V7 will be a free upgrade to all users who purchased V6 six months prior to its release. So buying it now should give you a free upgrade when V7 comes out. We're probably just a few months away from its release.

      The codepage issue, is addressed by a few extra lines of code to change (or rather, maintain) the encoding of the CGI output. It just occurred to us we've only documented this on the ASP.NET page, which is why you wouldn't have found it. See the section entitled "UTF-8 Troubleshooting" on this page:
      http://www.wrensoft.com/zoom/support/aspdotnet.html

      We'll have to add this to the CGI/ASP support page.

      Regarding the accent insensitivity and the "Search results for: ..." heading, this was by design, because it helps the user recognize why suddenly other words were matching and being highlighted (like "pec").
      Thank you for the fast response.

      I would really love the ASP example, as I can not seem to know ASP much in the wshSHell.Exec category and I am unable to add "support" on CGI to use utf-8 instead of 1250.

      Is there any way I could change anything to say "peč" below? Because 90% of people use "peč" in search, "pec" is just for the ones who are not in my home country and don't have our locale so it's really just a bonus feature I don't want everybody who is properly using "peč" to see and wonder why this happened.

      I have bought a software who was only 6 months ago in november, it has just run out the free period and the software didn't release in time Quite a bad experience, they told me it is 2-3 month away.

      Will the ASP version (classic) be dropped for V7, I hope not

      PS - But I have selected utf-8 in the preferences of Zoom search and my page is in utf-8, but I still see garbled characters using CGI and ASP. Seems like CGI is producing 1250 codetable chars instead of UTF-8 which I "ordered" in config. Is this a bug?

      Comment


      • #4
        Another thing: Is there any way I could use the indexer preferences from the web? Web interface? So one could set up new options using www. If not is this planned for V7?

        Comment


        • #5
          Originally posted by jerry2 View Post
          Is there any way I could change anything to say "peč" below? Because 90% of people use "peč" in search, "pec" is just for the ones who are not in my home country and don't have our locale so it's really just a bonus feature I don't want everybody who is properly using "peč" to see and wonder why this happened.
          At the moment, you have to disable accent insensitivity to change this behaviour. We'll consider changing it in the future, but it is a bit subjective and we haven't had many people ask us for this, and we have to prioritize changes accordingly.

          Originally posted by jerry2 View Post
          I have bought a software who was only 6 months ago in november, it has just run out the free period and the software didn't release in time Quite a bad experience, they told me it is 2-3 month away.
          Should this happen, just e-mail us and quote this forum thread.

          Originally posted by jerry2 View Post
          Will the ASP version (classic) be dropped for V7, I hope not
          No, we will still have Classic ASP in V7.

          Originally posted by jerry2 View Post
          PS - But I have selected utf-8 in the preferences of Zoom search and my page is in utf-8, but I still see garbled characters using CGI and ASP. Seems like CGI is producing 1250 codetable chars instead of UTF-8 which I "ordered" in config. Is this a bug?
          Are you sure the CGI is producing 1250 output? Look at the actual output directly, e.g. go to the CGI URL (e.g. http://www.yoursite.com/cgi-bin/search.cgi) directly from your browser. If that's the case, the problem is elsewhere.

          But it shouldn't be the case. The problem is that ASP is automatically re-encoding everything that has been read in via the oExec.StdOut.ReadAll() to the codepage of the ASP page. Which is once again, VBScript/ASP trying to be too clever. So extra code is needed to NOT make it do that, or undo the conversion that it did.

          Originally posted by jerry2 View Post
          Another thing: Is there any way I could use the indexer preferences from the web? Web interface? So one could set up new options using www. If not is this planned for V7?
          You can install the Indexer on the web server itself and schedule it to run regularly. Or have web scripts which changes the .zcfg configuration file.

          There are no plans for a web interface to the indexer at the moment, because it has too limited use -- the indexer can only run on Windows web servers, and those who have permission to install such a large component on their servers.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            I "know" CGI is producing this is because if I make this change:

            Response.CodePage = 1250

            Dim WshShell, env, oExec
            Set WshShell = CreateObject("WScript.Shell")
            Set env = WshShell.Environment("Process")
            env.Item("REQUEST_METHOD") = "GET"
            env.Item("QUERY_STRING") = Request.QueryString
            set oExec = WshShell.Exec(Server.MapPath("/search/search.cgi"))
            oExec.StdOut.ReadLine() ' skip the HTTP header line
            Response.Write(oExec.StdOut.ReadAll())

            Response.CodePage = 65001

            everything works, but I am not sure if this is correct way to do things. Manual for ASP says that I have to set Response.codepage at the beginning of the page only, so I am not very sure if this is OK thing to handle things.

            You refer to ASP.NET documentation, but I am not sure what part, I found no solution there of how to instruct ASP to make utf-8 output from response.write.

            About the indexer. I think it is supposed to run on the server to be standalone somehow and schedule every night the rescan. I was thinking about web GUI so my customer can set the synonims list themselves etc...

            Is the professional licence OK for 2 of my customers or do I need a per web licence?

            Thanx!

            Comment


            • #7
              I can not try cgi directly, I have to embed it in my ASP script. I think this is because CGI doesn't have handler on my IIS configured. But as gave cgi exec permission ASP can execute it, but not me directly so I can not test what you suggested.

              Comment


              • #8
                Originally posted by jerry2 View Post
                I "know" CGI is producing this is because if I make this change:

                Response.CodePage = 1250

                Dim WshShell, env, oExec
                Set WshShell = CreateObject("WScript.Shell")
                Set env = WshShell.Environment("Process")
                env.Item("REQUEST_METHOD") = "GET"
                env.Item("QUERY_STRING") = Request.QueryString
                set oExec = WshShell.Exec(Server.MapPath("/search/search.cgi"))
                oExec.StdOut.ReadLine() ' skip the HTTP header line
                Response.Write(oExec.StdOut.ReadAll())

                Response.CodePage = 65001

                everything works, but I am not sure if this is correct way to do things. Manual for ASP says that I have to set Response.codepage at the beginning of the page only, so I am not very sure if this is OK thing to handle things.
                This doesn't indicate what the CGI outputs. ASP/VBScript is re-encoding the input via oExec.StdOut.ReadAll() thinking the data, not realizing it is already encoded in UTF-8. So it ends up doing a double encoding. When you set the CodePage to 1250, it does not perform this encoding, so it is left alone.

                Originally posted by jerry2 View Post
                You refer to ASP.NET documentation, but I am not sure what part, I found no solution there of how to instruct ASP to make utf-8 output from response.write.
                We'll try to get an example up soon. It would be good to see the CGI output to make sure this is the problem though.

                Originally posted by jerry2 View Post
                About the indexer. I think it is supposed to run on the server to be standalone somehow and schedule every night the rescan. I was thinking about web GUI so my customer can set the synonims list themselves etc...

                Is the professional licence OK for 2 of my customers or do I need a per web licence?
                See this FAQ regarding licensing:
                http://www.wrensoft.com/zoom/support...s.html#license

                Regarding a web interface, this really depends on usage. There is a SDK with documentation on how to create .zcfg configuration files. And you can create scripts or a web interface to do this.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  Originally posted by Ray View Post
                  This doesn't indicate what the CGI outputs. ASP/VBScript is re-encoding the input via oExec.StdOut.ReadAll() thinking the data, not realizing it is already encoded in UTF-8. So it ends up doing a double encoding. When you set the CodePage to 1250, it does not perform this encoding, so it is left alone.

                  We'll try to get an example up soon. It would be good to see the CGI output to make sure this is the problem though.
                  Yes, this is my problem, because I don't see how I can solve the encoding thing, if response.write is reencoding it again, how to tell it not so? I am waiting for the example. I'll try to get the CGI to work independetly to check...

                  Comment


                  • #10
                    Ok, I have tried as requested, I managed to add CGI module to execute script without some includes.

                    The result is: The CGI itself works great, the čšž are preserved, using the ASP "includes" in your FAQ they are not preserved.

                    If there is any way I could make this work (I need to have my template for the search) I'll buy the program immediately.

                    Comment


                    • #11
                      Unfortunately I have another question. I am testing on server so that the server reindex all files every hour for test. Problem is, because the search.cgi is rebuilt every time and put into the folder of my website without the execute permission and of course doesn't work.

                      Is there any way to index wihout changing the search.cgi every time or is the only solution (that I don't like very much) to make a whole folder executable so every file that the indexer builds automatically have execute permission inherited from the folder?

                      Comment


                      • #12
                        There is an option in the FTP window to auto-set executable permissions after FTP upload. Does this help?

                        Comment


                        • #13
                          Maybe, but I am not using FTP upload as the zoom is putting the files directly on the site location. Should I do it another way?

                          About the utf-8, the cgi works. Is there any solution for ASP to make response.write without reencoding?

                          Comment


                          • #14
                            To be honest, I was really thinking that it'd be trivial to port the changes to re-encode the output as in the VB/ASP.NET example, but it didn't turn out to be that way. VBScript/ASP doesn't seem to have any straightforward way of changing the encoding of StdOut. Short of doing something more drastic like manually re-encoding the data stream.

                            The underlying issue seems to be the same, with the need to re-encode or change the encoding of the output from WScript, given what is (unofficially) documentedhere (which is not a practical solution, as I'm not familiar with any way to change how ASP calls WSH). In my opinion, it's a really flawed design.

                            Having said that, I would think that the following (your previously quoted example) is just a fine a solution as any, if it is working for you:

                            Originally posted by jerry2 View Post
                            Response.CodePage = 1250

                            Dim WshShell, env, oExec
                            Set WshShell = CreateObject("WScript.Shell")
                            Set env = WshShell.Environment("Process")
                            env.Item("REQUEST_METHOD") = "GET"
                            env.Item("QUERY_STRING") = Request.QueryString
                            set oExec = WshShell.Exec(Server.MapPath("/search/search.cgi"))
                            oExec.StdOut.ReadLine() ' skip the HTTP header line
                            Response.Write(oExec.StdOut.ReadAll())

                            Response.CodePage = 65001

                            everything works, but I am not sure if this is correct way to do things. Manual for ASP says that I have to set Response.codepage at the beginning of the page only, so I am not very sure if this is OK thing to handle things.
                            This works because setting the codepage to 1250 means the rest of ASP decided that it won't need to do any encoding on the StdOut, and leaves it alone. So it is plausible workaround, if it works for you.

                            The documentation here is ambiguous, stating that there should only be one code page but giving an exception in a Server.Execute call, implying that it is not strictly enforced, and also that one can override the other.

                            We might have to come back to this, but I would recommend the above approach in the meantime.

                            I presume your need to use the CGI over the Classic ASP version is necessary, because of the number of files you are indexing?
                            --Ray
                            Wrensoft Web Software
                            Sydney, Australia
                            Zoom Search Engine

                            Comment


                            • #15
                              Well, thank you for your honest answer. Yes, the above approach works ok, but I am not sure how is the speed penalty of decoding twice, I guess it is not much on a modern server, the output is instantenious. So I guess you can advise this method for now in your documentation.

                              I am not sure why MS1250 works, is that because my Windows use MS1250 as we use Slovenian codepage in Windows of 1250, is that the reason why it works? I found this by surprise, not expected this to work at all )) I see that in MS1250 the output is not reencoded but I don't know why (ok, I am just curious I guess)

                              I have to tell you that this is the first time ASP behaves so badly, in PHP we hasve many problems in Slovenia still about codepage but ASP, mySQL, new myODBC, things just work great. This is the first time to see a real flaw in classic ASP which my 12 year old HUGE site uses (no, the search engine will not be for this site, I have my own custom database search for it).

                              I only hope changing the codepage in the middle of the page still works on Windows 2008 R2 (I have just 200.

                              I am using the CGI because there will be about 5000 files and about 60.000 different words to index. I can only guess, because the Demo doesn't allow me to try this, but I am buying the program today and I'll see. I just saw that the burden on the server in CGI is only 10% of the ASP version and I told myself, I'll use that to speed the output by 10x. I guess ASP would work better in my case and I could easily configure (read: hack) into the search.asp if I needed something to work differently (like the thing I asked you in one of the first posts about saying I searched for "pec" if I searched for "peč"

                              If you need any betatester for V7, because I was testing many programs till now, bugs just love me, and out language and my config (windows, asp, mysql) is somehow "rare", so I find things other have harder time. If you want me, please let me know using PM.

                              Thanx!

                              Comment

                              Working...
                              X