PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

chemical structure search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • chemical structure search

    Hello,
    I want users of my website to be able to search chemical structures which they could draw into a search box. Could Zoom do this sort of search?

  • #2
    Zoom is, to the core, a text search engine. It's actually a darn good one, and one of the best ones out there (from my experience). What it sounds like you would need is something that would translate a graphical chemical structure into a string of text that can be used as a search term (even if that string of text is behind the scenes).

    In other words, something like what is already at Chmoogle (http://www.chmoogle.com/), when you click on the "Draw Structure" button.

    You have two basic options:

    1) You can do what Chmoogle did, and use the technology from ACD Labs (http://www.acdlabs.com/products/chem...ab/chemsketch/).

    2) Or, you can write your own similar version. Try doing some research into the "Smiles" text format. Bear in mind that the individual web pages themselves would have to be encoded with a "chemical keyword" in text that could be indexed by the search engine.

    I don't even think you'd need to modify Zoom at all, unless you wanted to have graphical previews (like they have on Chmoogle). I *think* that the next big version of Zoom will have the ability to show graphical previews (like for product pages)... but on that topic, I defer to the official folks at Wrensoft.

    - JW

    Comment


    • #3
      I think JW has answered that question better than we could

      As JW pointed out, Zoom does not provide any chemical drawing functionality. However, it would seem possible for you to add a front-end which would construct the necessary text query based on the drawing. We are not familiar with the requirements of searching chemical structures so we're not entirely sure what else you might have in mind.

      Thumbnails and image searching is indeed a feature of the next version of Zoom (V5.0). It will allow you to associate a single image with each page, which will be displayed alongside the link in the search results. So for each page dedicated to a specific chemical compound, you could potentially assign a diagram to each page and have them appear in the search results like in Chmoogle above.

      Zoom V5.0 is currently looking at a release sometime around July (this is just a best guess). There will be free upgrades for users who purchased the current version of Zoom in the last 6 months.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        That was very helpful! I didn't understand the concept behind a structure search you see.

        If I can just get an 'add on' to stick onto the existing Zoom I have that would be much easier.

        If you ever want to design a structure search 'add on' give me a bell.

        Comment


        • #5
          Let's say I attach an 'add on' which will convert structures to SMILES format.

          Let's also say a document has a molecular formula X in it.

          If I entered a molecular formula for "X" as a synonym to the corresponding SMILES of "X", I wouldn't need to insert the SMILES into the document in order for Zoom to find it would I?

          Comment


          • #6
            It depends on how your molecular formulas appear on your web pages currently. I think the whole point of needing the SMILES format (which I found some information on here: http://en.wikipedia.org/wiki/Chemica..._format#SMILES) is that it allows you to represent a formula in normal text - as opposed to a graphical representation, or a formula with subscript text.

            Text search engines such as Zoom would typically only index normal text, so it would not recognize the subscript formulas as presented in the URL above, nor a graphical representation. So in these two cases, Zoom would not be able to match them with the corresponding SMILES synonyms regardless.

            However, if your formulas appear on the page like this: C2H6O, then yes, Zoom will be able to search and match these formulas as they are, and you can also specify synonyms to match "C2H6O" with its SMILE equivalent: "CCO".
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Presumably, you could also get Zoom to recognise the corresponding chemical name from the SMILES through synonyms e.g. CCO synonym: dimethylether.

              Also, even though my bandwidth is not overstretched, Zoom is starting to struggle with the size of the index and it is getting quite slow. Any ideas to to speed it up except reduce index size (not an option), reduce results per page (already done)? A lot of the files in the index are PDFs. Does this contribute to the problem?

              Comment


              • #8
                Originally posted by will
                Presumably, you could also get Zoom to recognise the corresponding chemical name from the SMILES through synonyms e.g. CCO synonym: dimethylether.
                Yes, this will work.

                Originally posted by will
                Also, even though my bandwidth is not overstretched, Zoom is starting to struggle with the size of the index and it is getting quite slow. Any ideas to to speed it up except reduce index size (not an option), reduce results per page (already done)? A lot of the files in the index are PDFs. Does this contribute to the problem?
                First of all, give us some idea of how many files you are indexing and how many unique words are counted at the end of indexing (or the "maximum files" and "maximum unique words" to index limits that you have specified in Zoom).

                Also, which platform are you using: PHP, ASP, JS or CGI?

                Javascript is the most limited, and CGI is the most powerful.

                Here are some benchmarks of the different platform options:
                http://www.wrensoft.com/zoom/benchmarks.html

                More information on the available script platforms here:
                http://www.wrensoft.com/zoom/support/platforms.html
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  Last indexing
                  3507 files scanned (~3000 pdfs)
                  104,931 unique words
                  Using PHP platform

                  Hope thats helpful information

                  Comment


                  • #10
                    This is not a huge amount of data. The CGI would be quicker than PHP, but PHP should return search times of maybe 2 seconds.

                    So if I was you I would spend a bit of time investigating what hardware is in your server and how many sites you are sharing the server with. Maybe there are 100 other web sites on the same shared host, putting too much load on the hardware?

                    ------
                    David

                    Comment


                    • #11
                      cgi

                      OK I'll go for cgi then.

                      I need to "execute permissions" for search.cgi and "public read permissions" for the zdats and template. I read that I can do this via my ftp client/unix shell using the "chmod 755 <filename>" command.

                      I'm not clued up on this stuff so ummmm.. How do I do that?

                      Comment


                      • #12
                        Details about getting the CGI running are here,
                        http://www.wrensoft.com/zoom/support/faq_cgi.html

                        But it is not for the novice web developer. In most cases, you'll need to know how to use FTP software or Telnet to alter permissions and have some knowledge of file permissions in Unix / Linux. (assuming your server is running Linux?)

                        I would suggest you do some background reading before you start. There are many good books and thousands of pages on the web about Linux, access rights and the file system.

                        -----
                        David

                        Comment


                        • #13
                          OK, thanks.

                          Finally, when I switch from PHP to CGI, as Zoom uploads the files to thw web server, will the search.php file (and the other files) automatically be overwritten as normal to become search.cgi etc.?

                          Or, should I remove them first?

                          Comment


                          • #14
                            Furthermore, is there a limit to the number of categories that you can have?

                            If you want just one URL in a particular category, can you enter the enitre URL as the category's pattern?

                            e.g. a pattern of "http://www.bob.com/12345.html"

                            The reason I ask such a strange question, is that I can't create pdf.desc files for my search results from other web sites. So, if each web page has itts own category I can still have the title I want showing in the search results.

                            Is this just stupid or can it work?

                            Comment


                            • #15
                              Files will only be overwritten if they have the same name and in the same directory. Zoom doesn't hunt around the directories of the server looking for files to delete.

                              CGI often need to be in a special CGi-BIN directory (depending on your host).

                              I wouldn't delete the working PHP search function until you have the CGI running in parallel.

                              Is this just stupid or can it work?
                              I means a lot of maintanence for you. And probably a lot of confusion for the users. So I don't think it is a great idea. Have you turned on the plug-in option, "Use meta information...". Becuase this would allow you to get the title and keywords from the PDF's meta data.

                              -----
                              David

                              Comment

                              Working...
                              X