PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Zoom search & Project 7 PVII Gallery questions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zoom search & Project 7 PVII Gallery questions

    Hi,
    For the web site http://info-radiologie.ch/ I use PVII software, Gallery2 shareware and Zoomsearch. I’m very happy with these software. Thanks you guys for these great products.

    Three questions:

    1) Look at this page http://info-radiologie.ch/echographie.php (for instance). I use tab panel magic and a js tweak for an external triggering. If you search the word “echographie”, zoomsearch will see 4 different pages. Does it looks like duplicate content for a robot? Not a problem for me because I can delete this .js routine. However I thought a robot was unable to follow/understand any .js routine. I would like to know how and why your robot see this page like that?

    2) I use Gallery2 http://www.info-radiologie.ch/gallery2/main.php?g2_page=1 and I would like not to use their own piece of search engine. What is the best way to index every image of this web site including those are in gallery2 and in the same time to avoid the duplicate content. Please have a look to this part of the web site to understand the problem: there are some page that doesn’t have image. In some page, I put tag like nofollow, noindex but I’m not sure it works. Any advice/help would be appreciated to improve the performance of the search engine.

    3) I’m interested to upgrade to version 5.1 because of that: “Added support for additional accent and ligature characters for the "accent insensitive" options”. Got some problems. Lost some features like highlighting. Etc… I’m in the process to solve this kind of problem. However, seems to me the license key is not any more recognise. Would you be so kind to send me back another one please?
    Additional comment: Feel free to give me any ideas to improve the search in this web site.

    Thanks to give us this excellent SE.

    Thanks and best regards.

  • #2
    Originally posted by fxs View Post
    1) Look at this page http://info-radiologie.ch/echographie.php (for instance). I use tab panel magic and a js tweak for an external triggering. If you search the word “echographie”, zoomsearch will see 4 different pages. Does it looks like duplicate content for a robot? Not a problem for me because I can delete this .js routine. However I thought a robot was unable to follow/understand any .js routine. I would like to know how and why your robot see this page like that?
    This does not seem related to any JS from what I can tell.

    The problem is that you have links to that same page with different parameters at the end of it:
    http://www.info-radiologie.ch/echographie.php
    http://www.info-radiologie.ch/echographie.php?pnl=1_1
    http://www.info-radiologie.ch/echographie.php?pnl=1_2
    http://www.info-radiologie.ch/echographie.php?pnl=1_3
    http://www.info-radiologie.ch/echographie.php?pnl=1_4

    A quick check seems that each of these URLs actually return the same content and your "echographie.php" script most likely ignores the parameter. I'm not sure why you have these in place, but it is something you should check. Because you have these different links elsewhere on your website (which actually all return the same content), that is why you seem to have duplicated pages. Remember that each unique URL is technically a different web page. It just so happens that your script (echographie.php in this case) is returning the same content regardless of these different URLs.

    Since these URLs have identical content, a quick and dirty fix would be to simply enable the CRC duplicate page detection option (on the "Scan Options" tab of the Configuration window). A better long term solution (since you have many other pages linked the same way) would be to check why exactly you have these different URLs, what these parameters are for, and if they are not necessary, and if not, whether you can remove them so that you can have consistent links to your web pages.

    PS. You can use Zoom to find where these links are coming from. Switch to "Single thread" spider mode, and make sure you have "Spidering" information enabled on the "Index Log" tab. Then when you index your website, the messages will indicate which URLs are found (by "Queued http://www.soandso.com/..." style messages) while indexing a particular page.

    Originally posted by fxs View Post
    2) I use Gallery2 http://www.info-radiologie.ch/galler....php?g2_page=1 and I would like not to use their own piece of search engine. What is the best way to index every image of this web site including those are in gallery2 and in the same time to avoid the duplicate content. Please have a look to this part of the web site to understand the problem: there are some page that doesn’t have image. In some page, I put tag like nofollow, noindex but I’m not sure it works. Any advice/help would be appreciated to improve the performance of the search engine.
    Gallery2 is a complex content management script, much like message boards, etc. There is more information on what to look out for and how to index such sites in this FAQ:
    Q. How should I index my site if it features a message board, forum, or calendar and other similarly complex scripts?

    Here is a previous discussion regarding indexing Gallery2 specifically:
    http://www.wrensoft.com/forum/showthread.php?t=1349

    Originally posted by fxs View Post
    3) I’m interested to upgrade to version 5.1 because of that: “Added support for additional accent and ligature characters for the "accent insensitive" options”. Got some problems. Lost some features like highlighting. Etc… I’m in the process to solve this kind of problem. However, seems to me the license key is not any more recognise. Would you be so kind to send me back another one please?
    If you have lost your license key, you can e-mail us and we will resend it (please include your details such as name, company, etc. that you placed the order with, and preferably, e-mail us from the same address that you made the order with).

    In regards to the highlighting option, note that there are known limitations with highlighting words found with accent insensitivity. This should be the same in both V5 and V5.1 though. That is, words that matched by accent insensitivity (eg: you search for "cliché" and it finds a page with "cliche" on it), will not be highlighted visually. However, the actual search result is still accurate, as is the context description extracted. It just won't be coloured differently. This is something we are looking at addressing in the future.

    Originally posted by fxs View Post
    Additional comment: Feel free to give me any ideas to improve the search in this web site.
    It seems that you are not using the CSS for the search page as provided in the default search_template.html file (or perhaps you are using styles from an older version of Zoom?). This is why you've lost the default spacing between search results, and the general appearance of things such as Recommended Links, and your search results are lacking overall formatting.

    One easy way to recover this is to look at the original search template file created (by creating a new one in a temporary folder), and copying the CSS from there to your current search template.

    For more information on customizing your search results with CSS, see this FAQ:
    Q. How do I customize the appearance of my search results with CSS?

    There is also more information and a complete CSS class listing in chapter 6 of the Users Guide.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi,
      Thanks a lot for your answers.
      The discussion about Gallery will help me a lot.

      Regarding the tab panel magic from PVII there is a misunderstanding due to my English and a mistake in the page (that I fixed). Sorry about that.
      Please have a quick look to this page: http://projectseven.com/products/too...demo/index.htm
      and this tech note:
      http://projectseven.com/support/answers.asp?id=188
      I used this last tweak. For Search Engine seems to me that each URL of the tab panel looks like the same due to the aspect. In fact each URL is related to only one content of the tab panel. When you look for “echographie”, you get four answers: one is the whole page, each other are related to one content of the panel. However, the aspect of the four answers is the same and it’s quite confusing.
      I would like to know how Zoom SE handle this kind of URL and .js routine. By the way is there any workaround?

      Thanks and best regards

      Comment


      • #4
        Oh, I see. Yes, the script must have been broken before. I see now that the parameters in the URL (eg. "?pnl=1_3") is actually parsed by the Javascript to change the appearance of the tabs and the content visible.

        However, the actual page content does not seem to change. That is, the HTML downloaded is not dynamically changed since this is all client-side scripting. This means that the previously mentioned solution of using the "CRC duplicate page detection" feature in Zoom will still work, and it will filter out these URLs on indexing.

        An alternative solution would be to add these parameters to the skip pages list (on the "Skip Options" tab of the Configuration window). By simply adding "?pnl=" as a skip page entry, all URLs containing this parameter will not be indexed, so then only the original page (without any tab parameters) will be indexed.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          An alternative solution would be to add these parameters to the skip pages list (on the "Skip Options" tab of the Configuration window). By simply adding "?pnl=" as a skip page entry, all URLs containing this parameter will not be indexed, so then only the original page (without any tab parameters) will be indexed.
          Thanks a lot. Work like a charm. Best regards

          Comment

          Working...
          X