PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Link Popularity & negative exact phrase searches

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Link Popularity & negative exact phrase searches

    Imagine two pages A & B.

    Page A has the words:
    A rose by any other name would smell as sweet

    Page B has the words:
    A rose by any other smell would be a sweet name

    Page A has 1000 external pages linking to it. Page B has 2 external pages linking to it. Page A is hence more important and should be shown first.

    I see this a lot on my own site. Some words aren't very frequent so the results aren't ranked in a way that pushes the more important pages to the top. Any ideas?

    Could you get the rank or link popularity of each page and have this be used as part of the ordering function?

  • #2
    Zoom doesn't use link popularity to determine a web page's rank at the moment. This is because it generally is not as meaningful when indexing a single (or several) sites and you are only judging on internal links, and that is Zoom's primary purpose.

    It is more meaningful only when you attempt to index many, many external sites, like in an Internet-wide search engine such as Google. It also requires alot more storage to keep track of all links encountered during the spider process, and the link relationship between pages.

    Originally posted by GregRaiz
    Page A has 1000 external pages linking to it. Page B has 2 external pages linking to it. Page A is hence more important and should be shown first.
    You should note that this would only be picked up if we did actually index the 1000 external pages. And chances are, these may be scattered across 1000s of different websites, which we would have to index entirely to locate these links. How many external websites are you indexing? You see that this is only more useful when you start to index a large number of external websites. While V5 of Zoom will be better suited to this, the use of link popularity is still not yet of high importance to us because it is only of minimal benefit at this point. Although that is not to say that it is something we would rule out for the future.

    Originally posted by GregRaiz
    Some words aren't very frequent so the results aren't ranked in a way that pushes the more important pages to the top. Any ideas?
    Take a look at this page:
    http://www.wrensoft.com/zoom/support/faq_score.html
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      I think what GregRaiz is trying to say is (similar to what some people wrote in other posts and to what I experience myself a fair number of times) that search results unfortunately often seem to rank well rather ... irrelevant pages despite trying and testing various settings, boosting titles, headings, etc.

      For example, most times people when are searching are not using "keywords" but "keyphrases". They learn to search that way day after day when using google, yahoo, etc - especially as those search engines get gradually better and better at "understanding" what people are really searching for.
      However, many times when using Zoom to search for a key phrase, none of the pages ranked highly even contain that key phrase - not even once - 0 - nada! They have, let's say keyword1 and keyword2 a few lines down, again and again but never the actual keyphrase. You might need to dig a few pages down in the zoom search results to finally find the first "real" result actually relevant for that particular search phrase.

      Now I know those results can be improved by using "" - most people won't do that though, they'll just quit and leave, unless they really were determined to find what they were looking for.


      I'm aware major search engines are using, as the previous poster pointed out, link anchor text, and lots of other stuff to determine what is more relevant for a search querry - and I know many of the factors in the "google algorithm" could never be implemented in a "in house" search engine - however, when running a few tests on a couple of sites using both Zoom and Google for the same searches (via a site:mydomain.com search phrase query), the results seemed unfortunately quite a bit better in google's favor.

      At the end of the day, often not even using quotes can help because people might search for the keywords in the wrong order or excluding certain other "needed" words - and thus quotes wouldn't return any results; but then not using quotes comes up with nothing good because all those words might appear within the first ranked pages but never close together and never relevant for the actual search.

      So unfortunatelly sometimes the problem is not that result number 4 should rank first - that would be a "mild one" but something more like result number 21 should be ranked first, and 1-20 maybe shouldn't be anywere near the first pages of results.

      There must be a way words close together get a boost in results compared to longer pages which mention in irrelevant context multiple times all words separately.

      And again, these are "extreme examples" but there are of course (and naturally) plenty of "milder" situations like for example when my blog pages get ranked highly for a product search, higher than the actual product pages despite not actually focusing on that particular product - the blog page having multiple posts on multiple topics - whereas the product page (even if shorter) would have the actual product in the title, body and headings - all this happing even after boosting headings and titles...

      So what I'm trying to say is not that Zoom is not a great product - in fact I like it very much, I'm amazed by its speed and power but still, I believe some more attention should be paid not only to power, speed and flexibility - but also to the quality of results. In the end a search engine is only as good as its search results.

      Comment


      • #4
        Yes, without a doubt, there is always more that can be done to improve the relevancy of searches.

        As you mentioned, exact phrases are supported in Zoom. The "problem" however is that they require the user to enclose the search query in double quotes, and not all users are careful enough to read instructions or search tips (or to bother with a second search).

        The trouble with implying exact phrase search is that exact phrase information is expensive, both in disk space (the size of your index files and the time it takes to upload them) and in performance (the actual search time required). As such, we can not easily perform exact phrase searches for every search term and score them accordingly.

        It would be nice if it was easy to add things like this, but there is always a trade-off in other priorities - you say performance is not important, but few people would sit around for a search that takes over 3 seconds. Similarly, people don't want massive index files that hog up the majority of their web hosting disk space.

        That is not to say that improvements can not be made. We will of course, continue to look into ways that will allow us to improve the relevancy of searches. I'm just hoping to explain that these are not simple features to add, and that there is a lot more to consider than it may seem.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          One problem is that even if people use the "exact phrase search", they often do not get the results they expect because the negation term is not supported. That is, if you search for A -B you get all pages with A but without B. And searching for A "B C" gets pages with A and also the phrase B C. However A -"B C" is treated the same as A "B C" with no warning that the negation was ignored. This is not only different from most other search engines, but inconsistent within Zoom. The best solution would be to add the ability to negate phrases, and I do not understand how that would add much to the search overhead. As a minimum, the user should be told that the requested search was modified.
          -Gabe Fineman
          Washington, DC [still defranchised]

          Comment


          • #6
            You're right. It was something we had on our list to do but didn't get around to it. Like you said, we should either add a warning that the negative exact phrase is ignored or add support for negative phrase searches.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Update: Exclusive/negative exact phrase searches will be introduced in Version 5.0. Example syntax:

              cat -"dog food"
              This will return all pages containing the word "cat" but without the exact phrase "dog food".
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X