PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Anyone know why Google is indexing search result pages?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anyone know why Google is indexing search result pages?

    Hello, I'm not complaining at all, this is great that this is happening. I'd just like to understand why so maybe we can get it to work how we need someday.

    Google is logging many of our main search page, of the V5 search engine.

    Its not just showing a few pages of our search engine page but many.

    Each time it shows our search page it has a different query url in the google listings of our search page.

    Can anyone tell me why google lists so many pages of V5 and also with the queries already loaded? Please let me know, thank you very much.

  • #2
    It is hard to know without doing a forensic examination of your server logs, the search terms being used, and the content Google is displaying.

    But it is extremely unlikely that the GoogleBot is coming up with search words on its own and entering them into your search box. (Not impossible but extremely unlikely).

    Much more likely is that somewhere on the internet there are links to your web site. e.g. in public forums. And these links are fully formed search queries. The GoogleBot then follows these links and indexes the search results generate by Zoom.

    Of course it is good to have content indexed by Google. But the minor downside is that Google is hitting your server, place some extra load on it each time it indexes a search result page (if you did find the load is too much you can use a robots.txt file to stop or slow down Google).

    Comment


    • #3
      Thank you. It sounds like noone else has told you this is happening... Seems like it would be happening to others also since we haven't done anything different in any way to our search, other than follow the basic set up instructions for the cgi. You might be able to use this as an extra selling point for your search engine if you are interested, because it does bring more visitors to our site weekly.

      Here is a real example, maybe that will give you a better clue why it might be happening. It would sure help us to know why its happening. So far its worth the tradeoff for any server load, it seems like google is carrying a lot of it because all the links are on their site. Its nice because these extra links in google are bringing our site a lot more unexpected visitors.

      This is how the extra links look in google, which might help a lot to explain what is happening.

      Exact same title and name of main search engine page is always here
      always shows the exact same meta description for each link as well...
      www.website.com/cgi/search.cgi?zoom cat%5B%5D=1&zoom query=helps-25k -


      The bottom url like the just above example is the only thing that changes in google from the whole link, and each url and query it lists is different for the others. It always lists a query in the url as well.

      Since the url at the beginning lists the url of our own website like www.website.com and not anyone elses website, it almost seems like google bot is going to our main search page and finding a different querly loaded in it already on each visit. Which doesn't make sense because when we go to our main search page ourselves, there is never anything already loaded in the search already. Is the google bot reading something all on its own for some reason from a main search page that you might know of?

      As mentioned, its also showing a query itself as the url. In the above example "helps" was the search word, url, and query this time. Most of the queries and search words it shows in google are arbitrary words of any kind and not things already on our web site. We also don't see anyone doing these same searches on our own site, when we look at all our page counters. Our site is not very active yet except it is listed in the biggest search engines. Google has about no kidding, about 35 extra google pages just listing all these extra links of the main search page and these queries. (at 10 listings per google page). Thats a real considerable lot considering how inactive our site is, so why we are courious why it is happening.

      Do you know why the listings would look like this if they were coming from another site? Or do you have any other guesses on why it is happening? Any guesses would be fantastic coming from you professionals. Guesses are all we need to help us understand it a little bit better, because right now we are very much in the dark. Please let me know if you have any more speculation and guesses on this, it would really be appreciated. Thank you very much.
      Last edited by rianna; Jan-17-2008, 09:00 PM.

      Comment


      • #4
        I still think there is an web page somewhere on the net that has a one or several hyperlinks to your search page, with the query string as part of the link.

        I would bet that if you look at your server logs over several days / weeks you see that Google is always hitting the same URLs with the same search term. i.e. always with the query "helps".

        I don't think Googlebot is coming to your site to find the search box magically filled out with random words and a category selected.

        We saw he same thing in our server log. Google found links in posts in this forum, followed the links to hit our search function. This site is very well indexed by Google however (5000+ pages) so we really didn't need to extra load from Google and a dozen external bots hitting our PHP search scripts.

        Comment


        • #5
          That helps a whole lot, really appreciate that. What do you mean by looking at the server logs? I can look easily at server stats but don't know where to find what your talking about. Do you know what the area you mean generally is called? Thanks.

          Comment


          • #6
            All web server software provides the option of logging all server requests made. These are the raw log files from which your statistics are generated.

            You did not say whether you are using IIS or Apache, but if you do a search on Google for either "IIS server logs" or "Apache server logs", you will find plenty of information and documentation.

            Here's some to get started:
            Logfiles/Reporting in IIS

            Log files - Apache HTTP Server
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Thanks, I must be missing something. The only way I can track this down is only if people are clicking on these links that are on others sites. If no one is clicking on our links it probably won't show up in the logs, because no one is following those links to our site. Unless there is something you are saying that I am missing.

              Comment


              • #8
                You could try using the Google backlinks feature. But I doesn't alway work too well.

                Comment


                • #9
                  Originally posted by rianna View Post
                  Thanks, I must be missing something. The only way I can track this down is only if people are clicking on these links that are on others sites. If no one is clicking on our links it probably won't show up in the logs, because no one is following those links to our site. Unless there is something you are saying that I am missing.
                  When GoogleBot indexes a web page, it will follow these links, which is the same as someone clicking on the links. And so, you will also see this in your logs and should be able to confirm where these links come from thru the referral address.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment


                  • #10
                    Thanks a lot, really appreciated the help. Thought it might help you to know the end result of all this for your own information. It seems strange that anyone would list almost 40 links to our search engine on their pages with the queries already loaded. So please let me know someday if you bump into a connection by adding it to this post. Don't want to list my email in this forum. It would be a great zoom extra feature with no downside that I can see. I looked through all the referring pages stats and don't see our search engine links on any of the referring link pages. Which doesn't mean they don't exist, only that I can't find them. Best wishes and thanks for zoom.

                    Comment


                    • #11
                      If you can give us a link to your website, we could do a quick check to see if it's anything obvious. What is usually more common is if your own website is linking to the search engine, with recommendations for common search phrases. You can see this in action here (see "Suggested search words" links on the left).

                      But otherwise, it is not all that unusual that there are sites which link to it. There are alot of spam bots out there which harvest links through all sorts of ways. Some will even fill in text boxes with random data in an attempt to find more links. While GoogleBot won't do this, if a spam bot does this, and creates spam pages with these links, GoogleBot may eventually find these pages and index them.
                      --Ray
                      Wrensoft Web Software
                      Sydney, Australia
                      Zoom Search Engine

                      Comment


                      • #12
                        Similar situation

                        I have noticed a similar situation. If I do a search on google for "http://www.bnbfinder.com/search.php?zoom_query=" I find one site which is one of those spammy web linking sites. I actually couldn't find an active link on the page for our site anyways when I got there though.

                        This would indicate that the google bot itself isn't doing searches on the site with random keywords or anything like that. What I have noticed is that in webmaster central, google's tool for checking how it is indexing our site and the like, I find a list of hundreds of pages that it says produce duplicate header information. When I look at the list the are all zoom search result pages. Here are some examples of the pages it says it found:
                        • /search.php?zoom_and=1&zoom_page=2&zoo..._page=10
                          &zoom_query=estonia&zoom_sort=0
                        • /search.php?zoom_and=1&zoom_per_page=10&zoom_query= Great&zoom_sort=0
                        • /search.php?zoom_and=1&zoom_per_page=10&zoom_query= Monica&zoom_sort=0
                        • /search.php?zoom_and=1&zoom_per_page=10&zoom_query= Wellington&zoom_sort=0
                        • /search.php?zoom_and=1&zoom_per_page=10&zoom_query= brilliant&zoom_sort=0
                        • zoom_query=brewster/search.php?zoom_query=bridal
                        • /search.php?zoom_query=cambodia
                        • /search.php?zoom_query=cane
                        • /search.php?zoom_query=dekalb
                        • /search.php?zoom_query=dunes


                        It would appear that somehow google is finding these pages even though there is nothing in google's search index that has any of these pages listed. While I can block google's bot from going to these pages it doesn't really explain how it is getting there or if there is some inherent problem.

                        This may or may not be a problem with Zoom Search.

                        Comment


                        • #13
                          Originally posted by BnBFinder.com View Post
                          I have noticed a similar situation. If I do a search on google for "http://www.bnbfinder.com/search.php?zoom_query=" I find one site which is one of those spammy web linking sites. I actually couldn't find an active link on the page for our site anyways when I got there though.
                          That's because the page has changed since Google indexed it. These spammy web sites usually change rapidly like that, or they are generated on the spot, so the results can be fairly random. If you want to see the actual page indexed by Google, click on the "Cached" link next to the search result. You will then see the actual link on the page. Here is the cached page in question and you can see the link is actually on those sites.

                          Originally posted by BnBFinder.com View Post
                          It would appear that somehow google is finding these pages even though there is nothing in google's search index that has any of these pages listed. While I can block google's bot from going to these pages it doesn't really explain how it is getting there or if there is some inherent problem.
                          This is caused by the same issue we've been describing above. Google follows the links it finds on these spammy pages. And these spam sites have their own scripts/bots which crawls the Internet for links, often in less than sensible ways, just to find as much links as it can to spam with.

                          And as mentioned before, someone may have linked to the search result page once, somewhere, on a forum, and it's been picked up by spiders ever since.

                          Originally posted by BnBFinder.com View Post
                          This may or may not be a problem with Zoom Search.
                          It is not a problem with Zoom.

                          To avoid this, you can setup a robots.txt file and ask Google (and other polite bots) to not index the search page. However, there really is nothing you can do to block spambots, since it's quite likely that they would not be law abiding (they won't follow robots.txt instructions), nor would they allow themselves to be identified (likely to pretend to be Internet Explorer in most cases, but it may even pretend to be Googlebot!).
                          --Ray
                          Wrensoft Web Software
                          Sydney, Australia
                          Zoom Search Engine

                          Comment


                          • #14
                            Originally posted by Ray View Post
                            This is caused by the same issue we've been describing above. Google follows the links it finds on these spammy pages. And these spam sites have their own scripts/bots which crawls the Internet for links, often in less than sensible ways, just to find as much links as it can to spam with.

                            And as mentioned before, someone may have linked to the search result page once, somewhere, on a forum, and it's been picked up by spiders ever since.
                            If Google were following the links then they should appear in the search index. The do not appear in Google's Search Index. At least they are not being displayed in the search index.

                            Comment


                            • #15
                              Originally posted by BnBFinder.com View Post
                              If Google were following the links then they should appear in the search index. The do not appear in Google's Search Index. At least they are not being displayed in the search index.
                              Google can follow the link and then decide that it is not worth indexing or that it should be disregarded. This in fact, would make alot of sense considering you said the following:

                              Originally posted by BnBFinder.com View Post
                              What I have noticed is that in webmaster central, google's tool for checking how it is indexing our site and the like, I find a list of hundreds of pages that it says produce duplicate header information. When I look at the list the are all zoom search result pages. Here are some examples of the pages it says it found:
                              • /search.php?zoom_and=1&zoom_page=2&zoo..._page=10
                                &zoom_query=estonia&zoom_sort=0
                              • /search.php?zoom_and=1&zoom_per_page=10&zoom_query= Great&zoom_sort=0
                              • [... etc]
                              This indicates that Google did find and follow these links (otherwise it would not know they have duplicate header information). It then decided not to include them in the index.
                              --Ray
                              Wrensoft Web Software
                              Sydney, Australia
                              Zoom Search Engine

                              Comment

                              Working...
                              X