PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

getting stuck

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • getting stuck

    Every so often when I am indexing, the internet connection disconnects.

    Sometimes, when I reconnect, Zoom carries on indexing where it left off, but other times it doesn't and I have to begin indexing all over again.

    Pressing pause/resume indexing makes no difference.

    Other times, Zoom gets stuck even when the internet connection does not disconnect.

    Is there a way to save an indexing process so that, when this "getting stuck" happens, I don't have to begin all over again?

  • #2
    There were a few bugs fixed in the last few months that could result in a stalling problem.

    If you are not using V4.2 build 1013 you should upgrade.

    But in parallel you should also try and fix whatever the problem is causing your Internet connection to drop. In most western countries you can get connections that are stable for months at a time in most locations.

    -----
    David

    Comment


    • #3
      Thanks for that. Your newer version works perfectly.

      I know it's not fair to ask this question in this forum, but Google Scholar somehow extracts the title from the text of the article, even though the page title/meta data do not contain it.

      I don't suppose you know how they do this?

      Comment


      • #4
        We don't know for sure, but we can guess. Since they're only indexing a select few sites, and they require each institutions' cooperation (they actually need to contact Google and give them permission to index and include their material), it's quite possible that Google simply adds dedicated parsing code to pickup the relevant information based on each site's layout. It's always possible to do this when you are building a dedicated search engine for a select number of sites.

        It's also most likely they do not do this for any of the smaller or less important sites. They say in their FAQ that if you wish to promote the rank of your site, you would still need to change your page's layout and provide more suitable meta data so that it can be picked up. Google builds in support for Harvard etc. however, because it helps them to promote the usefulness of Google Scholar.

        Another possibility is that some sites may have different output when a client identifies itself as the Google Scholar Spider. This output would be different to what we would see when we access it with a normal browser. This is how many sites allow searching of "subscription only" material, where you will see (part of) the content in Google's search results, and when you go to the page, you can not access any of the content yourself without subscribing.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Interesting stuff.

          I suspect I won't be able to persuade other websites to alter their pages/meta data just for me though!

          Comment


          • #6
            pdf.desc files uploaded to the relevant website's server would solve the problem wouldn't it?

            Comment


            • #7
              Yes, but you would need to be able to upload the .pdf.desc files on the same server where the .pdf files are hosted.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                weird hybrid

                When I do

                input.zoom_button { background: transparent url(MyButton.bmp) no-repeat center top; }
                to input an image as th esearch button, I get this weird hybrid of a Submit button and the image I want.

                What am I doing wrong?

                Comment


                • #9
                  This is a general CSS question. Consult some online resources for using CSS on forms.

                  Google results for "css for forms":
                  http://www.google.com/search?hl=en&q=css+for+forms

                  Google results for "css image submit button":
                  http://www.google.com/search?hl=en&q...+submit+button
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment


                  • #10
                    Google Scholar Revisited

                    Revisiting the 'How does Google Scholar extract the title and authors from the document's body text' question.....

                    I think your "different output" for the GS Spider theory is the most plausible. This is because sometimes when I try and index particular websites, the Zoom Spider just indexes some garbage like "///~~~~" etc.

                    The fact that the Zoom spider got onto the page shows it was not password protected, and the fact that GS displays the same page perfectly is why I think it has an alternative output exclusively for GS.

                    Comment

                    Working...
                    X