PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Spider mode and "file://" links

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spider mode and "file://" links

    Hi,

    I have "file://" links in my pages and when I try to index my site, all the "file://" links are scanned but the .doc, .pdf... documents are not processed by plugins...

    How can I solve this problem ?

    Thanks in advance for your help

  • #2
    Spider mode and "file://" links Reply

    Hi!

    I had a similar problem with the plugin for extensions (.pdf, .doc, etc.) until I figured out that I needed to add the extensions to the extension list on the Scan Options tab in the Configuration window. Have you tried that yet?

    Genice

    Comment


    • #3
      Re: Spider mode and "file://" links Reply

      Originally posted by agmilliner
      Hi!

      I had a similar problem with the plugin for extensions (.pdf, .doc, etc.) until I figured out that I needed to add the extensions to the extension list on the Scan Options tab in the Configuration window. Have you tried that yet?

      Genice
      Yes, the extensions are added in the extension list.
      With http:// links I have no problem... only with file:// links.

      Exemple :
      I start spider from the page http://server/site/index.htm
      This page contain the link file://///server/site/test.doc (equivalent to \\server\site\test.doc) (site is a shared folder on the server and a virtual directory in IIS)
      When I try to index, there is a line "Scanning..." but no line "Processing..." for the doc file
      If I replace the file:// link by a http:// link there's no problem

      Comment


      • #4
        I'm doing something similar on my website. It's a large website (600+ pages) that's used as an online help website for a university. The spider start URL is: http://www.iupui.edu/~seshelp/newsit..._help_home.htm

        The base URL is: http://www.iupui.edu/~seshelp/newsite/

        In the Configuration window on the Scan Options tab I added the the extensions we use (.pdf, .doc, .rtf., .xls, etc.) and selected the Scan files with no extensions checkbox directly below the extension list and the Scan files linked via "file://" URLs in spider mode checkbox.

        Some of my files are buried pretty deep in subfolders too like: E:\HRMSSIS\SIS\SES Training\SES Online Help\New SES Online Help Website\pdf_misc_files\student_records\grades\Manu al FN Process with Final Grades 8.23.05.pdf but it still finds them and works. When I index I get a "Processing PDF file..." message.

        Don't know if any of that will be helpful or frustrating but I hope you find something useful there.

        Comment


        • #5
          Originally posted by agmilliner
          I'm doing something similar on my website. It's a large website (600+ pages) that's used as an online help website for a university. The spider start URL is: http://www.iupui.edu/~seshelp/newsit..._help_home.htm

          The base URL is: http://www.iupui.edu/~seshelp/newsite/

          In the Configuration window on the Scan Options tab I added the the extensions we use (.pdf, .doc, .rtf., .xls, etc.) and selected the Scan files with no extensions checkbox directly below the extension list and the Scan files linked via "file://" URLs in spider mode checkbox.

          Some of my files are buried pretty deep in subfolders too like: E:\HRMSSIS\SIS\SES Training\SES Online Help\New SES Online Help Website\pdf_misc_files\student_records\grades\Manu al FN Process with Final Grades 8.23.05.pdf but it still finds them and works. When I index I get a "Processing PDF file..." message.

          Don't know if any of that will be helpful or frustrating but I hope you find something useful there.
          I have the same scan options configured...
          How did you format your link ? Have you got a example page with a file:// link on your site ?

          Comment


          • #6
            Sure Here's a URL: http://www.iupui.edu/~seshelp/newsit...t_job_aids.htm. If you look at the link for Historical Section Change this is for a PDF file. The link below it is also a PDF file. One disclaimer I have to post at this point is that this site is still under construction AND all information contained on this site is owned by Indiana University. Sorry, but IU requires that I post a notice any time someone outside the univeristy access this type of info.

            Comment


            • #7
              Originally posted by Anonymous
              Sure Here's a URL: http://www.iupui.edu/~seshelp/newsit...t_job_aids.htm. If you look at the link for Historical Section Change this is for a PDF file. The link below it is also a PDF file. One disclaimer I have to post at this point is that this site is still under construction AND all information contained on this site is owned by Indiana University. Sorry, but IU requires that I post a notice any time someone outside the univeristy access this type of info.
              All your links are http:// links not file:// links
              I have no problem with http links...
              In fact, I want to use file:// links because with http:// links you open documents with read only permissions.
              My site is an intranet and the users have to be able to modify documents.

              Comment


              • #8
                We had a look at this and have determined that this is a bug in the current version (4.2.1003). Zoom is not scanning the plugin supported files when they are linked via file:// style links and you are indexing in Spider Mode. This will be fixed in the next build (4.2.1004).
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment

                Working...
                X