PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Wikipedia

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wikipedia

    We have wikipedia on our site and want to make it searchable. Do you have a list of directories we should skip?

    I noticed that you have also indexed wikipedia on your homepage to demonstrate the performance of your search software.

    By the way, we have purchased the software and find it very easy to install and well-documented.

    David Hart

  • #2
    First of all, we should clarify that "Wiki" is a type of server software that allows users to create a website which can be freely edited.
    http://wiki.org/wiki.cgi?WhatIsWiki

    "Wikipedia" is a website which uses the Wiki software to provide an online encylopedia.
    http://en.wikipedia.org/wiki/Wikipedia

    We thought we should clarify whether you actually have a mirror of the Wikipedia database, or just a Wiki-based website?

    If it is the latter, the skip list would differ depending on the version and Wiki software being used. Basically, you would want to look for parameters that appear in the URL of links, which is marked for "edit page", "login", "print", and "discussion" type of links - links which you would not want to index.

    The following is the skip list we used for indexing the online Wikipedia website (www.wikipedia.org) in our example here:
    http://www.wrensoft.com/cgi-bin/wikipedia/search.cgi

    Code:
    talk:
    Talk:
    &action
    Special:
    Wikipedia:
    Help:
    User:
    Template:
    Category:
    There may be more parameters that need to be skipped depending on your start point. But for our use, this worked quite well.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thanks very much. It did the trick.

      David

      Comment

      Working...
      X