Spec Intranet Search

Intranet Search Engine


Basic Requirements (Musts)

  1. Document Types - at least searching HTML, Textfiles and PDF documents
  2. Indexing directories more important than crawling - must be able to index a directory tree of files rather than using a http crawler to just follow links
  3. Source Code - access to source code, so we could modify the engine by ourself
  4. Speed - must be as fast as possible
  5. Filtering - ability to omit specified files from indexing, perfectly with regular expression
  6. Character set - can handle ISO-Latin-1, Umlaute and spaces
  7. Operators - at least AND, OR

Nice to have

  1. Search Option - partial search for words
  2. Fuzzy searching - Searching with misspelled words
  3. Output - partial output of found documents
  4. Licence - Open Source / GNU General Public License
  5. Score - Relevancy score, should be able to modify the formula

Already known products

Product URL Significant Feature(s) Status
Vestris www.vestris.com [ - ] Indexing only via URL spidering, cannot index file trees
[ - ] Commercial & Source code not available
---
Swish-e sunsite.berkeley.edu/SWISH-E [ + ] Handles file names with spaces correctly (had to modify Perl script, but engine can do)
[ + ] Better matches (compare e.g. «3COM Adapter»)
[ + ] well documentation, GNU puclic license
Installed
UDMSearch mysearch.udm.net [ + ] well documentation, GNU puclic license
[ + ] Database-driven
Installed
ht://Dig www.htdig.org [ + ] Synonym base, transforms known english words in grammar cases, e.g. contact to contacted, contacting, contacts
[ - ] Sent a question if there's any way to tell ht://Dig to index a directory tree of files rather than using the crawler to just follow links
Installed
Glimpse / Webglimpse webglimpse.org [ - ] perplexed organized docu, $200 - $500 Installed

.

[ my Home - SWISH-E Off. Home ]