A Unix based indexing and query system. It is good for indexing relatively small amounts of data. Different types of indexes allow you to trade off search speed for index size. The default search engine used in Harvest. http://glimpse.cs.arizona.edu/
SWISH-Enhanced is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other text files. http://swish-e.org/
Searches all popular file types, with features including hit highlighting, natural language, fuzzy, phonic, boolean, proximity, field, numeric range. http://www.dtsearch.com/
Jakarta Lucene is a full-featured text search engine written entirely in Java, and it is an open source project available for free download from Apache Jakarta. The current goals of the project are primarily to provide application and also a platform for http://jakarta.apache.org/lucene/
KE Texpress is an object/relational database that supports text as well as multimedia objects. Runs on a wide variety of platforms including Linux. http://www.kesoftware.com/
Cheshire II is a "Next-Generation Online Catalog and Full-Text Information Retrieval System." It features advanced IR techniques, including support for Boolean and probabilistic 'best match' ranked searching, SGML/XML as the primary data base fo http://cheshire.berkeley.edu/
Information and select sections of a book about indexing and compression techniques for documents and images. Also provides information about open source IR system released with the book. http://www.cs.mu.oz.au/mg/