Back to photostream

Build your own search engine with ht://Dig

Most Linux users know how easily they can run a Web server on their favorite distros. Unfortunately, serving pages is one thing — finding them is another. That’s when many users turn to ht://Dig.

 

 

ht://Dig is more than a simple search script for a Web site. It combines a powerful collection of command-line search utilities with an easy-to-use CGI script. Properly configured, they work together to form a robust, extensible search engine for a domain or intranet.

 

 

Like Google, ht://Dig can search PDF, PostScript, Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files, in addition to the expected plain text and HTML files. Unlike some search utilities, it maintains its database in plain text files, keeping software dependencies low.

 

 

ht://Dig is available as a set of stable binary packages for all the major distros. Most split the program into two packages: htdig, which contains the command-line utilities, and htdig-web, which contains the CGI script. Download and install both from your favorite repository, or binaries and source code are available from the project’s site. As of this writing, the most recent production version is 3.1.6.

 

 

Out of the box, ht://Dig is limited to searching plain text and HTML files. Fortunately, a number of conversion utilities can expand its reach. This tutorial includes instructions for indexing PostScript, PDF, Microsoft Word, Microsoft PowerPoint, and Microsoft Excel Files.

blog.revold.us/build-your-own-search-engine-with-ht-dig/

11 views
0 faves
0 comments
Uploaded on April 24, 2025