Baagle Desktop Search v2.0
==========================

Description
-----------

This is a work-alike to things like Google/Yahoo Desktop Search.

It is primarily designed for UNIX-like systems, but works passably on Windows
(it's received about 10 minutes of testing using ActivePerl and Swish-E for
Windows; YMMV) The basic principle is that you run an indexer that indexes all 
of your files, then run a simple dummy webserver which serves you the results 
of searches in a convenient web environment.

Requirements
------------

This package has the following requirements:

Swish-e 2.4.x
Perl    5.6+

additionally, the following non-standard Perl modules are required:

YAML
Tie::IxHash
LWP::UserAgent
Parallel::ForkManager
POE::Component::Server::HTTPServer

(some of these, particularly the last, may have their own requirements)
For all of the requirements I'd recommend your operating system's native 
package installer, followed by the perl installer for, followed by installing
them by hand.  So:

Windows:

1) Grab and use installers for Swish-E and ActiveState Perl
2) Run PPM to install the Perl modules

Unix:

1) Use apt-get, yum, portinstall, etc to grab and install everything for you;
   in the likely case that you can't find packages for one or more of the
   Perl modules:
2) Use the perl CPAN module to install the other modules


You may wish to install the zlib library for compressed index support, to save
on space.  Disk-space requirements are moderate; if you don't have enough disk 
to spare, you probably shouldn't be running this.

The SWISH::Filter module is now used for document conversions (only HTML, TXT,
and XML are natively supported, everything else requires a converter).  The 
following conversions are supported.  To add additional conversion support, 
add a module to the SWISH/Filters directory.  You will need to either run a full
reindex with -F or update the modification times (see touch(1)) of all of the
files you missed if you install one of the packages below or write your own 
filter.  See http://swish-e.com/docs/filter.html#writing_filters for more details.

  File Format            Requirements
  -----------            ------------
  Microsoft Word 	 catdoc (for basic text conversion) OR
                         wvWare (for nicer html conversion)

  Rich Text Format       rtfreader
  
  Microsoft Excel        Spreadsheet::ParseExcel perl module
  
  Adobe PDF              pdftotext and pdfinfo (part of the xpdf package)
  
  MP3 Audio              MP3::Tag perl module
  
Installation
------------

1) Copy baagle.conf.sample to baagle.conf and edit it; you will need to 
   minimally set SWISH_E, SWISH_PERL_LIB, and SEARCH_DIRS or WEB_HISTORY (or
   both), and maybe PORT to choose a different server port.  If you really 
   want to power-use, you can set OPENERS to configure programs for the system 
   to run for you when you click on certain files.

2) Run ./indexer (Windows: run "perl indexer")

3) (Optionally) put entries in your crontab to rerun indexer whenever you'd
   like; I'd suggest something like this:

   42 */2 * * * /path/to/this/dir/indexer >/dev/null 2>&1
   12   2 * * 0 /path/to/this/dir/indexer -F >/dev/null 2>&1
   
   That will run an incremental index (very fast) once every other hour, and
   a full index once a week.  If you have frequent changes to a small set of
   files, you may wish to increase the frequency of the first; if you have
   a large number of changes you may need to increase the frequency of the
   second.  If you have a just plain large dataset, you may wish to decrease
   the frequency of both.

   Windows: Uh, I don't know.. some sort of Scheduled Task or something?

4) Run ./server (Windows: "perl server")

5) Point your browser at http://localhost:2986/
   (if you changed PORT from the default; use that in place of 2986)
   You probably want to bookmark this url.

6) To kill the server, go to http://localhost:2986/quit

Notes
-----

This is designed for single-user systems.  If you run the indexer as yourself,
information from files only readable by you will be saved.  If you run the
server as yourself, any security problems that crop up in the server will run
as you.  If the server port is accessible by other people, and you have
LOCALHOST_ONLY set to false (not the default), other people may be able to 
access your index.  If other people have local access to your machine, they
will be able to access your index, period.

The indexer will always build a full index instead of an incremental when:

* The baagle.conf file has been modified (the most common case, I'd imagine)

* The indexer script has been modified (but you shouldn't really have to do
  this)

* The swish-e.conf file has been modified (you also shouldn't have to do this)

This whole thing is Copyright (c) 2004-2005 Floating Sheep Studios

Bugs
----

There's a bug in POE::Component::Server::HTTP which results in most browsers 
producing doubled log lines for the server.  There's a note and patch here:
http://www.mail-archive.com/poe@perl.org/msg02900.html
if you care.  I'm sure the next version of PCSH will probably fix this.

To Do
-----

* Clean up summary text some more

* Use SWISH::API after beating ports@freebsd.org over the head (see below)

* Build a scheduler into the server so you don't have to run the indexer out
  of cron

* Index filenames of files we don't handle, too, so at the very least you
  can find extensionless files you care about

FAQ
---

Q) Why are you calling the swish-e binary directly?  Why not use SWISH::API?

A) The FreeBSD port of swish-e does not install those modules for some reason
   and I haven't gotten around to bugging ports@freebsd.org about it (there
   doesn't appear to be a maintainer, currently).  Since _I_ don't have it
   installed without extra work, it's reasonable to assume other people won't
   either.  Running swish-e by hand is plenty fast enough anyway; this is 
   a server for running on your desktop, not serving hundreds of hits a second.

Q) This is kind of lame compared to (Google|Yahoo) Desktop Search.  Why did 
   you bother?

A) Because:
   1) Someone claimed I couldn't do it in a weekend, so of course I had to.
      (that was version 1.0)
   2) [GY]DS don't run on UNIX boxen.

Q) I have bags of money and want you to do some short-deadline demo-like project
   for me.  Who are you guys?

A) http://floatingsheep.com/
