[TriLUG] intranet search engine recommendations
Jon Carnes
jonc at nc.rr.com
Fri May 10 15:43:26 EDT 2002
I use HTDig and it's what runs on TriLUG, but we do run the full re-index
every night. (I'm sure there is a way around that...). I've also run Namazu.
Namazu seemed to do a much better job with excel and word docs, but it was
also much more complex to setup. Plus the docs are translated english, and
that makes for some head-scatching while you are doing the setup.
Mandrake defaults to using Medusa. I don't know how far along that is in
developement, but if Mandrake uses it, it must be promising.
Jon
--- Original Message: Friday 10 May 2002 03:28 pm ---
> First, congratulations to the new board and thanks to those who served for
> the past year.
>
> Second, I'd like to ask for the group's recommendations on an intranet
> search (engine|tool) which runs on Linux and is suitable for a small to
> midsize intranet. I've been experimenting with htdig (distributed with Red
> Hat Linux) but have run into some apparent limitations:
>
> 1) Based on the most current information I could find, htdig cannot update
> an index for only modified files. For example, if 50 of 25000 fil es are
> modified in the course of a day, I'd like to be able to update the index
> for only the modified files. With htdig, I would have to repa rse and
> reindex all 25000 files just to get the 50 updates.
>
> 2) htdig (and/or its external parsers) seem to have a very large memory
> footprint for xls, doc, and pdf files over a few MB in size. Setting the
> max_doc_size to a small number (i.e. 500K) would cause most of our
> documents to be omitted from indexing.
>
> Any recommendations? I'm especially interested in anything that allows
> indices to be updated on modified files without reindexing unchanged f
> iles. I've looked at Google's product, but is quite costly.
>
> Thanks,
> Geoff
>
>
> _______________________________________________
> TriLUG mailing list
> http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ:
> http://www.trilug.org/~lovelace/faq/TriLUG-faq.html
More information about the TriLUG
mailing list