[TriLUG] The tool I'm looking for

Kristopher Kane kristopher.kane at gmail.com
Fri Mar 1 21:41:22 EST 2013


Nutch + Solr + <insert  scripting language of the day to do the query to
solr> - "poof!" = ...

Not sure if Nutch can do anything with the tgz files but it could traverse
the other links.

http://wiki.apache.org/nutch/NutchTutorial


On Fri, Mar 1, 2013 at 9:30 PM, Pete Soper <pete at soper.us> wrote:

> I'd like to be able to point something at the archives of a GNU Mailman
> list and have it go "poof!" and create a local (on my computer) something
> that could be interacted with using whatever (web browser, Ruby script, I
> don't care) that would allow searching the messages.  For instance if I had
> this URL: http://www.trilug.org/**pipermail/trilug/<http://www.trilug.org/pipermail/trilug/>it would be nice to have a tool that would grok the structure of that web
> page, get some or all of the .gz files, create a searchable database, and
> make available an interface such that a query result would point back to
> the archived messages. It would of course score extra bonus points if this
> mechanism could be copied to an arbitrary server so anybody on the Internet
> could enjoy the fruits of this tool's labor.
>
> Put a different way, it's the year 2013. How is it possible that the
> TriLUG email isn't searchable, or have I just not found it after all this
> time?
>
> -Pete
>
> --
> This message was sent to: Kristopher Kane <kristopher.kane at gmail.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that
> address.
> TriLUG mailing list : http://www.trilug.org/mailman/**listinfo/trilug<http://www.trilug.org/mailman/listinfo/trilug>
> Unsubscribe or edit options on the web  : http://www.trilug.org/mailman/**
> options/trilug/kristopher.**kane%40gmail.com<http://www.trilug.org/mailman/options/trilug/kristopher.kane%40gmail.com>
> TriLUG FAQ          : http://www.trilug.org/wiki/**
> Frequently_Asked_Questions<http://www.trilug.org/wiki/Frequently_Asked_Questions>
>



More information about the TriLUG mailing list