[TriLUG] Web Site Indexing

Lance A. Brown lance at bearcircle.net
Tue Jan 4 17:04:28 EST 2005


Greetings,

A non-profit organization I volunteer time to is working towards migrating 
their website to some kind of CMS platform to hopefully make it easier to 
manager, etc.  I've been asked if I can provide an inventory of the 
material on their existing site to help them get a grip on the scale of the 
task.  They have several thousand pages currently.

 From the requester: "The idea is that we would look at all the htm and 
html files and grab the filename, title, keywords, and all the links" and 
"... it would print out in export from something, looking like an excel 
spread sheet."

I could write a tool to do this, but I don't really have the time.  There 
must be tools available to crawl a website and generate these kinds of 
reports, but I'm not finding them.  F/OSS is preferred, but I'm willing to 
recommend a commercial solution if it'll do the job.

Can anyone offer a pointer?

Thanks,
   --[Lance]

-- 
  Celebrate The Circle: http://www.celebratethecircle.org/
  Carolina Spirit Quest:  http://www.carolinaspiritquest.org/
  My LiveJournal: http://www.livejournal.com/users/labrown/
  GPG Fingerprint: 409B A409 A38D 92BF 15D9 6EEE 9A82 F2AC 69AC 07B9



More information about the TriLUG mailing list