[TriLUG] Website Directory Listing via HTTP?

Fri Aug 26 13:15:53 EDT 2005

On 8/26/05, Shane O'Donnell <shaneodonnell at gmail.com> wrote:
> So I'm trying to come up with text file listings of everything that's
> on the server (which is 180+GB or so) without having to download it
> all.  The previous "links -dump" suggestion comes close, but doesn't
> recurse.  wget recurses, but either downloads the file or provides
> back a very verbose message that I'm trying not to parse to hack the
> info out of it.  curl -l would do it, but only for an ftp server.

Have wget recurse and limit it to only getting .html files (or
even better, index.html files).  That will get the directory 
structure and a list (in whatever html format the server uses)
in each directory of what files reside in that directory.  You
can then, offline, use a perl  or something program to 
recurse through the directories and convert those index.html files
to text.  (Heck, recurse through and do lynx -dump on each
index.html file would probably do it.)

Cheers,
Tanner
-- 
Tanner Lovelace
clubjuggler at gmail dot com
http://wtl.wayfarer.org/
(fieldless) In fess two roundels in pale, a billet fesswise and an
increscent, all sable.