[TriLUG] Website Directory Listing via HTTP?
Tanner Lovelace
clubjuggler at gmail.com
Fri Aug 26 13:15:53 EDT 2005
On 8/26/05, Shane O'Donnell <shaneodonnell at gmail.com> wrote:
> So I'm trying to come up with text file listings of everything that's
> on the server (which is 180+GB or so) without having to download it
> all. The previous "links -dump" suggestion comes close, but doesn't
> recurse. wget recurses, but either downloads the file or provides
> back a very verbose message that I'm trying not to parse to hack the
> info out of it. curl -l would do it, but only for an ftp server.
Have wget recurse and limit it to only getting .html files (or
even better, index.html files). That will get the directory
structure and a list (in whatever html format the server uses)
in each directory of what files reside in that directory. You
can then, offline, use a perl or something program to
recurse through the directories and convert those index.html files
to text. (Heck, recurse through and do lynx -dump on each
index.html file would probably do it.)
Cheers,
Tanner
--
Tanner Lovelace
clubjuggler at gmail dot com
http://wtl.wayfarer.org/
(fieldless) In fess two roundels in pale, a billet fesswise and an
increscent, all sable.
More information about the TriLUG
mailing list