[TriLUG] Website Directory Listing via HTTP?

Thu Aug 25 15:42:46 EDT 2005

On Thu, 2005-08-25 at 15:13 -0400, Matt Frye wrote:
> Ok, now how about the perl to extract the dir listing?

Apache lists links from an auto-index page one per-line.  So something
like this might do:

$ wget http://foo.bar/index.html
$ grep href index.html |perl -p -e 's/^.*href=\"//; s/\".*$//;'

Grep eliminates lines that do not have links. The first s/// deletes
everything from the beginning of the line to href=".  The second s///
deletes everything after the closing quote.  Thus you get a list of
URLs.  !_This doesn't account for multiple links on a line_!

This could be fed back into wget with --input-file if so desired.  Of
course that option will take a raw html file as well, eliminating the
need for the perl.

Tim