[TriLUG] Website Directory Listing via HTTP?
Timothy A. Chagnon
tchagnon at futeki.net
Thu Aug 25 15:42:46 EDT 2005
On Thu, 2005-08-25 at 15:13 -0400, Matt Frye wrote:
> Ok, now how about the perl to extract the dir listing?
Apache lists links from an auto-index page one per-line. So something
like this might do:
$ wget http://foo.bar/index.html
$ grep href index.html |perl -p -e 's/^.*href=\"//; s/\".*$//;'
Grep eliminates lines that do not have links. The first s/// deletes
everything from the beginning of the line to href=". The second s///
deletes everything after the closing quote. Thus you get a list of
URLs. !_This doesn't account for multiple links on a line_!
This could be fed back into wget with --input-file if so desired. Of
course that option will take a raw html file as well, eliminating the
need for the perl.
Tim
More information about the TriLUG
mailing list