[TriLUG] Web Form Data

Josh Vickery josh at vickeryj.com
Mon Oct 16 15:30:23 EDT 2006


If you go the java httpclient path, you will probably find tagsoup
(http://home.ccil.org/~cowan/XML/tagsoup/) helpful.

Also, you might want to check out Open QA's Selenium
(http://www.openqa.org/selenium/).  It's intended to be used as a test
tool, but you might find it useful if faced with particularly nasty
javascript in the webpages you are intending to scrape.

Josh

On 10/16/06, Phillip Rhodes <mindcrime at cpphacker.co.uk> wrote:
> Owen Berry wrote:
> > If you can write Perl code, take a look at LWP::UserAgent,
> > HTML::TreeBuilder and HTTP::Cookies (if you need cookies) for it to
> > work.  I've used this to bulk retrieve information off a website (with
> > permission) using forms, cookies etc.
> >
> Of if Java appeals to you, take a look at Jakarta HTTPClient:
>
>  <http://jakarta.apache.org/commons/httpclient/>
>
>
> TTYL,
>
> Phil
> --
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
>



More information about the TriLUG mailing list