[TriLUG] Small puzzle: Fix my bad one-liner [Was: TLDP]

cristobalpalmer at gmail.com cristobalpalmer at gmail.com
Mon Jan 5 10:43:32 EST 2015


My last post was a bit intemperate and I regret that. As penance, I’m presenting a puzzle.

> On Jan 4, 2015, at 9:00 PM, cristobalpalmer at gmail.com wrote:
> 
> $ for i in 1 2 3; do sudo zcat 2015/01/www.tldp.org.vhost$i.access.log.20150103.gz | awk '{print $1}' | sort -u >> /tmp/tldp ; done ; sort -u /tmp/tldp | wc -l
> 34300
> 
> About 34k distinct IPv4 addresses accessed it yesterday. Presumably for documentation.

My one-liner is pretty bad. It got the job (give a reasonable estimate of distinct clients accessing www.tldp.org for one recent day) done, but I count at least four things wrong with it. Using any tools that are part of the default install of the distro of your choice from the last three years, please construct a more succinct, readable, and/or efficient one-liner that counts distinct IPv4 addresses that have accessed the www.tldp.org virtual host on January 3rd.

Things you should note:

  * There are three different log files; one for each vhost node (i.e.. there are three OS instances running a identical web hosting stacks, and each has a log file that sits in a single shared directory)
  * A typical line looks like this (mangled for privacy):
    192.168.1.50 - - [03/Jan/2015:20:55:45 -0500] "GET /LDP/Linux-Filesystem-Hierarchy/html/index.html HTTP/1.1" 200 4860 "http://ubuntuforums.org/showthread.php?t=1637306" "Mozilla/5.0"
  * Answers that involve real metrics tools and log analysis tools are cool and good, but not in the spirit of this puzzle.[0]

Cheers,
--
Cristóbal Palmer
Technical Director, ibiblio.org
University of North Carolina at Chapel Hill
CB #3456, Manning Hall, Chapel Hill, NC 27599-3456

[0] We (ibiblio) got out of the analytics/metrics game for our several hundred vhosts back when google stopped sales of Urchin. We shifted analytics responsibilities to the individual vhosts, and the vast majority went with google analytics. Possibly we’ll revisit this when we get through more of our high-priority infrastructure changes.


More information about the TriLUG mailing list