[TriLUG] Awk question

Tue Aug 7 17:00:34 EDT 2007

>When you talk about fast, I assume you mean "fast to
>implement". You don't really care how fast something like
>this runs do you?

Actually I do.  We run this file every day for one of our clients.  We
originally wrote the script in Perl that imported the records into a
MySQL database, then executed stored procedures to eliminate
duplicates and summarize the packeted information.

In MySQL, the deletion of duplicate records was taking around 4 hours.
 'sort -u' trimmed this up to less than one second. Admittedly, our
RAID 50 setup was dragging it down by about 50%, but one second is a
big improvement from 2 hours.

When 'sort' gave us this type of performance we started looking for a
way to speed up the packet summarization (Did I just create a word?)
step.  MySQL is currently taking around 5 hours to analyze and
summarize around 40,000 records into 15,000 records.  A friend of ours
said that he thought that a combination of sort and awk would perform
this task without a hitch.  (The only hitch being that I don't know
anything about awk.)

So, a long answer to a short question is yes - we are looking for a
good, quick to implement and quick to run solution.

Thanks,
Mark.