[TriLUG] Awk question
Steve Litt
slitt at troubleshooters.com
Tue Aug 7 23:39:00 EDT 2007
On Tuesday 07 August 2007 17:00, Mark Freeze wrote:
> >When you talk about fast, I assume you mean "fast to
> >implement". You don't really care how fast something like
> >this runs do you?
>
> Actually I do. We run this file every day for one of our clients. We
> originally wrote the script in Perl that imported the records into a
> MySQL database, then executed stored procedures to eliminate
> duplicates and summarize the packeted information.
My impression is that properly written awk runs very fast -- maybe a little
faster than Perl.
Awk also develops very fast for problems of the type you're solving. See my
awk content here:
http://www.troubleshooters.com/codecorn/awk/index.htm
and here
http://www.troubleshooters.com/lpm/200704/200704.htm#_An_awk_Primer
Back to your specific problem, if your fields are *always* separated by
spaceDashSpace, I'd pipe it through sort into an awk program, with simple
break logic, that summarizes. I'd imagine you're talking a ten or twenty line
program at the most.
If the SpaceDashSpace isn't reliable, I'd run it through an initial awk
program that takes varients with no space and the like, and outputs as a
reliable SpaceDashSpace and then pipe that through sort and the summarization
awk.
HTH
SteveT
More information about the TriLUG
mailing list