[TriLUG] Was: Awk question Now: Awk, Perl, SQL

Wed Aug 8 11:34:32 EDT 2007

On 8/8/07, Mark Freeze <mfreeze at gmail.com> wrote:
> Hi Jeremy,
> Thanks for the SQL code.  We are going to give it a go this afternoon
> to see how it improves performance.
>
> Deleteing duplicates within the database is a seperate step that we
> take before we summarize the packeted data. We delete the duplicate
> records and then summarize the remaining data.  The reason that we
> import the dups in the first place is because we also do some
> calculation with them before we delete them. (We are now doing these
> calculations in Perl before the SQL import.)
>
> Thanks to everyone for the ideas.  (Especially Robert who just seems
> interested in pointing out that we've written some bad code. Very
> helpful Robert, very helpful...)
>
> Also, I'd still be interested in seeing if anyone would like to throw
> us a bone with an awk script to test.
>
> Regards,
> Mark.
>
It's frequently faster to create a new table with appropriate
constraints than to process deletes on a table. (ymmv, I don't install
MySQL)

Consider benchmarking CREATE TABLE AS SELECT ... vs your delete
routines. You can always follow that with dropping and renaming tables
and rebuilding indexes, if it's faster.