[TriLUG] MySQL - Perl
Andrew C. Oliver
acoliver at buni.org
Mon Jul 9 18:16:06 EDT 2007
Mark Freeze wrote:
> Hi All,
> I have a question regarding an issue we are having with Perl & MySQL
> and I was wondering if anyone wanted to take a stab at it.
>
> We have a Perl script that parses text files we receive daily (around
> 1K to 25K records) into a MySQL database.
>
> The queries we issue from Perl perform the following actions: (All
> queries are limited to the records we just imported.)
>
> * Delete duplicate records within the database that have the same
> batch_id, product_code, and account_value, and retain only one record.
>
remove this step, add a unique key unless these are updates
(http://dev.mysql.com/doc/refman/5.0/en/insert.html)
> * Examine the remaining records for sets that have the same batch_id
> and product_code, total the product_cost field for the set, place that
> amount in the record with the numerically highest account_value, then
> delete all of the other records.
>
don't do this in perl, use a temporary table and do it in SQL.
> * Export all of the remaining records to a text file.
>
> * Mark the records as 'processed' within the database.
>
> The problem is this: With this exact scenario, we imported around
> 15,000 records today. After deleting duplicates and consolidating
> records, we exported around 5,000. Perl & MySQL took around 2 hrs to
> perform this task.
>
The problem is that you're doing this over a remote transport in some
kind of loop a record at a time. Let's say each query takes 2 second
for all the remote queries you're doing (not that surprising because
you're doing a connection, stuff, receiving).
> Now I could have this all wrong, however, I feel that this operation
> should have taken far, FAR, less time. And, to compound things, our
> volume on this account is about to increase by around 150%.
>
> The delete_duplicates routine takes over an hour to run, and the
> summarize_accounts and export code takes over an hour. The import is
> very quick.
>
> So, I was wondering if anyone would like to throw an idea out there as
> to what they believe the most efficient (read 'speedy') method would
> be to perform these tasks.
>
Ideally you get everything into one or two tables and execute a set of
insert into select from statements and update statements then you can
dump from there. (set logic vs cursors)
Most probably you need an index or two to boot. W/o seeing it, I can
only make generalizations but they're probably accurate.
> Best Regards,
> Mark.
--
Buni Meldware Communication Suite
http://buni.org
Multi-platform and extensible Email,
Calendaring (including freebusy),
Rich Webmail, Web-calendaring, ease
of installation/administration.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3629 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.trilug.org/pipermail/trilug/attachments/20070709/dda0f52b/attachment.bin>
More information about the TriLUG
mailing list