[TriLUG] MySQL - Perl

Andrew C. Oliver acoliver at buni.org
Mon Jul 9 18:16:06 EDT 2007


Mark Freeze wrote:
> Hi All,
> I have a question regarding an issue we are having with Perl & MySQL
> and I was wondering if anyone wanted to take a stab at it.
> 
> We have a Perl script that parses text files we receive daily (around
> 1K to 25K records) into a MySQL database.
> 
> The queries we issue from Perl perform the following actions: (All
> queries are limited to the records we just imported.)
> 
> * Delete duplicate records within the database that have the same
> batch_id, product_code, and account_value, and retain only one record.
> 

remove this step, add a unique key unless these are updates 
(http://dev.mysql.com/doc/refman/5.0/en/insert.html)

> * Examine the remaining records for sets that have the same batch_id
> and product_code, total the product_cost field for the set, place that
> amount in the record with the numerically highest account_value, then
> delete all of the other records.
> 

don't do this in perl, use a temporary table and do it in SQL.

> * Export all of the remaining records to a text file.
> 
> * Mark the records as 'processed' within the database.
> 
> The problem is this:  With this exact scenario, we imported around
> 15,000 records today.  After deleting duplicates and consolidating
> records, we exported around 5,000.  Perl & MySQL took around 2 hrs to
> perform this task.
> 

The problem is that you're doing this over a remote transport in some 
kind of loop a record at a time.  Let's say each query takes 2 second 
for all the remote queries you're doing (not that surprising because 
you're doing a connection, stuff, receiving).

> Now I could have this all wrong, however, I feel that this operation
> should have taken far, FAR, less time.  And, to compound things, our
> volume on this account is about to increase by around 150%.
> 
> The delete_duplicates routine takes over an hour to run, and the
> summarize_accounts and export code takes over an hour.  The import is
> very quick.
> 
> So, I was wondering if anyone would like to throw an idea out there as
> to what they believe the most efficient (read 'speedy') method would
> be to perform these tasks.
> 

Ideally you get everything into one or two tables and execute a set of 
insert into select from statements and update statements then you can 
dump from there.  (set logic vs cursors)

Most probably you need an index or two to boot.  W/o seeing it, I can 
only make generalizations but they're probably accurate.

> Best Regards,
> Mark.


-- 
Buni Meldware Communication Suite
http://buni.org
Multi-platform and extensible Email,
Calendaring (including freebusy),
Rich Webmail, Web-calendaring, ease
of installation/administration.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3629 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.trilug.org/pipermail/trilug/attachments/20070709/dda0f52b/attachment.bin>


More information about the TriLUG mailing list