[TriLUG] MySQL - Perl

William Sutton william at trilug.org
Mon Jul 9 18:08:09 EDT 2007


May I suggest putting unique key constraints on mySQL on (batch_id, 
product_code, and account_value), and using REPLACE INTO for your record 
insertions?  Also, you may want to play with your table indeces in general 
to get a performance improvement, and it never hurts to use prepared 
queries in Perl.

-- 
William Sutton


On Mon, 9 Jul 2007, Mark Freeze wrote:

> Hi All,
> I have a question regarding an issue we are having with Perl & MySQL
> and I was wondering if anyone wanted to take a stab at it.
> 
> We have a Perl script that parses text files we receive daily (around
> 1K to 25K records) into a MySQL database.
> 
> The queries we issue from Perl perform the following actions: (All
> queries are limited to the records we just imported.)
> 
> * Delete duplicate records within the database that have the same
> batch_id, product_code, and account_value, and retain only one record.
> 
> * Examine the remaining records for sets that have the same batch_id
> and product_code, total the product_cost field for the set, place that
> amount in the record with the numerically highest account_value, then
> delete all of the other records.
> 
> * Export all of the remaining records to a text file.
> 
> * Mark the records as 'processed' within the database.
> 
> The problem is this:  With this exact scenario, we imported around
> 15,000 records today.  After deleting duplicates and consolidating
> records, we exported around 5,000.  Perl & MySQL took around 2 hrs to
> perform this task.
> 
> Now I could have this all wrong, however, I feel that this operation
> should have taken far, FAR, less time.  And, to compound things, our
> volume on this account is about to increase by around 150%.
> 
> The delete_duplicates routine takes over an hour to run, and the
> summarize_accounts and export code takes over an hour.  The import is
> very quick.
> 
> So, I was wondering if anyone would like to throw an idea out there as
> to what they believe the most efficient (read 'speedy') method would
> be to perform these tasks.
> 
> Best Regards,
> Mark.
> 



More information about the TriLUG mailing list