[TriLUG] MySQL - Perl
William Sutton
william at trilug.org
Mon Jul 9 18:08:09 EDT 2007
May I suggest putting unique key constraints on mySQL on (batch_id,
product_code, and account_value), and using REPLACE INTO for your record
insertions? Also, you may want to play with your table indeces in general
to get a performance improvement, and it never hurts to use prepared
queries in Perl.
--
William Sutton
On Mon, 9 Jul 2007, Mark Freeze wrote:
> Hi All,
> I have a question regarding an issue we are having with Perl & MySQL
> and I was wondering if anyone wanted to take a stab at it.
>
> We have a Perl script that parses text files we receive daily (around
> 1K to 25K records) into a MySQL database.
>
> The queries we issue from Perl perform the following actions: (All
> queries are limited to the records we just imported.)
>
> * Delete duplicate records within the database that have the same
> batch_id, product_code, and account_value, and retain only one record.
>
> * Examine the remaining records for sets that have the same batch_id
> and product_code, total the product_cost field for the set, place that
> amount in the record with the numerically highest account_value, then
> delete all of the other records.
>
> * Export all of the remaining records to a text file.
>
> * Mark the records as 'processed' within the database.
>
> The problem is this: With this exact scenario, we imported around
> 15,000 records today. After deleting duplicates and consolidating
> records, we exported around 5,000. Perl & MySQL took around 2 hrs to
> perform this task.
>
> Now I could have this all wrong, however, I feel that this operation
> should have taken far, FAR, less time. And, to compound things, our
> volume on this account is about to increase by around 150%.
>
> The delete_duplicates routine takes over an hour to run, and the
> summarize_accounts and export code takes over an hour. The import is
> very quick.
>
> So, I was wondering if anyone would like to throw an idea out there as
> to what they believe the most efficient (read 'speedy') method would
> be to perform these tasks.
>
> Best Regards,
> Mark.
>
More information about the TriLUG
mailing list