1

I have a mysql table which contains a Primary auto_increment key.

I have 500 csv files, each about 3gb worth of data and the bulk of the data in one column.

Currently I'm loading the files into mysql using:

#!/bin/bash
for file in /files/*.csv
do
    mysql -e "load data local infile '$f' into table myTable FIELDS TERMINATED BY ',' 
    ENCLOSED BY '\"' escaped by '\"' IGNORE 1 LINES"  -u user -ppass
done

Are there any ways to improve performance? Maybe removing the primary key while inserting and then adding it afterwards? Or is there a way to insert in parallel instead of one file at a time?

3
  • You could try looping through each line and executing an insert for each line. It may or may not perform better than a one off load but it may be worth a go. Commented Jul 26, 2017 at 14:36
  • Take a look at stackoverflow.com/questions/2463602/… Commented Jul 26, 2017 at 16:44
  • Also, as far as inserting in parallel, I don't think it would help much since the processing steps will be the same as far as the engine goes, i.e. Same workload going through the same channel, but I could be wrong Commented Jul 26, 2017 at 16:47

1 Answer 1

1

The new MySQL Shell as of version 5.0.17 has a parallel bulk loader for CSV, TSV, and JSON files.

Sign up to request clarification or add additional context in comments.

2 Comments

Can you elaborate? Maybe a link to what exactly is it/how to use it?
See elephantdolphin.blogspot.com/2019/08/… for an example of using the new parallel bulk loader utility in the MySQL Shell

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.