2

I am new to Python. I have 2000 files each about 100 MB. I have to read each of them and merge them into a big matrix (or table). Can I use parallel processing for this so that I can save some time? If yes, how? I tried searching and things seem very complicated. Currently, it takes about 8 hours to get this done serially. We have a really big server with one Tera Byte RAM and few hundred processors. How can I efficiently make use of this?

Thank you for your help.

6
  • 3
    Whether you can parallelize this depends on whether the merging phase is CPU-bound or I/O-bound, in addition to your hardware. Commented Nov 16, 2011 at 18:58
  • Just saying, Python probably isn't the best tool for this. Maybe there's more information that makes it a better fit, but something like C or C++ will probably be easier and more efficient and processing large amounts of data. Commented Nov 16, 2011 at 18:59
  • Is the merge operation something trivial/segmented, like concatenation? Or something like a reads/op/write (sum, e.g.)? Commented Nov 16, 2011 at 19:00
  • @root45: It might not be as bad as you think. The asker talks about matrices, so it may be possible to use something like numpy to do the grunt work in compiled code, while controlling the process from Python. Commented Nov 16, 2011 at 19:01
  • what format is the file in? have you tried numpy.fromfile? Commented Nov 16, 2011 at 20:15

1 Answer 1

1

You make be able to preprocess the files in separate processes using the subprocess module; however, if the final table is kept in memory, then that process will end up being you bottleneck.

There is another possible approach using shared memory with mmap objects. Each subprocess can be responsible for loading the files into a subsection of the mapped memory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.