I have a script that processes files using multiprocessing. Here's a snippet:
from multiprocessing import Pool
import os
cores=multiprocessing.cpu_count()
def f_process_file(file):
rename file
convert file
add metadata
files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
p = multiprocessing.Pool(processes = cores)
async_result = p.map_async(f_process_file, files)
p.close()
p.join()
Which runs fine, except that I had to do some other actions before I can call f_process_file, which has other arguments. Here's the snippet:
def f_process_file(file, inventory, variety):
if variety > 1:
rename file with follow-up number
convert file
add metadata
else:
rename file without follow-up number
convert file
add metadata
# create list
files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
# create inventory list
inventories = [fn.split('_')[2].split('-')[0].split('.')[0] for fn in files]
# Check number of files per inventory
counter=collections.Counter(inventories)
for file in files:
inventory = file.split('_')[2].split('-')[0].split('.')[0]
matching = [s for s in sorted(counter.items()) if inventory in s]
for key,variety in matching:
f_process_file(file, inventory, variety)
I can't manage getting this executed using multiprocessing. Do you have any advise?
for file in filesloop into its own method (let's call itfile_processing), and then callingasync_result = p.map_async(file_processing, files)