I have 100000 images and I need to get the vectors for each image
imageVectors = []
for i in range(100000):
fileName = "Images/" + str(i) + '.jpg'
imageVectors.append(getvector(fileName).reshape((1,2048)))
cPickle.dump( imageVectors, open( 'imageVectors.pkl', "w+b" ), cPickle.HIGHEST_PROTOCOL )
getVector is a function that takes 1 image at a time and takes about 1 second to process a it. So, basically my problem reduces to
for i in range(100000):
A = callFunction(i) //a complex function that takes 1 sec for each call
The things that I have already tried are: (only the pseduo-code is given here)
1) Using numpy vectorizer:
def callFunction1(i):
return callFunction2(i)
vfunc = np.vectorize(callFunction1)
imageVectors = vfunc(list(range(100000))
2)Using python map:
def callFunction1(i):
return callFunction2(i)
imageVectors = map(callFunction1, list(range(100000))
3) Using python multiprocessing:
import multiprocessing
try:
cpus = multiprocessing.cpu_count()
except NotImplementedError:
cpus = 4 # arbitrary default
pool = multiprocessing.Pool(processes=cpus)
result = pool.map(callFunction, xrange(100000000))
4) Using multiprocessing in a different way:
from multiprocessing import Process, Queue
q = Queue()
N = 100000000
p1 = Process(target=callFunction, args=(N/4,q))
p1.start()
p2 = Process(target=callFunction, args=(N/4,q))
p2.start()
p3 = Process(target=callFunction, args=(N/4,q))
p3.start()
p4 = Process(target=callFunction, args=(N/4,q))
p4.start()
results = []
for i in range(4):
results.append(q.get(True))
p1.join()
p2.join()
p3.join()
p4.join()
All the above methods are taking immensely huge time. Is there any other way more efficient than this so that maybe I can loop through many elements simultaneously instead of sequentially or in any other faster way.
The time is mainly being taken by the getvector function itself. As a work around, I have split my data into 8 different batches and running the same program for different parts of the loop and running eight separate instances of python on a octa-core VM in google cloud. Could anyone suggest if map-reduce or taking help of GPU's using PyCuda may be a good option?
getvector(fileName)takes 1 second, and you have 100 million files. You need to use more cores. 1 gets you the result in 1157 days, 4 cores in 289. You need 1100 cores to get it down to one day.np.vectorize. If your function takes 1 sec to process a 2048 element array, then the problem is in that function. And if that is a complextensorflowprocess then the solution lies in understanding that package, not trying to make the looping process 'more parallel'. It's not the loop that's slowing you down, it doing that 'getvector' many times on many different files.