This is not easy to do vectorize further (as far as I see), unless id has some structure. Otherwise a bottleneck might be doing id==dummy often, but the only solution I can think of would be the use of sorting, and due to the lack of a reduce functionality for np.max() still requires quite a bit of python code (Edit: There actually is a reduce function through np.fmax available). This is about 3x faster for a x being 1000x1000 and id/ids being in 0..100, but as its rather complex, its only worth it for larger problems with many ids:
def max_at_ids(x, id, ids):
# create a 1D view of x and id:
r_x = x.ravel()
r_id = id.ravel()
sorter = np.argsort(r_id)
# create new sorted arrays:
r_id = r_id[sorter]; r_x = r_x[sorter]
# unfortunatly there is no reduce functionality for np.max...
ids = np.unique(ids) # create a sorted, unique copy, just in case
# w gives the places where the sorted arrays id changes:
w = np.where(r_id[:-1] != r_id[1:])[0] + 1
I originally offered this solution which does a pure python loop over the slices, but below is a shorter (and faster) version:
# The result array:
max_x = np.empty(len(ids), dtype=r_x.dtype)
start_idx = 0; end_idx = w[0]
i_ids = 0
i_w = 0
while i_ids < len(ids) and i_w < len(w) + 1:
if ids[i_ids] == r_id[start_idx]:
max_x[i_ids] = r_x[start_idx:end_idx].max()
i_ids += 1
i_w += 1
elif ids[i_ids] > r_id[start_idx]:
i_w += 1
else:
i_ids += 1
continue # skip updating start_idx/end_idx
start_idx = end_idx
# Set it to None for the last slice (might be faster to do differently)
end_idx = w[i_w] if i_w < len(w) else None
return ids, max_x
EDIT: improved version for calculation of the maxium for each slice:
There is a way to remove the python loop by the use of np.fmax.reduceat, which might outperform the previous one a lot if the slices are small (and is actually quite elgant):
# just to 0 at the start of w
# (or calculate first slice by hand and use out=... keyword argument to avoid even
# this copy.
w = np.concatenate(([0], w))
max_x = np.fmin.reduceat(r_x, w)
return ids, max_x
Now there are probably some small things where it is possible to make this a little faster.
If id/ids has some structure it should be possible to simplify the code, and maybe use a different approach to achieve a much larger speedup. Otherwise the speedup of this code should be large, as long as there are many (unique) ids (and x/id arrays are not very small).
Note that the code enforces np.unique(ids), which is probably a good assumption though.