I am testing the following simple example (see comments in the coding below for background). I have two questions. Thanks.
- How come
binbottleis not updated even though the for loop did calculate the right value? - Is there an easier way to do this without using for loop? I heard that using loop can take a lot of time to run when the data is bigger than this simple example.
test = pd.DataFrame( [[1, 5], [1, 8], [1, 9], [2, 1], [3, 1], [4, 1]], columns=['a', 'b'] ) # Original df bottle = pd.DataFrame().reindex_like(test) # a blank df with the same shape bottle['a'] = test['a'] # set 'a' in bottle to be the same in test print(bottle) a b 0 1 NaN 1 1 NaN 2 1 NaN 3 2 NaN 4 3 NaN 5 4 NaN for index, row in bottle.iterrows(): row['b'] = test[test['a'] == row['a']]['b'].sum() print(row['a'], row['b']) 1.0 22.0 1.0 22.0 1.0 22.0 2.0 1.0 3.0 1.0 4.0 1.0 # I can see for loop is doing what I need. bottle a b 0 1 NaN 1 1 NaN 2 1 NaN 3 2 NaN 4 3 NaN 5 4 NaN # However, 'b' in bottle is not updated by the for loop. Why? And how to fix that? test['c'] = bottle['b'] # This is the end output I want to get, but not working due to the above. Also is there a way to achieve this without using for loop?