0

I am trying to replicate the following logic from Pandas, but using Numpy vectorization.

Also, I feel there might be a more Pythonic way of adding the Actual Available column without creating two separate variables series_1 and series_2 first, and also that is not verbose.

The logic behind [Actual Available] is,

  • if [Is First?] column is True then [Actual Available] = [Stock] + [Requirements] + [Receipts],
  • if [Is First?] column is False then [Actual Available] = [Prev row of Actual Available] + [Requirements] + [Receipts]

Any ideas?

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "Material": ["ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"],
    "Plant": [2685, 2685, 2685, 2685, 2685, 2685, 2685],
    "Year": ["2020", "2020", "2020", "2020", "2020", "2020", "2020"],
    "Week": [1, 2, 3, 4, 1, 2, 3],
    "Stock": [30, 30, 30, 30, 70, 70, 70],
    "Requirements": [10, 15, 20, 25, 20, 30, 40],
    "Receipts": [1, 2, 3, 4, 11, 12, 13]
})

print(df)

# Add [Is First?] column
df["Is First?"] = np.where(
    (df["Material"] == df["Material"].shift(1)) &
    (df["Plant"] == df["Plant"].shift(1)),
    False,
    True,
)

# Add [Actual Available] column
df["Actual Available"] = (df["Stock"] + df["Requirements"] +
                          df["Receipts"]).where(df["Is First?"].eq(True))

series_1 = df["Is First?"].eq(True).cumsum()
series_2 = (df["Actual Available"].ffill() +
            (df["Receipts"] +
             df["Requirements"]).shift(-1).groupby(series_1).cumsum().shift())

df["Actual Available"] = df["Actual Available"].fillna(series_2)

print(df)
1
  • What do you think adding .eq(True) does? Why the False and True in numpy.where()? Commented Feb 8, 2020 at 21:45

1 Answer 1

1

Starting from your initial DataFrame all of this logic seems to be a groupby + cumsum of 'Requirements' + 'Receipts' added to the 'Stock' column as 'Stock' is already repeated throughout the group.

df["Actual Available"] = df['Stock'] + df.groupby(['Material', 'Plant'])[['Requirements', 'Receipts']].cumsum().sum(1)

  Material  Plant  Year  Week  Stock  Requirements  Receipts  Actual Available
0      ABC   2685  2020     1     30            10         1                41
1      ABC   2685  2020     2     30            15         2                58
2      ABC   2685  2020     3     30            20         3                81
3      ABC   2685  2020     4     30            25         4               110
4      XYZ   2685  2020     1     70            20        11               101
5      XYZ   2685  2020     2     70            30        12               143
6      XYZ   2685  2020     3     70            40        13               196

In terms of "vectorization" pandas is built upon numpy so the performance is there. In addition pandas goes the extra mile for a lot of operations. DataFrame.GroupBy.cumsum() has a fast track implemented in cython so it's already been optimized a lot.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.