5

I have a Series such as follow:

example = pd.Series([[1.0, 1209.75, 1207.25],
 [1.0, 1211.0, 1207.5],
 [-1.0, 1211.25, 1205.75],
 [0, 1207.25, 1206.0],
 [1.0, 1206.25, 1201.0],
 [-1.0, 1205.75, 1202.75],
 [0, 1205.5, 1203.75]])

This Series has basically a list of 3 numbers in each cell. I turn it into a DataFrame and add a new column:

example = example.to_frame(name="input")
example["result"]=np.NaN

Now i would like to perform the following operation on it:

example["result"] = example["input"].apply(lambda x,y,z: y if x==1 else z if x==-1 else NaN)

I receive the following error message when trying to do it: missing 2 required positional arguments: 'y' and 'z'

2 Answers 2

6

The lambda only takes one argument which in this case is a list. Simply index the list:

>>> example["result"] = example["input"].apply(lambda lst: lst[1] if lst[0]==1 else lst[2] if lst[0]==-1 else np.NaN)
>>> example
                      input   result
0   [1.0, 1209.75, 1207.25]  1209.75
1     [1.0, 1211.0, 1207.5]  1211.00
2  [-1.0, 1211.25, 1205.75]  1205.75
3      [0, 1207.25, 1206.0]      NaN
4    [1.0, 1206.25, 1201.0]  1206.25
5  [-1.0, 1205.75, 1202.75]  1202.75
6      [0, 1205.5, 1203.75]      NaN

On a lighter note, you could refactor the nested ternary operators into a function with nested ifs, so your code is more readable:

def func(lst):
    x, y, z = lst
    if x == 1:
        return y
    elif x == -1:
        return z
    else:
        return np.NaN


example["result"] = example["input"].apply(func)
Sign up to request clarification or add additional context in comments.

4 Comments

Yes I found it too just now... Sorry guys. But funny how very often wording my question is enough for me to find the answer... Anyway, thanks! What do you mean with your comment? What would be your suggestion?
Thanks a lot. In the case of the function, why are we using x, y, z instead of x[0], x[1], x[2] as in the lambda? Are function and lambdas not supposed to be equivalent?
I'm passing lst as the parameter, not x. Just a change of name
Ok yes I had missed that one. Thanks
0

Here is a vectorized solution:

In [30]: example
Out[30]:
                      input
0   [1.0, 1209.75, 1207.25]
1     [1.0, 1211.0, 1207.5]
2  [-1.0, 1211.25, 1205.75]
3      [0, 1207.25, 1206.0]
4    [1.0, 1206.25, 1201.0]
5  [-1.0, 1205.75, 1202.75]
6      [0, 1205.5, 1203.75]

In [31]: example['result'] = np.where(np.isclose(example.input.str[0], 1),
    ...:                              example.input.str[1],
    ...:                              np.where(np.isclose(example.input.str[0], -1),
    ...:                                       example.input.str[2],
    ...:                                       np.nan))
    ...:

In [32]: example
Out[32]:
                      input   result
0   [1.0, 1209.75, 1207.25]  1209.75
1     [1.0, 1211.0, 1207.5]  1211.00
2  [-1.0, 1211.25, 1205.75]  1205.75
3      [0, 1207.25, 1206.0]      NaN
4    [1.0, 1206.25, 1201.0]  1206.25
5  [-1.0, 1205.75, 1202.75]  1202.75
6      [0, 1205.5, 1203.75]      NaN

4 Comments

This does not handle the case where example.str[0] is -1
@MaxU This is interesting except for the .isclose that is a bit unfortunate
@MosesKoledoye, thank you for pointing it out! I've corrected my answer
@jimbasquiat, when using float dtype it's better to compare values using isclose or allclose

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.