2

Say I have this:

import polars as pl
import polars.selectors as cs

df = pl.from_repr("""
┌─────┬─────┬─────┐
│ j   ┆ k   ┆ l   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 71  ┆ 79  ┆ 67  │
│ 26  ┆ 42  ┆ 55  │
│ 12  ┆ 43  ┆ 85  │
│ 92  ┆ 96  ┆ 14  │
│ 95  ┆ 26  ┆ 62  │
│ 75  ┆ 14  ┆ 56  │
│ 61  ┆ 41  ┆ 75  │
│ 74  ┆ 97  ┆ 70  │
│ 73  ┆ 32  ┆ 10  │
│ 66  ┆ 98  ┆ 40  │
└─────┴─────┴─────┘
""")

and I want to apply the same when/then/otherwise condition on multiple columns:

df.select(
    pl.when(cs.numeric() < 50)
      .then(1)
      .otherwise(2)
)

This fails with:

DuplicateError: the name 'literal' is duplicate

How do I make this use the currently selected column as the alias? I.e. I want the equivalent of this:

df.select(
    pl.when(pl.col(c) < 50)
      .then(1)
      .otherwise(2)
      .alias(c)
    for c in df.columns
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j   ┆ k   ┆ l   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 2   ┆ 2   ┆ 2   │
│ 1   ┆ 1   ┆ 2   │
│ 1   ┆ 1   ┆ 2   │
│ 2   ┆ 2   ┆ 1   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 2   ┆ 2   │
│ 2   ┆ 1   ┆ 1   │
│ 2   ┆ 2   ┆ 1   │
└─────┴─────┴─────┘

1 Answer 1

5

DuplicateError

The issue is that the output name of when/then comes from the first .then() branch.

In this case 1 is parsed as pl.lit(1) which has the default name of literal.

pl.when(cs.numeric() > 50).then(1)

You can think of it as there being an implicit alias() call with the name from the then branch.

pl.when(cs.numeric() > 50).then(pl.lit(1)).alias("literal")

Expression expansion then turns this into 3 separate calls, so you get a DuplicateError.

pl.when(pl.col.j > 50).then(pl.lit(1)).alias("literal"),
pl.when(pl.col.k > 50).then(pl.lit(1)).alias("literal"),
pl.when(pl.col.l > 50).then(pl.lit(1)).alias("literal")

Keep name with a 'literal' then

.name.keep() can be added to use the column name as the output name instead.

df.select(
   pl.when(cs.numeric() < 50)
     .then(1)
     .otherwise(2)
     .name.keep()
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j   ┆ k   ┆ l   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 2   ┆ 2   ┆ 2   │
│ 1   ┆ 1   ┆ 2   │
│ 1   ┆ 1   ┆ 2   │
│ 2   ┆ 2   ┆ 1   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 1   ┆ 2   │
│ 2   ┆ 2   ┆ 2   │
│ 2   ┆ 1   ┆ 1   │
│ 2   ┆ 2   ┆ 1   │
└─────┴─────┴─────┘

Keep name with a 'column' then

As per the comments, if you want to use a col() inside then() you will get an error.

df.select(
    pl.when(cs.numeric() < 50).then(pl.col.j).otherwise(2).name.keep()
)
# DuplicateError: projections contained duplicate output name 'j'.

What you can do nest another when/then inside your then() branch.

df.select(
    pl.when(cs.numeric() < 50)
      .then(pl.when(False).then(cs.numeric() < 50).otherwise(pl.col.j))
      .otherwise(2)
)
shape: (10, 3)
┌─────┬─────┬─────┐
│ j   ┆ k   ┆ l   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 2   ┆ 2   ┆ 2   │
│ 26  ┆ 26  ┆ 2   │
│ 12  ┆ 12  ┆ 2   │
│ 2   ┆ 2   ┆ 92  │
│ 2   ┆ 95  ┆ 2   │
│ 2   ┆ 75  ┆ 2   │
│ 2   ┆ 61  ┆ 2   │
│ 2   ┆ 2   ┆ 2   │
│ 2   ┆ 73  ┆ 73  │
│ 2   ┆ 2   ┆ 66  │
└─────┴─────┴─────┘

You need the "column selector" inside the then() branch in order to retain the name.

pl.when(False) can be used to "broadcast" column j values into the other columns while keeping their name.

df.select(pl.when(False).then(cs.numeric() < 50).otherwise(pl.col.j))
shape: (10, 3)
┌─────┬─────┬─────┐
│ j   ┆ k   ┆ l   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 71  ┆ 71  ┆ 71  │
│ 26  ┆ 26  ┆ 26  │
│ 12  ┆ 12  ┆ 12  │
│ 92  ┆ 92  ┆ 92  │
│ 95  ┆ 95  ┆ 95  │
│ 75  ┆ 75  ┆ 75  │
│ 61  ┆ 61  ┆ 61  │
│ 74  ┆ 74  ┆ 74  │
│ 73  ┆ 73  ┆ 73  │
│ 66  ┆ 66  ┆ 66  │
└─────┴─────┴─────┘

Technically you only .then(cs.numeric()) for the "inner" when, but I've just repeated the predicate.

Sign up to request clarification or add additional context in comments.

3 Comments

Ah, I was not aware of the .name namespace. Thanks!
Hm... This doesn't seem to work when .then uses a col. E.g. changing to .then(polars.col('j')) causes polars.exceptions.DuplicateError: the name: 'j' is duplicate. Similarly .otherwise(polars.col('k')) fails
Apologies for such a late reply. I've updated the answer to show how you can do that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.