'numpy.ndarray' object has no attribute 'groupby'

Question

I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder in Python. However, I keep getting the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'

from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split

# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status', 
                 'purpose', 'application_type', 'zipcode']

# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)

# Initialize the Target Encoder
encoder = TargetEncoder()

# Apply Target Encoding
for i in encoding_cols:
    X_train[i] = encoder.fit_transform(X_train[i], y_train)  # **Error occurs here**
    X_test_cv[i] = encoder.transform(X_test_cv[i])
    X_test[i] = encoder.transform(X_test[i])

want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby' error.

always put full error message because there are other useful information. — furas
– furas, Commented Mar 4 at 8:19
maybe it needs pandas.DataFrame because it has function groupby — furas
– furas, Commented Mar 4 at 8:19
i tried to run TargetEncoder with different objects- dataframe, list, numpy.array - and it always works, I can't reproduce problem with simple code. Maybe later I would try to run your colab code. At this moment you could use print() to check type() of data before fit_transform. Maybe it can explain what can make problem — furas
– furas, Commented Mar 4 at 16:01
(1) always put full error message because there are other useful information. (2) in colab you have little different code than in your question - it can make difference. Always show code which gives you error. (3) you could add link in question - it will be more visible, so more people may help you. — furas
– furas, Commented Mar 4 at 16:07
Please try to provide a minimal reproducible example. When I run most of your code I get the error you report, but when I try running just the data import, split, target definition, and encoder fit (without specifying columns) it works fine. — Ben Reiniger
– Ben Reiniger, Commented Mar 5 at 2:27

desertnaut · Accepted Answer · 2025-03-15 18:58:46Z

2

This is interesting. I can reproduce your error.

It is related to the dtype. To solve the issue you need to force a conversion using its list values and set the name and index explicitly.

y_train = pd.Series(y_train.tolist(), name='loan_status', index=y_train.index)

This will convert your initial dtype of CategoricalDtype(categories=[1, 0], ordered=False, categories_dtype=int64) to dtype('int64')

So you last cell in the Colab is now:

# Initialize TargetEncoder
encoder = ce.TargetEncoder(cols=encoding_cols)

# Here is the list conversion and back to series
y_train = pd.Series(y_train.tolist(), index=y_train.index)

# Fit and transform the training data
X_train = encoder.fit_transform(X_train, y_train)

and this works fine.

edited Mar 15 at 18:58

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

answered Mar 5 at 15:25

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ben Reiniger Mar 5 at 22:52

Can you identify the issue, by inspecting y_train before and after the conversion? Something about dtypes maybe?

seralouk Mar 6 at 14:52

added more clarity to my answer

Ben Reiniger Mar 6 at 15:35

Nice! Made a github issue for them: github.com/scikit-learn-contrib/category_encoders/issues/453

desertnaut · Accepted Answer · 2025-03-15 21:45:30Z

1

I'm the maintainer of Category Encoders. There was a problem in the library, I've fixed it now in version 2.8.1

edited Mar 15 at 21:45

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

answered Mar 15 at 16:18

Paul

1,1928 silver badges11 bronze badges

1 Comment

Christoph Rackwitz Mar 15 at 23:01

thanks for the authoritative confirmation and the fix!

Collectives™ on Stack Overflow

'numpy.ndarray' object has no attribute 'groupby'

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related