1

I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder in Python. However, I keep getting the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split

# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status', 
                 'purpose', 'application_type', 'zipcode']

# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)

# Initialize the Target Encoder
encoder = TargetEncoder()

# Apply Target Encoding
for i in encoding_cols:
    X_train[i] = encoder.fit_transform(X_train[i], y_train)  # **Error occurs here**
    X_test_cv[i] = encoder.transform(X_test_cv[i])
    X_test[i] = encoder.transform(X_test[i])

want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby' error.

6
  • 3
    always put full error message because there are other useful information. Commented Mar 4 at 8:19
  • 1
    maybe it needs pandas.DataFrame because it has function groupby Commented Mar 4 at 8:19
  • i tried to run TargetEncoder with different objects- dataframe, list, numpy.array - and it always works, I can't reproduce problem with simple code. Maybe later I would try to run your colab code. At this moment you could use print() to check type() of data before fit_transform. Maybe it can explain what can make problem Commented Mar 4 at 16:01
  • 2
    (1) always put full error message because there are other useful information. (2) in colab you have little different code than in your question - it can make difference. Always show code which gives you error. (3) you could add link in question - it will be more visible, so more people may help you. Commented Mar 4 at 16:07
  • Please try to provide a minimal reproducible example. When I run most of your code I get the error you report, but when I try running just the data import, split, target definition, and encoder fit (without specifying columns) it works fine. Commented Mar 5 at 2:27

2 Answers 2

2

This is interesting. I can reproduce your error.

It is related to the dtype. To solve the issue you need to force a conversion using its list values and set the name and index explicitly.

y_train = pd.Series(y_train.tolist(), name='loan_status', index=y_train.index)

This will convert your initial dtype of CategoricalDtype(categories=[1, 0], ordered=False, categories_dtype=int64) to dtype('int64')

So you last cell in the Colab is now:

# Initialize TargetEncoder
encoder = ce.TargetEncoder(cols=encoding_cols)

# Here is the list conversion and back to series
y_train = pd.Series(y_train.tolist(), index=y_train.index)

# Fit and transform the training data
X_train = encoder.fit_transform(X_train, y_train)

and this works fine.

Sign up to request clarification or add additional context in comments.

3 Comments

Can you identify the issue, by inspecting y_train before and after the conversion? Something about dtypes maybe?
added more clarity to my answer
1

I'm the maintainer of Category Encoders. There was a problem in the library, I've fixed it now in version 2.8.1

1 Comment

thanks for the authoritative confirmation and the fix!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.