Random Forest evaluation for array with floats and integers - numpy

Question

I have an array which contains feature values as floats and I have an array of labels, which are integers - 1 and 0.

Example: feature values:

[[  17.99    10.38   122.8   ...,    0.147    0.242    0.079]
 [  20.57    17.77   132.9   ...,    0.07     0.181    0.057]]

When I append labels to the array of feature values, the labels become floats. Example - feature_values with appended 0:

[[  17.99    10.38   122.8   ...,    0.242    0.079    0.   ]]

When I run the following code:

training_set = data_features[:,0:9] 
test_set = data_features[:,9] 
seed = 7
num_trees = 100
max_features = 3
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, training_set, test_set, cv=kfold)
print(results.mean())

I get an error :

raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

From what I've read, I see that this is happening because the labels are floats.

If I change the dtype of feature values to "int", the code does work, but I need to preserve the floats.

Is there any way to have labels as integers and feature values as floats so that the code works?

test_set = data_features[:,9].astype(int) this should do the trick. — Vikash Singh
– Vikash Singh, Commented Mar 6, 2017 at 17:31
but my test set is 10% from my training set, which is also floats. If I do .astype(int) it makes the test set zeros. — nanachan
– nanachan, Commented Mar 6, 2017 at 17:36
of you only need to convert one column to int. Got it. let me check. If it's a standard example can you share more code or link to it. — Vikash Singh
– Vikash Singh, Commented Mar 6, 2017 at 17:38
It was actually my mistake, i put the labels into a separate array and your solution works. Thank you! — nanachan
– nanachan, Commented Mar 6, 2017 at 17:41

Vikash Singh · Accepted Answer · 2017-03-06 17:42:38Z

1

you need to convert the y_labels to integers so RandomForestClassifier can train on it.

test_set = data_features[:,9].astype(int)

answered Mar 6, 2017 at 17:42

Vikash Singh

14.1k9 gold badges45 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Random Forest evaluation for array with floats and integers - numpy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related