0

I'm trying to perform a dummy classification on a dataset for a school project. The idea is to get an idea of the frequency in which different political parties give speeches. My idea is to write this code in the following way:

from sklearn.dummy import DummyClassifier
import pandas as pd
import bz2


with bz2.open("data/ch3/speeches-201718.json.bz2") as source:
    speeches_201718 = pd.read_json(source)

with bz2.open("data/ch3/speeches-201819.json.bz2") as source:
    speeches_201819 = pd.read_json(source)


training_data, test_data = speeches_201718, speeches_201819

train_parties_count = training_data['party'].value_counts()
test_parties_count = test_data['party'].value_counts()
dummy_clf = DummyClassifier(strategy="most_frequent")

X = train_parties_count
y = train_parties_count.index
dummy_clf.fit(X.values, y)
print(X)
print(y)

test_parties_count.index = pd.CategoricalIndex(test_parties_count.index, categories=train_parties_count.index, ordered=True)
X_test = test_parties_count.sort_index()
print(X_test)
pred_mfc = dummy_clf.predict(X_test.values)

print("Urval av prediktioner [0-4]: ", pred_mfc[:5])

I get the following output: enter image description here

As you can see the prediction is C when it should be S, what can be incorrect?

I have tried defining the train and test data in multiple ways with no success.

1 Answer 1

1

The dummy estimators in sklearn are not intended for real problems (they are used to obtain baseline measures of performance using very simple rules). In your case, the dummy estimator is configured to always output "C" regardless of the input.

RandomForestClassifier is usually a good 'off-the-shelf' estimator. I'd suggest viewing the train score after you do the training in order to verify that the model is learning something. Then you can assess its performance on data it hasn't seen (a validation set).

For the purposes of getting an accuracy score, you could use my_classifier.score(X_data, y_data).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.