How to create feature columns for TensorFlow classifier

Question

I have a very simple dataset for binary classification in csv file which looks like this:

"feature1","feature2","label"
1,0,1
0,1,0
...

where the "label" column indicates class (1 is positive, 0 is negative). The number of features is actually pretty big but it doesn't matter for that question.

Here is how I read the data:

train = pandas.read_csv(TRAINING_FILE)
y_train, X_train = train['label'], train[['feature1', 'feature2']].fillna(0)

test = pandas.read_csv(TEST_FILE)
y_test, X_test = test['label'], test[['feature1', 'feature2']].fillna(0)

I want to run tensorflow.contrib.learn.LinearClassifier and tensorflow.contrib.learn.DNNClassifier on that data. For instance, I initialize DNN like this:

classifier = DNNClassifier(hidden_units=[3, 5, 3],
                               n_classes=2,
                               feature_columns=feature_columns, # ???
                               activation_fn=nn.relu,
                               enable_centered_bias=False,
                               model_dir=MODEL_DIR_DNN)

So how exactly should I create the feature_columns when all the features are also binary (0 or 1 are the only possible values)?

Here is the model training:

classifier.fit(X_train.values,
                   y_train.values,
                   batch_size=dnn_batch_size,
                   steps=dnn_steps)

The solution with replacing fit() parameters with the input function would also be great.

Thanks!

P.S. I'm using TensorFlow version 1.0.1

unrelated to your question: you're filling missing values with 0 which I dont think would be appropriate given 0 is very meaningful in your dataset - its a ground truth label/class, and its a possible feature value. Which means whenever you're filling na, you're actually creating false training examples and assigning them to your negative (0) class — Simon
– Simon, Commented Mar 23, 2017 at 2:55
@Simon thanks for your comment! I realized that too. But I had done some feature engineering and I'm sure there are no missing values in the dataset. — Ilia Kopylov
– Ilia Kopylov, Commented Mar 23, 2017 at 2:59

Salem · Accepted Answer · 2018-07-17 09:24:21Z

7

You can directly use tf.feature_column.numeric_column :

feature_columns = [tf.feature_column.numeric_column(key = key) for key in X_train.columns]

edited Jul 17, 2018 at 9:24

answered May 23, 2018 at 15:34

Salem

3995 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ilia Kopylov · Accepted Answer · 2017-03-23 02:21:03Z

3

I've just found the solution and it's pretty simple:

feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)

Apparently infer_real_valued_columns_from_input() works well with categorical variables.

answered Mar 23, 2017 at 2:21

Ilia Kopylov

7887 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to create feature columns for TensorFlow classifier

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related