How to create 2 column binary numpy array from string list?

Question

Input:

A string list like this:

['a', 'a', 'a', 'b', 'b', 'a', 'b']

Output I want:

A numpy array like this:

array([[ 1,  0],
       [ 1,  0],
       [ 1,  0],
       [ 0,  1],
       [ 0,  1],
       [ 1,  0],
       [ 0,  1]])

What I tried:

Try 1 - My starting data is actually stored in a column as a csv file. So I tried the following:

data1 = genfromtxt('csvname.csv', delimiter=',')

I did this because I thought I could manipulate the csv data into to form I want after I input it into the numpy format. However, the problem is I get all nan which is not a number. I'm not sure how else to go about this effectively because I need to do this for a large data set.

Try 2 - The ineffective method which I was thinking of doing:

For each element of the list, append [1,0] if a and append [0,1] if b.

Is there a better method?

The6thSense · Accepted Answer · 2016-01-08 06:54:28Z

4

Using List comprehension

Code:

import numpy
lst = ['a', 'a', 'a', 'b', 'b', 'a', 'b']
numpy.array([[1,0] if val =="a" else [0,1]for val in lst])

Output:

array([[1, 0],
    [1, 0],
    [1, 0],
    [0, 1],
    [0, 1],
    [1, 0],
    [0, 1]])

Note:

Rather then appending to a list\numpy array, creating a list is faster

answered Jan 8, 2016 at 6:54

The6thSense

8,3559 gold badges38 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

thundergolfer · Accepted Answer · 2016-01-08 07:08:00Z

3

Building List

import numpy as np
list = ['a','a','a','b','b','a','b']
np.array([[ch=='a',ch=='b'] for ch in list]).astype(int)

Output

array([[1, 0],
    [1, 0],
    [1, 0],
    [0, 1],
    [0, 1],
    [1, 0],
    [0, 1]])

Does this solve it for you?

answered Jan 8, 2016 at 7:08

thundergolfer

5571 gold badge5 silver badges18 bronze badges

4 Comments

thundergolfer Over a year ago

I didn't refresh the page to see I was second. Is my answer different enough to keep? Or do I delete my post when this happens?

pr338 Over a year ago

Yes I think it is different enough to keep! Thank you for your input!! Although both answers answer my question, who knows, your method may prove to be more useful for the next person who views this question.

The6thSense Over a year ago

@thundergolfer i feel that your answer maybe efficient then mine :). So just keep it.

The6thSense Over a year ago

And answering second or last does not matter providing a better output matters.

Divakar · Accepted Answer · 2016-01-08 07:36:52Z

2

NumPythonic vectorized method using np.unique -

((np.unique(A)[:,None] == A).T).astype(int)

Sample run -

In [9]: A
Out[9]: ['a', 'a', 'a', 'b', 'b', 'a', 'b']

In [10]: ((np.unique(A)[:,None] == A).T).astype(int)
Out[10]: 
array([[1, 0],
       [1, 0],
       [1, 0],
       [0, 1],
       [0, 1],
       [1, 0],
       [0, 1]])

answered Jan 8, 2016 at 7:36

Divakar

222k19 gold badges273 silver badges374 bronze badges

2 Comments

The6thSense Over a year ago

I have already up it. But have doubts 1. since there are only two value a,b why do you need to use np.unique and all isn't it over complicating things 2. Is this efficient thunder's answer ?

Divakar Over a year ago

@The6thSense Well thanks for the up! On the questions - 1) I am assuming OP has posted a sample case in the question, so there could be more than just a and b in it. 2) On efficiency, being a vectorized approach I would think this should be pretty fast, given enough unique letters to iterate with.

Collectives™ on Stack Overflow

How to create 2 column binary numpy array from string list?

3 Answers 3

Comments

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related