0

I have a pandas dataframe over here with two columns: participant names and reaction times (note that one participant has more measures oh his RT).

    ID RT
0  foo  1
1  foo  2
2  bar  3
3  bar  4
4  foo  1
5  foo  2
6  bar  3
7  bar  4
8  bar  4

I would like to get a 2d array from this where every row contains the reaction times for one participant.

[[1,2,1,2]
[3,4,3,4,4]]

In case it's not possible to have a shape like that, the following options for obtaining a good a x b shape would be acceptable for me: fill missing elements with NaN; truncate the longer rows to the size of the shorter rows; fill the shorter rows with repeats of their mean value.

I would go for whatever is easiest to implement.

I have tried to sort this out by using groupby, and I expected it to be very easy to do this but it all gets terribly terribly messy :(

1 Answer 1

4
import pandas as pd
import io
data = io.BytesIO("""    ID RT
0  foo  1
1  foo  2
2  bar  3
3  bar  4
4  foo  1
5  foo  2
6  bar  3
7  bar  4
8  bar  4""")

df = pd.read_csv(data, delim_whitespace=True)
df.groupby("ID").RT.apply(pd.Series.reset_index, drop=True).unstack()

output:

    0  1  2  3   4
ID                 
bar  3  4  3  4   4
foo  1  2  1  2 NaN
Sign up to request clarification or add additional context in comments.

2 Comments

awesome! this seems to do exactly what I wanted! pity that pandas doesn't offer any more direct way for this functionality... One more thing: what if my column name of interest had a space in it ('reaction times' instead of 'RT')?
try: df.groupby("ID")["reaction times"]...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.