2

Hello I have a dataset that looks like this:

array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)

I would like to convert is to a dataframe with this columns

columns = ["Gender";"FSIQ";"VIQ";"PIQ";"Weight";"Height";"MRI_Count"]

NB: From the array list the separator of rows values is a semicolon (;).Help me organize it to a dataframe with column names and row values from array

1
  • What have you tried so far? Please post your code. Commented Feb 26, 2020 at 9:25

2 Answers 2

2

Create DataFrame and Series.str.split with expand=True for new columns:

a = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)

df = pd.DataFrame(a)[0].str.split(';', expand=True)
df.columns = ['ID',"Gender","FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]

Last some data cleaning - removed traling "" by Series.str.strip and convert columns to numeric by to_numeric with DataFrame.apply:

df['Gender'] = df['Gender'].str.strip('"')
c = ["ID", "FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]
df[c] = df[c].apply(lambda x: pd.to_numeric(x.str.strip('"'), errors='coerce'))
print (df)
  ID  Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count
0  1  Female   133  132  124   118.0    64.5     816932
1  2    Male   140  150  124     NaN    72.5    1001121
2  3    Male   139  123  150   143.0    73.3    1038437
3  4    Male   133  129  128   172.0    68.8     965353
4  5  Female   137  132  134   147.0    65.0     951545
5  6  Female    99   90  110   146.0    69.0     928799
6  7  Female   138  136  131   138.0    64.5     991305
Sign up to request clarification or add additional context in comments.

Comments

2

Another potential solution would be to use io.StringIO and pandas.read_csv. Just join each element in the array with a \n character:

from io import StringIO

# Setup
a = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']])

columns = ["Gender", "FSIQ", "VIQ", "PIQ", "Weight", "Height", "MRI_Count"]

df = pd.read_csv(StringIO('\n'.join(a.ravel())), header=None,
                 sep=';', names=columns, na_values=['.'])

[out]

   Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count
1  Female   133  132  124   118.0    64.5     816932
2    Male   140  150  124     NaN    72.5    1001121
3    Male   139  123  150   143.0    73.3    1038437
4    Male   133  129  128   172.0    68.8     965353
5  Female   137  132  134   147.0    65.0     951545
6  Female    99   90  110   146.0    69.0     928799
7  Female   138  136  131   138.0    64.5     991305

pandas should do a pretty good job of interpreting dtypes

print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 1 to 7
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Gender     7 non-null      object 
 1   FSIQ       7 non-null      int64  
 2   VIQ        7 non-null      int64  
 3   PIQ        7 non-null      int64  
 4   Weight     6 non-null      float64
 5   Height     7 non-null      float64
 6   MRI_Count  7 non-null      int64  
dtypes: float64(2), int64(4), object(1)
memory usage: 448.0+ bytes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.