0

I've used pandas.read_csv to load in a file.

I've stored the file into a variable. The first column is a series of numbers separated by a comma (,) I want to split these numbers, and put each number to a new column.

I can't seem to find the write functionality for pandas.dataframe.

Side Note I would prefer a different library for loading in my file, but pandas provides some other different functionality which I need.

My Code:

Data = pandas.read_csv(pathFile,header=None)

doing: print Data gives me:

   0                          1         2          ...
0 [2014, 8, 26, 5, 30, 0.0]   0         0.25       ...

(as you can see its a date)

Question: How to split/separate each number and save it in a new array

p.s. I'm trying to achieve the same thing the matlab method datevec() does

1

2 Answers 2

3

If the CSV data looks like

"[2014, 8, 26, 5, 30, 0.0]",0,0.25    

then

import pandas as pd
import json

df = pd.read_csv('data', header=None)
dates, df = df[0], df.iloc[:, 1:]
df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1,
               ignore_index=True)
print(df)

yields

   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

with the values parsed as numeric values.


How it works:

dates, df = df[0], df.iloc[:, 1:]

peels off the first column, and reassigns df to the rest of the DataFrame:

In [217]: dates
Out[217]: 
0    [2014, 8, 26, 5, 30, 0.0]
Name: 0, dtype: object

dates contains strings:

In [218]: dates.iloc[0]
Out[218]: '[2014, 8, 26, 5, 30, 0.0]'

We can convert these to a list using json.loads:

In [219]: import json

In [220]: json.loads(dates.iloc[0])
Out[220]: [2014, 8, 26, 5, 30, 0.0]

In [221]: type(json.loads(dates.iloc[0]))
Out[221]: list

We can do this for each row of dates by using apply:

In [222]: dates.apply(lambda x: pd.Series(json.loads(x)))
Out[222]: 
      0  1   2  3   4  5
0  2014  8  26  5  30  0

By making lambda, above, return a Series, apply will return a DataFrame, with the index of the Series becoming the column index of the DataFrame.

Now we can use pd.concat to concatenate this DataFrame with df:

In [228]: df = pd.concat([df, dates.apply(lambda x: pd.Series(json.loads(x)))], axis=1, ignore_index=True)

In [229]: df
Out[229]: 
   0     1     2  3   4  5   6  7
0  0  0.25  2014  8  26  5  30  0

In [230]: df.dtypes
Out[230]: 
0      int64
1    float64
2    float64
3    float64
4    float64
5    float64
6    float64
7    float64
dtype: object
Sign up to request clarification or add additional context in comments.

1 Comment

Tnx! Great explanation. Cheers
0

How about

df
#                   datestr
#0  2014, 8, 26, 5, 30, 0.0
#1  2014, 8, 26, 5, 30, 0.0
#2  2014, 8, 26, 5, 30, 0.0
#3  2014, 8, 26, 5, 30, 0.0
#4  2014, 8, 26, 5, 30, 0.0

# each entry is a string
df.datestr[0]
#'2014, 8, 26, 5, 30, 0.0'

Then

date_order = ('year', 'month','day','hour','minute','sec') # order matters here, should match the datestr column 

for i,col in enumerate( date_order):
    df[col] = df.datestr.map( lambda x: x.split(',')[i].strip() )

#df
#                   datestr  year month day hour minute  sec
#0  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#1  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#2  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#3  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0
#4  2014, 8, 26, 5, 30, 0.0  2014     8  26    5     30  0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.