0

I am importing a CSV file to a Pandas dataframe. The CSV file is something like:

Time,    Status, Variable, freq_1, freq_2, freq_3, .....
1/1/2000,  Hi,      A,      0.1,    3.3,    8.1, ....
1/1/2000,  Hi,      B,      2.4,    1.2,    1.3, ....
1/1/2000,  Lo,      A,      4.5,    6.9,    6.4, ....
1/1/2000,  Lo,      B,      7.1,    8.8,    2.3, ....
2/1/2000,  Hi,      A,      0.1,    3.3,    8.1, ....
2/1/2000,  Hi,      B,      2.4,    1.2,    1.3, ....
2/1/2000,  Lo,      A,      4.5,    6.9,    6.4, ....
2/1/2000,  Lo,      B,      7.1,    8.8,    2.3, ....
....

I read it into a dataframe with a multi-index using Time, Status and Variable as indicies.

I would now like to import the dataframe into Xarray using Pandas to_xarray or Xarray from_dataframe. However, both of these methods appear to choke on the index, throwing the error:

TypeError: Could not convert tuple of form (dims, data[, attrs, encoding]): (0, DatetimeIndex(['2018-01-12 00:15:00', '2018-01-12 00:45:00',
               '2018-01-12 01:15:00', '2018-01-12 01:45:00',
               '2018-01-12 02:15:00', '2018-01-12 02:45:00',
               '2018-01-12 03:15:00', '2018-01-12 03:45:00',
               '2018-01-12 04:15:00', '2018-01-12 04:45:00',
               ...
               '2019-11-01 16:15:00', '2019-11-01 17:15:00',
               '2019-11-01 17:45:00', '2019-11-01 18:15:00',
               '2019-11-01 18:45:00', '2019-11-01 19:15:00',
               '2019-11-01 20:45:00', '2019-11-01 21:15:00',
               '2019-11-01 21:45:00', '2019-11-01 22:15:00'],
              dtype='datetime64[ns]', name=0, length=3100, freq=None)) to Variable.

I have also tried using the Xarray.DataArray procedure:

Mytime = PD.to_datetime(df[0],infer_datetime_format=True)
Myfreq = np.array([ 0,1,2,3...39])
OutDataArray = Xarray.DataArray(df[np.arange(3,43)], coords=[('time', Mytime ), ('freq', Myfreq ), ('Status', df[1]), ('variable', df[2])])

but this gave the error:

ValueError: coords is not dict-like, but it has 4 items, which does not match the 2 dimensions of the data

So, how does one import a Pandas dataframe into Xarray if the dataframe is 2D, but one of those dimensions (i.e. the rows) actually consists of multiple dimensions specified by the multi-index of the dataframe?


As requested, here is an example script that reproduces the problem. Note you will need to set a filename for the CSV file of the example data that gets imported:

import numpy as np
import pandas as PD

# create some data
dt = PD.date_range(start='01/01/2000 00:00:00', end='02/01/2000 00:00:00', freq='30T')
ldt = len(dt)
vr1 = PD.Series(np.empty(ldt, dtype = np.str))
vr2 = PD.Series(np.empty(ldt, dtype = np.str))
vr3 = PD.Series(np.empty(ldt, dtype = np.str))
vr1.values[:] = 'apple'
vr2.values[:] = 'orange'
vr3.values[:] = 'peach'

# combine the data to mimic my file format
a = PD.Series([1,2,3,4], index=[7,2,8,9])
b = PD.Series([5,6,7,8], index=[7,2,8,9])
df1 = PD.DataFrame({'Time': dt,'Fruit':vr1, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df2 = PD.DataFrame({'Time': dt,'Fruit':vr2, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df3 = PD.DataFrame({'Time': dt,'Fruit':vr3, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df_unsorted = PD.concat([df1, df2, df3])
df = df_unsorted.sort_values(by=['Time', 'Fruit'])

# write the data to a csv file
filename = 'Give a file path/name here'
df.to_csv(filename, index=False)

#import the csv file and convert to an xarray
df2 = PD.read_csv(filename,  sep=',', skiprows=1, header=None, skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
da = df2.to_xarray()
1
  • Can you offer something reproducible? The to_xarray generally works, so I think more detail is needed Commented Mar 21, 2019 at 13:36

1 Answer 1

1

Your error seems to lie in the columns and indices from your csv file not being named in the resulting DataFrame. Replacing the last two lines of your code example with:

df2 = PD.read_csv(filename,  sep=',', skiprows=1, header=None, skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
df2.columns = ['N1', 'N2', 'N3']
df2.index.names = ['time', 'fruit']
ds = df2.to_xarray()

Results in a successful conversion to an xarray Dataset.

print(ds)

<xarray.Dataset>
Dimensions:  (fruit: 3, time: 1489)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T00:30:00 ... 2000-02-01
  * fruit    (fruit) object 'apple' 'orange' 'peach'
Data variables:
    N1       (time, fruit) float64 0.114 0.3726 0.5072 ... 0.2065 0.9082 0.7945
    N2       (time, fruit) float64 0.7534 0.1107 0.8866 ... 0.4509 0.5218 0.1472
    N3       (time, fruit) float64 0.156 0.6498 0.3521 ... 0.3742 0.5899 0.607

Update: you can skip manually setting column and index names by removing the skiprows=1 and header=None arguments in PD.read_csv(), getting the column names from the csv header. So your last two lines look like:

df2 = PD.read_csv(filename,  sep=',', skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
ds = df2.to_xarray()
Sign up to request clarification or add additional context in comments.

4 Comments

So Xarray can not handle a Pandas dataframe that has the default column labels (i.e. [0,1,2,3,...]) for a CSV file that does not have a header line?
It would appear that way. But you can make this simpler and not have to muck with manually setting the columns and index names by using the headers from the csv directly. I updated my answer.
Unfortunately, my CSV file headers are not suitable for column naming.
@Dan How can we customize a xarray.Dataset from dataframe if we use this method. Say coords will have additional variable that is not in the dim and datavariable N1 to have only time not fruit

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.