How to move Pandas multi-index dataframe to Xarray DataArray

Question

I am importing a CSV file to a Pandas dataframe. The CSV file is something like:

Time,    Status, Variable, freq_1, freq_2, freq_3, .....
1/1/2000,  Hi,      A,      0.1,    3.3,    8.1, ....
1/1/2000,  Hi,      B,      2.4,    1.2,    1.3, ....
1/1/2000,  Lo,      A,      4.5,    6.9,    6.4, ....
1/1/2000,  Lo,      B,      7.1,    8.8,    2.3, ....
2/1/2000,  Hi,      A,      0.1,    3.3,    8.1, ....
2/1/2000,  Hi,      B,      2.4,    1.2,    1.3, ....
2/1/2000,  Lo,      A,      4.5,    6.9,    6.4, ....
2/1/2000,  Lo,      B,      7.1,    8.8,    2.3, ....
....

I read it into a dataframe with a multi-index using Time, Status and Variable as indicies.

I would now like to import the dataframe into Xarray using Pandas to_xarray or Xarray from_dataframe. However, both of these methods appear to choke on the index, throwing the error:

TypeError: Could not convert tuple of form (dims, data[, attrs, encoding]): (0, DatetimeIndex(['2018-01-12 00:15:00', '2018-01-12 00:45:00',
               '2018-01-12 01:15:00', '2018-01-12 01:45:00',
               '2018-01-12 02:15:00', '2018-01-12 02:45:00',
               '2018-01-12 03:15:00', '2018-01-12 03:45:00',
               '2018-01-12 04:15:00', '2018-01-12 04:45:00',
               ...
               '2019-11-01 16:15:00', '2019-11-01 17:15:00',
               '2019-11-01 17:45:00', '2019-11-01 18:15:00',
               '2019-11-01 18:45:00', '2019-11-01 19:15:00',
               '2019-11-01 20:45:00', '2019-11-01 21:15:00',
               '2019-11-01 21:45:00', '2019-11-01 22:15:00'],
              dtype='datetime64[ns]', name=0, length=3100, freq=None)) to Variable.

I have also tried using the Xarray.DataArray procedure:

Mytime = PD.to_datetime(df[0],infer_datetime_format=True)
Myfreq = np.array([ 0,1,2,3...39])
OutDataArray = Xarray.DataArray(df[np.arange(3,43)], coords=[('time', Mytime ), ('freq', Myfreq ), ('Status', df[1]), ('variable', df[2])])

but this gave the error:

ValueError: coords is not dict-like, but it has 4 items, which does not match the 2 dimensions of the data

So, how does one import a Pandas dataframe into Xarray if the dataframe is 2D, but one of those dimensions (i.e. the rows) actually consists of multiple dimensions specified by the multi-index of the dataframe?

As requested, here is an example script that reproduces the problem. Note you will need to set a filename for the CSV file of the example data that gets imported:

import numpy as np
import pandas as PD

# create some data
dt = PD.date_range(start='01/01/2000 00:00:00', end='02/01/2000 00:00:00', freq='30T')
ldt = len(dt)
vr1 = PD.Series(np.empty(ldt, dtype = np.str))
vr2 = PD.Series(np.empty(ldt, dtype = np.str))
vr3 = PD.Series(np.empty(ldt, dtype = np.str))
vr1.values[:] = 'apple'
vr2.values[:] = 'orange'
vr3.values[:] = 'peach'

# combine the data to mimic my file format
a = PD.Series([1,2,3,4], index=[7,2,8,9])
b = PD.Series([5,6,7,8], index=[7,2,8,9])
df1 = PD.DataFrame({'Time': dt,'Fruit':vr1, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df2 = PD.DataFrame({'Time': dt,'Fruit':vr2, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df3 = PD.DataFrame({'Time': dt,'Fruit':vr3, 'N1': np.random.rand(ldt), 'N2': np.random.rand(ldt), 'N3': np.random.rand(ldt)})
df_unsorted = PD.concat([df1, df2, df3])
df = df_unsorted.sort_values(by=['Time', 'Fruit'])

# write the data to a csv file
filename = 'Give a file path/name here'
df.to_csv(filename, index=False)

#import the csv file and convert to an xarray
df2 = PD.read_csv(filename,  sep=',', skiprows=1, header=None, skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
da = df2.to_xarray()

Can you offer something reproducible? The to_xarray generally works, so I think more detail is needed — Maximilian
– Maximilian, Commented Mar 21, 2019 at 13:36

Dan · Accepted Answer · 2019-03-22 23:48:06Z

1

Your error seems to lie in the columns and indices from your csv file not being named in the resulting DataFrame. Replacing the last two lines of your code example with:

df2 = PD.read_csv(filename,  sep=',', skiprows=1, header=None, skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
df2.columns = ['N1', 'N2', 'N3']
df2.index.names = ['time', 'fruit']
ds = df2.to_xarray()

Results in a successful conversion to an xarray Dataset.

print(ds)

<xarray.Dataset>
Dimensions:  (fruit: 3, time: 1489)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T00:30:00 ... 2000-02-01
  * fruit    (fruit) object 'apple' 'orange' 'peach'
Data variables:
    N1       (time, fruit) float64 0.114 0.3726 0.5072 ... 0.2065 0.9082 0.7945
    N2       (time, fruit) float64 0.7534 0.1107 0.8866 ... 0.4509 0.5218 0.1472
    N3       (time, fruit) float64 0.156 0.6498 0.3521 ... 0.3742 0.5899 0.607

Update: you can skip manually setting column and index names by removing the skiprows=1 and header=None arguments in PD.read_csv(), getting the column names from the csv header. So your last two lines look like:

df2 = PD.read_csv(filename,  sep=',', skipinitialspace=True, index_col=[0,1], parse_dates=True, infer_datetime_format=True )
ds = df2.to_xarray()

edited Mar 22, 2019 at 23:48

answered Mar 22, 2019 at 21:33

Dan

1,19512 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

RJCL Over a year ago

So Xarray can not handle a Pandas dataframe that has the default column labels (i.e. [0,1,2,3,...]) for a CSV file that does not have a header line?

Dan Over a year ago

It would appear that way. But you can make this simpler and not have to muck with manually setting the columns and index names by using the headers from the csv directly. I updated my answer.

RJCL Over a year ago

Unfortunately, my CSV file headers are not suitable for column naming.

Deepika Rao Over a year ago

@Dan How can we customize a xarray.Dataset from dataframe if we use this method. Say coords will have additional variable that is not in the dim and datavariable N1 to have only time not fruit

Collectives™ on Stack Overflow

How to move Pandas multi-index dataframe to Xarray DataArray

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related