1

I would like to create a Series in pandas from a DataFrame that I have.

The DataFrame has 3 columns: 'date', 'time' and 'frequ'. I would like that the first two columns ('date' and 'time') would be the index of the new Series.

Unfortunately, the data which I have contains missing values. So when I try to convert to Series I have a problem to specify the index. Normally, if I wouldn't have missing values, I would use:

index = pd.data_range(start = df.date[0], end = '2015/03/06 17:07:05', freq = 'S') in the pd.Series( ) function.

But if I do that in my example, then I get an error because the length of the new index is longer than the actual one (the new has no missing values).

So here is a small sample of my DataFrame:

Out[2]: 
          date      time   frequ
0   2015/03/06  17:06:26  50.091
1   2015/03/06  17:06:27  50.087
2   2015/03/06  17:06:29  50.084
3   2015/03/06  17:06:30  50.083
4   2015/03/06  17:06:31  50.082
..         ...       ...     ...
33  2015/03/06  17:07:03  50.079
34  2015/03/06  17:07:04  50.078
35  2015/03/06  17:07:05  50.077

(So as can be seen, the value and time at 2015/03/06 17:06:28 is missing)

This is how the Series (ts) should look like more or less:

2015/03/06  17:06:26  50.091
2015/03/06  17:06:27  50.087
2015/03/06  17:06:29  50.084
2015/03/06  17:06:30  50.083
2015/03/06  17:06:31  50.082
...              ...     ...
2015/03/06  17:07:03  50.079
2015/03/06  17:07:04  50.078
2015/03/06  17:07:05  50.077

again, in this outcome the first two columns are the index

so if I will call for example:

In[3]: ts['2015/03/06 17:06:26': '2015/03/06 17:06:29']

i'll get:

out[3]: 
2015/03/06  17:06:26  50.091
2015/03/06  17:06:27  50.087
2015/03/06  17:06:29  50.084

Freq: S, dtype: float64

Finally, here is the code that I wrote:

import pandas as pd

data = {'frequ': sum_freq, 'time': sum_time, 'date': date_list}
df = pd.DataFrame(data, columns = ['date', 'time', 'frequ'])
ts = pd.Series(df.frequ.values, index = ???)

Does anybody have an idea how to overcome this problem?

Thanks!!!

(I use Python 2.7.6)

2 Answers 2

2

If the date column has dtype datetime64[ns] and the time column has dtype timedelta64[ns] then you can add them together to form a new column of dtype datetime64[ns]. Then you could set that column as the index and select the frequ column to obtain the desired Series:

import pandas as pd

df = pd.read_table('data', delim_whitespace=True)
df['date'] = pd.to_datetime(df['date'])
df['time'] = pd.to_timedelta(df['time'])
df['datetime'] = df['date'] + df['time']
ts = df.set_index(['datetime'])['frequ']

yields

datetime
2015-03-06 17:06:26    50.091
2015-03-06 17:06:27    50.087
2015-03-06 17:06:29    50.084
2015-03-06 17:06:30    50.083
2015-03-06 17:06:31    50.082
2015-03-06 17:07:03    50.079
2015-03-06 17:07:04    50.078
2015-03-06 17:07:05    50.077
Name: frequ, dtype: float64
Sign up to request clarification or add additional context in comments.

2 Comments

You could use ts.asfreq('S', method=None) to expand the timeseries to 1 second frequency, filling in missing values with NaN.
That works! to improve it I thought to fill in the gaps with 'nan'. So for example: '2015-03-06 17:06:26 50.091 2015-03-06 17:06:27 50.087 2015-03-06 17:06:28 nan 2015-03-06 17:06:29 50.084 '
2

Expanding on unutbu's answer, you also need to group on the index to ensure that there are no duplicates. You need to decide how you'd like to handle any such duplicates (e.g. sum them).

index = df.groupby('datetime')['frequ'].sum()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.