5

This is something that I'm confused about...

import pandas as pd

# this works fine
df1 = pd.DataFrame(columns=['A','B'])

# but let's say I have this
df2 = pd.DataFrame([])

# this doesn't work!
df2.columns = ['A','B']
# ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 elements

Why doesn't this work? What can I do instead? Is the only way to do something like this?

if len(df2.index) == 0:
    df2 = pd.DataFrame(columns=['A','B'])
else:
    df2.columns = ['A','B']

There must be a more elegant way.

Thank you for your help!

Update 4/19/2015

Someone asked why do this at all:

df2 = pd.DataFrame([])

The reason is that actually I'm doing something like this:

df2 = pd.DataFrame(data)

... where data could be empty list of lists, but in most cases it is not. So yes, I could do:

if len(data) > 0:
    df2 = pd.DataFrame(data, columns=['A','B'])
else:
    df2 = pd.DataFrame(columns=['A','B'])

... but this doesn't seem very DRY (and certainly not concise).

Let me know if you have any questions. Thanks!

0

2 Answers 2

4

This looks like a bug in pandas. All of these work:

pd.DataFrame(columns=['A', 'B'])
pd.DataFrame({}, columns=['A', 'B'])
pd.DataFrame(None, columns=['A', 'B'])

but not this:

pd.DataFrame([], columns=['A', 'B'])

Until it's fixed, I suggest something like this:

if len(data) == 0: data = None
df2 = pd.DataFrame(data, columns=['A','B'])

or:

df2 = pd.DataFrame(data if len(data) > 0 else None, columns=['A', 'B'])
Sign up to request clarification or add additional context in comments.

Comments

3

Update: as of Pandas version 0.16.1, passing data = [] works:

In [85]: df = pd.DataFrame([], columns=['a', 'b', 'c'])

In [86]: df
Out[86]: 
Empty DataFrame
Columns: [a, b, c]
Index: []

so the best solution is to update your version of Pandas.


If data is an empty list of lists, then

data = [[]]

But then len(data) would equal 1, so len(data) > 0 is not the right condition to check to see if data is an empty list of lists.

There are a number of values for data which could make

pd.DataFrame(data, columns=['A','B'])

raise an Exception. An AssertionError or ValueError is raised if data equals [] (no data), [[]] (no columns), [[0]] (one column) or [[0,1,2]] (too many columns). So instead of trying to check for all of these I think it is safer and easier to use try..except here:

columns = ['A', 'B']
try:
    df2 = pd.DataFrame(data, columns=columns)
except (AssertionError, ValueError):
    df2 = pd.DataFrame(columns=columns)

It would be nice if there is a DRY-er way to write this, but given that it's the caller's responsibility to check for this, I don't see a better way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.