Python - Modifying existing excel using Pandas and openpyxl

Question

I have an Excel file (Celebrities.xlsx) with multiple sheets and I'm trying to modify a single sheet called Relationships without modifying (or potentially erasing) other sheets. Here's what I've done.

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

# Name of the celebrity that I want to modify
celeb_name = 'Terence Stamp'

wb = load_workbook('Celebrities.xlsx')
ws = wb['Relationships']

df = pd.read_excel('Celebrities.xlsx', sheet_name='Relationships')

# This part is trivial, but basically I'm replacing every null cell in 'Link' column with the word 'empty' (of that particular celebrity)
df.loc[(df['Celebrity Name'] == celeb_name) & (df['Link'].isnull()), 'Link'] = 'empty'

for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)

wb.save('new.xlsx')

Now the script runs without any error and new.xlsx is created successfully, but when I try to open it, it gives me this error:

Warning loading document new.xlsx: The data could not be loaded completely because the maximum number of rows per sheet was exceeded.

And nothing has been modified!

I can assure that this part of the code works perfectly:

wb = load_workbook('Celebrities.xlsx')
ws = wb['Relationships']
wb.save('new.xlsx')

I suppose the problem is with this part of code:

for r in dataframe_to_rows(df, index=True, header=True):
    ws.append(r)

But I don't know how to fix it.

There is no point in using both load_workbook() and pandas.read_excel() on the same file. — Charlie Clark
– Charlie Clark, Commented Mar 14, 2019 at 10:19
You probably don't want to use append here but overwrite the exsiting cells. — Charlie Clark
– Charlie Clark, Commented Mar 14, 2019 at 11:05
@CharlieClark Yes, using both was confusing to me too. However, there's something that I use from each of them; I use DataFrame from Pandas and I modify the file with openpyxl. — Amir Shabani
– Amir Shabani, Commented Mar 14, 2019 at 11:29
@CharlieClark How can I overwrite the existing cells? That is, indeed, what I wanted to do initially. — Amir Shabani
– Amir Shabani, Commented Mar 14, 2019 at 11:29

BoarGules · Accepted Answer · 2019-03-14 09:46:56Z

3

You say in your question nothing has been modified. But it has. Your code is looping through the dataframe and adding a new row to the worksheet each time through the loop. The limit is 1,048,576 rows and Excel is telling you that the modified worksheet exceeds that limit.

edited Mar 14, 2019 at 9:46

answered Mar 14, 2019 at 9:38

BoarGules

17.1k3 gold badges29 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Amir Shabani Over a year ago

You're right, it has been modified. I wanted to overwrite the existing cells.

BoarGules Over a year ago

Matching the dataframe to the worksheet to do the update in place might be tricky. One solution would be to add a new empty worksheet to the workbook to hold the modified data, and populate that using ws.append(). But do you really need a dataframe at all? Why not modify the worksheet directly?

Amir Shabani Over a year ago

sorry for the delay! I use Pandas because of its flexibility and that I can do rather complicated things (like the code above) with it, and it's just easy to use. I didn't know how to save the modified DataFrame (while preserving formatting and not changing other contents (such as other sheets)), so I turned to openpyxl, which does this perfectly, but then I don't know how to do the complicated modifying with openpyxl (that I did with Pandas). Hope I explained it well!

BoarGules Over a year ago

To get your approach to work, get openpyxl to delete all the rows in the worksheet (ws.delete_rows()) before adding them all back again from pandas with ws_append(). Though the sheer inefficiency of that makes me cringe. I think you would be better off learning how to loop through the rows of ws and modifying them directly. I reckon it shouldn't be more than 4 lines of code. And with either approach, do yourself a favour: make a small abridged version of your workbook instead of testing your code using one that has more than half a million rows in it.

Collectives™ on Stack Overflow

Python - Modifying existing excel using Pandas and openpyxl

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related