How to reformat dataframe in Pandas using Python?

Ask Question

Asked 8 years, 6 months ago

Modified 8 years, 6 months ago

Viewed 194 times

I'm pretty new to Pandas and Python, but have solid coding background. I've decided to pick this up because it will help me automate certain financial reports at work..

To give you a basic background of my issue, I'm taking a PDF and using Tabula to reformat it into a CSV file, which is working fine but giving me certain formatting issues. The reports come in about 60 page PDF files, which I am exporting to a CSV and then trying to manipulate the data in Python using Pandas.

The issue: when I reformat the data, I get a CSV file that looks something like this -

The issue here is that certain tables are shifting and I think it is due to the amount of pages and multiple headings within those.

Would it be possible for me to reformat this data using Pandas, and basically create a set of rules for how it gets reformatted?

Basically, I would like to shift the rows that are misplaced back into their respective places based on something like blank spaces.
Is it possible for me to delete rows with certain strings - deleting extra/unnecessary headers.
Can I somehow save the 'Total' data at the bottom by searching for the row with 'Total' and placing it somewhere else?

In essence, is there a way to partition this data by a set of commands (without specifying row numbers - because this changes daily) and then reposition it accordingly so that I can manipulate the data however necessary?

edited Jun 13, 2017 at 18:48

user8155614

154 bronze badges

asked Jun 13, 2017 at 18:22

sgerbhctim

3,6568 gold badges42 silver badges68 bronze badges

Posting an example of actual csv data will help people play around with it - posting a minimal example that still includes the formatting pitfalls would be best. The expected result from the example data also helps. You have multiple issues, you might need to make multiple passes to resolve them and some of the corrections might be done prior to making a DataFrame out of it.

wwii
– wwii

2017-06-13 19:05:55 +00:00
Commented Jun 13, 2017 at 19:05
@wwii what do you mean?

sgerbhctim
– sgerbhctim

2017-06-13 19:07:26 +00:00
Commented Jun 13, 2017 at 19:07
If you post actual csv data instead of an image of data, people will be more likely to play around with it and offer solutions. The basic answer to all of your questions is Yes, you can pretty much manipulate data to make it look like you want - and there are probably multiple ways to solve those problems.

wwii
– wwii

2017-06-13 19:14:54 +00:00
Commented Jun 13, 2017 at 19:14
@wwii how should i go about posting that actual CSV data?

sgerbhctim
– sgerbhctim

2017-06-13 19:16:35 +00:00
Commented Jun 13, 2017 at 19:16
Include the actual csv text/string in the question and use the Code and Preformated Text Markdown - for example: stackoverflow.com/q/35755915/2823755 , stackoverflow.com/q/43638280/2823755. -- You can practice in the formatting sandbox - meta.stackexchange.com/questions/3122/formatting-sandbox

wwii
– wwii

2017-06-13 19:27:07 +00:00
Commented Jun 13, 2017 at 19:27

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to reformat dataframe in Pandas using Python?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked