0

I'm pretty new to Pandas and Python, but have solid coding background. I've decided to pick this up because it will help me automate certain financial reports at work..

To give you a basic background of my issue, I'm taking a PDF and using Tabula to reformat it into a CSV file, which is working fine but giving me certain formatting issues. The reports come in about 60 page PDF files, which I am exporting to a CSV and then trying to manipulate the data in Python using Pandas.

The issue: when I reformat the data, I get a CSV file that looks something like this -

CSV Exported Data from PDF

The issue here is that certain tables are shifting and I think it is due to the amount of pages and multiple headings within those.

Would it be possible for me to reformat this data using Pandas, and basically create a set of rules for how it gets reformatted?

  • Basically, I would like to shift the rows that are misplaced back into their respective places based on something like blank spaces.
  • Is it possible for me to delete rows with certain strings - deleting extra/unnecessary headers.
  • Can I somehow save the 'Total' data at the bottom by searching for the row with 'Total' and placing it somewhere else?

In essence, is there a way to partition this data by a set of commands (without specifying row numbers - because this changes daily) and then reposition it accordingly so that I can manipulate the data however necessary?

5
  • Posting an example of actual csv data will help people play around with it - posting a minimal example that still includes the formatting pitfalls would be best. The expected result from the example data also helps. You have multiple issues, you might need to make multiple passes to resolve them and some of the corrections might be done prior to making a DataFrame out of it. Commented Jun 13, 2017 at 19:05
  • @wwii what do you mean? Commented Jun 13, 2017 at 19:07
  • If you post actual csv data instead of an image of data, people will be more likely to play around with it and offer solutions. The basic answer to all of your questions is Yes, you can pretty much manipulate data to make it look like you want - and there are probably multiple ways to solve those problems. Commented Jun 13, 2017 at 19:14
  • @wwii how should i go about posting that actual CSV data? Commented Jun 13, 2017 at 19:16
  • Include the actual csv text/string in the question and use the Code and Preformated Text Markdown - for example: stackoverflow.com/q/35755915/2823755 , stackoverflow.com/q/43638280/2823755. -- You can practice in the formatting sandbox - meta.stackexchange.com/questions/3122/formatting-sandbox Commented Jun 13, 2017 at 19:27

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.