I'm pretty new to Pandas and Python, but have solid coding background. I've decided to pick this up because it will help me automate certain financial reports at work..
To give you a basic background of my issue, I'm taking a PDF and using Tabula to reformat it into a CSV file, which is working fine but giving me certain formatting issues. The reports come in about 60 page PDF files, which I am exporting to a CSV and then trying to manipulate the data in Python using Pandas.
The issue: when I reformat the data, I get a CSV file that looks something like this -
The issue here is that certain tables are shifting and I think it is due to the amount of pages and multiple headings within those.
Would it be possible for me to reformat this data using Pandas, and basically create a set of rules for how it gets reformatted?
- Basically, I would like to shift the rows that are misplaced back into their respective places based on something like blank spaces.
- Is it possible for me to delete rows with certain strings - deleting extra/unnecessary headers.
- Can I somehow save the 'Total' data at the bottom by searching for the row with 'Total' and placing it somewhere else?
In essence, is there a way to partition this data by a set of commands (without specifying row numbers - because this changes daily) and then reposition it accordingly so that I can manipulate the data however necessary?
