4

I'm trying to read a fixed width file using pandas.read_fwf, and please see a sample of the file as below:

0000123456700123  
0001234567800045  

Say, column 0-11 is the balance (with format %12.2f), and column 11-16 is the interest rate (with format %6.2f). So my expected output data frame should look like this:

     Balance  Int_Rate  
0   12345.67      1.23  
1  123456.78      0.45

Here's my code for reading the file without formatting:

colspecs = [(0,11),(11,16)]  
header = ['Balance','Int_Rate']
df = pd.read_fwf("dataset",colspecs=colspecs, names=header)

I've checked the documentation of pandas.read_fwf, however it seems impossible to format the columns as an option during the importing process. Do I have to update the formats afterwards, or there's a better way to do it?

2
  • 2
    You could use the converters and dtype parameters. Commented Sep 24, 2015 at 16:35
  • @olivecoder I figure it out! Thx for your tip! Commented Sep 24, 2015 at 18:23

1 Answer 1

1

I had the same problem awhile back, I used struct then pandas

import struct
import pandas as pd

def parse_data_file(fieldwidths, fn):
    #
    # see https://docs.python.org/3.0/library/struct.html, for formatting and other info
    fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                         for fw in fieldwidths)
    fieldstruct = struct.Struct(fmtstring)
    umpack = fieldstruct.unpack_from

    # this part will dissect your data, per your fieldwiths
    parse = lambda line: tuple(s.decode() for s in umpack(line.encode()))
    df = []
    with open(fn, 'r') as f:
        for line in f:
            row = parse(line)
            df.append(row)
    return df

#
# test.txt file content, per below
# 6332      x102340   Darwin                                                                                              080007Darwin                                            1101
# 6332      x102342   Sydney                                                                                              200001Sydney                                            1101
file_location = "test.txt"
fieldwidths = (10 ,10 ,100 ,4 ,2 ,50 ,4)  # negative widths represent ignored padding fields

column_names = ['ID', 'LocationID', 'LocationName', 'PostCode', 'StateID', 'Address', 'CountryID']
fields = parse_data_file(fieldwidths=fieldwidths, fn=file_location)

# Pandas options
pd.options.display.width=500
pd.options.display.colheader_justify='left'

# assigned list into dataframe
df = pd.DataFrame(fields)
df.columns = column_names

print(df)

Output

    ID    LocationID  LocationName  PostCode StateID Address CountryID
    6332  x102340     Darwin        0800     07      Darwin  1101    
    6332  x102342     Sydney        2000     01      Sydney  1101   
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.