8

I have a file with a number of lines formatted with the following syntax:

FIELD      POSITION  DATA TYPE
------------------------------
COOP ID       1-6    Character
LATITUDE     8-15    Real
LONGITUDE   17-25    Real
ELEVATION   27-32    Real
STATE       34-35    Character
NAME        37-66    Character
COMPONENT1  68-73    Character
COMPONENT2  75-80    Character
COMPONENT3  82-87    Character
UTC OFFSET  89-90    Integer

The data is all ASCII-formatted.

An example of a line is:

011084  31.0581  -87.0547   26.0 AL BREWTON 3 SSE                  ------ ------ ------ +6

My current thought is that I'd like to read the file in a line at a time and somehow have each line broken up into a dictionary so I can refer to the components. Is there some module that does this in Python, or some other clean way?

Thanks!

10
  • 2
    Are the columns broken up by tabs? Commented Aug 18, 2011 at 17:41
  • From what I understand, there may be no delineation between columns, so simple application of split() is not an option. Commented Aug 18, 2011 at 17:42
  • That is why the data's position in the string is listed. Commented Aug 18, 2011 at 17:43
  • If there is no delineation, how can we know that a part of a line is in the first column and not the second? (Edit: I do not understand what you mean in your second comment. Is the file you want to read different from the one in your post?) Commented Aug 18, 2011 at 17:45
  • Because, @murgatroid99, the first column is characters 1-6, the second column is characters 8-15, and so on. Commented Aug 18, 2011 at 17:46

3 Answers 3

16

EDIT: You can still use the struct module:

See the struct module documentation. Looks to me like you want to use struct.unpack()

What you want is probably something like:

import struct
with open("filename.txt", "r") as f:
    for line in f:
        (coop_id, lat, lon, elev, state, name, c1, c2, c3, utc_offset
         ) = struct.unpack("6sx8sx9sx6sx2sx30sx6sx6sx6sx2s", line.strip())
        (lat, lon, elev) = map(float, (lat, lon, elev))
        utc_offset = int(utc_offset)
Sign up to request clarification or add additional context in comments.

8 Comments

Sorry, the data is in ASCII format. I will clarify the question.
By the way, I didn't give you the down-vote. Struct looks like it would work fine if the data had been packed in that way; alas.
then you just have to read the text file in binary mode and unpack each line with stuct or encode each line and unpack the resulting bytes
I tried the code a couple of times, I think it's assuming the data is non-ASCII in nature.
@Richard, just changed my example to assume all data is strings, not packed values.
|
1

I think I understand from your question/comments what you are looking for. If we assume that Real, Character, and Integer are the only data types, then the following code should work. (I will also assume that the format file you showed is tab delimited):

format = {}
types = {"Real":float, "Character":str, "Integer":int}

for line in open("format.txt", "r"):
    values = line.split("\t")
    range = values[1].split("-")
    format[values[0]]={"start":int(range[0])-1, "end":int(range[1])-1, "type":types[values[2]]}

results=[]
for line in open("filename.txt"):
    result={}
    for key in format:
        result[key]=format["type"](line[format["start"]:format["end"]])
    results.append(result)

You should end up with results containing a list of dictionaries where each dictionary is a mapping from key names in the format file to data values in the correct data type.

Comments

0

It seems like you could write a function using strings and slices fairly simply. string[0:5] would be the first element. Does it need to be extensible, or is it likely a one off?

2 Comments

There will be a number of data files from different sources with formats not unlike this. Extensibility would be desirable. It's easy enough to write as a one-off, but it also seemed like a module might exist for this purpose.
That I don't know, but I'm interested to see the answer, as well. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.