5

I am using python with numpy to read in data from a numerical model in a text file with a fairly complicated format.

Numpy's genfromtxt and fromfile functions work well, but only if the data is structured. My data files looks something like this:

------snip

[sitename] [dimemsion 1 size] [dimension 2 size]
[data for dim 1]
[data for dim 2]
[date/time]
[header data]
[data (dim1 * dim2)]
[header]
[data]
...
.  
.   
[data/time]
[header]
[data]
.
.
etc...

---- snip

So, I have a mixture of text and numbers and a complicated (but repeating) layout. How is the best way to read this in using numpy?

Cheers,

Chris

1
  • 1
    Do you need to use numpy methods only? Maybe the reading part could be done in plain python. Commented Apr 12, 2012 at 21:37

2 Answers 2

6

Numpy isn't good at generalized parsing, so you'd do well to look beyond it, and what you choose will depend mostly on how consistent the files are.

If they're unusually ultra consistent, so that say, you can just extract numbers from known positions and known rows, than you can just read in the file line by line as a sting and index this to the character that you want. (Step through the file, e.g., using file.readlines to get each line as a string.)

The usual case (at least that I find) is that it's more varied than above, but that simple string operations can be used to parse the line, such as string.split (which is almost always my first step), etc.

Beyond this, there are lots of parsing libraries in Python. I'm partial to pyparsing (but I don't know the others well, so it's not a fair comparison). Here's a summary of the various parsing libraries.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 for pyparsing. This is absolutely the right tool for the job here.
Thanks - I will give pyparsing a try. As it is python based (rather than a c module like numpy.fromfile), I guess it will be noticeably slower than using numpy? Chris
Yes, pyparsing is significantly slower than numpy.fromfile, at least from my experience. I think it's also doing much more too though. Also, although it's a good library, it takes some learning. For this reason, I'd recommend first giving it a try with basic string operations, as these usually do the trick, and if these don't work, go to pyparsing (unless, of course, you'd like to learn pyparsing anyway).
1

I agree with the previous answer. The following chain of steps work best and are a lot easier than pyparsing or numpy.genfromtxt

inp = open(textfilename).readlines()
my_list = []
for line in inp:
    item = str.split(line)
    my_list.append(float(item[0]))

You can then easily convert the list into a numpy array/matrix and proceed from there

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.