Using numpy loadtext

Question

I have a text file containing 10 columns of numbers. What I would like to be able to do is to create a dictionary in which the first three numbers (of the 10 per line) of each row of the data can be used as a key to access two further numbers in columns 6 and 7 (in the same line). I have been trying to do this using the numpy.loadtext (in Python 2.7) function however I am running into difficulties with the dtype argument? Is this the correct approach or is there a simpler way, and if so, what is the correct way to lay out the function.

Many thanks and please let me know if any clarification is required

unutbu · Accepted Answer · 2012-03-02 11:43:46Z

1

Given column-spaced the format of your data,

   1   0   0      617.09        0.00        9.38 l   0.0000E+00
   2   0   0     7169.00     6978.44       94.10 o   0.1913E-05
   3   0   0      366.08      371.91       14.06 o   0.6503E-03
   4   0   0     5948.04     5586.09       52.95 o   0.2804E-05
   5   0   0     3756.34     3944.63       50.69 o   0.6960E-05
 -11   1   0      147.27       93.02       23.25 o   0.1320E-02
 -10   1   0       -2.31        5.71        9.57 o   0.2533E-02

I think it would be easiest to just use Python string manipulation tools like split to parse the file:

def to_float(item):
    try:
        return float(item)
    except ValueError:
        return item

def formatter(lines):
    for line in lines:
        if not line.strip(): continue
        yield [to_float(item) for item in line.split()]

dct = {}
with open('data') as f:
    for row in formatter(f):
        dct[tuple(row[:3])] = row[5:7]

print(dct)

yields

{(-11.0, 1.0, 0.0): [23.25, 'o'], (4.0, 0.0, 0.0): [52.95, 'o'], (1.0, 0.0, 0.0): [9.38, 'l'], (-10.0, 1.0, 0.0): [9.57, 'o'], (3.0, 0.0, 0.0): [14.06, 'o'], (5.0, 0.0, 0.0): [50.69, 'o'], (2.0, 0.0, 0.0): [94.1, 'o']}

Original answer:

genfromtxt has a parameter dtype, which when set to None causes genfromtxt to try to guess the appropriate dtype:

import numpy as np
arr = np.genfromtxt('data', dtype = None)
dct = {tuple(row[:3]):row[5:7] for row in arr}

For example, with data like this:

1 2 3 4 5 6 7 8 9 10
1 2 4 4 5 6 7 8 9 10
1 2 5 4 5 6 7 8 9 10

dct gets set to

{(1, 2, 5): array([6, 7]), (1, 2, 4): array([6, 7]), (1, 2, 3): array([6, 7])}

edited Mar 2, 2012 at 11:43

answered Mar 1, 2012 at 13:17

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1171835 Over a year ago

Thanks for this. It works when I use test data as above. The problem its throwing up is that the index is invalid. I think it's because my data are not evenly spaced, are a mixture of integers and floats and also some contain a '-' negative sign before them. I tried using delimiter=None but this does not seem to help. Sorry if I'm just being naive

unutbu Over a year ago

Maybe post a sample of your data and we'll be able to suggest a different approach.

Travis Vaught · Accepted Answer · 2012-03-01 16:39:34Z

1

For clarity, a complete example of the above (correct) answer might look like:

    import numpy as np  
    f = open("data.txt", 'wa')  
    f.write("1 2 3 4 5 6 7 8 9 10\n")  
    f.write("1 2 4 4 5 6 7 8 9 10\n")  
    f.write("1 2 5 4 5 6 7 8 9 10\n")  
    f.close()  
    arr = np.genfromtxt("data.txt", dtype=None)  
    dct = {tuple(row[:3]):row[4:6] for row in arr}

Which would result in:

    {(1, 2, 3): array([5, 6]), (1, 2, 4): array([5, 6]), (1, 2, 5): array([5, 6])}

It may be apparent, but NB: you will overwrite dictionary entries when you have identical elements in the first three columns of more than one row.

answered Mar 1, 2012 at 16:39

Travis Vaught

3582 silver badges6 bronze badges

1 Comment

user1171835 Over a year ago

Thanks for this. It works when I use test data as above. The problem its throwing up is that the index is invalid. I think it's because my data are not evenly spaced, are a mixture of integers and floats and also some contain a '-' negative sign before them. I tried using delimiter=None but this does not seem to help. Sorry if I'm just being naive

Collectives™ on Stack Overflow

Using numpy loadtext

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related