2

I have a text file containing 10 columns of numbers. What I would like to be able to do is to create a dictionary in which the first three numbers (of the 10 per line) of each row of the data can be used as a key to access two further numbers in columns 6 and 7 (in the same line). I have been trying to do this using the numpy.loadtext (in Python 2.7) function however I am running into difficulties with the dtype argument? Is this the correct approach or is there a simpler way, and if so, what is the correct way to lay out the function.

Many thanks and please let me know if any clarification is required

0

2 Answers 2

1

Given column-spaced the format of your data,

   1   0   0      617.09        0.00        9.38 l   0.0000E+00
   2   0   0     7169.00     6978.44       94.10 o   0.1913E-05
   3   0   0      366.08      371.91       14.06 o   0.6503E-03
   4   0   0     5948.04     5586.09       52.95 o   0.2804E-05
   5   0   0     3756.34     3944.63       50.69 o   0.6960E-05
 -11   1   0      147.27       93.02       23.25 o   0.1320E-02
 -10   1   0       -2.31        5.71        9.57 o   0.2533E-02

I think it would be easiest to just use Python string manipulation tools like split to parse the file:

def to_float(item):
    try:
        return float(item)
    except ValueError:
        return item

def formatter(lines):
    for line in lines:
        if not line.strip(): continue
        yield [to_float(item) for item in line.split()]

dct = {}
with open('data') as f:
    for row in formatter(f):
        dct[tuple(row[:3])] = row[5:7]

print(dct)

yields

{(-11.0, 1.0, 0.0): [23.25, 'o'], (4.0, 0.0, 0.0): [52.95, 'o'], (1.0, 0.0, 0.0): [9.38, 'l'], (-10.0, 1.0, 0.0): [9.57, 'o'], (3.0, 0.0, 0.0): [14.06, 'o'], (5.0, 0.0, 0.0): [50.69, 'o'], (2.0, 0.0, 0.0): [94.1, 'o']}

Original answer:

genfromtxt has a parameter dtype, which when set to None causes genfromtxt to try to guess the appropriate dtype:

import numpy as np
arr = np.genfromtxt('data', dtype = None)
dct = {tuple(row[:3]):row[5:7] for row in arr}

For example, with data like this:

1 2 3 4 5 6 7 8 9 10
1 2 4 4 5 6 7 8 9 10
1 2 5 4 5 6 7 8 9 10

dct gets set to

{(1, 2, 5): array([6, 7]), (1, 2, 4): array([6, 7]), (1, 2, 3): array([6, 7])}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this. It works when I use test data as above. The problem its throwing up is that the index is invalid. I think it's because my data are not evenly spaced, are a mixture of integers and floats and also some contain a '-' negative sign before them. I tried using delimiter=None but this does not seem to help. Sorry if I'm just being naive
Maybe post a sample of your data and we'll be able to suggest a different approach.
1

For clarity, a complete example of the above (correct) answer might look like:

    import numpy as np  
    f = open("data.txt", 'wa')  
    f.write("1 2 3 4 5 6 7 8 9 10\n")  
    f.write("1 2 4 4 5 6 7 8 9 10\n")  
    f.write("1 2 5 4 5 6 7 8 9 10\n")  
    f.close()  
    arr = np.genfromtxt("data.txt", dtype=None)  
    dct = {tuple(row[:3]):row[4:6] for row in arr}

Which would result in:

    {(1, 2, 3): array([5, 6]), (1, 2, 4): array([5, 6]), (1, 2, 5): array([5, 6])}

It may be apparent, but NB: you will overwrite dictionary entries when you have identical elements in the first three columns of more than one row.

1 Comment

Thanks for this. It works when I use test data as above. The problem its throwing up is that the index is invalid. I think it's because my data are not evenly spaced, are a mixture of integers and floats and also some contain a '-' negative sign before them. I tried using delimiter=None but this does not seem to help. Sorry if I'm just being naive

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.