Reading numbers from a text file in python

Question

I have a large text file, as you can see in the following, including strings and numbers. I want to read just numbers and also delete rows which have just 3 columns and write them into a matrix(m by n). could someone tell me what is the best way in python to manipulate such files?

My file is something like:

# Chunk-averaged data for fix Dens and group ave
# Timestep Number-of-chunks Total-count
# Chunk Coord1 Ncount density/number
4010000 14 1500
  1 4.323 138.758 0.00167105
  2 12.969 121.755 0.00146629
  3 21.615 127.7 0.00153788
  4 30.261 131.682 0.00158584
  5 38.907 127.525 0.00153578
  6 47.553 136.322 0.00164172
  7 56.199 118.014 0.00142124
  8 64.845 125.842 0.00151551
  9 73.491 120.684 0.00145339
  10 82.137 132.282 0.00159306
  11 90.783 121.567 0.00146402
  12 99.429 97.869 0.00117863
  13 108.075 0 0
  14 116.721 0 0......

Is it just that header line that only has three numbers, or do lines like that reoccur? If the former, just open the file, skip the first four lines, then have numpy read the rest. If the latter, just have numpy read the whole thing with nan fill and then select the lines where none of the columns are nan. — abarnert
– abarnert, Commented Jun 28, 2018 at 14:53
Read line by line, if there is a character skip if not convert it to list if there are only 4 elements (3 columns and one index column) then skip otherwise add to dataframe — Hamid Mir
– Hamid Mir, Commented Jun 28, 2018 at 14:58
@ᴀʀᴍᴀɴ It hink regex would be vastly overkill! There are great methods from numpy :) — Andreas Storvik Strauman
– Andreas Storvik Strauman, Commented Jun 28, 2018 at 22:25

R Balasubramanian · Accepted Answer · 2018-06-28 15:05:29Z

2

You haven't specified what exactly you meant by matrix, so here is a solution that will turn your text file into a 2d list, making each number individually accessible.

It checks that the first item in a given row is a number, and that there are 4 items in the row, in which case it will append that line as 4 separate numbers to the 2d list mat. If you want to access any number in mat, you can use mat[i][j].

with open("test.txt") as f:
    content = f.readlines()

content = [x.strip() for x in content]
mat = []

for line in content:
    s = line.split(' ')
    if s[0].isdigit() and len(s) == 4:
        mat.append(s)

answered Jun 28, 2018 at 15:05

R Balasubramanian

8098 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2018-06-28 18:37:05Z

With a copy-n-paste of your sample to txt:

In [350]: np.genfromtxt(txt.splitlines(), invalid_raise=False)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
    Line #2 (got 4 columns instead of 3)
    Line #3 (got 4 columns instead of 3)
  ....
  #!/usr/bin/python3
Out[350]: array([4.01e+06, 1.40e+01, 1.50e+03])

That read the first non-comment line, and took that as the standard. Skipping that, I can read all the lines:

In [351]: np.genfromtxt(txt.splitlines(), invalid_raise=False,skip_header=4)
Out[351]: 
array([[1.00000e+00, 4.32300e+00, 1.38758e+02, 1.67105e-03],
       [2.00000e+00, 1.29690e+01, 1.21755e+02, 1.46629e-03],
       [3.00000e+00, 2.16150e+01, 1.27700e+02, 1.53788e-03],
       [4.00000e+00, 3.02610e+01, 1.31682e+02, 1.58584e-03],
       [5.00000e+00, 3.89070e+01, 1.27525e+02, 1.53578e-03],
       [6.00000e+00, 4.75530e+01, 1.36322e+02, 1.64172e-03],
       [7.00000e+00, 5.61990e+01, 1.18014e+02, 1.42124e-03],
       [8.00000e+00, 6.48450e+01, 1.25842e+02, 1.51551e-03],
       [9.00000e+00, 7.34910e+01, 1.20684e+02, 1.45339e-03],
       [1.00000e+01, 8.21370e+01, 1.32282e+02, 1.59306e-03],
       [1.10000e+01, 9.07830e+01, 1.21567e+02, 1.46402e-03],
       [1.20000e+01, 9.94290e+01, 9.78690e+01, 1.17863e-03],
       [1.30000e+01, 1.08075e+02, 0.00000e+00, 0.00000e+00],
       [1.40000e+01, 1.16721e+02, 0.00000e+00, 0.00000e+00]])

Actually in this case all the rest have the required 4. If I truncate the last 2 lines, I get the warning, but it still reads the other lines.

Filtering the lines before passing them to genfromtxt is another option. genfromtxt accepts any input that feeds it lines - a file, a list of strings, or a function that reads and filters a file.

user6108553 · Accepted Answer · 2018-06-28 15:11:48Z

for your task you would need iterator, string.split() and re.match:

import re #needed to use regexp to see if line in file contains only numbers

matrix = [] #here we'll put your numbers
i = 0 #counter for matrix rows

for line in open('myfile.txt'): #that will iterate lines in file one by one
    if not re.match('[ 0-9\.]', line): #checking for symbols other than numbers in line
        continue #and skipping an iteration if there are any

    list_of_items = line.split(' ') #presumed numbers in string are divided with spaces - splittin line into list of separate strings
    if len(list_of_items) <= 3: #we will not take ro of 3 or less into matrix
        continue

    matrix.append([]) #adding row to matrix

    for an_item in list_of_items:
        matrix[i].append(float(an_item)) #converting strings and adding floats to a row
    i += 1

I tried to make code and comments speak, let me know if anything is unclear

Collectives™ on Stack Overflow

Reading numbers from a text file in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related