1

Hi, I have such array in my .txt file:

n|vechicle.car.characteristics[0].speed|180
n|vechicle.car.characteristics[0].weight|3
c|vechicle.car.characteristics[0].color|black
c|vechicle.car.characteristics[0].fuel|95
n|vechicle.car.characteristics[1].speed|160
n|vechicle.car.characteristics[1].weight|4
c|vechicle.car.characteristics[1].color|green
c|vechicle.car.characteristics[1].fuel|92
n|vechicle.car.characteristics[2].speed|200
n|vechicle.car.characteristics[2].weight|5
c|vechicle.car.characteristics[2].color|white
c|vechicle.car.characteristics[2].fuel|95

And I'd like to parse it into such dataFrame:

  speed weight  color fuel
0  180      3  black   95
1  160      4  green   92
2  200      5  white   95

That's, how i solved it:

import re
import pandas as pd

df_output_list  = {}
df_output_dict  = []
match_counter = 1

with open('sample_car.txt',encoding='utf-8') as file:
    line = file.readline()
    while line:

        result = re.split(r'\|',line.rstrip())
        result2 = re.findall(r'.(?<=\[)(\d+)(?=\])',result[1])

        regex = re.compile('vechicle.car.characteristics.')
        match = re.search(regex, result[1])
        if match:

            if match_counter == 1:
                ArrInd = 0
            match_counter+=1
            #print(df_output_list)
            if ArrInd == int(result2[0]):
                df_output_list[result[1].split('.')[3]] = result[2]
                ArrInd = int(result2[0])

            else:
                df_output_dict.append(df_output_list)
                df_output_list  = {}
                df_output_list[result[1].split('.')[3]] = result[2]
                ArrInd = int(result2[0])

        line = file.readline()
    df_output_dict.append(df_output_list)
#print(df_output_dict)
df_output = pd.DataFrame(df_output_dict)
print(df_output)

And i found it so complicated. Is it possible to simplify it?
Column names should be parsed automatically.

1 Answer 1

1

Read as csv file with sep='|' then get last column which contain values and then reshape in appropriate shape.

>>> columns=['speed','weight','color','fuel']
>>> s = pd.read_csv('filename.txt', sep='|', header=None).iloc[:,-1]
>>> df = pd.DataFrame(s.to_numpy().reshape(-1,4), columns=columns)
>>> df
  speed weight  color fuel
0   180      3  black   95
1   160      4  green   92
2   200      5  white   95

If you have fix row formate like n|vechicle.car.characteristics[0].speed|180 then we can do this

>>> df = pd.read_csv('d.csv', sep='|', header=None)
>>> columns = df.iloc[:,1].str.split('.').str[-1].unique()
>>> df_out = pd.DataFrame(df.iloc[:,-1].to_numpy().reshape(-1,len(columns)), columns=columns)
>>> df_out
  speed weight  color fuel
0   180      3  black   95
1   160      4  green   92
2   200      5  white   95
Sign up to request clarification or add additional context in comments.

1 Comment

Dishin H Goyani, looks good. But i need columns to be parsed automatically. For example, if i add new attribute to array , code should work without changes. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.