0
s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

I am wanting to convert the above string into the formatted dictionary below. The spacing and extra text is like that on purpose. The spacing & noted text should be removed / stripped.

print(my_dict)
{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}, '2': {'VALUE_1': '0.2', 'VALUE_2': '400'}, '3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}

What I've tried so far:

s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

#Get the columns and assign them to a variable.
columns = s.lstrip().splitlines()[0] #Print the first line of the string

dct = {}

rows = s.lstrip().splitlines()

for data in rows[1:]:
    row = data.split()
    dct[row[0]] = dict(zip(columns[1:], row[1:]))

print(dct)

This ends up outputting an ugly unformatted looking dictionary :

{'1': {'D': '0.1', '#': '300'}, '2': {'D': '0.2', '#': '400', ' ': 'in', 'V': 'row', 'A': '2', 'L': 'but', 'U': 'needs', 'E': 'to', '_': 'be', '1': 'C', '2': 'ignored'}, '3': {'D': '0.9', '#': '600'}}

I've been unable to incorporate a way to successfully strip out the spaces and extra chunk of data on row2 with my current loop process.

2 Answers 2

2

A regex solution, seems more neat to me:

>>> from pprint import pprint
>>> pprint([{i[0]:{'VALUE_1': i[1], 'VALUE_2': i[2]}}
...     for i in re.findall(r'^\s*(\d+)\s+(\S+)\s+(\d+)', s, re.M)])
[{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}},
 {'2': {'VALUE_1': '0.2', 'VALUE_2': '400'}},
 {'3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}]

Check how does the regex works here

Sign up to request clarification or add additional context in comments.

2 Comments

That's awesome! I was learning some regex the other night to see if it was a viable option. Did you rattle this off by memory or do you have a tool that helps you?
regex101.com is a great tool! check my update. Cheers @IanSmith
1

There's a small bug in your code.

columns = s.lstrip().splitlines()[0]

does not give a list. Use:

columns = s.lstrip().splitlines()[0].split()

After making this modification, your code should run fine.

Moreover, improvising on it, you shouldn't use columns at all. Just replace it with rows[0].

6 Comments

And they could just use rows[0]; move the columns assignment a few lines lower and avoid stripping and splitting twice.
@MartijnPieters Could you please clarify?
@IanSmith: Look at the work done for rows and for columns. You can use columns = rows[0].split() instead, avoiding stripping and splitting the larger s string.
Yes, @MartijnPieters is right. You see, columns is just rows[0] so why keep it twice?
@IanSmith: exactly like that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.