Given this string input how could I make this given output?

Question

s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

I am wanting to convert the above string into the formatted dictionary below. The spacing and extra text is like that on purpose. The spacing & noted text should be removed / stripped.

print(my_dict)
{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}, '2': {'VALUE_1': '0.2', 'VALUE_2': '400'}, '3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}

What I've tried so far:

s = """
ID# VALUE_1 VALUE_2
  1      0.1          300
  2   0.2             400 (11 - this text is part of C in row 2 but needs to be ignored / removed)
  3          0.9          600"""

#Get the columns and assign them to a variable.
columns = s.lstrip().splitlines()[0] #Print the first line of the string

dct = {}

rows = s.lstrip().splitlines()

for data in rows[1:]:
    row = data.split()
    dct[row[0]] = dict(zip(columns[1:], row[1:]))

print(dct)

This ends up outputting an ugly unformatted looking dictionary :

{'1': {'D': '0.1', '#': '300'}, '2': {'D': '0.2', '#': '400', ' ': 'in', 'V': 'row', 'A': '2', 'L': 'but', 'U': 'needs', 'E': 'to', '_': 'be', '1': 'C', '2': 'ignored'}, '3': {'D': '0.9', '#': '600'}}

I've been unable to incorporate a way to successfully strip out the spaces and extra chunk of data on row2 with my current loop process.

Juan Diego Godoy Robles · Accepted Answer · 2017-03-30 08:09:37Z

2

A regex solution, seems more neat to me:

>>> from pprint import pprint
>>> pprint([{i[0]:{'VALUE_1': i[1], 'VALUE_2': i[2]}}
...     for i in re.findall(r'^\s*(\d+)\s+(\S+)\s+(\d+)', s, re.M)])
[{'1': {'VALUE_1': '0.1', 'VALUE_2': '300'}},
 {'2': {'VALUE_1': '0.2', 'VALUE_2': '400'}},
 {'3': {'VALUE_1': '0.9', 'VALUE_2': '600'}}]

Check how does the regex works here

edited Mar 30, 2017 at 8:09

answered Mar 30, 2017 at 7:14

Juan Diego Godoy Robles

15k2 gold badges43 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ian Smith Over a year ago

That's awesome! I was learning some regex the other night to see if it was a viable option. Did you rattle this off by memory or do you have a tool that helps you?

Juan Diego Godoy Robles Over a year ago

regex101.com is a great tool! check my update. Cheers @IanSmith

alDiablo · Accepted Answer · 2017-03-30 07:13:31Z

1

There's a small bug in your code.

columns = s.lstrip().splitlines()[0]

does not give a list. Use:

columns = s.lstrip().splitlines()[0].split()

After making this modification, your code should run fine.

Moreover, improvising on it, you shouldn't use columns at all. Just replace it with rows[0].

edited Mar 30, 2017 at 7:13

answered Mar 30, 2017 at 7:04

alDiablo

9797 silver badges23 bronze badges

6 Comments

Martijn Pieters Over a year ago

And they could just use rows[0]; move the columns assignment a few lines lower and avoid stripping and splitting twice.

Ian Smith Over a year ago

@MartijnPieters Could you please clarify?

Martijn Pieters Over a year ago

@IanSmith: Look at the work done for rows and for columns. You can use columns = rows[0].split() instead, avoiding stripping and splitting the larger s string.

alDiablo Over a year ago

Yes, @MartijnPieters is right. You see, columns is just rows[0] so why keep it twice?

Martijn Pieters Over a year ago

@IanSmith: exactly like that.

|

Collectives™ on Stack Overflow

Given this string input how could I make this given output?

2 Answers 2

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related