extract data from a complicated data structure in python

Question

I hava a structure of data like

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

it is a list contains many dictionaries, each have 3 pairs 'uid': 'test_subject145', 'class':'?', 'data':[]. in the last pair 'data', the value is a list, and it contain again a dictionary which have 2 pairs 'chunk':1, 'writing':[], in the pair 'writing', its value is a list containing again many lists. What I want to extract is the content of all those sentence like 'this is exciting' and 'you are good' etc and put then into a simple list. Its final form should be list_final = ['this is exciting', 'you are good', 'he died',... ]

Possible duplicate of python getting a list of value from list of dict — groenhen
– groenhen, Commented Mar 24, 2017 at 13:19

willeM_ Van Onsem · Accepted Answer · 2017-03-24 13:24:15Z

3

Given your original list is named input, simply use list comprehension:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]

You can use .get(..,()) such that if there is no such key, it still works: if there is no such key, we return the empty tuple () so there are no iterations.

Based on your sample input, we get:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

edited Mar 24, 2017 at 13:24

answered Mar 24, 2017 at 13:11

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Harvey Over a year ago

++ for the missing key idea using .get(..., ())

Harvey · Accepted Answer · 2017-03-24 13:30:19Z

tl;dr

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Just go slow and do one layer at a time. Then refactor your code to make it smaller.

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

Then to convert to something smaller (but hard to read), rewrite the above code like this. It should be easy to see how to go from one to the other:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

Then, make it pretty with better names like Willem's answer:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Community · Accepted Answer · 2017-05-23 11:54:10Z

1

So I believe the below will work

lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
          {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

list_of_final_products = []

for itema in lista:
  try:
    for data_item in itema['data']:
      for writa in data_item['writing']:
        for writa_itema in writa:
          list_of_final_products.append(writa)
  except:
    pass

This item, as referenced above, is I believe helpful in understanding - python getting a list of value from list of dict (thank you to McGrady)

edited May 23, 2017 at 11:54

CommunityBot

11 silver badge

answered Mar 24, 2017 at 13:12

A. N. Other

4275 silver badges14 bronze badges

6 Comments

willeM_ Van Onsem Over a year ago

Note that the elements in writing are also lists... So are the elements in 'data':...

A. N. Other Over a year ago

Added. Thank you - I hadn't seen that

willeM_ Van Onsem Over a year ago

I think it is still not valid, since itema['data'] itself is a list. So you need to iterate over that, not get a key.

willeM_ Van Onsem Over a year ago

I think now it is valid, although it is adviseable not to use an except blanket. +1.

willeM_ Van Onsem Over a year ago

well in this case it will not matter. But say you do something like .append(some_function(x)) and some_function can raise some weird error, you do not always want to catch that (at that place). So a lot of software engineers advice to never catch all exceptions, only a list of explicitly stated ones.

|

Collectives™ on Stack Overflow

extract data from a complicated data structure in python

3 Answers 3

1 Comment

Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related