3

I hava a structure of data like

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

it is a list contains many dictionaries, each have 3 pairs 'uid': 'test_subject145', 'class':'?', 'data':[]. in the last pair 'data', the value is a list, and it contain again a dictionary which have 2 pairs 'chunk':1, 'writing':[], in the pair 'writing', its value is a list containing again many lists. What I want to extract is the content of all those sentence like 'this is exciting' and 'you are good' etc and put then into a simple list. Its final form should be list_final = ['this is exciting', 'you are good', 'he died',... ]

2

3 Answers 3

3

Given your original list is named input, simply use list comprehension:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]

You can use .get(..,()) such that if there is no such key, it still works: if there is no such key, we return the empty tuple () so there are no iterations.

Based on your sample input, we get:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']
Sign up to request clarification or add additional context in comments.

1 Comment

++ for the missing key idea using .get(..., ())
2

tl;dr

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Just go slow and do one layer at a time. Then refactor your code to make it smaller.

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

Then to convert to something smaller (but hard to read), rewrite the above code like this. It should be easy to see how to go from one to the other:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

Then, make it pretty with better names like Willem's answer:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Comments

1

So I believe the below will work

lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
          {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

list_of_final_products = []

for itema in lista:
  try:
    for data_item in itema['data']:
      for writa in data_item['writing']:
        for writa_itema in writa:
          list_of_final_products.append(writa)
  except:
    pass

This item, as referenced above, is I believe helpful in understanding - python getting a list of value from list of dict (thank you to McGrady)

6 Comments

Note that the elements in writing are also lists... So are the elements in 'data':...
Added. Thank you - I hadn't seen that
I think it is still not valid, since itema['data'] itself is a list. So you need to iterate over that, not get a key.
I think now it is valid, although it is adviseable not to use an except blanket. +1.
well in this case it will not matter. But say you do something like .append(some_function(x)) and some_function can raise some weird error, you do not always want to catch that (at that place). So a lot of software engineers advice to never catch all exceptions, only a list of explicitly stated ones.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.