0

I have a JSON file of the following form:

 {'query': {'tool': 'domainquery', 'query': 'example.org'},
 'response': {'result_count': '1',
  'total_pages': '1',
  'current_page': '1',
  'matches': [{'domain': 'example2.org',
    'created_date': '2015-07-25',
    'registrar': 'registrar_10'}]}}

I have a list of the following form:

removal_list=["example2.org","example3.org"...]

I am trying to loop through the removal_list and remove all instances of each item from the JSON file. The issue is how long it takes to compute, with removal_list containing 110,000 items. I have tried to make this faster by using set() and isdisjoint, but this does not make it any faster it seems.

The code I currently have to do this is:

    removal_list= set(removal_list)
    for domain in removal_list:
        for i in range(len(JSON_file)):
            if int(JSON_file[i]['response']['result_count'])>0:  
                for j in range(len(JSON_file[i]['response']['matches'])):
                    for item in JSON_file[i]['response']['matches'][j]['domain']:
                        if not remove_set.isdisjoint(JSON_file[i]['response']['matches'][j]['domain']):
                            del(JSON_file[i]['response']['matches'][j]['domain'])
                        else: 
                            pass

Does anyone have any suggestions on how to speed this process up? Thanks in advance.

4
  • i suggest using binary chop? it might help. stackoverflow.com/questions/9501337/… Commented May 23, 2022 at 18:39
  • Try hoisting common sub-expressions; for example, save JSON_file[i]['reponse'] in a variable, and use it wherever you use that expression. Commented May 23, 2022 at 18:39
  • 1
    @2wen: Doesn't that require sorted lists? Commented May 23, 2022 at 18:41
  • Are you saying that any value in the removal_list when observed as a value in a dictionary anywhere in the main dictionary, has to be removed? I think you could make this clearer by showing an input data structure and the expected output structure. As it stands, your code looks remarkably convoluted and may not need to be that complex Commented May 23, 2022 at 18:54

1 Answer 1

0

The looping in the question is 'inverted'. That is to say that JSON_File (which is clearly a list of dictionaries) should be enumerated and examined to see if there are any dictionaries within the 'matches' list that have a domain in the removal_list.

Let's have just two dictionaries in the JSON_File list and then show the code to process them.

removal_list = {"example2.org", "example3.org"}

d1 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example2.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}
d2 = {'query': {'tool': 'domainquery', 'query': 'example.org'},
     'response': {'result_count': '1',
                  'total_pages': '1',
                  'current_page': '1',
                  'matches': [{'domain': 'example3.org',
                               'created_date': '2015-07-25',
                               'registrar': 'registrar_10'}]}}

JSON_File = [d1, d2]

for j in JSON_File:
    if matches := j['response'].get('matches'):
        for match in matches:
            if match.get('domain') in removal_list:
                del match['domain']

print(JSON_File)

Assumption:

if result_count is non-zero then there will be a non-empty 'matches' list which means that there's no need to explicitly examine the 'result_count value'

Note:

Requires Python 3.8+

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.