0

I'm using python to hit a foreman API to gather some facts about all the hosts that foreman knows about. Unfortunately, there is not get-all-hosts-facts (or something similar) in the v1 foreman API, so I'm having to loop through all the hosts and get the information. Doing so has lead me to an annoying problem. Each call to a given host return a JSON object like so:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

This is totally fine, the issue arises when I append the next host's information. I then get a json file that looks something like this:

{
  "host1.com": {
    "apt_update_last_success": "1452187711", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
}
}{
"host2.com": {
    "apt_update_last_success": "1452703454", 
    "architecture": "amd64", 
    "augeasversion": "1.2.0", 
    "bios_release_date": "06/03/2015", 
    "bios_vendor": "Dell Inc."
   }
}

Here's the code that's doing this:

for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    with open(results_file, 'a') as f:
        f.write(json.dumps(facts_data, sort_keys=True, indent=4))

Here's what I need the file to look like:

{
"host1.com": {
    "apt_update_last_success": "1452187711",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
},
"host2.com": {
    "apt_update_last_success": "1452703454",
    "architecture": "amd64",
    "augeasversion": "1.2.0",
    "bios_release_date": "06/03/2015",
    "bios_vendor": "Dell Inc."
  }
}

3 Answers 3

4

It would be better to assemble all of your data into one dict and then write it all out one time, instead of each time in the loop.

d = {}
for i in hosts_data:
    log.info("Gathering host facts for host: {}".format(i['host']['name']))
    try:
        facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
        if hosts.status_code != 200:
            log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
            .format(hosts.status_code, hosts.text))
            sys.exit(1)
    except requests.exceptions.RequestException as e:
        log.error(e)
    facts_data = json.loads(facts.text)
    log.debug(facts_data)
    d.update(facts_data)  #add to dict
# write everything at the end
with open(results_file, 'a') as f:
    f.write(json.dumps(d, sort_keys=True, indent=4))
Sign up to request clarification or add additional context in comments.

1 Comment

Np. The only thing to be aware of is that dict.update() replaces the content if the key already exists (in your case for example, if hosts_data contained duplicates).
1

Instead of writing json inside the loop, insert the data into a dict with the correct structure. Then write that dict to json when the loop is finished.

This assumes your dataset fit into memory.

Comments

0

For safety/consistency, you need to load in the old data, mutate it, then write it back out.

Change the current with and write to:

# If file guaranteed to exist, can use r+ and avoid initial seek
with open(results_file, 'a+') as f:
    f.seek(0)
    combined_facts = json.load(f)
    combined_facts.update(facts_data)
    f.seek(0)
    json.dump(combined_facts, f, sort_keys=True, indent=4)
    f.truncate()  # In case new JSON encoding smaller, e.g. due to replaced key

Note: If possible, you want to use pault's answer to minimize unnecessary I/O, this is just how you'd do it if the data retrieval should be done piecemeal, with immediate updates for each item as it becomes available.

FYI, the unsafe way is to basically find the trailing curly brace, delete it, then write out a comma followed by the new JSON (removing the leading curly brace from it's JSON representation). It's much less I/O intensive, but it's also less safe, doesn't clean out duplicates, doesn't sort the hosts, doesn't validate the input file at all, etc. So don't do it.

1 Comment

Thanks @ShadowRanger! Good information about the safety and IO implications. Much appreciated!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.