Let's say I have a data collection, like a list, containing a set of objects with particular properties. Let's go with animals:
[ cow, sheep, orangutan ]
An animal has a property animal_data that contains taxonomic information, like kingdom, class, family and species. This establishes a hierarchy which says that each property is contained in the previous one, sort of like a linear many-to-one tree.
Now we want to rearrange the former collection into a data structure that groups each animal into their own species, family, class and kingdom. We'd end up with something like this:
{
"kingdoms": [
{
"name": "Animalia",
"classes": [
{
"name": "Mammalia",
"families": [
{
"name": "Bovidae",
"species": [
{
"name": "Bos taurus"
},
{
"name": "Bovis aries"
}
]
},
{
"name": "Hominidae",
"species": [
{
"name": "Pongo pygmaeus"
}
]
}
]
}
]
}
]
}
This would be our final data structure. And this is strictly how it should look like, I can't rearrange it to make it look better. I know that could be done, but that's just how it has to look like.
Now, being relatively new to Python, or at least to its functional potential, I tried using list comprehensions, map, groupby and lambdas to achieve that result. However, I couldn't get past the first level of nesting, because each group is dependant on the one on the higher level.
So this is my solution instead:
# group by kingdom
animals_dict = {kingdom: list(animals_by_kingdom) for kingdom, animals_by_kingdom in
groupby(animals, lambda a: a.animal_data.kingdom)}
grouped_animals = defaultdict(list)
for kingdom, animals_by_kingdom in animals_dict.items():
# group by class
classes_dict = {animal_class: list(animals_by_class) for animal_class, animals_by_class in
groupby(animals_by_kingdom, lambda a: a.animal_data.animal_class)}
classes = []
for animal_class, animals_by_class in classes_dict.items():
# group by family
families_dict = {family: list(animals_by_family) for family, animals_by_family in
groupby(animals_by_class, lambda a: a.animal_data.family)}
families = []
for family, animals_by_family in families_dict.items():
families.append(
{"name": family, "species": [{"name": animal.animal_data.species} for animal in animals_by_family]})
classes.append({"name": animal_class, "families": families})
grouped_animals["kingdoms"].append({"name": kingdom, "classes": classes})
This is the best I could do, but something tells me Python has a potential that'd allow me to do this more elegantly, compressed and clear.
I'd really appreaciate if any of you could give me tips in how to enhance my code and how to use Python tools to do it more properly and clearly (if it can indeed be done better).
Disclaimers:
- I cannot modify the initial data structure. If the
animal_dataproperty seems weird (instead of having thekingdom,class, etc. attached to the animal directly), that's just how it is. - If you're wondering why would I rearrange the list in such a way, it is so it can be easily consumed by an endpoint that would work better with that format.
In case you need the Animal and AnimalCode to fiddle with this demo, here it is:
class AnimalData:
def __init__(self, kingdom, animal_class, family, species):
super().__init__()
self.kingdom = kingdom
self.animal_class = animal_class
self.family = family
self.species = species
def __str__(self, *args, **kwargs):
return "Kingdom=%s, Class=%s, Family=%s, Species=%s" % (
self.kingdom, self.animal_class, self.family, self.species)
class Animal:
def __init__(self, kingdom, animal_class, family, species):
super().__init__()
self.animal_data = AnimalData(kingdom, animal_class, family, species)
def __str__(self, *args, **kwargs):
return str(self.animal_data)
cow = Animal("Animalia", "Mammalia", "Bovidae", "Bos taurus")
sheep = Animal("Animalia", "Mammalia", "Bovidae", "Bovis aries")
orangutan = Animal("Animalia", "Mammalia", "Hominidae", "Pongo pygmaeus")
animals = [cow, sheep, orangutan]
Note: The code works in both Python 2 and Python 3.