17

Dataclass example:

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int 
    statuses: List[StatusElement]

JSON example:

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

I can unpack the JSON doing something like this:

object = List(**json)

But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.

2
  • 1
    I think I have a better solution. See my answer. Commented Sep 5, 2021 at 14:21
  • I voted your comment up because I think it's an equally good solution (also it is much more lightweight) Commented Sep 5, 2021 at 17:54

5 Answers 5

25

Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.

A few workarounds exist for this:

  • You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
  • You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.

Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)

Example below:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import JSONWizard


@dataclass
class List(JSONWizard):
    id: int
    statuses: PyList['StatusElement']
    # on Python 3.9+ you can use the following syntax:
    #   statuses: list['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}


object = List.from_dict(json)

print(repr(object))
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

Disclaimer: I am the creator (and maintainer) of this library.


You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.

Here's the modified version of the above without class inheritance:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import fromdict, asdict


@dataclass
class List:
    id: int
    statuses: PyList['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

# De-serialize the JSON dictionary into a `List` instance.
c = fromdict(List, json)

print(c)
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

# Convert the instance back to a dictionary object that is JSON-serializable.
d = asdict(c)

print(d)
# {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}

Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.

from dataclasses import dataclass
from timeit import timeit
from typing import List

from dacite import from_dict

from dataclass_wizard import JSONWizard, fromdict


data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


class ListWiz(List, JSONWizard):
    ...


n = 100_000

# 0.37
print('dataclass-wizard:            ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))

# 0.36
print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))

# 11.2
print('dacite:                      ', timeit('from_dict(List, data)', number=n, globals=globals()))


lst_wiz1 = ListWiz.from_dict(data)
lst_wiz2 = from_dict(List, data)
lst = from_dict(List, data)

# True
assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__
Sign up to request clarification or add additional context in comments.

6 Comments

This looks really slick. I've looked at pydantic and seems a bit heavy for what I'm trying to do. I'll have to give this library a shot. Thanks!
can confirm it's really faster, almost like default one: dacite - .040896 ms, dataclass_wizard - .002921 ms, double ** extraction in a loop - .001776 ms
Considering you added the no-inheritance support over 1.5 years ago, do we still need that part of the answer?
@CamiloTerevinto good point. I'll see if I can update the answer to instead include a link to the docs with no-inheritance support.
fromdict is sick, thanks a lot man!
|
12

I've spent a few hours investigating options for this. There's no native Python functionality to do this, but there are a few third-party packages (writing in November 2022):

  • marshmallow_dataclass has this functionality (you need not be using marshmallow in any other capacity in your project). It gives good error messages and the package is actively maintained. I used this for a while before hitting what I believe is a bug parsing a large and complex JSON into deeply nested dataclasses, and then had to switch away.
  • dataclass-wizard is easy to use and specifically addresses this use case. It has excellent documentation. One significant disadvantage is that it won't automatically attempt to find the right fit for a given JSON, if trying to match against a union of dataclasses (see https://dataclass-wizard.readthedocs.io/en/latest/common_use_cases/dataclasses_in_union_types.html). Instead it asks you to add a "tag key" to the input JSON, which is a robust solution but may not be possible if you have no control over the input JSON.
  • dataclass-json is similar to dataclass-wizard, and again doesn't attempt to match the correct dataclass within a union.
  • dacite is the option I have settled upon for the time being. It has similar functionality to marshmallow_dataclass, at least for JSON parsing. The error messages are significantly less clear than marshmallow_dataclass, but slightly offsetting this, it's easier to figure out what's wrong if you pdb in at the point that the error occurs - the internals are quite clear and you can experiment to see what's going wrong. According to others it is rather slow, but that's not a problem in my circumstance.

Comments

7

A "cleaner" solution (in my eyes). Use dacite

No need to inherit anything.

from dataclasses import dataclass
from typing import List
from dacite import from_dict

data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


lst: List = from_dict(List, data)
print(lst)

output

List(id=124, statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open')])

4 Comments

This is also a very cool solution - I'll admit I hadn't tried dacite before. However, from personal tests dacite ended up being about 30x slower in the de-serialization process (I might be missing an optimization step however)
But if we absolutely need to, we can also call fromdict(data, List) without extending from any class. Where the import is generated with from dataclass_wizard.loaders import fromdict. But just a note that this is technically not public API, so it might change in a future release.
Just a note, but I took the suggestion about inheritance being unnecessary to heart - the latest version of dataclass-wizard should now support a fromdict so regular data classes should work as well. I updated my answer above.
wondering why decite is so slow 🤔
1

One way of achieving this is to implement classmethod in each dataclass.

from dataclasses import dataclass
import inspect

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str

    @classmethod
    def init(cls, json_element):
        data = {}
        for k,v in json_element.items():
            if k in inspect.signature(cls).parameters:
                data[k] = v
        return cls(**data)

@dataclass
class List:
    id: int 
    statuses: list[StatusElement]

    @classmethod
    def init(cls, json_element):
        data = {}
        for k,v in json_element.items():
            if k in inspect.signature(cls).parameters:
                if k == 'statuses':
                    data[k] = list(map(StatusElement.init,v))
                else:
                    data[k] = v
        return cls(**data)

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    },
    {
      "status": "to do next",
      "orderindex": 1,
      "color": "#d4d4d4",
      "type": "pending"
    }]
}

object = List.init(json)
print(object)
# List(id='124', statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open'), StatusElement(status='to do next', orderindex=1, color='#d4d4d4', type='pending')])

Comments

0

i am using this native method of my class decorated by @dataclass @dataclass_json. Correct me if i am wrong, but this works for me recursively in python 3.9:

@dataclass
@dataclass_json)
class ExampleClass:
    example_parameter: float

then i am using schema() method:

json_string = "{example_parameter: 0}"
parsed_obj = ExampleClass.schema().loads(json_string, many=False)

1 Comment

That's not a native method. That's the dataclass-json library mentioned in balderman's answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.