This question came up while I was saving a large number of model-inferred embeddings to plain text. To do so, I needed to convert lists of float embeddings into strings, and I found this conversion to be surprisingly time-consuming.
Inspired by this discussion, I benchmarked four different methods for converting float arrays to strings. Surprisingly, orjson performed the best—even though it's a third-party JSON serialization library.
This got me wondering: Is there a native Python method that can achieve performance comparable to orjson for converting lists of floats to strings?
Below are the commands I used for profiling, along with the results:
$ python -m pyperf timeit --fast -s 'x = [3141592653589793] * 100' 'str(x)'
Mean +- std dev: 4.79 us +- 0.06 us
$ python -m pyperf timeit --fast -s 'from orjson import dumps; x = [3141592653589793] * 100' 'dumps(x)'
Mean +- std dev: 2.70 us +- 0.02 us
$ python -m pyperf timeit --fast -s 'from json import dumps; x = [3141592653589793] * 100
' 'dumps(x)'
Mean +- std dev: 8.03 us +- 0.31 us
$ python -m pyperf timeit --fast -s 'x = [3141592653589793] * 100' '"{}".format(x)'
Mean +- std dev: 4.94 us +- 0.16 us
str(x)is less than twice as slow asstr(x)compared to orjson. Back in 2022 it was 3-4 times slower.orjsonis faster because it's written in Rust and uses a different implementation. The discussion links to other fast librariesSurprisingly ... even though it's a third-partyit's quite the opposite. External libraries can be faster because they can use different implementations without breaking compatibility. Converting floats to strings isn't trivial. The linked discussion is about replacing the existing C algorithm with newer and faster algorithms like Rye and Dragonbox.orjsonmay be using very different string management mechanisms too. Allocating and releasing memory is expensive, so fast serializers pre-allocate and reuse buffers