I have a 'processing' function and a 'serializing' function. Currently the processor returns 4 different types of data structures to be serialized in different ways.
Looking for the best practise on what to do here.
def process(input):
...
return a,b,c,d
def serialize(a,b,c):
...
# Different serialization patterns for each of a-c.
a,b,c,d = process(input)
serialize(a,b,c)
go_on_to_do_other_things(d)
That feels janky.
Should I instead use a class where a,b,c,d are member variables?
class VeryImportantDataProcessor:
def process(self,input):
self.a = ...
self.b = ...
...
def serialize(self):
s3.write(self.a)
convoluted_serialize(self.b)
...
vipd = VeryImportantDataProcessor()
vipd.process(input)
vipd.serialize()
Keen to hear your thoughts on what is best here!
Note after processing and serializing, the code goes on to use variable d for further unrelated shenanigans. Not sure if that changes anything.
a,bandcget used outside ofprocessandserialize? Or is the "point" of this code to returnd, with serialization of some values as a side effect, anda,bandcmigrated to the API by necessity of implementation rather than by design?a,b, andcare processed products of a raw data stream, serialized for other live APIs to pull down for use in their different tasks. Theprocessfunction here is essentially the SQL-like data manipulation in Spark. After this stage we're done with Spark processing.dis another related subset of the data, but it goes on to additional steps (ML model training)