I have a main-class that takes a list of sources and returns two objects for each source; one with the required data and one analytics tool.
The Analytics-class has different methods depending on what source it is. The Data-class extracts data from different paths and cleans the data in different ways depending on the source. Importing/exporting is made through pandas read_excel(). The analytics tool outputs some calculations based on what source the data comes from.
class Main_class():
def __init__(self, sources = ['a','b','c']):
self.data_sources = {}
self.analytics = {}
for s in sources:
self.data_sources[s] = Data(s)
self.analytics[s] = Analytics(s, self.data_sources[s])
Right now, my solution is to have one Data-class and one Analytics-class which has if-statements to adapt the functionality of the class depending on the source. This is not a scalable or otherwise good solution, I basically have checks in both classes where I say
acceptable_sources = ['a', 'b', 'c']
if source not in acceptable_sources:
raise ValueError(f"Only acceptable sources are: {acceptable_sources}")
Then, I need more checks to set the self.variables correctly, here's an example from the Data-class
self.data = {}
if source == 'a': # if it's a, then there's 3 sources
self.data[a] = pd.read_excel('a_1.xlsx')
self.data[a] = pd.read_excel('a_2.xlsx')
self.data[a] = pd.read_excel('a_2.xlsx')
elif source == 'b': # if it's b, then there's 2 sources
self.data[a] = pd.read_excel('b_1.xlsx')
self.data[a] = pd.read_excel('b_2.xlsx')
This is problematic, since there will be a lot of if-statements as the number of sources increase, but it might be the best solution, I'm not sure. Using the same idea in my Analytics-class, there will be a lot of unused functions for each source-case. Let's say there are 30 functions for source a, 25 functions for source b and 40 functions for source c. Some of these functions might be shared across sources, and some will be unique. So whichever source I use, there will be a lot of unused methods which seems like a waste.
My first thought was to make Analytics and Data into abstract classes and create unique classes for each compatible source, but then I wouldn't be able to instantiate them in the for-loop in my main class. Then I thought that I could include them in a Class_holder which basically checks which class I want to instantiate, and if it exists to return an object of that class. So for example if I have X possible sources, the Class_holder would be able to handle and return X different classes, and if it doesn't exist return an error. It would look something like
from analytics import A, B, C # classes I should create with correct methods
class Class_holder:
def __init__(self, source, data):
self.acceptable_sources = ['a', 'b', 'c']
if source in acceptable_sources:
raise ValueError(f"Only acceptable sources are: {acceptable_sources}")
self.source = source
self.data = data
def return_analytics_class(self):
if self.source == 'a':
return A(self.data)
elif self.source == 'b':
return B(self.data)
elif self.source == 'c':
return C(self.data)
And the classes I have called A, B, C could either be a combination of Data and Analytics, or I could separate it by having one Data and one Analytics-class for each source, then the Class_holder.return_class() would return a tuple with two classes. For the Class_holder-solution I would have to change my Main to something like
from a_file import Data
from another_file import Class_holder
class Main_class():
def __init__(self, sources = ['a','b','c'])
self.data_sources = {}
for s in sources:
self.data_souces[s] = Data(s)
self.analytics = {}
for s in sources:
self.analytics[s] = Class_holder(s, self.data_sources[s]).return_analytics_class()
But then I'm back to my original problem, where I need to have checks in both Data and Class_holder to see if the sources are compatible, however this might solve the problem of only instantiating the correct analytics-functions for each source.
It just doesn't feel like an optimal way of doing this kind of task, so I'm turning to codereview to ask for a bit of guidance, if you know any design pattern or other solution for this kind of problem, I would greatly appreciate it.