I have some objects that are very slow to instantiate. They are representation of data loaded from external sources such as YAML files, and loading large YAML files is slow (I don't know why).
I know these objects depends on some external factors:
- The arguments passed at the object creation
- Environment variables
- Some external files
Ideally I would like a transparent non boilerplate method to cache these objects if the external factors are the same:
@cache(depfiles=('foo',), depvars=(os.environ['FOO'],))
class Foo():
def __init__(*args, **kwargs):
with open('foo') as fd:
self.foo = fd.read()
self.FOO = os.environ['FOO']
self.args = args
self.kwargs = kwargs
The main idea is that the first time I instantiate Foo, a cache file is created with the content of the object, then the next time I instantiate it (in another Python session), the cache file will be used only if none of the dependencies and argument have changed.
The solution I've found so far is based on shelve:
import shelve
class Foo(object):
_cached = False
def __new__(cls, *args, **kwargs):
cache = shelve.open('cache')
cache_foo = cache.get(cls.__name__)
if isinstance(cache_foo, Foo):
cache_foo._cached = True
return cache_foo
self = super(Foo, cls).__new__(cls, *args, **kwargs)
return self
def __init__(self, *args, **kwargs):
if self._cached:
return
time.sleep(2) # Lots of work
self.answer = 42
cache = shelve.open('cache')
cache[self.__class__.__name__] = self
cache.sync()
It works perfectly as is but it is too boilerplate and it doesn't cover all the cases:
- Conflicts when different classes have the same name
- Check for args and kwargs
- Check for dependencies (environment vars, external files)
Is there any native solution to achieve similar behavior in Python?
shelveusespicklebehind the scene.