0

I have some objects that are very slow to instantiate. They are representation of data loaded from external sources such as YAML files, and loading large YAML files is slow (I don't know why).

I know these objects depends on some external factors:

  • The arguments passed at the object creation
  • Environment variables
  • Some external files

Ideally I would like a transparent non boilerplate method to cache these objects if the external factors are the same:

@cache(depfiles=('foo',), depvars=(os.environ['FOO'],))
class Foo():
    def __init__(*args, **kwargs):
        with open('foo') as fd:
           self.foo = fd.read()
        self.FOO = os.environ['FOO']
        self.args = args
        self.kwargs = kwargs

The main idea is that the first time I instantiate Foo, a cache file is created with the content of the object, then the next time I instantiate it (in another Python session), the cache file will be used only if none of the dependencies and argument have changed.

The solution I've found so far is based on shelve:

import shelve

class Foo(object):
    _cached = False
    def __new__(cls, *args, **kwargs):
        cache = shelve.open('cache')
        cache_foo = cache.get(cls.__name__)
        if isinstance(cache_foo, Foo):
            cache_foo._cached = True
            return cache_foo
        self = super(Foo, cls).__new__(cls, *args, **kwargs)
        return self

    def __init__(self, *args, **kwargs):
        if self._cached:
            return

        time.sleep(2) # Lots of work
        self.answer = 42

        cache = shelve.open('cache')
        cache[self.__class__.__name__] = self
        cache.sync() 

It works perfectly as is but it is too boilerplate and it doesn't cover all the cases:

  • Conflicts when different classes have the same name
  • Check for args and kwargs
  • Check for dependencies (environment vars, external files)

Is there any native solution to achieve similar behavior in Python?

3
  • 1
    cachetools? I know of it, not about it. pypi.python.org/pypi/cachetools Commented Feb 15, 2017 at 8:49
  • I guess you can try pickling the objects. Commented Feb 15, 2017 at 8:58
  • @AlexFung shelve uses pickle behind the scene. Commented Feb 15, 2017 at 9:08

1 Answer 1

1

Python 3 provides the functools.lru_cache() decorator to provide memoization of callables, but I think you're asking to preserve the caching across multiple runs of your application and by that point there is such a variety of differing requirements that you're unlikely to find a 'one size fits all' solution.

If your own answer works for you then use it. So far as 'too much boilerplate' is concerned I would extract the caching out into a separate mixin class: the first reference to Foo in __new__ probably ought to be cls in any case and you can use the __qualname__ attribute instead of cls.__name__ to reduce the likelihood of class name conflicts (assuming Python 3.3 or later).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.