6

Is there an easy way to cache function results in python based on a single identifier argument? For example, suppose my function has 3 arguments arg1, arg2 and id. Is there a simple way to cache the function result based only on the value of id? That is, whenever id takes the same value, the cached function would return the same result, regardless of arg1 and arg2.

Background: I have a time-consuming and repeatedly called function, in which arg1 and arg2 are lists and dictionaries composed of large numpy arrays. Hence, functools.lru_cache doesn't work as is. Yet, there are only a handful specific combinations of arg1 and arg2. Hence my idea to manually specify some id which takes a unique value for each possible combination of arg1 and arg2.

1
  • 2
    Yes, it's easy to write your own memoization based only on one argument of a function. Just need the argument to be hashable. Commented Feb 2, 2021 at 2:02

3 Answers 3

4
def cache(fun):
    cache.cache_ = {}
    def inner(arg1, arg2, id):
        if id not in cache.cache_:
            print(f'Caching {id}') # to check when it is cached
            cache.cache_[id] = fun(arg1, arg2, id)
        return cache.cache_[id]
    return inner
    
@cache
def function(arg1, arg2, arg3):
    print('something')

You can create your own decorator as suggested by DarrylG. You can do a print(cache.cache_) inside if id not in cache.cache_: to check that it only caches for newer values of id.

You can make cache_ a function attribute PEP 232 by using cache.cache_. Then when you want to reset cache_ you can use cache.cache_.clear(). That will give you direct access to the dictionary that caches the results.

function(1, 2, 'a')
function(11, 22, 'b')
function(11, 22, 'a')
function([111, 11], 222, 'a')

print(f'Cache {cache.cache_}') # view previously cached results
cache.cache_.clear() # clear cache
print(f'Cache {cache.cache_}') # cache is now empty

# call some function again to populate cache
function(1, 2, 'a')
function(11, 22, 'b')
function(11, 22, 'a')
function([111, 11], 222, 'a')

Edit: Addressing a new comment by @Bob (OP), in most cases returning a reference to the same object would suffice but OP's use-case seems to require a new copy of the answer, possibly due to the nature of function(arg1, arg2, arg3) being treated as unique based on arg1, arg_2 and arg3 (inside the "cache" function uniqueness is only defined using id). In which case, returning the same reference to a mutable object would lead to undesired behavior. As mentioned in the same comment, the return statement in the inner function should be changed from return cache.cache_[id] to return copy.deepcopy(cache.cache_[id]).

Sign up to request clarification or add additional context in comments.

5 Comments

This is great. One follow up question: How would I best clear cache_ without reloading the module that houses the definition of cache? I tried adding a keyword argument to cache def cache(fun, clear=False) with the idea of adding an if statement that clears the cache when calling cache(None, clear=True). But it seems that cache_ is always empty when I am calling cache directly.
Yes, that's perfect and I even learned some more python from it. Thanks
After continued use of this (still working great!), I think it might be better to return a deepcopy of cache.cache_[id]. Otherwise, the cached results may get inadvertently overwritten if some of the function outputs are mutable.
good to know my answer still works :D, that is a good point, I will edit this once I am on a PC, but appreciate the feedback @Bob
Yes, that's how I did it, too. I think this is better behavior for any kind of caching function, because otherwise the output to future fun calls will change if the output to prior calls is ever modified. E.g, my_cached_list = make_unsorted_list(id='my_id') will fail to return the original value of cache_ (say, an unsorted list), if one ever did my_cached_list.sort() after a prior call to the caching function. While perhaps desirable in some cases, it would be unexpected in most use cases. @python_user
1

I think you could move excessive arguments to a separate function (caller), like below:

import functools

def get_and_update(a, b, c):
    return {'a': a, 'b': b, 'c': c}

# ->

@functools.lru_cache
def get_by_a(a):
    return {}

def get_and_update(a, b, c):
    res = get_by_a(a)
    res.update(a=a, b=b, c=c)
    return res

x1 = get_and_update('x', 1, 2)
x2 = get_and_update('x', 2, 3)
assert x1 is x2
print(x1, x2, sep='\n')
{'a': 'x', 'b': 2, 'c': 3}
{'a': 'x', 'b': 2, 'c': 3}

Comments

1

The best may be just to write your own simple decorator, as @DarrylG says, e.g.

from functools import wraps 

def memoize_first(func):
    """Memoize like functools.cache, but only consider first argument.

    Adapted from https://wiki.python.org/moin/PythonDecoratorLibrary
    """
    cache = func.cache = {}

    @wraps(func)
    def memoizer(arg1, *args):
        if arg1 not in cache:
            cache[arg1] = func(arg1, *args)
        return cache[arg1]
    return memoizer

Obviously once the result is cached, the other args won't affect the result, so you need to specify the correct args the first time to be cached.

Another twist: I think you can use function attributes as @python_user mentions to bypass arguments for caching, although this is not elegant as you have to specify those attributes separately from the function arguments. If arg1 and arg2 are constants, than this is a fine use of globals instead.

from functools import cache

@cache
def f(id):
    return compute(id, f.arg1, f.arg2) 

f.arg1 = big_list
f.arg2 = other_big_list

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.