Python list comprehension - want to avoid repeated evaluation

I have a list comprehension which approximates to:

[f(x) for x in l if f(x)]

Where l is a list and f(x) is an expensive function which returns a list.

I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?

I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.

Edit:

Two basic approaches have been suggested:

An inner generator comprehension:

[y for y in (f(x) for x in l) if y]

or memoization.

I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:

[g(x, f(x)) for x in l if f(x)]

For this more complicated situation, I think memoization produces a cleaner end result.

Answers


A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.

a really simple implementation is the following:

storage = {}
def memoized(value):
    if value not in storage:
        storage[value] = f(value)
    return storage[value]

[memoized(x) for x in l if memoized(x)]

and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.

You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.

On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.

EDIT:

as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:

[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]

You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.


[y for y in (f(x) for x in l) if y]

Will do.


You should use a memoize decorator. Here is an interesting link.


Using memoization from the link and your 'code':

def memoize(f):
    """ Memoization decorator for functions taking one or more arguments. """
    class memodict(dict):
        def __init__(self, f):
            self.f = f
        def __call__(self, *args):
            return self[args]
        def __missing__(self, key):
            ret = self[key] = self.f(*key)
            return ret
    return memodict(f)

@memoize
def f(x):
    # your code

[f(x) for x in l if f(x)]

[y for y in [f(x) for x in l] if y]

For your updated problem, this might be useful:

[g(x,y) for x in l for y in [f(x)] if y]

Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:

output = []
for x in l:
    result = f(x)
    if result: 
        output.append(result)

If you find that hard to read, you can always wrap it in a function.


As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.

To come to your actual problem:

[g(x, f(x)) for x in series if f(x)]

To calculate the final value you need both x and f(x). No problem, pass them both like this:

[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]

Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:

[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS

You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value. I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:

def memoize(func):
    def wrapper(*args):
        if args in wrapper.d:
            return wrapper.d[args]
        ret_val = func(*args)
        wrapper.d[args] = ret_val
        return ret_val
    wrapper.d = {}
    return wrapper

@memoize
def f(x):
...

Now f is a memoized version of itself. With this implementation you can memoize any function using the @memoize decorator.


There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:

from functools import lru_cache

@lru_cache()
def f(x):
    # function body here

This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.


Use map() !!

comp = [x for x in map(f, l) if x]

f is the function f(X), l is the list

map() will return the result of f(x) for each x in the list.


Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:

In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:

[y for x in l if (y := f(x))]

Here is my solution:

filter(None, [f(x) for x in l])

How about defining:

def truths(L):
    """Return the elements of L that test true"""
    return [x for x in L if x]

So that, for example

> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]

> truths(wife.children for wife in henry8.wives) 
[[Mary1], [Elizabeth1], [Edward6]]

Need Your Help

Removing old indices in elasticsearch

elasticsearch logstash elasticsearch-plugin

I have the many of my logs indexed in logstash-Year-Week format. That is if i want to delete indices older than a few weeks, how can I achieve that in elasticsearch. Is there an easy, seamless way ...

Is it possible to define a C macro in a makefile?

c makefile compilation

Is it possible to put the equivalent of #define VAR (in a C program) into a makefile, so that one can control which part of the program should be compiled?