Python: iterate over a sublist

Generally, when you want to iterate over a portion of a list in Python, the easiest thing to do is just slice the list.

# Iterate over everything except the first item in a list
#
items = [1,2,3,4]
iterrange = (x for x in items[1:])

But the slice operator creates a new list, which is not even necessary to do in many cases. Ideally, I'd like some kind of slicing function that creates generators, as opposed to new list objects. Something similar to this could be accomplished by creating a generator expression that uses a range to return only certain portions of the list:

# Create a generator expression that returns everything except 
# the first item in the list
#
iterrange = (x for x, idx in zip(items, range(0, len(items))) if idx != 0)

But this is sort of cumbersome. I'm wondering if there is a better, more elegant way to do this. So, what's the easiest way to slice a list so that a generator expression is created instead of a new list object?

Answers


Use itertools.islice:

import itertools

l = range(20)

for i in itertools.islice(l,10,15):
    print i

10
11
12
13
14

From the doc:

Make an iterator that returns selected elements from the iterable


Before I start, to be clear, the correct order of selecting between slicing approaches is usually:

  1. Use regular slicing (the cost of copying all but the longest of inputs is usually not meaningful, and the code is much simpler). If the input might not be a sliceable sequence type, convert it to one, then slice, e.g. allbutone = list(someiterable)[1:]. This is simpler, and for most cases, typically faster, than any other approach.
  2. If regular slicing isn't viable (the input isn't guaranteed to be a sequence and converting to a sequence before slicing might cause memory issues, or it's huge and the slice covers most of it, e.g. skipping the first 1000 and last 1000 elements of a 10M element list, so memory might be a concern), itertools.islice is usually the correct solution as it's simple enough, and the performance cost is usually unimportant.
  3. If, and only if, islice's performance is unacceptably slow (it adds some overhead to producing every item, though admittedly it's quite a small amount) and the amount of data to be skipped is small, while the data to be included is huge (e.g. the OP's scenario of skipping a single element and keeping the rest), keep reading

If you find yourself in case #3, you're in a scenario where islice's ability to bypass initial elements (relatively) quickly isn't enough to make up for the incremental cost to produce the rest of the elements. In that case, you can improve performance by reversing your problem from selecting all elements after n to discarding all elements before n.

For this approach, you manually convert your input to an iterator, then explicitly pull out and discard n values, then iterate what's left in the iterator (but without the per-element overhead of islice). For example, for an input of myinput = list(range(1, 10000)), your options for selecting elements 1 through the end are:

# Approach 1, OP's approach, simple slice:
for x in myinput[1:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, 1, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
next(myiter, None) # Throw away one element, providing None default to avoid StopIteration error
for x in myiter:  # Iterate unwrapped iterator

If the number of elements to discard is larger, it's probably best to borrow the consume recipe from the itertools docs:

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is None, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

which makes the approaches generalize for skipping n elements to:

# Approach 1, OP's approach, simple slice:
for x in myinput[n:]:

# Approach 2, Sebastian's approach, using itertools.islice:
for x in islice(myinput, n, None):

# Approach 3 (my approach)
myiter = iter(myinput)  # Explicitly create iterator from input (looping does this already)
consume(myiter, n)      # Throw away n elements
# Or inlined consume as next(islice(myiter, n, n), None)
for x in myiter:        # Iterate unwrapped iterator

Performance-wise, this wins by a meaningful amount for most large inputs (exception: range itself on Python 3 is already optimized for plain slicing; plain slicing can't be beat on actual range objects). ipython3 microbenchmarks (on CPython 3.6, 64 bit Linux build) illustrate this (the definition of slurp in the setup is just a way to make the lowest overhead approach to running out an iterable so we minimize the impact of the stuff we're not interested in):

>>> from itertools import islice
>>> from collections import deque
>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(r[1:])
...
65.8 μs ± 109 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... slurp(islice(r, 1, None))
...
70.7 μs ± 104 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10000))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
30.3 μs ± 64.1 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

Obviously, the extra complexity of my solution isn't usually going to be worth it, but for moderate sized inputs (10K elements in this case), the performance benefit is clear; islice was the worst performer (by a small amount), plain slicing was slightly better (which reinforces my point about plain slicing almost always being the best solution when you have an actual sequence), and the "convert to iterator, discard initial, use rest" approach won by a huge amount, relatively speaking (well under half the time of either of the under solutions).

That benefit won't show up for tiny inputs, because the fixed overhead of loading/calling iter/next, and especially islice, will outweigh the savings:

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(r[1:])
...
207 ns ± 1.86 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... slurp(islice(r, 1, None))
...
307 ns ± 1.71 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(islice(ir, 1, 1), None)  # Inlined consume for simplicity, but with islice wrapping to show generalized usage
... slurp(ir)
...
518 ns ± 4.5 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

>>> %%timeit -r5 slurp = deque(maxlen=0).extend; r = list(range(10))
... ir = iter(r)
... next(ir, None)  # To show fixed overhead of islice, use next without it
... slurp(ir)
...
341 ns ± 0.947 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

but as you can see, even for 10 elements the islice-free approach isn't much worse; by 100 elements, the islice-free approach is faster than all competitors, and by 200 elements, the generalized next+islice beats all competitors (obviously it doesn't beat islice-free given the 180 ns overhead of islice, but that's made up for by generalizing to skipping n elements as a single step, rather than needing to call next repeatedly for skipping more than one element). Plain islice rarely wins in the "skip a few, keep a lot" case due to the per element overhead the wrapper exacts (it didn't clearly beat eager slicing in the microbenchmarks until around 100K elements; it's memory efficient, but CPU inefficient), and it will do even worse (relative to eager slicing) in the "skip a lot, keep a few" case.


Try itertools.islice:

http://docs.python.org/library/itertools.html#itertools.islice

iterrange = itertools.islice(items, 1, None)

Need Your Help

How do I pass 2 parameters to Nant script?

.net ant build-automation nant

I have to write an Nant script that will accept 2 parameters on the command line. The first is just an option and has the following format: -myOption. The second one needs to be wrapped in quotes: ...