Iterable objects¶
In Python, we have objects that can be iterated by default: lists, tuples, sets and dictionaries. However, the built-in iterable objects are not the only kind that we can have in a for loop. We could also create our own iterable, with the logic we define for iteration.
In order to achieve this, we rely on magic methods. Iteration works in Python by its own protocol (namely the iteration
protocol). When you try to iterate an object in the form for e in myobject:
…, what Python checks at a very high
level are the following two things, in order:
If the object contains one of the iterator methods
__next__
or__iter__
If the object is a sequence and has
__len__
and__getitem__
Therefore, as a fallback mechanism, sequences can be iterated, and so there are two ways of customizing our objects to be able to work on for loops.
1. Creating iterable objects¶
When we try to iterate an object, Python will call the iter()
function over it. One of the first things this
function checks for is the presence of the __iter__
method on that object, which, if present, will be executed.
__iter__
should return the iterator itself.
The following code creates an object that allows iterating over a range of dates, producing one day at a time on every round of the loop:
from datetime import timedelta
class DateRangeIterable:
"""An iterable that contains its own iterator object."""
def __init__(self, start_date, end_date):
self.start_date = start_date
self.end_date = end_date
self._present_day = start_date
def __iter__(self):
return self
def __next__(self):
if self._present_day >= self.end_date:
raise StopIteration
today = self._present_day
self._present_day += timedelta(days=1)
return today
This object is designed to be created with a pair of dates, and when iterated, it will produce each day in the interval of specified dates, which is shown in the following code:
>>> for day in DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5)):
... print(day)
2018-01-01
2018-01-02
2018-01-03
2018-01-04
Here, the for loop is starting a new iteration over our object. At this point, Python will call the iter()
function
on it, which in turn will call the __iter__
magic method. On this method, it is defined to return self
,
indicating that the object is an iterable itself, so at that point every step of the loop will call the next()
function on that object, which delegates to the __next__
method. In this method, we decide how to produce the
elements and return one at a time. When there is nothing else to produce, we have to signal this to Python by raising
the StopIteration exception.
This means that what is actually happening is similar to Python calling next()
every time on our object until there
is a StopIteration exception, on which it knows it has to stop the for loop:
This example works, but it has a small problem—once exhausted, the iterable will continue to be empty, hence raising StopIteration. This means that if we use this on two or more consecutive for loops, only the first one will work, while the second one will be empty:
>>> r1 = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
>>> ", ".join(map(str, r1))
'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'
>>> max(r1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
This is because of the way the iteration protocol works: an iterable constructs an iterator, and this one is the one
being iterated over. In our example, __iter__
just returned self, but we can make it create a new iterator every
time it is called. One way of fixing this would be to create new instances of DateRangeIterable, which is not a
terrible issue, but we can make __iter__
use a generator (which are iterator objects), which is being created
every time:
class DateRangeContainerIterable:
def __init__(self, start_date, end_date):
self.start_date = start_date
self.end_date = end_date
def __iter__(self):
current_day = self.start_date
while current_day < self.end_date:
yield current_day
current_day += timedelta(days=1)
And this time, it works:
>>> r1 = DateRangeContainerIterable(date(2018, 1, 1), date(2018, 1, 5))
>>> ", ".join(map(str, r1))
'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'
>>> max(r1)
datetime.date(2018, 1, 4)
The difference is that each for loop is calling __iter__
again, and each one of those is creating the generator
again. This is called a container iterable.
..note:: In general, it is a good idea to work with container iterables when dealing with generators.
2. Creating sequences¶
Maybe our object does not define the __iter__()
method, but we still want to be able to iterate over it. If
__iter__
is not defined on the object, the iter()
function will look for the presence of __getitem__
, and if
this is not found, it will raise TypeError.
A sequence is an object that implements __len__
and __getitem__
and expects to be able to get the elements it
contains, one at a time, in order, starting at zero as the first index. This means that you should be careful in the
logic so that you correctly implement __getitem__
to expect this type of index, or the iteration will not work.
The example from the previous section had the advantage that it uses less memory. This means that is only holding one date at a time, and knows how to produce the days one by one. However, it has the drawback that if we want to get the n-th element, we have no way to do so but iterate n-times until we reach it. This is a typical trade-off in computer science between memory and CPU usage.
The implementation with an iterable will use less memory, but it takes up to O(n) to get an element, whereas implementing a sequence will use more memory (because we have to hold everything at once), but supports indexing in constant time, O(1).
This is what the new implementation might look like:
class DateRangeSequence:
def __init__(self, start_date, end_date):
self.start_date = start_date
self.end_date = end_date
self._range = self._create_range()
def _create_range(self):
days = []
current_day = self.start_date
while current_day < self.end_date:
days.append(current_day)
current_day += timedelta(days=1)
return days
def __getitem__(self, day_no):
return self._range[day_no]
def __len__(self):
return len(self._range)
Here is how the object behaves:
>>> s1 = DateRangeSequence(date(2018, 1, 1), date(2018, 1, 5))
>>> for day in s1:
... print(day)
2018-01-01
2018-01-02
2018-01-03
2018-01-04
>>> s1[0]
datetime.date(2018, 1, 1)
>>> s1[3]
datetime.date(2018, 1, 4)
>>> s1[-1]
datetime.date(2018, 1, 4)
In the preceding code, we can see that negative indices also work. This is because the DateRangeSequence object delegates all of the operations to its wrapped object (a list), which is the best way to maintain compatibility and a consistent behavior.
Evaluate the trade-off between memory and CPU usage when deciding which one of the two possible implementations to use. In general, the iteration is preferable (and generators even more), but keep in mind the requirements of every case.