This article is the first in what will likely be a series of programming patterns that should be avoided or used with caution in Python.
Consider this general pattern:
some_iterable = [4, 5, 6] for x in some_iterable: # Do something pass # Do something with final 'x' here, outside the for block print(x) # Output: 6
To those used to C-style scoping, the above example would seem a little off because
x, though first appearing in a
for block, is still in scope after the block exits.
In Python, blocks such as
for do not create local scopes. Thus, it may be tempting to continue to use a variable, declared in such a block, in code beyond the block. However, this should be avoided.
Consider what happens with this pattern if the iterable does not validate:
some_iterable =  for x in some_iterable: pass print(x)
Output (portion of traceback omitted):
line 6, in
print(x) NameError: name 'x' is not defined
The above example is what makes this pattern dangerous. If the iterable in the
for statement does not validate as some non-empty iterable, then
x is never instantiated, nor even defined (as the traceback shows).
If not considered, this could lead to an unexpected bug down the road.
The fix is to explicitly set a fallback value in the "outer scope":
some_iterable =  x = None for x in some_iterable: pass print(x)
This fix, though very simple, is easy to forget. If not accessing or modifying
x outside of the
for block, there is no need for the fallback value, of course.
In more general terms, I think a good rule of thumb is to act as though for/while blocks do create a local scope, because in the case that they are never entered, they effectively do. In a scenario where you wish to use a variable both inside and outside of a for/while block, always instantiate it at the outermost level in which it is used.
I tripped myself up with this error in making this very blog application.
First, some background to the example. Articles for this site are drafted in Markdown. When they are saved (via the admin site) their resulting HTML markup is cached, as there is no need to (*shudder*) re-parse the Markdown on every page load. Part of this process of marking up the Markdown is auto-generating a table of contents from the content and structure of the headings in the document (<h3>, <h4>, etc.). However, if an article is short enough, the table of contents is not included; this was estimated using the number of headings. By default, if an article has fewer than 5 total headings, the table of contents is not inserted into the HTML.
The actual function is needlessly complex (lots of BeautifulSoup nonsense) for the purpose of this discussion, so it's easier to discuss with the irrelavent portions stripped:
def make_table_of_contents(html, min_headings=5): toc = '' # Resulting HTML for the table of contents, actually a BeautifulSoup object # Iterate through each <h*> tag in the document (where * is a digit) for i, heading_tag in enumerate(html.find_all(re.compile('h\d$'))): # Build toc tree as needed... pass # Return toc, or empty string if heading quota not met return toc if i + 1 >= min_headings else ''
In this case, it made sense to build the whole tree even if it would not be used, since either way the whole document has to be parsed to find the number of headings. Since
i already held the number of headings encountered, I thoughtlessly threw it in the
Let's get this out of the way now - my biggest problem was I didn't immediately write tests for this function. Boo.
The function worked fine for a while, until I threw together a tiny test article that had no headings. The admin site then kerploded, raising an
UnboundLocalError. The problem was exactly what is described in the preceding sections. If there are no headings to iterate through,
i is not defined, so the
return statement fails.
i = 0 up at the top of the function fixes the problem. However, I'd argue the more readable (thus pythonic) solution would be to define a separate
heading_count variable (that can still be updated using
i in the loop).