This article is the first in what will likely be a series of programming patterns that should be avoided or used with caution in Python.
The Pattern
Consider this general pattern:
some_iterable = [4, 5, 6] for x in some_iterable: # Do something pass # Do something with final 'x' here, outside the for block print(x) # Output: 6
To those used to C-style scoping, the above example would seem a little off because x
, though first appearing in a for
block, is still in scope after the block exits.
In Python, blocks such as for
do not create local scopes. Thus, it may be tempting to continue to use a variable, declared in such a block, in code beyond the block. However, this should be avoided.
The Problem
Consider what happens with this pattern if the iterable does not validate:
some_iterable = [] for x in some_iterable: pass print(x)
Output (portion of traceback omitted):
line 6, inprint(x) NameError: name 'x' is not defined
The above example is what makes this pattern dangerous. If the iterable in the for
statement does not validate as some non-empty iterable, then x
is never instantiated, nor even defined (as the traceback shows).
If not considered, this could lead to an unexpected bug down the road.
The Fix
The fix is to explicitly set a fallback value in the "outer scope":
some_iterable = [] x = None for x in some_iterable: pass print(x)
This fix, though very simple, is easy to forget. If not accessing or modifying x
outside of the for
block, there is no need for the fallback value, of course.
In more general terms, I think a good rule of thumb is to act as though for/while blocks do create a local scope, because in the case that they are never entered, they effectively do. In a scenario where you wish to use a variable both inside and outside of a for/while block, always instantiate it at the outermost level in which it is used.
Real-World Example
I tripped myself up with this error in making this very blog application.
First, some background to the example. Articles for this site are drafted in Markdown. When they are saved (via the admin site) their resulting HTML markup is cached, as there is no need to (*shudder*) re-parse the Markdown on every page load. Part of this process of marking up the Markdown is auto-generating a table of contents from the content and structure of the headings in the document (<h3>, <h4>, etc.). However, if an article is short enough, the table of contents is not included; this was estimated using the number of headings. By default, if an article has fewer than 5 total headings, the table of contents is not inserted into the HTML.
The actual function is needlessly complex (lots of BeautifulSoup nonsense) for the purpose of this discussion, so it's easier to discuss with the irrelavent portions stripped:
def make_table_of_contents(html, min_headings=5): toc = '' # Resulting HTML for the table of contents, actually a BeautifulSoup object # Iterate through each <h*> tag in the document (where * is a digit) for i, heading_tag in enumerate(html.find_all(re.compile('h\d$'))): # Build toc tree as needed... pass # Return toc, or empty string if heading quota not met return toc if i + 1 >= min_headings else ''
In this case, it made sense to build the whole tree even if it would not be used, since either way the whole document has to be parsed to find the number of headings. Since i
already held the number of headings encountered, I thoughtlessly threw it in the return
statement.
Let's get this out of the way now - my biggest problem was I didn't immediately write tests for this function. Boo.
The function worked fine for a while, until I threw together a tiny test article that had no headings. The admin site then kerploded, raising an UnboundLocalError
. The problem was exactly what is described in the preceding sections. If there are no headings to iterate through, i
is not defined, so the return
statement fails.
Sticking an i = 0
up at the top of the function fixes the problem. However, I'd argue the more readable (thus pythonic) solution would be to define a separate heading_count
variable (that can still be updated using i
in the loop).