Generators
Generators produce values one at a time — on demand — instead of building the whole collection in memory. Essential for working with large files, streams, and infinite sequences.
The Problem: Memory
If you want to process a million items, building a list of all million items first wastes memory. A generator produces each item only when asked.
import sys
# List — builds ALL items in memory immediately
big_list = [x**2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(big_list):,} bytes")
# Generator — builds nothing upfront, produces one item at a time
big_gen = (x**2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(big_gen):,} bytes")
# Both produce the same values when iterated
# But the generator uses ~112 bytes vs ~8MB for the list!
# You can still use a generator in for loops and sum()
total = sum(x**2 for x in range(100))
print(f"Sum of squares 0-99: {total}")
Output
Generator Functions with yield
A generator function uses yield instead of return. Each time you call next() on it, execution resumes right after the last yield.
def countdown(n):
print("Starting countdown!")
while n > 0:
yield n # pause here, hand back n
n -= 1
print("Done!")
gen = countdown(3)
print(next(gen)) # prints "Starting countdown!" then yields 3
print(next(gen)) # resumes, yields 2
print(next(gen)) # resumes, yields 1
# next(gen) would raise StopIteration
# Usually you just use it in a for loop
for value in countdown(5):
print(value, end=" ")
# Infinite generator — perfectly fine because it's lazy
def integers_from(n):
while True:
yield n
n += 1
gen = integers_from(10)
print([next(gen) for _ in range(5)]) # [10, 11, 12, 13, 14]
Output
Real-World Use Cases
# Fibonacci — infinite sequence using almost no memory
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Take first 10
fib = fibonacci()
first_10 = [next(fib) for _ in range(10)]
print(first_10) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
# Process a (simulated) large CSV line-by-line
def read_large_file(lines):
for line in lines:
yield line.strip()
fake_csv = ["alice,25", "bob,30", "carol,22"]
for row in read_large_file(fake_csv):
name, age = row.split(",")
print(f"{name} is {age} years old")
# Generator pipeline — each stage is lazy
data = range(1, 11)
doubled = (x * 2 for x in data)
filtered = (x for x in doubled if x > 10)
result = list(filtered)
print(result) # [12, 14, 16, 18, 20]
Output