Late initialization with mypy

Sunday 12 February 2023

Let’s say you have a complex class with a number of attributes. The class is used in a few different ways, so sometimes the attributes are available, but sometimes they haven’t been initialized yet. Because of global knowledge about how the class is used, we know which paths are certain to have the attributes, and which might not have them.

[UPDATE: I’ve changed my mind: Late initialization, reconsidered]

(If you are interested, the real code I’m thinking about is from coverage.py, but this post has toy examples for clarity.)

Before static type checking, I’d initialize these attributes to None. In the certain-to-exist code paths, I’d just use the attributes. In the uncertain code paths, I’d check if an attribute was None before using it:

# Original untyped code.
class Complicated:
    def __init__(self):
        self.other = None

    def make_other(self):
        self.other = OtherThing()

    def certain_path(self):
        self.other.do_something()

    def uncertain_path(self):
        if self.other is not None:
            self.other.do_something()

How should I add type annotations to a situation like this? The most obvious approach is to declare the attribute as Optional. But that means adding asserts to the certain paths. Without them, the type checker will warn us that the attribute might be None. Type checkers don’t have the global understanding that makes us certain about them being available on those paths. Now we need extra code for both certain and uncertain paths: asserts for one and run-time checks for the other:

# Simple Optional typing.
class Complicated:
    def __init__(self):
        self.other: Optional[OtherThing] = None

    def make_other(self):
        self.other = OtherThing()

    def certain_path(self):
        assert self.other is not None
        self.other.do_something()

    def uncertain_path(self):
        if self.other is not None:
            self.other.do_something()

This is a pain if there are many certain paths, or many of these attributes to deal with. It just adds clutter.

A second option is to have the attribute exist or not exist rather than be None or not None. We can type these ghostly attributes as definitely not None, but then we have to check if it exists in the uncertain paths:

# Ghost: attribute exists or doesn't exist.
class Complicated:
    def __init__(self):
        # declared but not defined:
        self.other: OtherThing

    def make_other(self):
        self.other = OtherThing()

    def certain_path(self):
        self.other.do_something()

    def uncertain_path(self):
        if hasattr(self, "other"):
            self.other.do_something()

This is strange: you don’t often see a class that doesn’t know in its own code whether attributes exist or not. This is how I first adjusted the coverage.py code with type annotations: six attributes declared but not defined. But it didn’t sit right with me, so I kept experimenting.

A third option is to use two attributes for the same value: one is typed Optional and one is not. This lets us avoid asserts on the certain paths, but is really weird and confusing:

# Two attributes for the same value.
class Complicated:
    def __init__(self):
        self.other: OtherThing
        self.other_maybe: Optional[OtherThing] = None

    def make_other(self):
        self.other = self.other_maybe = OtherThing()

    def certain_path(self):
        self.other.do_something()

    def uncertain_path(self):
        if self.other_maybe is not None:
            self.other_maybe.do_something()

But if we’re going to use two attributes in the place of one, why not make it the value and a boolean?

# Value and boolean.
class Complicated:
    def __init__(self):
        self.other: OtherThing
        self.other_exists: bool = False

    def make_other(self):
        self.other = OtherThing()
        self.other_exists = True

    def certain_path(self):
        self.other.do_something()

    def uncertain_path(self):
        if self.other_exists:
            self.other.do_something()

This is about the same as “exists or doesn’t exist’, but with a second nearly-useless attribute, so what’s the point?

Another option: the attribute always exists, and is never None, but is sometimes a placebo implementation that does nothing for those times when we don’t want it:

# Placebo

class OtherPlacebo(OtherThing):
    def do_something(self):
        pass

class Complicated:
    def __init__(self):
        self.other: OtherThing = OtherPlacebo()

    def make_other(self):
        self.other = OtherThing()

    def certain_path(self):
        self.other.do_something()

    def uncertain_path(self):
        self.other.do_something()

A philosophical quandary about placebos: should they implement all the base class methods, or only those that we know will be invoked in the uncertain code paths? Type checkers are fine with either, and run-time is of course fine with only the subset.

In the end, I liked the placebo strategy best: it removes the need for any checking or asserts. I implemented the placebos as bare-bones with only the needed methods. It can make the logic a bit harder to understand at a glance, but I think I mostly don’t need to know whether it’s a placebo or not in any given spot. Maybe six months from now I’ll be confused by the switcheroos happening, but it looks good right now.

Comments

Anon 4:59 PM on 12 Feb 2023

It seems to me that it would be simplest to just use Optional and check for is not None even if it is slightly tedious.

Adding placebo objects seems to overcomplicate things in order to evade typing as well as being nonstandard.

If there’s a lot of these checks, you could add a maybe_do_with_other(self, action) method that does the checking in a single place, although I’m not sure off the top of my head how that would interact with typing.

Another thought that comes to mind is that using a placebo like this is similarish to the strategy pattern, so maybe you could pass in other as a strategy instance up front rather than having a make_other method? But this also seem overly complex just to avoid some None checks.

Alex 5:34 PM on 12 Feb 2023

Are six lines of assert ... is not None at the top of various functions really so bad?

The Placebo solution is totally bespoke and doesn’t even work as advertised – you had to create CoverageData._real. This feels like adding magic and cognitive overhead for marginal aesthetic reasons.

Nick 6:10 PM on 12 Feb 2023

I’m not a fan of the placebo pattern—it neutralizes the benefits of strict optional type checking, which throws the baby out with the bathwater.

In the “Simple optional typing” approach, every time you access the attribute, the type checker forces you to consider whether you should treat the path as certain versus uncertain. This is something you need to be thinking about as a programmer because the type checker can’t do it for you. Once the code is written, you can also see at the callsite which mode you are in and, if you get an AssertionError you know there is a bug in your program.

One possibility would be wrap the optional testing in a helper method, forcing you to declare at the callsite whether you are on the certain or uncertain path.

class Complicated:

    def do_other_something(self, *, strict: bool) -> Any:
        if self.other is None:
            if strict:
                raise AssertionError("you forgot to call make_other()!")
        else:
            self.other.do_something()

However, this can be unwieldy if there are a lot of possible methods on other that you want to call and don’t want to wrap them all. Also, you may have calls to do_other_something() that don’t do something, which can make it more difficult to reason about the code.

If you don’t need to support versions before 3.8, you can combine overload and Literal from typing with assignment expressions to make what I think is a very nice pattern. The callsites are pretty clean and you never have a do_something() call that doesn’t do something.

from typing import overload, Literal

class Complicated:

    @overload
    def get_other(*, strict: Literal[False]) -> Optional[OtherThing]:
        ...

    @overload
    def get_other(*, strict: Literal[True]) -> OtherThing:
        ...

    def get_other(*, strict: bool) -> Optional[OtherThing]:
        if strict and self.other is None:
            raise AssertionError("you should have initialized")
        return self.other
        ...

   def certain_path(self):
        self.get_other(strict=True).do_something()

    def uncertain_path(self):
        if (other := self.get_other(strict=False)):
            other.do_something()

Nick 6:15 PM on 12 Feb 2023

Oops, I’m missing the self arguments on the get_other() definitions above and there is a trailing ..., but you get the idea.

Nick 6:32 PM on 12 Feb 2023

Or maybe dispense with all the cleverness and just have a generic unwrap() method for the certain paths.

from typing TypeVar

T = TypeVar("T")

def unwrap(value: Optional[T]) -> T:
    if val is None:
        raise AssertionError("expected value to not be None!")
    return val    

class Complicated:
    ...

    def certain_path(self):
        unwrap(self.get_other).do_something()

    def uncertain_path(self):
        if self.other:
            self.other.do_something()