Blog | Ned Batchelder

Changelog automation

Sunday 29 September 2024

I have two main approaches for producing changelogs, but both are based on the same principles: make it convenient for the author to create them, then make it possible to use the information automatically to benefit the readers.

The first way is with a tool such as scriv, which I wrote, but which was inspired by previous similar tools like towncrier and CPython’s blurb. They let you write your changelog one entry at a time in the same pull request as the product change itself. The entries are individual uniquely named files that are collected together when a release is made. This avoids merge conflicts that will happen if a number of developers have to all edit the same changelog file.

The second way I maintain a changelog is how I do it for coverage.py. This predates scriv, and is more custom-coded, so I’ll walk through the steps. Maybe you will be inspired to add bits to other tooling.

I hand-edit a CHANGES.rst file. An entry there might look like this:

CHANGES.rst

- Fix: we failed calling
  :func:`runpy.run_path <python:runpy.run_path>`, as described
  in `issue 1234`_.  This is now fixed, thanks to `Debbie Developer
  <pull 2345_>`_.  Details are on the :ref:`configuration page
  <config_report_format>`.

.. _issue 1234: https://github.com/nedbat/coveragepy/issues/1234
.. _pull 2345: https://github.com/nedbat/coveragepy/pull/2345

This lets me use semantic linking mechanisms. GitHub displays .rst files, but doesn’t understand the :ref:-style of links unfortunately.

The changelog is part of the docs for the project, pulled into the docs/ tree with a Sphinx directive. The :end-before: lets me have end-page content in CHANGES.rst that don’t appear in the docs:

doc/changes.rst

.. include:: ../CHANGES.rst
    :end-before: scriv-end-here

It’s great when researching a bug fix in other projects to see an issue closed with a comment about the commit that fixed it. Even better is when the issue mentions what release first had the fix. I automate that process for coverage.py.

To do that and a few other things, I have some custom tooling. It’s a bit baroque because it grew over time, but it suits my purposes. First I need to get the changelog into a more easily understood form. Sphinx has a little-known feature to produce .rst files as output. It sounds paradoxical, but the benefit is that all links are reduced to their simplest form. The entry above becomes:

tmp/changes.rst

*  Fix: we failed calling
   https://docs.python.org/3/library/runpy.html#runpy.run_path, as
   described in `issue 1234
   <https://github.com/nedbat/coveragepy/issues/1234>`_.  This is now
   fixed, thanks to `Debbie Developer
   <https://github.com/nedbat/coveragepy/pull/2345>`_.  Details are on
   the `configuration page <config.rst#config-report-format>`_.

Then pandoc converts it to Markdown and my parse_relnotes.py creates a JSON file to make it easy to find entries for each version:

[
    {
        "version": "7.6.1",
        "text": "-   Fix: coverage used to fail when measuring code using ...",
        "prerelease": false,
        "when": "2024-08-04"
    },
    ...

Finally(!) comment_on_fixes.py gets the latest release from the JSON file, regexes it for GitHub URLs in the text, and adds a comment to closed issues and merged pull requests:

This is now released as part of [coverage 7.x.y](https://pypi.org/project/coverage/7.x.y).

The other automated output from my CHANGES.rst file is a GitHub release. GitHub releases are both convenient and problematic. I don’t like the idea of authoring content on GitHub that is only available on GitHub. The history of my project is an important part of my project, so I want the source of truth to be a version-controlled text file in my source distribution. But people want to see GitHub releases. So I author in CHANGES.rst, but publish to GitHub releases.

Using github_releases.py I automatically generate a GitHub release from the JSON file. This was useful enough that I added a github-release command to scriv to do a similar thing, but coverage.py still has the custom code to take advantage of the rst link simplifications I showed above.

One of the things I don’t like about GitHub releases is that they always have “Assets” appended to the end, with links to .zip and .tar.gz snapshots of the repo. Those aren’t the right way to get the package, so I include the link to the PyPI page and the correct command to install the package.

Describing all this, it sounds complicated, and I guess it is. I like being able to publish information to people who want it, and this automation accomplishes that.

Changelog philosophy

Saturday 28 September 2024

I playfully quipped about changelogs, and Sumana Harihareswara thoughtfully responded with Changelogs and Release Notes. I agree with her on some things, and disagree on others.

A Simpson's "don't make me tap the sign" meme saying, "a list of commits is not a changelog"

My point with the meme was that people should put effort into a hand-crafted description of what has changed in each release of their product. It should be focused on what users need to know, and not include internal changes, which can be found in the git commits or pull requests. It’s easy to publish a list of commits or pull requests and call it a changelog, but it’s not that helpful to your users trying to understand what has changed for them. That was the point of the meme.

But Sumana raised the stakes, explaining why projects should produce two hand-crafted descriptions. The first is a changelog which mentions every non-trivial change. The second are release notes which should be user-focused with more details.

I liked the reasons Sumana gave:

Release notes can include project-level information that doesn’t correspond to a particular change in a release. Maybe you started a new discussion forum, or there’s a shift in maintainer attention, plans for upcoming work, and so on.
If the release notes are user-focused, then the changelog can be more comprehensive, giving people a fuller picture of the work that goes into producing the project. This can pull back the curtain, helping people understand the inner workings of the project and perhaps find a way to help out.

My problem with separating the changelog and release notes is that I have limited energy to produce them, and perhaps more importantly, people have limited attention to read them. For my projects, I opt instead for a middle ground: my changelogs lean more toward Sumana’s ideal of release notes. They are hand-written, focused on what users of the project need to know, and do not include things like build changes and refactorings.

For large projects like Python and Linux, there are many maintainers and many types of information, so it makes sense to have multiple views of “what’s changed.” For single-maintainer projects, it feels like too much. I applaud people who can do it, but I don’t think I can, and I won’t expect it from others.

Ultimately, each project has to decide for themselves how to balance the effort and the benefit. They know their audience(s), and what resources they have to do the work. Open source is already difficult, the last thing I want to do is add a giant SHOULD to a project.

There’s an inexact nested ratio at work in projects: Most users (say 90%) will only consume, you will never hear from them. You hear from the remaining 10%, but only 10% of those will do something you consider a contribution. For widely used projects like coverage.py, I think the ratio might be more like 1% of 1% instead of 10% of 10%. How does this affect your communication approach? You could look at it two ways: either write for the audience you have (focus on the 90%), or write for the audience you want (focus on the 10%).

In my changelogs now, for fixes I try to describe the bad thing that used to happen and any important changes in behavior. For features, I link to the new docs. I include links to issues and pull requests, and I name the contributors who helped.

So I guess my approach is to write changelogs for the 90%. But I like Sumana’s idea of making the full picture of maintainence more visible to people, so I’m thinking about how to add that without changing the essential character of my changelog. Perhaps something at the end summarizing the changes that aren’t yet mentioned, with a link to the git history? I’m not sure I can automate collecting that information, but I’ll have to play with it.

Cleaning up a messy branch

Saturday 21 September 2024

Let’s say you have a long-lived git branch. Most of the changes should be merged back to main, but some of the changes were already cherry-picked from main, and some of the changes shouldn’t be put onto main at all. How do you review the branch and merge it?

Here’s a diagram of a simple example. The main branch at the top has seven commits. Beneath that is our work branch with three commits, of the three different kinds: W is important work we need to end up on main, M is a commit we cherry-picked from main, and X is a temporary tweak that we don’t want to end up on main:

If we make a pull request from our work branch, GitHub will show a diff that includes all three commits W, M, and X. It was a surprise to me that M was included: it’s not a change that will happen if we merge the work branch, because M is already on main. GitHub doesn’t show you a diff between your branch and main, it shows the diff since your branch diverged from main: it shows all of the commits on your branch. This makes it hard to assess what a merge will do if the branch has cherry-picked commits.

And of course the pull request diff includes X, since that would be a change to main if we merge the work branch. But we don’t want X in the merge, and we don’t want to be distracted by M when reviewing the pull request. What should we do?

The answer is to use the “git revert” command to add commits to the branch that undo M and undo X. We show those as -M and –X:

Now the diff will show only W, great! The –X commit is perfect, it will prevent X from merging to main. But what about –M? What will happen when we merge that? I was concerned that it would undo the M commit on main. But it doesn’t.

A git merge compares two snapshots of the repo and combines them. In this case, the changes from M are on the main branch, and no trace of them are on the work branch, so M is fine, and remains on main after the merge. The merge does just what we want. It brings the W changes onto main, and I’ve named it wM to indicate that:

Some other points here:

Why not just merge the branch after the W commit? This is a simplified example for illustration. The real branch that sent me down this path has dozens of commits intermixed.
GitHub has three different ways to finish a pull request (merge, squash, rebase). This technique of using reverts to hide cherry-picked changes and avoid unwanted changes applies to all of them.
Although our merge only adds the W changes to main, the history will show the complete work branch, including our revert commits. If you wanted it a little cleaner, you could leave out the –M reverts before merging. The result will be the same with or without them.
If you want you can also make a new branch for the revert commits to keep the work branch pristine:

Finally, the way to get the cleanest history is to create a new branch and rebase the commits we want before merging. This could be a lot of work, and some people will object to misrepresenting the actual history of commits. Git gives you plenty of tools to do it as you prefer.

Cogged GitHub profile

Saturday 14 September 2024

Cog is my tool for using bits of Python to generate content inside an otherwise static file. I used it in extreme ways to generate my GitHub profile page.

If you haven’t seen it before, you can customize your GitHub profile by creating a README.md in a repo named the same as your username. So my profile is rendered from nedbat/nedbat/README.md.

My profile has a bit of static text, but much of it is badges, blog posts, links to PyPI projects, and so on. The README.md is literally a Markdown file that can be displayed by GitHub, but it’s full HTML comments containing Python code that generates the content. The generation happens once a day in a GitHub action.

There are three kinds of lines in a file run through cog: static content, code that will generate content, and generated content. My README.md is lop-sided: it has 225 lines of code, 38 of static content, and 43 of generated content.

The badges are made with shields.io image URLs. To make this easier, there are Python functions for Markdown image syntax, for building shields.io badge URLs, and so on.

I can’t walk through all of the code, but I can show a few simplified versions to convey the idea. Read the file itself if you are interested in the full details.

This makes a shields.io URL:

def shields_url(
    label=None,
    message=None,
    color=None,
    label_color=None,
    logo=None,
):
    params = {"style": "flat"}
    url = "".join([
        "/badge/",
        quote(label or ""),
        "-",
        quote(message),
        "-",
        color,
        ])
    url = "https://img.shields.io" + url
    if label_color:
        params["labelColor"] = label_color
    if logo:
        params["logo"] = logo
    return url + "?" + urlencode(params)

This makes a Markdown image:

def md_image(image_url, text, link):
    return f'[![{text}]({image_url} "{text}")]({link})'

Now we can make a Markdown badge:

def badge(text=None, link=None, **kwargs):
    return md_image(image_url=shields_url(**kwargs), text=text, link=link)

Anything print’ed will become part of the generated portions of the file. We can add a badge to the page with:

print(badge(
    logo="discord", logo_color="white", label_color="7289da",
    message="Discord", color="ffe97c",
    text="Python Discord", link="https://discord.gg/python",
))

There are other functions built on top of these to make Mastodon badges, Stack Overflow badges, a row of badges for a PyPI project, and so on.

Building the page ends up pulling data from 10 URLs, including a JSON summary of my blog for including blog posts. It’s satisfying to be able to have this update automatically instead of having to copy data around.

The result is a convenient mix of static and generated, and it was a fun exercise in light-touch automation.

Coverage branches instead of arcs

Monday 26 August 2024

As I mentioned in a few recent posts, I’ve been working on some significant work in coverage.py to take advantage of new capabilities in Python.

Mark Shannon has been improving the sys.monitoring API so that branch coverage can be done with low overhead. I want to take advantage of that in coverage.py, but I needed to do some refactoring work first. The tests were focused on mapping the complete set of code pathways (which I called arcs), but using low-overhead branch monitoring won’t provide those complete pathways. If the tests continued to focus on them, they would fail with sys.monitoring.

But the complete pathways aren’t actually needed. The useful information is where the branches are, and which branches were taken. That can be measured with sys.monitoring. So a first step was to refactor the tests to focus on branches instead of arcs. That took a while, but is now done.

Not needing all those arcs also meant I could simplify the AST-based parser that found the arcs, removing about 150 lines. I suspect there’s more that could be removed. Maybe it will happen over time. Also, the new code.co_branches() method might make it all obsolete over time.

If you read Coverage at a crossroads on this blog, I talked about using ideas from SlipCover like inserting fake lines with an import hook. Those exotic ideas were appealing in their way, but are no longer needed, and they would have brought a bunch of complexity. With the two new sys.monitoring events, we can get the branch information directly without advanced shenanigans.

There’s more work to do, including attending to incoming bug reports. If you’d like to help, or learn more about any of this, we have a #coverage-py channel in the Python Discord.

Cherish this time

Sunday 11 August 2024

I’m been talking lately with a friend with a four-month-old baby. He mentioned the well-worn dynamic that older parents tell new parents to cherish their early days with their newborn, that they will grow up faster than you expect.

I agree with the general sentiment, but I don’t think it’s a good thing to tell new parents. First, let’s be honest: there’s a lot of time with a four-month-old that is not easy. Many of the days (and nights!) are very difficult. If you tell someone “cherish this time” and they feel burdened, overworked, confused, tired, or stressed, then they can easily feel like they are doing it wrong. They can feel like they are failing to cherish the time, an important thing they aren’t doing right. New parents already have enough conflicting advice and nearly impossible things to do. Don’t add cherishing to their list.

Besides, “cherish” sounds like a needlepoint on sale at a greeting-card store. It’s saccharine and simplistic. I think what those older parents mean is, “I cherish my memories from that time,” which I can totally relate to.

I’d say this to a new parent:

This time is difficult, but there are also good things about it. It will get easier. There are things about now that you will miss when they are gone. I don’t know what you should do with that information, but you should at least know it.

I’ve also heard this said as, “the days are long and the years are short,” which is also very true.

One of the things I value now about my time as a new parent was the focus it brought. The night before our second son was born, Susan had bad pain from the baby pressing against her back. The best solution we had was for me to press a tennis ball into the small of her back. We were up most of the night dealing with that difficulty. It was a hard night, but we look back on it fondly as Tennis Ball Night. We were focused together on an immediate problem. It reduced our diameter of concerns and we supported each other to get through it.

The early days of the pandemic had a similar effect: many of our usual duties were dropped or deferred while we figured out what to do. It also was a parenting challenge, since our 30-year-old disabled son came back to live with us. We had to focus on what we needed and what he needed: activities and exercise, and how to stay safe. Other usual concerns could wait.

Whatever happens, parenting will be the “most” thing you ever do. It’s a life-long project, and you have little control over much of it. It’s impossible to do it all correctly, it’s impossible to avoid mistakes. There’s no rulebook to ensure everything goes well.

I’m a long way past the four-month-old stage. But parenting is still a big part of my life, and not an easy part. Not as big or as not-easy as when my sons were four months old, but it’s still something I do, and it requires care and attention.

I won’t say you should cherish parenting. Do your best, it will be fine, and enjoy it in your own way if you can.

Older:

Jul 31:

Pushing back on sys.monitoring

Jul 16:

Anthropic

Jun 21:

Coverage at a crossroads

Jun 16:

Math factoid of the day: 62

May 28:

One way to fix Python circular imports