A(n Open Source) GitHub Action for Ruff

Published in

Chartboost Engineering

5 min readMay 11, 2023

Software/Data Engineers get to work on cool stuff. This is the story of a GitHub Action for Ruff that we created and use, which is also shared with the world. To help this make sense, we need to share what is Ruff, why we use it, how we use it, and more.

The GitHub Action can be found at: https://github.com/chartboost/ruff-action.

Ruff

Ruff is a great emerging project for improving Python code. It is often compared with Flake-8 (and Flake-8’s plugin ecosystem), though it also takes on some other functionality from the Python ecosystem (like isort). The real selling point is that Ruff is blazingly FAST. It’s difficult to notice whether it’s running, which removes the friction and annoyances that other tools add due to delays while processing.

Plenty has been written and shared on Ruff, so rather than repeat, I’ll direct you elsewhere. For any new codebase, or for getting started with linting/testing, Ruff makes sense. For mature code bases already using Flake-8, there are still some tradeoffs to consider. Namely availability of desired features versus speed.

A decent overview/writeup is available here: https://blog.jerrycodes.com/ruff-the-python-linter/

Clean Repositories

In addition to helping us in our local development environments, Ruff’s primary importance is ensuring the sanctity of our code bases and laying the foundations for more and more (continuous) testing, and there are plenty of other sources to look at on the benefits of CI (and CD). At Chartboost, we believe one of the marks of a great engineer is testing. Therefore, we strive to run tests on every commit; doing so helps us move increasingly fast — and hopefully not break too many things! Ruff’s speed helps make that more doable.

GitHub Actions

Since much of our code is kept in GitHub, and GitHub Actions are quite lightweight, easy-to-use, and well integrated, we’ve been using more and more. Beyond Ruff for Python code, we have gotten moving with Black and PyRight, among others. Our engineers are now writing more tests and automation. The actions are in the repositories, part of the codebase, and to run without going to a different tool, which has helped us immensely.

Our First Use Case

Our data organization did not have any sort of automation around coding standards, so we needed to arrive at some to work toward continuous integration. We have lots of Python code, so we started with Black. Black is fantastic off the shelf, needs no configuration, and alleviates many concerns — minimizing the need for humans to argue about what the code looks like in particular, which makes group interactions more pleasant. The next obvious step was Ruff (alternatively, it could have been Flake-8), but since we were not replacing anything (and considering the speed of it plus adding some valuable functionality), it was deemed the wise move. After going through the exercise of determining our Ruff rules, we ensured Python type-checking. Future posts could contain the reasoning behind our decision to use Pyright.

Airflow/Data Composer

One tool used widely these days within Chartboost, as well as data engineering at large, is Apache Airflow. Since Chartboost is largely a GCP shop, and it is ideal for our engineers to focus on creating business value when there are reasonable tradeoffs to make, we rely on GCP Cloud Composer rather than self-hosting. Airflow orchestrates LOTS of our data processing jobs — especially BigQuery, DataFlow, and BigTable (and lots of others as well) — and gets data to where it needs to be for our internal and external customers.

Users of Airflow (ex: data engineers) generally create Python files that define workflows (DAGs), often each includes many tasks defining where and how to read, write, and transform data. We have lots of processes with a large codebase. This could pose a challenge for getting started with Ruff, since, as one might imagine, the codebase could be far from compliant with our desired ruleset. Fortunately, Ruff has a great way to address this by using — add-noqa, making it straightforward to always evaluate the entire codebase.

#NOQA

Ruff includes the ability to use a #noqa directive to tell the tool to ignore the thing it would otherwise complain about. While fantastic, it can be overused by engineers that don’t want to fix the underlying problem. However, it is fantastically helpful for getting started with the tool. Not only can these directives be added manually as warranted, the tool also can add these directives anywhere for you by using the argument — add-noqa, as in:

ruff check — add-noqa .

Using the command and argument, the codebase gets annotated to ignore known problems, and future runs of the following will pass (the GitHub Action also does this for you without passing special arguments):

ruff check .

The trick is to subsequently remove instances of #noqa rather than continuing to add. Docs for #noqa can be found here. Not only do we not add new #noqa directives, but we also remove all #noqa anytime a file it touched.

With the GitHub Action in place, we can merge only PRs that pass our ruff check. That means it’s a more straightforward process to ensure our codebase is up to our standards. We can also focus our attention on reviewing what the code is doing rather than what it looks like (driving more business value!).

The discussion would be incomplete without mentioning pre-commit. We rely on pre-commit to get quicker (local) feedback and/or fix the code before committing it into a branch and running the PR. Ruff already had pre-commit, so that was easy to adopt!

Suggestions/Takeaways

To sum it up:

Ruff is a good, fast tool that was core to cleaning up our Airflow/Composer codebases as well as others.
It is not too late to add new group coding standards, even on large existing codebases.
Data tooling often can be improved.
Increasing testing and automation can have compounding affects. After adding Ruff, our engineers encountered fewer deployment problems because they caught issues sooner. This led to additional development velocity — creating more sooner and with fewer errors. These benefits have led to continued investment in our CI/CD tooling and automation.