Understanding dependency resolution

This chapter covers the way dependency resolution works inside Gradle. After covering how you can declare repositories and dependencies, it makes sense to explain how these declarations come together during dependency resolution.

Dependency resolution is a process that consists of two phases, which are repeated until the dependency graph is complete:

When a new dependency is added to the graph, perform conflict resolution to determine which version should be added to the graph.
When a specific dependency, that is a module with a version, is identified as part of the graph, retrieve its metadata so that its dependencies can be added in turn.

The following section will describe what Gradle identifies as conflict and how it can resolve them automatically. After that, the retrieval of metadata will be covered, explaining how Gradle can follow dependency links.

How Gradle handles conflicts?

When doing dependency resolution, Gradle handles two types of conflicts:

Version conflicts: That is when two or more dependencies require a given dependency but with different versions.
Implementation conflicts: That is when the dependency graph contains module that provide the same implementation, or capability in Gradle terminology.

The following sections will explain in detail how Gradle attempts to resolve these conflicts.

The dependency resolution process is highly customizable to meet enterprise requirements. For more information, see the chapter on Controlling transitive dependencies.

Version conflict resolution

A version conflict occurs when two components:

Depend on the same module, let’s say com.google.guava:guava
But on different versions, let’s say 20.0 and 25.1-android
- Our project itself depends on com.google.guava:guava:20.0
- Our project also depends on com.google.inject:guice:4.2.2 which itself depends on com.google.guava:guava:25.1-android

Resolution strategy

Given the conflict above, there exist multiple ways to handle it, either by selecting a version or failing the resolution. Different tools that handle dependency management have different ways of handling these type of conflicts.

Apache Maven uses a nearest first strategy.

Maven will take the shortest path to a dependency and use that version. In case there are multiple paths of the same length, the first one wins.

This means that in the example above, the version of guava will be 20.0 because the direct dependency is closer than the guice dependency.

The main drawback of this method is that it is ordering dependent. Keeping order in a very large graph can be a challenge. For example, what if the new version of a dependency ends up having its own dependency declarations in a different order than the previous version?

With Maven, this could have unwanted impact on resolved versions.

Apache Ivy is a very flexible dependency management tooling. It offers the possibility to customize dependency resolution, including conflict resolution.

This flexibility comes with the price of making it hard to reason about.

Gradle will consider all requested versions, wherever they appear in the dependency graph. Out of these versions, it will select the highest one.

As you have seen, Gradle supports a concept of rich version declaration, so what is the highest version depends on the way versions were declared:

If no ranges are involved, then the highest version that is not rejected will be selected.
- If a strictly is lower than that version, selection will fail.
If ranges are involved:
- If there is a non range version that falls within the specified ranges or is higher than their upper bound, it will be selected.
- If there are only ranges, the highest existing version of the range with the highest upper bound will be selected.
- If a strictly is lower than that version, selection will fail.

Note that in the case where ranges come into play, Gradle requires metadata to determine which versions do exist for the considered range. This causes an intermediate lookup for metadata, as described in How Gradle retrieves dependency metadata?.

Implementation conflict resolution

Gradle uses variants and capabilities to identify what a module provides.

This is a unique feature that deserves its own chapter to understand what it means and enables.

A conflict occurs the moment two modules either:

Attempt to select incompatible variants,
Declare the same capability

Learn more about handling these type of conflicts in Selecting between candidates.

How Gradle retrieves dependency metadata?

Gradle requires metadata about the modules included in your dependency graph. That information is required for two main points:

Determine the existing versions of a module when the declared version is dynamic.
Determine the dependencies of the module for a given version.

Discovering versions

Faced with a dynamic version, Gradle needs to identify the concrete matching versions:

Each repository is inspected, Gradle does not stop on the first one returning some metadata. When multiple are defined, they are inspected in the order they were added.
For Maven repositories, Gradle will use the maven-metadata.xml which provides information about the available versions.
For Ivy repositories, Gradle will resort to directory listing.

This process results in a list of candidate versions that are then matched to the dynamic version expressed. At this point, version conflict resolution is resumed.

Note that Gradle caches the version information, more information can be found in the section Controlling dynamic version caching.

Obtaining module metadata

Given a required dependency, with a version, Gradle attempts to resolve the dependency by searching for the module the dependency points at.

Each repository is inspected in order.
- Depending on the type of repository, Gradle looks for metadata files describing the module (.module, .pom or ivy.xml file) or directly for artifact files.
- Modules that have a module metadata file (.module, .pom or ivy.xml file) are preferred over modules that have an artifact file only.
- Once a repository returns a metadata result, following repositories are ignored.
Metadata for the dependency is retrieved and parsed, if found
- If the module metadata is a POM file that has a parent POM declared, Gradle will recursively attempt to resolve each of the parent modules for the POM.
All of the artifacts for the module are then requested from the same repository that was chosen in the process above.
All of that data, including the repository source and potential misses are then stored in the The Dependency Cache.

The last point above is what can make the integration with Maven Local problematic. As it is a cache for Maven, it will sometimes miss some artifacts of a given module. If Gradle is sourcing such a module from Maven Local, it will consider the missing artifacts to be missing altogether.

Repository blacklisting

When Gradle fails to retrieve information from a repository, it will blacklist it for the duration of the build and fail all dependency resolution.

That last point is important for reproducibility. If the build was allowed to continue, ignoring the faulty repository, subsequent builds could have a different result once the repository is back online.

HTTP Retries

Gradle will make several attempts to connect to a given repository before blacklisting it. If connection fails, Gradle will retry on certain errors which have a chance of being transient, increasing the amount of time waiting between each retry.

Blacklisting happens when the repository cannot be contacted, either because of a permanent error or because the maximum retries was reached.

The Dependency Cache

Gradle contains a highly sophisticated dependency caching mechanism, which seeks to minimise the number of remote requests made in dependency resolution, while striving to guarantee that the results of dependency resolution are correct and reproducible.

The Gradle dependency cache consists of two storage types located under GRADLE_USER_HOME/caches:

A file-based store of downloaded artifacts, including binaries like jars as well as raw downloaded meta-data like POM files and Ivy files. The storage path for a downloaded artifact includes the SHA1 checksum, meaning that 2 artifacts with the same name but different content can easily be cached.
A binary store of resolved module metadata, including the results of resolving dynamic versions, module descriptors, and artifacts.

The Gradle cache does not allow the local cache to hide problems and create other mysterious and difficult to debug behavior. Gradle enables reliable and reproducible enterprise builds with a focus on bandwidth and storage efficiency.

Separate metadata cache

Gradle keeps a record of various aspects of dependency resolution in binary format in the metadata cache. The information stored in the metadata cache includes:

The result of resolving a dynamic version (e.g. 1.+) to a concrete version (e.g. 1.2).
The resolved module metadata for a particular module, including module artifacts and module dependencies.
The resolved artifact metadata for a particular artifact, including a pointer to the downloaded artifact file.
The absence of a particular module or artifact in a particular repository, eliminating repeated attempts to access a resource that does not exist.

Every entry in the metadata cache includes a record of the repository that provided the information as well as a timestamp that can be used for cache expiry.

Repository caches are independent

As described above, for each repository there is a separate metadata cache. A repository is identified by its URL, type and layout. If a module or artifact has not been previously resolved from this repository, Gradle will attempt to resolve the module against the repository. This will always involve a remote lookup on the repository, however in many cases no download will be required.

Dependency resolution will fail if the required artifacts are not available in any repository specified by the build, even if the local cache has a copy of this artifact which was retrieved from a different repository. Repository independence allows builds to be isolated from each other in an advanced way that no build tool has done before. This is a key feature to create builds that are reliable and reproducible in any environment.

Artifact reuse

Before downloading an artifact, Gradle tries to determine the checksum of the required artifact by downloading the sha file associated with that artifact. If the checksum can be retrieved, an artifact is not downloaded if an artifact already exists with the same id and checksum. If the checksum cannot be retrieved from the remote server, the artifact will be downloaded (and ignored if it matches an existing artifact).

As well as considering artifacts downloaded from a different repository, Gradle will also attempt to reuse artifacts found in the local Maven Repository. If a candidate artifact has been downloaded by Maven, Gradle will use this artifact if it can be verified to match the checksum declared by the remote server.

Checksum based storage

It is possible for different repositories to provide a different binary artifact in response to the same artifact identifier. This is often the case with Maven SNAPSHOT artifacts, but can also be true for any artifact which is republished without changing its identifier. By caching artifacts based on their SHA1 checksum, Gradle is able to maintain multiple versions of the same artifact. This means that when resolving against one repository Gradle will never overwrite the cached artifact file from a different repository. This is done without requiring a separate artifact file store per repository.

Cache Locking

The Gradle dependency cache uses file-based locking to ensure that it can safely be used by multiple Gradle processes concurrently. The lock is held whenever the binary metadata store is being read or written, but is released for slow operations such as downloading remote artifacts.

This concurrent access is only supported if the different Gradle processes can communicate together. This is usually not the case for containerized builds.

Cache Cleanup

Gradle keeps track of which artifacts in the dependency cache are accessed. Using this information, the cache is periodically (at most every 24 hours) scanned for artifacts that have not been used for more than 30 days. Obsolete artifacts are then deleted to ensure the cache does not grow indefinitely.

Copy and reuse the cache

As of Gradle 6.1, the dependency cache, both the file and metadata parts, are fully encoded using relative path. This means that it is perfectly possible to copy a cache around and see Gradle benefit from it.

The path that can be copied is $GRADLE_HOME/caches/modules-<version>. The only constraint is placing it using the same structure at the destination, where the value of GRADLE_HOME can be different.

Note that creating the cache and consuming it should be done using compatible Gradle version, as shown in the table below. Otherwise the build might still require some interactions with remote repositories to complete missing information, which might be available in a different version. If multiple incompatible Gradle versions are in play, all should be used when seeding the cache.

Table 1. Dependency cache compatibility
Module cache version	File cache version	Metadata cache version	Gradle version(s)
`modules-2`	`files-2.1`	`metadata-2.90`	Gradle 6.1

Accessing the resolution result programmatically

While most users only need access to a "flat list" of files, there are cases where it can be interesting to reason on a graph and get more information about the resolution result:

for tooling integration, where a model of the dependency graph is required
for tasks generating a visual representation (image, .dot file, …) of a dependency graph
for tasks providing diagnostics (similar to the dependencyInsight task)
for tasks which need to perform dependency resolution at execution time (e.g, download files on demand)

For those use cases, Gradle provides lazy, thread-safe APIs, accessible by calling the Configuration.getIncoming() method:

the ResolutionResult API gives access to a resolved dependency graph, whether the resolution was successful or not.
the artifacts API provides a simple access to the resolved artifacts, untransformed, but with lazy download of artifacts (they would only be downloaded on demand).
the artifact view API provides an advanced, filtered view of artifacts, possibly transformed.