Skip to content

Dependency Graph Design

Elliotte Rusty Harold edited this page Feb 4, 2020 · 16 revisions

Work in Progress

Definitions

Artifact

An artifact is a resource in the Maven repository system that has Maven coordinates. An artifact has a name, coordinates, and a sequence of bytes. It is often instantiated as a file or a Web resource. However, the absolute and relative paths to the file, the created and modified times of the file, and other file system metadata are not part of the artifact. Furthermore, the same artifact can exist in many files and many file systems at the same time. Two copies of a resource with the same Maven coordinates in two different repositories are the same artifact. An artifact does not have dependencies.

The name is usually derived from the Maven coordinates and is fixed. The byte sequence is usually fixed after the artifact is first published. The notable exception to this is snapshot artifacts. Two artifacts are considered to be the same if they have the same coordinates, even if the name or the bytes have changed.

TBD: do we even need to consider names here?

Maven Coordinates

Maven coordinates are a colon separated string that uniquely identifies a Maven artifact. The exact syntax of the coordinates is defined by the Maven Project's POM reference. Maven coordinates contain up to five colon separated parts:

  • Group ID
  • Artifact ID
  • Version
  • Classifier
  • Packaging

Two strings that are character by character identical identify the same artifact. Furthermore the classifier and packaging have the default values of the empty string and "jar" respectively if they're omitted.

There is no guarantee that an artifact identified by syntactically correct Maven coordinates can be located or exists in any particular repositories.

Maven Projects

A Maven project is identified by a group ID and an artifact ID. A Maven artifact belongs to the project that has same group ID and artifact ID that the artifact has.

Most projects have more than one version and many projects have more than one artifact in each version. Each version of a project is defined in a pom.xml file. It has a single group ID, artifact ID, and version. It contains zero or more dependencies of the project, each of which is identified by a dependency element in the dependencies section of the pom.xml file. It also specifies one or more artifacts. Most projects contain a single artifact. However a project can use different classifiers and packaging types to produce multiple artifacts.

Projects also have various other metadata such as organization, copyright, issue tracker URL, developers, and more that we do not need to consider or model.

Projects may contain subprojects called modules that are built at the same time. However this is a compile time only distinction. For our purposes all projects are co-equal.

Dependency

A dependency belongs to a specific version of a project. It is defined by a dependency element in that version's pom.xml. A dependency contains:

  • The group ID of a Maven project
  • The artifact ID of a Maven project
  • A version range
  • A scope: compile (the default), runtime, provided, test, or system
  • A type, a string
  • An optional flag, boolean, defaults to false

Our current code only handles single element version ranges though this could be expanded in the future.

There is also a systemPath element that is rarely used and which we ignore.

The dependencies of a project are ordered, and the order of dependencies can have significant effects on both the compile time and runtime classpaths.

effective POM????

pom.xml

Dependency Graph

A dependency graph is built by starting with a specific Maven artifact identified by Maven coordinates. This is called the root node.

This is a directed graph. It does not contain self links unless a pom.xml is found that depends on itself. This would be an error in the pom but it can happen.

The edges of each node (the dependencies) are ordered. That is the edges from a node are a list, not a set. Technically this means it's no a graph.

Parallel links are merged.

The full graph incl

profiles????

Dependency Tree

There is no such thing. A Maven dependency graph is not a tree. It can and more often than not does contain cycles.

Classpath

An ordered list of jar files, zip files, and directories, each of which contains Java class files. Maven-repository-based Build tools such as Maven, Gradle, and Ivy use different algorithms to convert a dependency graph into a classpath. That is, there is not a unique classpath for each dependency graph. javac and the Java virtual machine only read the classpath and do not consider the dependency graph.

Library Design (not yet implemented)

This is how we model the above concepts.

Artifacts are represented by the Aether Artifact class.

Dependencies are represented by the Dependency class.

Dependency graphs are represented by the DependencyGraph class.

Instead of producing a pure classpath, we produce an annotated classpath. This is an ordered list of jar files, the same as Maven or Gradle would produce. However in our data structure each node in the list is annotated with its Maven coordinates and with a pointer to the corresponding Dependency node in the graph.