maven/maven-project-spec.tex

349 lines
22 KiB
TeX

\documentclass[12pt]{amsart}
\usepackage{geometry} % see geometry.pdf on how to lay out the page. There's lots.
\geometry{a4paper} % or letter or a5paper or ... etc
% \geometry{landscape} % rotated page geometry
% See the ``Article customise'' template for come common customisations
\title{Maven Project Builder Specification}
\author{Shane Isbell}
\begin{document}
\maketitle
\tableofcontents
\section{Model Transformations}
\subsection{Canonical Data Format}
Maven supports a canonical data format for the pom that includes 465 model properties (we refer to each ordered pair of uri/values as a model property).
\begin{verbatim}
http://apache.org/maven/project/modelVersion
http://apache.org/maven/project/groupId
http://apache.org/maven/project/artifactId
\end{verbatim}
So a valid set would contain ordered pairs:
\begin{math}
\mathcal{A}= \{\langle "http://apache.org/maven/project/groupId", "org.apache.maven"\rangle,
\langle "http://apache.org/maven/project/artifactId", "mavencore"\}\rangle \ldots\}.
\end{math}
Technically \begin{math}\mathcal{A}\end{math} is also ordered.
Any one is free to create a transformer from any another format (yaml, .NET projects files, etc) to this canonical data format, giving them all the benefits of project inheritance and interpolation that Maven uses in building a project.
\subsubsection{Collections}
A model property may be specified as a collection, which allows specialized join rules for adding model properties. Collections of collections are allowed.
\begin{verbatim}
http://apache.org/maven/project/build/plugins#collection
http://apache.org/maven/project/build/plugins#collection/plugin/executions#collection
http://apache.org/maven/project/profiles#collection
\end{verbatim}
There are 31 collections within the canonical data format.
\subsubsection{Sets}
A model property may be specified as a set, which means that model properties are not duplicated. Generally sets are only used on configuration properties of the plugins.
\begin{verbatim}
http://...pluginManagement/plugins#collection/plugin/configuration#set
http://...plugins#collection/plugin/configuration#set
\end{verbatim}
\subsubsection{Singletons}
Any model property not defined as a collection or set is a singleton, This means that only one entry containing the model property's URI is allowed in the transformed list.
\section{General Inheritance Rules}
General inheritance rules are those rules applied to a list of model properties, independent of the domain context. The framework delegates domain specific inheritance rules to ModelTransformers provided to it by the invoking application. These will be covered in the next section, under \emph{Maven Project Inheritance Rules}.
\subsection{Constructing}
Basic construction rules are as follows
\begin{enumerate}
\item Let there be a collection of domain models (poms) denoted by set \begin{math}\mathcal{D}_{i}\end{math}, where for some \begin{math} n \in \mathbb{N} \end{math} the collection is ordered from most specific to least specific \begin{math}\mathcal{C} = \{\mathcal{D}_{0}, \mathcal{D}_{1},...,\mathcal{D}_{n} \} \end{math}. \begin{math}\mathcal{D}_{n}\end{math} is the SuperPom and must be contained within \begin{math}\mathcal{C}\end{math}, even if it is the only element.
\item Let \begin{math}p_j\end{math} be an ordered pair (or model property). In the case of the pom, \begin{math}j=nodes + properties\end{math} of the pom. Define t as a function operating on elements of \begin{math}\mathcal{C}\end{math} that transforms each element to a set of model properties. \begin{math}\mathcal{D}'_{i}=t(\mathcal{D}_{i})=\{p_0,p_1,...p_m\}\end{math}. We end up with a transformed collection of domain models: \begin{math}\mathcal{C'} = \{\mathcal{D'}_{0}, \mathcal{D'}_{1},...,\mathcal{D'}_{n} \} \end{math}.
\item Next domain specific rules are applied (See section 3). Let tr be a domain transform rule: \begin{math}
\forall_j \forall_i \mathcal{A}_{i,j} \subseteq \mathcal{D'}_{i} \end{math} such that \begin{math} \mathcal{A'}_{i,j} = \{tr(p_0), tr(p_1),...,tr(p_n))\}. \end{math} tr may change either the model property URI or value or may transform the property to a null value (remove it) or it could add additional model properties not from the original set. We are left with \begin{math}
\mathcal{C''} = \forall_j \forall_i \bigcup_{i,j}(\mathcal{D'}_{i} \cup \mathcal{A'}_{i,j} - (\mathcal{D'}_{i} \cap\mathcal{A}_{i,j})) \end{math}. Thus \begin{math}\mathcal{C''} \end{math} is just a set of transformed and untransformed model properties from all of the domain models \begin{math}
\mathcal{D}_{i} \end{math}. These model properties are still ordered from most specialized model to least.
\item Model properties are now sorted (see section 2.2). Collections and sets are left in reverse order.
\item Model container rules are applied for collections and sets. The general sorting in the previous step doesn't know how to handle collections and sets and needs to delegate back to the domain specific model containers (Sections 3.4 and 3.5)
\item Model properties are sorted (see section 2.2) again. This is to maintain the original order of collections.
\item Interpolates the model properties (Section 5)
\item Applies dependency management rules
\item Applies plugin management rules
\end{enumerate}
\subsection{Sorting}
Let \begin{math}\mathcal{C''}\end{math} be the original set of model properties and let \begin{math}\mathcal{C'''}\end{math} be the new set, which is initially empty. Now iterate over each element (model property) of \begin{math}\mathcal{C''}\end{math} placing the model property into set \begin{math}\mathcal{C'''}\end{math} according to the following rules:
\begin{enumerate}
\item Let \begin{math}u_{i}\end{math} be the uri from model property \begin{math}\langle uri, value\rangle_{i} \end{math} If \begin{math}u_{i}\end{math} = baseUri, then it is placed first in the list. In the case of Maven, http://apache.org/maven/project is the baseUri and defines the top node name.
\item If \begin{math}u_{i}\end{math} is not within any of the model properties contained within \begin{math}\mathcal{C'''}\end{math} then place the model property into \begin{math}\mathcal{C'''}\end{math}. This rule only allows one singleton into the set: \emph{http://.../project/groupId} and since \begin{math}\mathcal{C''}\end{math} is sorted in order of most specialized to least specialized, only the most specialized pom values will be maintained.
\item If \begin{math}u_{i}\end{math} contains a value of \#collection or \#set but does not end with \#collection or \#set then then place the model property into \begin{math}\mathcal{C'''}\end{math}, at the position of its first (and only) parent. For example,\emph{ http://.../project/build/plugins\#collection }would have been added in the previous step (because it was not contained in \begin{math}\mathcal{C'''}\end{math}) but this step would exclude any additional model properties containing \emph{http://.../project/build/plugins\#collection}. However, all model properties containing uri \emph{http://apache.org/maven/project/build/plugins\#collection/plugin} would be added just below its collection. Only one node of a collection type is maintained but multiple children within that collection are allowed.
\end{enumerate}
\subsection{Mixins and Multiple Inheritance}
Currently, Maven 3.0 supports linearlized inheritance, making mixins and multiple inheritance easy. Support for multiple inheritance would require an additional to the pom, within the parents section.
\begin{verbatim}
<parents>
<parent>
<groupId>org.apache.maven.shared</groupId>
<artifactId>maven-shared-components</artifactId>
<version>9</version>
</parent>
<parent>
<artifactId>maven</artifactId>
<groupId>org.apache.maven</groupId>
<version>3.0-SNAPSHOT</version>
</parent>
</parents>
\end{verbatim}
In the case above, the child pom's model properties would be first in the set, followed by the model properties of \emph{maven-shared-components}; then \emph{maven} project's model properties and finally by the SuperPom's model properties. So from the framework's perspective there is little difference between multiple inheritance and single inheritance.
Mixins would function the same as multiple/single inheritance:
\begin{verbatim}
<mixins>
<mixin>
<groupId>org.apache.maven</groupId>
<artifactId>dependency-mixin</artifactId>
<version>1</version>
</mixin>
<mixin>
<groupId>org.apache.maven</groupId>
<artifactId>repository-mixin</artifactId>
<version>2</version>
</mixin>
</mixins>
\end{verbatim}
The only difference between a parent project and a mixin is that the mixin is abstract (not a complete model).
\section{Maven Project Inheritance Rules}
These rules outlined in this section are provided in the PomClassicTransformer class. The shared-model framework will delegate to this transformer for the processing of the Maven specific domain model rules.
\subsection{Inheriting Version and Group Ids}
If \emph{project.version} is not specified within the child pom, the child pom will use the \emph{project.parent.version} as its own version. Similarly, if \emph{project.groupId} is not within the child pom, the child pom will use the \emph{project.parent.groupId} as its own \emph{project.groupId}.
\subsection{Properties Excluded From Inheritance}
A child project does not inherit the following properties from its specified parent project. All other properties are inherited, unless otherwise noted below.
\begin{enumerate}
\item project.parent
\item project.modules
\item project.name
\item project.packaging
\item project.profiles
\item project.version
\item project.groupId
\item project.build.resources
\item project.build.testResoures
\item project.pluginRepositories
\end{enumerate}
A parent project can set an inherited property within the \emph{project.build.plugins.plugin} or \emph{project.build.plugins.plugin.executions.execution} section of the pom to mark the member as private, thus preventing inheritance.
\begin{verbatim}
<plugin>
<groupId>org.apache.maven</groupId>
<artifactId>sample</artifactId>
<version>1.0</version>
<inherited>false</inherited>
</plugin>
\end{verbatim}
\begin{verbatim}
<plugin>
<groupId>org.apache.maven</groupId>
<artifactId>sample</artifactId>
<version>1.0</version>
<executions>
<execution>
<inherited>false</inherited>
</execution>
</executions>
</plugin>
\end{verbatim}
\subsection{Super Pom Rules}
The super pom is implicitly the parent of every project and functions as a normal parent project except that it allows the child project to inherit the following properties, which are normally excluded. If the child project defines the property, it overrides all elements of the respective super pom property. There is no join of these three superPom-child collections.
\begin{itemize}
\item project.build.resources
\item project.build.testResoures
\item project.pluginRepositories
\end{itemize}
\subsection{Artifact Inheritance (Model Container)}
\subsubsection{Defined Nodes}
Within the project there are a number of nodes which contain artifactId, groupId and version. These nodes may be inherited or even joined.
\begin{enumerate}
\item project.dependencies.dependency
\item project.build.plugins.plugin
\item project.build.plugins.plugin.dependencies.dependency
\item project.build.plugins.plugin.dependencies.dependency.exclusions.exclusion
\item project.dependencyManagement.dependencies.dependency
\item project.build.pluginManagement.plugins.plugin
\item project.build.pluginManagement.plugins.plugin.dependencies.dependency
\end{enumerate}
\subsubsection{Rules}
Let the parent project be \begin{math}\mathcal{A}\end{math} and the child project be \begin{math}\mathcal{B}\end{math} . Let both \begin{math}\alpha_i \subset \mathcal{A}\end{math} and \begin{math}\beta_i \subset \mathcal{B}\end{math} be one of the elements listed above. For example, \begin{math}\alpha_1\end{math} would contain all the elements of a project dependency within the parent project.
Both \begin{math}\alpha_i \subset \mathcal{A}\end{math} and \begin{math}\beta_i \subset \mathcal{A}\end{math}, contain at least the elements groupId and artifactId, with an implied version value of null, if version is not specified within the pom. Also note that only (1), (3), (5) and (7) have an element of type, otherwise the type value is considered null. More precisely we have:
\begin{math} \forall_i \forall_j \alpha_i = \{\langle groupId, value_j \rangle_i, \langle artifactId, value_{j+1}\rangle_i, \langle version, value_{j+2}\rangle_i, \ldots\}.\end{math}
The inheritance rules are:
\begin{enumerate}
\item
\begin{math}
groupId(value)^{\alpha_i} \neq groupId(value)^{\beta_i} \vee artifactId(value)^{\alpha_i} \neq artifactId(value)^{\beta_i} \Rightarrow \mathcal{B}_{new} = \mathcal{B} \cup \alpha_i
\end{math}
\item
\begin{math}
groupId(value)^{\alpha_i} = groupId(value)^{\beta_i} \wedge artifactId(value)^{\alpha_i} = artifactId(value)^{\beta_i} \wedge version(value)^{\alpha_i} \neq version(value)^{\beta_i} \wedge type(value)^{\alpha_i} = type(value)^{\beta_i} \Rightarrow \mathcal{B}_{new} = \mathcal{B} - \alpha_i
\end{math}
\item
\begin{math}
groupId(value)^{\alpha_i} = groupId(value)^{\beta_i} \wedge artifactId(value)^{\alpha_i} = artifactId(value)^{\beta_i} \wedge version(value)^{\alpha_i} \neq version(value)^{\beta_i} \wedge type(value)^{\alpha_i} \neq type(value)^{\beta_i} \Rightarrow \mathcal{B}_{new} = \mathcal{B} \cup \alpha_i
\end{math}
\item
\begin{math}
groupId(value)^{\alpha_i} = groupId(value)^{\beta_i} \wedge artifactId(value)^{\alpha_i} = artifactId(value)^{\beta_i} \wedge version(value)^{\alpha_i} = version(value)^{\beta_i} \wedge type(value)^{\alpha_i} = type(value)^{\beta_i} \Rightarrow \mathcal{B}_{new} = \mathcal{B} \cup \alpha_i - (\alpha_i \cap \beta_i)
\end{math}
\item
\begin{math}
groupId(value)^{\alpha_i} = groupId(value)^{\beta_i} \wedge artifactId(value)^{\alpha_i} = artifactId(value)^{\beta_i} \wedge version(value)^{\alpha_i} = version(value)^{\beta_i} \wedge type(value)^{\alpha_i} \neq type(value)^{\beta_i} \Rightarrow \mathcal{B}_{new} = \mathcal{B} \cup \alpha_i
\end{math}
\end{enumerate}
Or more simply:
\begin{enumerate}
\item if either groupId or artifactId is different, then inherit the node.
\item if groupId, artifactId and type match but the version doesn't, then don't inherit the node.
\item if groupId and artifactId match but the version and types don't, then inherit the node.
\item if groupId, artifactId, version and type match, then inherit and join the node.
\item if groupId, artifactId, version match but type doesn't, then inherit the node
\end{enumerate}
\subsection{Id Inheritance (Model Container)}
\subsubsection{Defined Nodes}
Within the project there are a number of nodes which contain id. These nodes may be inherited or even joined.
\begin{enumerate}
\item project.pluginRepositories.pluginRepository
\item project.repositories.repository
\item project.reporting.plugins.plugin.reportSets.reportSet
\item project.build.plugins.plugin.executions.execution
\end{enumerate}
\subsubsection{Rules}
If an id exists in both the parent and child pom and the ids are equal, then join the nodes, otherwise inherit the node.
\subsection{Plugin Configuration Inheritance}
Plugin nodes are treated as a set. If a child pom contains the same element as a parent pom, then the parent pom element will not be inherited/joined unless the child element contains a property combine.children="append". In this case, it will treat the element as a collection.
\begin{verbatim}
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<testExcludes combine.children="append">
<testExclude implementation="java.lang.String">
**/PersonThreeTest.java
</testExclude>
</testExcludes>
</configuration>
</plugin>
\end{verbatim}
If the parent pom contains an element that the child pom does not have, the element will be inherited.
\section{Management Rules}
\subsection{Dependency/Plugin Management}
Dependency and plugin management are treated the same, so we will only cover dependency management. Our initial set has already been processed for inheritance and interpolated by the time these rules are applied.
Let \begin{math}\mathcal{A}\end{math} be the set of \emph{project.dependencies.dependency} model containers (model containers are themselves sets of model properties).
Let \begin{math}\mathcal{B}\end{math} be the set of
\emph{project.dependencyManagement.dependencies.dependency }model containers. \begin{math}\mathcal{B}\end{math} is processed such that each dependencyManagement reference within its uris is removed. Thus the uris exactly match those contained within \begin{math}\mathcal{A}\end{math}. Call this transformed set \begin{math} \mathcal{B'}\end{math}.
Now we can apply the same artifact container rules between each \begin{math} \mathcal{B'}_{i} \end{math} and \begin{math} \mathcal{A}_{j}\end{math}. as those defined in section 3.4.
\section{Interpolation Rules}
\subsection{Type of Properties}
There are four types of properties in the following order of precedence: maven properties, system (user) properties, project properties, environment (execution) variables.
\subsubsection{Maven Properties}
There are two maven specific properties that can be used: \$\{basedir\} (or \$\{pom.basedir\} or \$\{project.basedir\}) and \$\{build.timestamp\}. \emph{basedir} denotes the project directory of the executing pom, while\emph{ build.timestamp} denotes the time that the build started.
\begin{verbatim}
<build>
<directory>${project.basedir}/target</directory>
<sourceDirectory>${project.basedir}/src/main/java</sourceDirectory>
</build>
\end{verbatim}
\subsubsection{System Properties}
These properties are defined on the command line through -D option. For instance, \emph{-DjunitVersion=3.8}. These property values take precedence over project and environment properties and will override them.
\subsubsection{Project Properties} These properties are derived directly from the pom itself: \$\{project.version\}, \$\{project.artifactId\}... So in the code snippet below, \emph{project.build.finalname} will be resolved to\emph{ maven-3.0-SNAPSHOT}.
Note \emph{pom} is an alias for\emph{ project}, so you can also reference the properties through \$\{pom.version\}, \$\{pom.artifactId\}... , although \emph{project} is preferred.
These types of properties also include special rules for the \emph{project.properties }section of the pom. The elements under the properties section can directly be referenced, by name, from within other elements of the pom. For example, the \emph{project.properties }section defines \emph{junitVersion}, allowing the
\emph{ project.build.dependencies.dependency.version} to reference the value by inserting \$\{junitVersion\}
\begin{verbatim}
<project>
<groupId>org.apache.maven</groupId>
<artifactId>maven</artifactId>
<version>3.0-SNAPSHOT</version>
<build>
<finalName>${project.artifactId}-${project.version}</finalName>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junitVersion}</version>
<scope>test</scope>
</dependency>
</dependencies>
<properties>
<junitVersion>3.8.1</junitVersion>
</properties>
</project>
\end{verbatim}
Keep in mind that if you set \emph{-DjunitVersion=3.8} on the command line, then this value would be used for interpolation, not the pom specified one.
\subsubsection{Environment Properties}
The properties are taken from the environment and hold the lowest level of precedence.
\subsection{Processing Rules}
The pom XML is flattened to a list of model properties (this is part of the inheritance processing). The interpolator property list will be referred to as interpolator properties.
\subsubsection{Preprocessing}
The initial interpolator property list is constructed and sorted in order of maven properties, system properties, project properties and environment properties. Being a list, it contains duplicate property keys that may reference different values. A common example occurs when overriding a pom property through the command line -D. So all lower duplicate key values are eliminated, resulting in a set of interpolator properties, where order does not matter.
The maven property \$\{project.basedir\} is only added to the initial list if the pom being interpolated is within the build (not a dependency within the repository).
\subsubsection{Pass 1 -withhold using build directories as interpolator properties} In this pass, the list is preprocessed into a set, but excludes any of the build directories from the interpolator list. In other words, the build directories can be interpolated but they can't be used to interpolate other properties. Interpolating is simply the iteration of all interpolator properties over model properties.
\subsubsection{Pass 2 - set absolute paths on build directories} At this point, the build directories are completely interpolated but they may or may not contain absolute paths. So each build model property is checked and if it contains a relative path, the absolute path is set based on the \$\{project.basedir\} location.
\subsubsection{Pass 3- use build directories as interpolator properties} In this pass, all model properties that contain a build directory reference are interpolated with the build directory interpolator properties, which were withheld from pass 1. Now all directory paths within the pom contain absolute references.
\subsection{Interpolation and Profiles}
Active profiles are applied prior to interpolation so that any \emph{project.profiles.profile.properties} defined within an active profile can be used as an interpolation property [Still to be implemented]
\end{document}