mirror of https://github.com/apache/archiva.git
359 lines
18 KiB
Plaintext
359 lines
18 KiB
Plaintext
----
|
|
Metadata Control Model
|
|
----
|
|
|
|
Metadata Content Model
|
|
|
|
The metadata repository stores all known information about a repository in a common format that other plugins can
|
|
understand, and that eventually external applications will be able to query.
|
|
|
|
The content model is designed such that it models the most likely structure of the data both for storage and
|
|
retrieval. For example, audit logs are stored by the time they occur, not grouped under an action.
|
|
|
|
* Content Model Structure
|
|
|
|
The following is a sample tree that represents the content model:
|
|
|
|
----
|
|
.
|
|
`-- repositories/
|
|
`-- central/
|
|
|-- config/
|
|
| |-- name=
|
|
| |-- storageUrl=
|
|
| `-- uri=
|
|
|-- content/
|
|
| `-- org/
|
|
| `-- apache/
|
|
| |-- archiva/
|
|
| | `-- platform/
|
|
| | |-- scanner/
|
|
| | | |-- 1.0-SNAPSHOT/
|
|
| | | | |-- scanner-1.0-20091120.012345-1.pom/
|
|
| | | | | |-- asc=
|
|
| | | | | |-- created=
|
|
| | | | | |-- fileCreated=
|
|
| | | | | |-- fileLastModified=
|
|
| | | | | |-- maven:buildNumber=
|
|
| | | | | |-- maven:classifier
|
|
| | | | | |-- maven:timestamp=
|
|
| | | | | |-- maven:type=
|
|
| | | | | |-- md5=
|
|
| | | | | |-- sha1=
|
|
| | | | | |-- size=
|
|
| | | | | |-- updated=
|
|
| | | | | `-- version=
|
|
| | | | |-- ciManagement.system=
|
|
| | | | |-- ciManagement.url=
|
|
| | | | |-- created=
|
|
| | | | |-- dependencies.0.artifactId=
|
|
| | | | |-- dependencies.0.classifier=
|
|
| | | | |-- dependencies.0.groupId=
|
|
| | | | |-- dependencies.0.optional=
|
|
| | | | |-- dependencies.0.scope=
|
|
| | | | |-- dependencies.0.systemPath=
|
|
| | | | |-- dependencies.0.type=
|
|
| | | | |-- dependencies.0.version=
|
|
| | | | |-- description=
|
|
| | | | |-- individuals.0.email=
|
|
| | | | |-- individuals.0.name=
|
|
| | | | |-- individuals.0.properties.scmId=
|
|
| | | | |-- individuals.0.roles.0=
|
|
| | | | |-- individuals.0.timezone=
|
|
| | | | |-- issueManagement.system=
|
|
| | | | |-- issueManagement.url=
|
|
| | | | |-- licenses.0.name=
|
|
| | | | |-- licenses.0.url=
|
|
| | | | |-- mailingLists.0.mainArchiveUrl=
|
|
| | | | |-- mailingLists.0.name=
|
|
| | | | |-- mailingLists.0.otherArchives.0=
|
|
| | | | |-- mailingLists.0.postAddress=
|
|
| | | | |-- mailingLists.0.subscribeAddress=
|
|
| | | | |-- mailingLists.0.unsubscribeAddress=
|
|
| | | | |-- maven:buildExtensions.0.artifactId=
|
|
| | | | |-- maven:buildExtensions.0.groupId=
|
|
| | | | |-- maven:buildExtensions.0.version=
|
|
| | | | |-- maven:packaging=
|
|
| | | | |-- maven:parent.artifactId=
|
|
| | | | |-- maven:parent.groupId=
|
|
| | | | |-- maven:parent.version=
|
|
| | | | |-- maven:plugins.0.artifactId=
|
|
| | | | |-- maven:plugins.0.groupId=
|
|
| | | | |-- maven:plugins.0.reporting=
|
|
| | | | |-- maven:plugins.0.version=
|
|
| | | | |-- maven:properties.mavenVersion=
|
|
| | | | |-- maven:repositories.0.id=
|
|
| | | | |-- maven:repositories.0.layout=
|
|
| | | | |-- maven:repositories.0.name=
|
|
| | | | |-- maven:repositories.0.plugins=
|
|
| | | | |-- maven:repositories.0.releases=
|
|
| | | | |-- maven:repositories.0.snapshots=
|
|
| | | | |-- maven:repositories.0.url=
|
|
| | | | |-- name=
|
|
| | | | |-- organization.favicon=
|
|
| | | | |-- organization.logo=
|
|
| | | | |-- organization.name=
|
|
| | | | |-- organization.url=
|
|
| | | | |-- relocatedTo.namespace=
|
|
| | | | |-- relocatedTo.project=
|
|
| | | | |-- relocatedTo.projectVersion=
|
|
| | | | |-- scm.connection=
|
|
| | | | |-- scm.developerConnection=
|
|
| | | | |-- scm.url=
|
|
| | | | |-- updated=
|
|
| | | | `-- url=
|
|
| | | `-- maven:artifactId=
|
|
| | `-- maven:groupId=
|
|
| `-- maven/
|
|
| `-- plugins/
|
|
| |-- maven:groupId=
|
|
| |-- maven:plugins.compiler.artifactId=
|
|
| `-- maven:plugins.compiler.name=
|
|
|-- facets/
|
|
| |-- org.apache.archiva.audit/
|
|
| | `-- 2010/
|
|
| | `-- 01/
|
|
| | `-- 19/
|
|
| | `-- 093600.000/
|
|
| | |-- action=
|
|
| | |-- artifact.id=
|
|
| | |-- artifact.namespace=
|
|
| | |-- artifact.projectId=
|
|
| | |-- artifact.version=
|
|
| | |-- remoteIP=
|
|
| | `-- user=
|
|
| |-- org.apache.archiva.metadata.repository.stats/
|
|
| | `-- 2009/
|
|
| | `-- 12/
|
|
| | `-- 03/
|
|
| | `-- 090000.000/
|
|
| | |-- scanEndTime=
|
|
| | |-- scanStartTime=
|
|
| | |-- totalArtifactCount=
|
|
| | |-- totalArtifactFileSize=
|
|
| | |-- totalFileCount=
|
|
| | |-- totalGroupCount=
|
|
| | `-- totalProjectCount=
|
|
| `-- org.apache.archiva.reports/
|
|
`-- references/
|
|
`-- org/
|
|
`-- apache/
|
|
`-- archiva/
|
|
|-- parent/
|
|
| `-- 1/
|
|
| `-- references/
|
|
| `-- org/
|
|
| `-- apache/
|
|
| `-- archiva/
|
|
| |-- platform/
|
|
| | `-- scanner/
|
|
| | `-- 1.0-SNAPSHOT/
|
|
| | `-- referenceType=parent
|
|
| `-- web/
|
|
| `-- webapp/
|
|
| `-- 1.0-SNAPSHOT/
|
|
| `-- referenceType=parent
|
|
`-- platform/
|
|
`-- scanner/
|
|
`-- 1.0-SNAPSHOT/
|
|
`-- references/
|
|
`-- org/
|
|
`-- apache/
|
|
`-- archiva/
|
|
`-- web/
|
|
`-- webapp/
|
|
`-- 1.0-SNAPSHOT/
|
|
`-- referenceType=dependency
|
|
----
|
|
|
|
~~ To update - run "tree --dirsfirst -F" on the unpacked content-model.zip from the sandbox
|
|
|
|
This uses a typical content repository structure, where there is a path to a particular node (the last paths in
|
|
the structure above), and nodes can have properties and values (shown as <<<property=value>>> above).
|
|
|
|
Properties with '.' may be nested in other representations such as Java models or XML, if appropriate - this is
|
|
the decision of the content repository persistence implementation.
|
|
|
|
Additionally, while some information is stored at the most generic level in the metadata repository (eg
|
|
<<<maven:groupId>>>, <<<maven:artifactId>>>), for convenience when loaded by the implementation it may all be pushed
|
|
into the project version's information. The metadata repository implementation can decide how best to store and
|
|
retrieve the information.
|
|
|
|
<Note:> Some of the properties have been put in place temporarily but need to be revisited - for example the use
|
|
of index counters for the lists of Maven POM information are not ideal, and some Maven specific aspects of
|
|
the dependencies should become faceted content
|
|
|
|
The following sections walk through parts of the tree.
|
|
|
|
** Configuration section
|
|
|
|
<Note:> The configuration section is not currently implemented in the code. It should be shadowed to a file on the
|
|
file system for easy editing and pre-configuration outside the server. A possible implementation is to use
|
|
the same storage and resolution mechanism to access the configuration so that this can be achieved, and it
|
|
can be loaded on the fly, etc.
|
|
|
|
It is desirable to be able to access and modify all configuration through the same interfaces, so it is also stored
|
|
in the content repository.
|
|
|
|
Each repository will have it's own metadata, but there also needs to be a server-level configuration for other parts
|
|
of the system.
|
|
|
|
** Content section
|
|
|
|
The content section houses the information directly about the artifacts in the repository. As described in the
|
|
{{{./terminology.html} Terminology}} document, artifacts are described by the following coordinates (with the values
|
|
shown from the example above):
|
|
|
|
* Namespace (<<<org.apache.archiva.platform>>>)
|
|
|
|
* Project ID (<<<scanner>>>)
|
|
|
|
* Project version (<<<1.0-SNAPSHOT>>>)
|
|
|
|
* Artifact ID (<<<scanner-1.0-20091120.012345-1.pom>>>)
|
|
|
|
[]
|
|
|
|
Namespaces are of arbitrary depth, and are project namespaces, not to be confused with JCR's item/node namespaces.
|
|
A separate namespace and project identifier are retained to allow '.' in the project identifier without splitting,
|
|
while still allowing splitting on '.' in the namespace, when determining the most appropriate path for an artifact
|
|
in the content repository. The namespace may be null if there isn't one.
|
|
|
|
Projects are very simple entities. They do not have subprojects - if such modeling needs to be done, then we
|
|
would create a "products" tree (or similar) that will map what "Archiva 1.0" contains as a collection of project
|
|
version nodes, for example.
|
|
|
|
Each artifact in the repository will contain an entry, though not necessarily every file. For example, in a Maven
|
|
repository it is known that the <<<.md5>>>, <<<.sha1>>> and <<<.asc>>> files represent metadata about the artifact
|
|
of the same name, so that is attached to that node instead.
|
|
|
|
Metadata is stored at the level most appropriate to that piece of information. This means that in a Maven
|
|
repository, while both the POM and other artifact(s) are considered be separate artifacts, they all share the
|
|
information in the POM that is stored at the project version or even project level. We only keep one set of project
|
|
information for a version - this differs from Maven's storage of one POM per snapshot. The Maven 2 module will take
|
|
the latest snapshot data and use that. Those that need Maven's behaviour should retrieve the POM directly.
|
|
|
|
Note that artifact data is not stored in the metadata repository (there is no data= property on the file). The
|
|
information here is enough to locate the file in the original storage when it is requested.
|
|
|
|
The following describes some of the metadata at each level. Note that the Maven extensions are covered here - these
|
|
are optional, and they wouldn't be present on a non-Maven storage repository. Likewise, plugins may store
|
|
additional metadata for each artifact.
|
|
|
|
*** Namespace Metadata
|
|
|
|
* <<<maven:groupId>>> - the Maven group ID
|
|
|
|
*** Project Metadata
|
|
|
|
* <<<maven:artifactId>>> - the Maven artifact ID
|
|
|
|
*** Project Version Metadata
|
|
|
|
* <<<created>>> - when the metadata was added to the repository (see [1] below)
|
|
|
|
* <<<updated>>> - when the metadata was last updated (see [1] below)
|
|
|
|
* <<<name>>> - human-readable project name
|
|
|
|
* <<<description>>> - a human-readable description of this project
|
|
|
|
* <<<url>>> - a URL to the project's documentation or other information
|
|
|
|
* <<<organization[].*>>> - information about the organization
|
|
|
|
* <<<licenses[].*>>> - the license the project source code is available under
|
|
|
|
* <<<issueManagement.*>>> - the issue tracker used by the application
|
|
|
|
* <<<ciManagement.*>>> - continuous integration server information
|
|
|
|
* <<<dependencies[].*>>> - other projects that this project version depends on. Note that currently this contains
|
|
Maven specifics that are expected to be abstracted out into the Maven extensions
|
|
|
|
* <<<individuals[].*>>> - participants in the development of, or otherwise associated with, the project
|
|
|
|
* <<<scm.*>>> - information about the SCM used to store the project source code
|
|
|
|
* <<<relocatedTo.*>>> - co-ordinates that this artifact has been relocated to
|
|
|
|
* <<<maven:packaging>>> - the packaging value in the Maven POM
|
|
|
|
* <<<maven:parent.*>>> - a reference to the Maven parent POM
|
|
|
|
* <<<maven.plugins[].*>>> - references to build plugins in a Maven POM
|
|
|
|
* <<<maven.repositories[].*>>> - references to other repositories in a Maven POM
|
|
|
|
* <<<maven:buildExtensions[].*>>> - references to build extensions in a Maven POM
|
|
|
|
* <<<maven:properties.*>>> - properties stored in a Maven POM
|
|
|
|
[]
|
|
|
|
Footnotes:
|
|
|
|
[[1]] created/updated timestamps may be maintained by the metadata repository implementation for the metadata
|
|
itself. Timestamps for individual files are stored as additional properties (<<<fileCreated>>>,
|
|
<<<fileLastModified>>>). It may make sense to add a "discovered" timestamp if an artifact is known to be
|
|
created at a different time to which it is added to the metadata repository.
|
|
|
|
** Facets Section
|
|
|
|
The facets section allows storage of other repository metadata for specific plugins. Each is named by the plugin's
|
|
unique identifier.
|
|
|
|
*** Audit Logs (<<<org.apache.archiva.audit>>>)
|
|
|
|
Audit logs are stored hierarchically by name, breaking down the date until getting to the timestamp of a particular
|
|
event. The event details are stored as properties of that node. Presently filtering by an action or other field
|
|
would require querying the content repository.
|
|
|
|
* <<<action>>> - the action that was taken, such as uploading an artifact
|
|
|
|
* <<<artifact.*>>> - the co-ordinates of the artifact affected
|
|
|
|
* <<<remoteIP>>> - the IP address of the person executing the action, if applicable
|
|
|
|
* <<<user>>> - the user affecting the action, if applicable
|
|
|
|
[]
|
|
|
|
A future possibility is to store audit metadata on artifacts themselves (who uploaded, when, and how), or whether it
|
|
was discovered by scanning. While this duplicates some information, it would reduce the need to query by a certain
|
|
artifact ID and the nodes could be lined referentially.
|
|
|
|
Audit metadata may also need to be extended to other nodes such as configuration. In this case, it may make sense
|
|
to alter the artifact reference to a content repository path instead, or to utilise a native mechanism of the
|
|
content repository.
|
|
|
|
*** Repository Statistics (<<<org.apache.archiva.metadata.repository.stats>>>)
|
|
|
|
Like audit logs, repository statistics are stored by timestamp, marking the time a scan started. The results are
|
|
stored as properties of the scan:
|
|
|
|
* <<<scanStartTime>>>, <<<scanEndTime>>> - when the scan ran from and until
|
|
|
|
* <<<total*>>> - the statistics gathered about certain totals in the repository
|
|
|
|
[]
|
|
|
|
The current approach of tying statistics to the scanning process is not optimal, as it cannot be 'live'. We may
|
|
later determine if any of the stats can be derived by functions of the content repository rather than storing and
|
|
trying to keep them up to date. Historical data might be retained by versioning and taking a snapshot at a given
|
|
point in time.
|
|
|
|
*** Problem Reports (<<<org.apache.archiva.reports>>>)
|
|
|
|
While not shown above, the problem reporting plugin similarly stores a facet of information, recording particular
|
|
issues noticed in the repository such as invalid Maven POMs, etc.
|
|
|
|
** References Section
|
|
|
|
The references section contains information about references to a given artifact. It is the inverse of the
|
|
dependency relationship.
|
|
|
|
References are stored outside the main model so that their creation doesn't imply a "stub" model - we know if the
|
|
project exists whether a reference is created or not. References need not infer referential integrity.
|
|
|