mirror of https://github.com/apache/archiva.git
[MRM-127] design docs
git-svn-id: https://svn.apache.org/repos/asf/maven/repository-manager/trunk@425308 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e741c40549
commit
87b0eda46c
|
@ -0,0 +1,77 @@
|
||||||
|
-----
|
||||||
|
Indexer Design
|
||||||
|
-----
|
||||||
|
Brett Porter
|
||||||
|
-----
|
||||||
|
25 July 2006
|
||||||
|
-----
|
||||||
|
|
||||||
|
Indexer Design
|
||||||
|
|
||||||
|
<<Note: The current indexer design is under review. This document will grow into what it should be, and the code and
|
||||||
|
tests refactored to match>>
|
||||||
|
|
||||||
|
* Standard Artifact Index
|
||||||
|
|
||||||
|
We currently want to index these elements from the repository:
|
||||||
|
|
||||||
|
* for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename,
|
||||||
|
checksums (md5, sha1) and size
|
||||||
|
|
||||||
|
* for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins
|
||||||
|
|
||||||
|
* plugin prefix from the repository metadata (in the future, more may be indexed)
|
||||||
|
|
||||||
|
* the identifier of the source repository
|
||||||
|
|
||||||
|
Each record in the index refers to an artifact. Since the content for a record can come from various sources, the
|
||||||
|
record may need to be updated when different files that are related to the same artifact are discovered (ie, the
|
||||||
|
POM, or for plugins the metadata that contains their prefix).
|
||||||
|
|
||||||
|
Records in the index are generally keyed by their dependency conflict ID (ie, a combination of group, artifact,
|
||||||
|
version, type and classifier). The exception to this rule is the POM: if an entry already exists with a different
|
||||||
|
type but the same group, artifact, version and no classifier, then a POM entry is not added and the model fields are
|
||||||
|
applied to the existing entry. Conversely, if a POM is added first and an artifact with the same group, artifact,
|
||||||
|
version and no classifier is later added then it overwrites the record of the POM.
|
||||||
|
|
||||||
|
The above process, especially with regard to the handling of the POM, should be much simpler if the discoverer is
|
||||||
|
able to associate a POM to the artifact instead of feeding them in separately as it does at present.
|
||||||
|
|
||||||
|
While some of the information stored is specific to a particular type of file, it is all maintained in a single index
|
||||||
|
for simplicity. In the future, if the content of the various documents diverges greatly, it may be split into separate
|
||||||
|
indexes. In that case, we may consider using Lucene's
|
||||||
|
{{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} multiple index
|
||||||
|
searching capabilities}}.
|
||||||
|
|
||||||
|
Currently, the discoverer returns POMs as separate artifact entries to the actual artifact, and any derived artifacts
|
||||||
|
in the repository. To accommodate this, when indexed
|
||||||
|
|
||||||
|
Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM.
|
||||||
|
However, to be able to search by this type, the indexer will look for a <<<META-INF/maven/archetype.xml>>> file, and
|
||||||
|
if found set its packaging to <<<maven-archetype>>>. In the future, this handling will be deprecated as the POMs
|
||||||
|
can start using the appropriate packaging.
|
||||||
|
|
||||||
|
The index is shared among multiple repositories. The source repository is recorded in the index record. The indexer
|
||||||
|
should complain if an artifact is attempted to be updated from a different repository at a later date to avoid
|
||||||
|
duplicates. Ideally, the discovery/conversion mechanisms would deal with this before reaching the indexer.
|
||||||
|
|
||||||
|
When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and
|
||||||
|
interpolation are performed. This ensures that the record is as complete as possible, and that searching by
|
||||||
|
fields that are inherited will reveal both the parent and the children in the search results.
|
||||||
|
|
||||||
|
* Reduced Size Index
|
||||||
|
|
||||||
|
An additional index is maintained by the repository manager in the
|
||||||
|
{{{../apidocs/org/apache/maven/repository/indexing/MinimalIndex.html} MinimalIndex}} class. This indexes all of the
|
||||||
|
same artifacts as the first index, but stores them with shorter field names and less information to maintain a smaller
|
||||||
|
size. This index is appropriate for use by certain clients such as IDE integration for fast searching. For a fuller
|
||||||
|
interface to the repository information, the integration should use the XMLRPC interface.
|
||||||
|
|
||||||
|
~~TODO: finish!
|
||||||
|
|
||||||
|
* Limitations
|
||||||
|
|
||||||
|
Currently, because the POM and artifacts are fed in separately, there is no way to associate an artifact with a
|
||||||
|
classifier to its POM, meaning there is less information about it in the index. It may be best that this occurs by
|
||||||
|
design - it seems that while it is desirable to search by classifier you only want to find the main artifact for
|
||||||
|
browsing and see the derived artifact listed under that. How this evolves should be carefully considered.
|
|
@ -0,0 +1,24 @@
|
||||||
|
<?xml version="1.0" encoding="ISO-8859-1"?>
|
||||||
|
<!--
|
||||||
|
~ Copyright 2005-2006 The Apache Software Foundation.
|
||||||
|
~
|
||||||
|
~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
~ you may not use this file except in compliance with the License.
|
||||||
|
~ You may obtain a copy of the License at
|
||||||
|
~
|
||||||
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
~
|
||||||
|
~ Unless required by applicable law or agreed to in writing, software
|
||||||
|
~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
~ See the License for the specific language governing permissions and
|
||||||
|
~ limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
<project>
|
||||||
|
<body>
|
||||||
|
<menu name="Design Documentation">
|
||||||
|
<item name="Indexing Design" href="/design.html"/>
|
||||||
|
</menu>
|
||||||
|
</body>
|
||||||
|
</project>
|
Loading…
Reference in New Issue