[MRM-127] design docs

git-svn-id: https://svn.apache.org/repos/asf/maven/repository-manager/trunk@425308 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Brett Porter 2006-07-25 06:41:13 +00:00
parent e741c40549
commit 87b0eda46c
2 changed files with 101 additions and 0 deletions

View File

@ -0,0 +1,77 @@
-----
Indexer Design
-----
Brett Porter
-----
25 July 2006
-----
Indexer Design
<<Note: The current indexer design is under review. This document will grow into what it should be, and the code and
tests refactored to match>>
* Standard Artifact Index
We currently want to index these elements from the repository:
* for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename,
checksums (md5, sha1) and size
* for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins
* plugin prefix from the repository metadata (in the future, more may be indexed)
* the identifier of the source repository
Each record in the index refers to an artifact. Since the content for a record can come from various sources, the
record may need to be updated when different files that are related to the same artifact are discovered (ie, the
POM, or for plugins the metadata that contains their prefix).
Records in the index are generally keyed by their dependency conflict ID (ie, a combination of group, artifact,
version, type and classifier). The exception to this rule is the POM: if an entry already exists with a different
type but the same group, artifact, version and no classifier, then a POM entry is not added and the model fields are
applied to the existing entry. Conversely, if a POM is added first and an artifact with the same group, artifact,
version and no classifier is later added then it overwrites the record of the POM.
The above process, especially with regard to the handling of the POM, should be much simpler if the discoverer is
able to associate a POM to the artifact instead of feeding them in separately as it does at present.
While some of the information stored is specific to a particular type of file, it is all maintained in a single index
for simplicity. In the future, if the content of the various documents diverges greatly, it may be split into separate
indexes. In that case, we may consider using Lucene's
{{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} multiple index
searching capabilities}}.
Currently, the discoverer returns POMs as separate artifact entries to the actual artifact, and any derived artifacts
in the repository. To accommodate this, when indexed
Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM.
However, to be able to search by this type, the indexer will look for a <<<META-INF/maven/archetype.xml>>> file, and
if found set its packaging to <<<maven-archetype>>>. In the future, this handling will be deprecated as the POMs
can start using the appropriate packaging.
The index is shared among multiple repositories. The source repository is recorded in the index record. The indexer
should complain if an artifact is attempted to be updated from a different repository at a later date to avoid
duplicates. Ideally, the discovery/conversion mechanisms would deal with this before reaching the indexer.
When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and
interpolation are performed. This ensures that the record is as complete as possible, and that searching by
fields that are inherited will reveal both the parent and the children in the search results.
* Reduced Size Index
An additional index is maintained by the repository manager in the
{{{../apidocs/org/apache/maven/repository/indexing/MinimalIndex.html} MinimalIndex}} class. This indexes all of the
same artifacts as the first index, but stores them with shorter field names and less information to maintain a smaller
size. This index is appropriate for use by certain clients such as IDE integration for fast searching. For a fuller
interface to the repository information, the integration should use the XMLRPC interface.
~~TODO: finish!
* Limitations
Currently, because the POM and artifacts are fed in separately, there is no way to associate an artifact with a
classifier to its POM, meaning there is less information about it in the index. It may be best that this occurs by
design - it seems that while it is desirable to search by classifier you only want to find the main artifact for
browsing and see the derived artifact listed under that. How this evolves should be carefully considered.

View File

@ -0,0 +1,24 @@
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
~ Copyright 2005-2006 The Apache Software Foundation.
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project>
<body>
<menu name="Design Documentation">
<item name="Indexing Design" href="/design.html"/>
</menu>
</body>
</project>