From 87b0eda46c10f686f11f9cc8672fe5b348030f72 Mon Sep 17 00:00:00 2001 From: Brett Porter Date: Tue, 25 Jul 2006 06:41:13 +0000 Subject: [PATCH] [MRM-127] design docs git-svn-id: https://svn.apache.org/repos/asf/maven/repository-manager/trunk@425308 13f79535-47bb-0310-9956-ffa450edef68 --- .../src/site/apt/design.apt | 77 +++++++++++++++++++ maven-repository-indexer/src/site/site.xml | 24 ++++++ 2 files changed, 101 insertions(+) create mode 100644 maven-repository-indexer/src/site/apt/design.apt create mode 100644 maven-repository-indexer/src/site/site.xml diff --git a/maven-repository-indexer/src/site/apt/design.apt b/maven-repository-indexer/src/site/apt/design.apt new file mode 100644 index 000000000..0eebb6035 --- /dev/null +++ b/maven-repository-indexer/src/site/apt/design.apt @@ -0,0 +1,77 @@ + ----- + Indexer Design + ----- + Brett Porter + ----- + 25 July 2006 + ----- + +Indexer Design + + <> + +* Standard Artifact Index + + We currently want to index these elements from the repository: + + * for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename, + checksums (md5, sha1) and size + + * for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins + + * plugin prefix from the repository metadata (in the future, more may be indexed) + + * the identifier of the source repository + + Each record in the index refers to an artifact. Since the content for a record can come from various sources, the + record may need to be updated when different files that are related to the same artifact are discovered (ie, the + POM, or for plugins the metadata that contains their prefix). + + Records in the index are generally keyed by their dependency conflict ID (ie, a combination of group, artifact, + version, type and classifier). The exception to this rule is the POM: if an entry already exists with a different + type but the same group, artifact, version and no classifier, then a POM entry is not added and the model fields are + applied to the existing entry. Conversely, if a POM is added first and an artifact with the same group, artifact, + version and no classifier is later added then it overwrites the record of the POM. + + The above process, especially with regard to the handling of the POM, should be much simpler if the discoverer is + able to associate a POM to the artifact instead of feeding them in separately as it does at present. + + While some of the information stored is specific to a particular type of file, it is all maintained in a single index + for simplicity. In the future, if the content of the various documents diverges greatly, it may be split into separate + indexes. In that case, we may consider using Lucene's + {{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} multiple index + searching capabilities}}. + + Currently, the discoverer returns POMs as separate artifact entries to the actual artifact, and any derived artifacts + in the repository. To accommodate this, when indexed + + Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM. + However, to be able to search by this type, the indexer will look for a <<>> file, and + if found set its packaging to <<>>. In the future, this handling will be deprecated as the POMs + can start using the appropriate packaging. + + The index is shared among multiple repositories. The source repository is recorded in the index record. The indexer + should complain if an artifact is attempted to be updated from a different repository at a later date to avoid + duplicates. Ideally, the discovery/conversion mechanisms would deal with this before reaching the indexer. + + When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and + interpolation are performed. This ensures that the record is as complete as possible, and that searching by + fields that are inherited will reveal both the parent and the children in the search results. + +* Reduced Size Index + + An additional index is maintained by the repository manager in the + {{{../apidocs/org/apache/maven/repository/indexing/MinimalIndex.html} MinimalIndex}} class. This indexes all of the + same artifacts as the first index, but stores them with shorter field names and less information to maintain a smaller + size. This index is appropriate for use by certain clients such as IDE integration for fast searching. For a fuller + interface to the repository information, the integration should use the XMLRPC interface. + + ~~TODO: finish! + +* Limitations + + Currently, because the POM and artifacts are fed in separately, there is no way to associate an artifact with a + classifier to its POM, meaning there is less information about it in the index. It may be best that this occurs by + design - it seems that while it is desirable to search by classifier you only want to find the main artifact for + browsing and see the derived artifact listed under that. How this evolves should be carefully considered. diff --git a/maven-repository-indexer/src/site/site.xml b/maven-repository-indexer/src/site/site.xml new file mode 100644 index 000000000..565d18fd0 --- /dev/null +++ b/maven-repository-indexer/src/site/site.xml @@ -0,0 +1,24 @@ + + + + + + + + + +