mirror of https://github.com/apache/archiva.git
[MRM-141] remove irrelevant documentation
git-svn-id: https://svn.apache.org/repos/asf/maven/archiva/trunk@591958 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7474f81764
commit
ed87e673fe
|
@ -1,153 +0,0 @@
|
|||
-----
|
||||
Indexer Design
|
||||
-----
|
||||
Brett Porter
|
||||
-----
|
||||
25 July 2006
|
||||
-----
|
||||
|
||||
~~ Copyright 2006 The Apache Software Foundation.
|
||||
~~
|
||||
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||
~~ you may not use this file except in compliance with the License.
|
||||
~~ You may obtain a copy of the License at
|
||||
~~
|
||||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~~
|
||||
~~ Unless required by applicable law or agreed to in writing, software
|
||||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
~~ See the License for the specific language governing permissions and
|
||||
~~ limitations under the License.
|
||||
|
||||
~~ NOTE: For help with the syntax of this file, see:
|
||||
~~ http://maven.apache.org/guides/mini/guide-apt-format.html
|
||||
|
||||
Indexer Design
|
||||
|
||||
<<Note: The current indexer design is under review. This document will grow into what it should be, and the code and
|
||||
tests refactored to match>>
|
||||
|
||||
~~TODO: separate API design from Lucene implementation design
|
||||
|
||||
* Standard Artifact Index
|
||||
|
||||
We currently want to index these elements from the repository:
|
||||
|
||||
* for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename (including path
|
||||
from the repository base), checksums (md5, sha1) and size
|
||||
|
||||
* for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins
|
||||
|
||||
* plugin prefix
|
||||
|
||||
* Java classes within a JAR artifact (delimited by \n)
|
||||
|
||||
* filenames within an archive (delimited by \n)
|
||||
|
||||
* the identifier of the source repository
|
||||
|
||||
Each record in the index refers to an artifact. Since the content for a record can come from various sources, the
|
||||
record may need to be updated when different files that are related to the same artifact are discovered (ie, the
|
||||
POM, or for plugins the metadata that contains their prefix).
|
||||
|
||||
To simplify this, the process for discovery is as follows:
|
||||
|
||||
* Discovered artifacts will read the related POM and metadata from the repository to index, rather than relying on
|
||||
it being discovered. This ensures that partial discovery still yields correct results in all cases, and it is
|
||||
possible to construct the entire record without having to read back from the index.
|
||||
|
||||
* POMs that do not have a packaging of POM are not sent to the indexer.
|
||||
|
||||
The result of this process is that updates to a POM or repository metadata and not the corresponding artifact(s) will
|
||||
not update the index. As POMs should not be modified, this will not be a major concern. Likewise, updates to metadata
|
||||
will only accompany updates to the artifact itself, so will not cause a problem.
|
||||
|
||||
The above case may have a problem if the discovery happens during the middle of a deployment outside of the
|
||||
repository manager (where the artifact is present, but the metadata or POM is not). To avoid such cases, the
|
||||
discoverer should only detect changes more than a minute old (this blackout should be configurable).
|
||||
|
||||
Other techniques were considered:
|
||||
|
||||
* Processing each artifact file individually, updating each record as needed. This would result in having to read
|
||||
back each index record before writing. This is quite costly in Lucene as it would be "read, delete, add". You
|
||||
must have a reader and writer open for that process, and it greatly complicates the code.
|
||||
|
||||
* Have three indices, one for each. This would complicate searching (and may affect ranking of results, though this
|
||||
was not analysed). While Lucene is
|
||||
{{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} capable of
|
||||
searching multiple indices}}, it is expected that the results would be in the form of a list of separate records
|
||||
rather than the "table join" this effectively is. A similar derivative of this technique would be to store
|
||||
everything in one index, using a field (previously, doctype) to identify each record.
|
||||
|
||||
Records in the index are keyed by their path from the repository root. While this is longer than using the
|
||||
dependency conflict ID, Lucene cannot delete by a combination of terms, so would require storing an additional
|
||||
field in the index where the file already exists.
|
||||
|
||||
The plugin prefix could be found either from inside the plugin JAR (<<<META-INF/maven/plugin.xml>>>), or from the
|
||||
repository metadata for the plugin's group. For simplicity, the first approach will be used. This means at present
|
||||
there is no need to index the repository metadata, however that may be considered in future.
|
||||
|
||||
Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM.
|
||||
However, to be able to search by this type, the indexer will look for a <<<META-INF/maven/archetype.xml>>> file, and
|
||||
if found set its packaging to <<<maven-archetype>>>. In the future, this handling will be deprecated as the POMs
|
||||
can start using the appropriate packaging.
|
||||
|
||||
The index is shared among multiple repositories. The source repository is recorded in the index record. The
|
||||
discovery/conversion/reporting mechanisms are expected to deal with duplicates before reaching the indexer, so if the
|
||||
indexer encounters an artifact from a different repository than it was already added, it will simply replace the
|
||||
record.
|
||||
|
||||
When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and
|
||||
interpolation are performed. This ensures that the record is as complete as possible, and that searching by
|
||||
fields that are inherited will reveal both the parent and the children in the search results.
|
||||
|
||||
* Reduced Size Index
|
||||
|
||||
An additional index is maintained by the repository manager in the
|
||||
{{{../apidocs/org/apache/maven/archiva/indexing/MinimalArtifactIndexRecord.html} MinimalIndex}} class. This
|
||||
indexes all of the same artifacts as the first index, but stores them with shorter field names and less information to
|
||||
maintain a smaller size. This index is appropriate for use by certain clients such as IDE integration for fast
|
||||
searching. For a fuller interface to the repository information, the integration should use the XMLRPC interface.
|
||||
|
||||
The following fields are in the reduced index:
|
||||
|
||||
* <<<j>>>: The JAR filename
|
||||
|
||||
* <<<s>>>: The JAR size
|
||||
|
||||
* <<<d>>>: The last modified timestamp
|
||||
|
||||
* <<<c>>>: A list of classes in the JAR (\n delimited)
|
||||
|
||||
* <<<m>>>: md5 checksum of the JAR
|
||||
|
||||
* <<<pk>>>: the primary key of the artifact
|
||||
|
||||
Only JARs are indexed at present. The JAR filename is used as the key for later deleting entries.
|
||||
|
||||
* Searching
|
||||
|
||||
Searching will be reasonably flexible, though the general use case will be to enter a single parsed query that is
|
||||
applied to all fields in the index.
|
||||
|
||||
Some features that will be available:
|
||||
|
||||
* <Search through most fields for a particular keyword>: the general case described above.
|
||||
|
||||
* <Search by a particular field (exact match)>: This would be needed for search by checksum.
|
||||
|
||||
* <Search in a range of field values>: This would be needed for searching based on update time. Note that in
|
||||
Lucene it may be better to search by other fields (or return all), and then filter the results by dates rather
|
||||
than making dates part of a search query.
|
||||
|
||||
* <Limit search to particular fields>: It will be useful to only search Java classes and packages, for example
|
||||
|
||||
Another thing to note is that the search results should be able to be composed entirely from the index for performance
|
||||
reasons. It should not have to read any metadata files or properties of files such as size and checksum from the disk.
|
||||
This enables searching a repository remotely without having the physical repository available, which is useful for
|
||||
IDE integration among other things.
|
||||
|
||||
Note that to be able to do an exact match search, a field must be stored untokenized. For fields where it makes sense
|
||||
to search both tokenized and untokenized, they will be stored twice. This currently includes: artifact ID, group ID,
|
||||
and version.
|
|
@ -1,90 +0,0 @@
|
|||
~~ Copyright 2006 The Apache Software Foundation.
|
||||
~~
|
||||
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||
~~ you may not use this file except in compliance with the License.
|
||||
~~ You may obtain a copy of the License at
|
||||
~~
|
||||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~~
|
||||
~~ Unless required by applicable law or agreed to in writing, software
|
||||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
~~ See the License for the specific language governing permissions and
|
||||
~~ limitations under the License.
|
||||
|
||||
~~ NOTE: For help with the syntax of this file, see:
|
||||
~~ http://maven.apache.org/guides/mini/guide-apt-format.html
|
||||
|
||||
ProxyManager
|
||||
|
||||
The ProxyManager is designed to be used as a simple object or bean for use by
|
||||
a command-line application or web application.
|
||||
|
||||
Configuration
|
||||
|
||||
An instance of a ProxyManager requires a configuration object that will
|
||||
define its behavior called ProxyConfiguration. The ProxyConfiguration is a
|
||||
plexus component and can be looked up to get an instance of it. Below is a sample
|
||||
plexus lookup statement:
|
||||
|
||||
----------
|
||||
ProxyConfiguration config = (ProxyConfiguration) container.lookup( ProxyConfiguration.ROLE );
|
||||
----------
|
||||
|
||||
Currently, a ProxyConfiguration lookup will return an empty instance of the
|
||||
ProxyConfiguration which means it doesn't have any default definitions yet on
|
||||
how the ProxyManager should behave. So the next step is to explicitly define
|
||||
its behavior.
|
||||
|
||||
----------
|
||||
ProxyConfiguration config = (ProxyConfiguration) container.lookup( ProxyConfiguration.ROLE );
|
||||
|
||||
config.setRepositoryCachePath( "/user/proxy-cache" );
|
||||
|
||||
ArtifactRepositoryLayout defLayout = new DefaultRepositoryLayout();
|
||||
|
||||
File repo1File = new File( "src/test/remote-repo1" );
|
||||
|
||||
ProxyRepository repo1 = new ProxyRepository( "central", "http://www.ibiblio.org/maven2", defLayout );
|
||||
|
||||
config.addRepository( repo1 );
|
||||
----------
|
||||
|
||||
The above statements sets up the ProxyConfiguration to use the directory
|
||||
<<</user/proxy-cache>>> as the location of the proxy's repository cache.
|
||||
Then it creates a ProxyRepository instance with an id of <<<central>>> to
|
||||
look for remote files in ibiblio.org.
|
||||
|
||||
Instantiation
|
||||
|
||||
To create or retrieve an instance of a ProxyManager, one will need to use the
|
||||
ProxyManagerFactory.
|
||||
|
||||
----------
|
||||
ProxyManagerFactory factory = (ProxyManagerFactory) container.lookup( ProxyManagerFactory.ROLE );
|
||||
proxy = factory.getProxyManager( "default", config );
|
||||
----------
|
||||
|
||||
The factory requires two parameters. The first parameter is the proxy_type
|
||||
that you will want to use. And the second parameter is the ProxyConfiguration
|
||||
which we already did above. The proxy_type defines the client that the
|
||||
ProxyManager is expected to service. Currently, only <<<default>>>
|
||||
ProxyManager type is available and is defined to be for Maven 2.x clients.
|
||||
|
||||
Usage
|
||||
|
||||
* The get() method
|
||||
|
||||
The ProxyManager get( target ) method is used to retrieve a path file. This
|
||||
method first looks into the cache if the target exists. If it does not, then
|
||||
the ProxyManager will search all the ProxyRepositories present in its
|
||||
ProxyConfiguration. When the target path is found, the ProxyManager creates
|
||||
a copy of it in its cache and returns a File instance of the cached copy.
|
||||
|
||||
* The getRemoteFile() method
|
||||
|
||||
The ProxyManager getRemoteFile( path ) method is used to force the
|
||||
ProxyManager to ignore the contents of its cache and search all the
|
||||
ProxyRepository objects for the specified path and retrieve it when
|
||||
available. When successful, the ProxyManager creates a copy of the remote
|
||||
file in its cache and then returns a File instance of the cached copy.
|
Loading…
Reference in New Issue