From ed87e673fef461df6890e606f5e0058f95266277 Mon Sep 17 00:00:00 2001 From: Brett Porter Date: Mon, 5 Nov 2007 11:05:08 +0000 Subject: [PATCH] [MRM-141] remove irrelevant documentation git-svn-id: https://svn.apache.org/repos/asf/maven/archiva/trunk@591958 13f79535-47bb-0310-9956-ffa450edef68 --- .../archiva-indexer/src/site/apt/design.apt | 153 ------------------ .../archiva-proxy/src/site/apt/how-to.apt | 90 ----------- 2 files changed, 243 deletions(-) delete mode 100644 archiva-base/archiva-indexer/src/site/apt/design.apt delete mode 100644 archiva-base/archiva-proxy/src/site/apt/how-to.apt diff --git a/archiva-base/archiva-indexer/src/site/apt/design.apt b/archiva-base/archiva-indexer/src/site/apt/design.apt deleted file mode 100644 index aea3f9634..000000000 --- a/archiva-base/archiva-indexer/src/site/apt/design.apt +++ /dev/null @@ -1,153 +0,0 @@ - ----- - Indexer Design - ----- - Brett Porter - ----- - 25 July 2006 - ----- - -~~ Copyright 2006 The Apache Software Foundation. -~~ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. - -~~ NOTE: For help with the syntax of this file, see: -~~ http://maven.apache.org/guides/mini/guide-apt-format.html - -Indexer Design - - <> - - ~~TODO: separate API design from Lucene implementation design - -* Standard Artifact Index - - We currently want to index these elements from the repository: - - * for each artifact file: the artifact ID, version, group ID, classifier, type (extension), filename (including path - from the repository base), checksums (md5, sha1) and size - - * for each artifact POM: the packaging, licenses, dependencies, build plugins, reporting plugins - - * plugin prefix - - * Java classes within a JAR artifact (delimited by \n) - - * filenames within an archive (delimited by \n) - - * the identifier of the source repository - - Each record in the index refers to an artifact. Since the content for a record can come from various sources, the - record may need to be updated when different files that are related to the same artifact are discovered (ie, the - POM, or for plugins the metadata that contains their prefix). - - To simplify this, the process for discovery is as follows: - - * Discovered artifacts will read the related POM and metadata from the repository to index, rather than relying on - it being discovered. This ensures that partial discovery still yields correct results in all cases, and it is - possible to construct the entire record without having to read back from the index. - - * POMs that do not have a packaging of POM are not sent to the indexer. - - The result of this process is that updates to a POM or repository metadata and not the corresponding artifact(s) will - not update the index. As POMs should not be modified, this will not be a major concern. Likewise, updates to metadata - will only accompany updates to the artifact itself, so will not cause a problem. - - The above case may have a problem if the discovery happens during the middle of a deployment outside of the - repository manager (where the artifact is present, but the metadata or POM is not). To avoid such cases, the - discoverer should only detect changes more than a minute old (this blackout should be configurable). - - Other techniques were considered: - - * Processing each artifact file individually, updating each record as needed. This would result in having to read - back each index record before writing. This is quite costly in Lucene as it would be "read, delete, add". You - must have a reader and writer open for that process, and it greatly complicates the code. - - * Have three indices, one for each. This would complicate searching (and may affect ranking of results, though this - was not analysed). While Lucene is - {{{http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-b11296f9e7b2a5e7496d67118d0a5898f2fd9823} capable of - searching multiple indices}}, it is expected that the results would be in the form of a list of separate records - rather than the "table join" this effectively is. A similar derivative of this technique would be to store - everything in one index, using a field (previously, doctype) to identify each record. - - Records in the index are keyed by their path from the repository root. While this is longer than using the - dependency conflict ID, Lucene cannot delete by a combination of terms, so would require storing an additional - field in the index where the file already exists. - - The plugin prefix could be found either from inside the plugin JAR (<<>>), or from the - repository metadata for the plugin's group. For simplicity, the first approach will be used. This means at present - there is no need to index the repository metadata, however that may be considered in future. - - Note that archetypes currently don't have a packaging associated with them in Maven, so it is not recorded in the POM. - However, to be able to search by this type, the indexer will look for a <<>> file, and - if found set its packaging to <<>>. In the future, this handling will be deprecated as the POMs - can start using the appropriate packaging. - - The index is shared among multiple repositories. The source repository is recorded in the index record. The - discovery/conversion/reporting mechanisms are expected to deal with duplicates before reaching the indexer, so if the - indexer encounters an artifact from a different repository than it was already added, it will simply replace the - record. - - When indexing metadata from a POM, the POM should be loaded using the Maven project builder so that inheritance and - interpolation are performed. This ensures that the record is as complete as possible, and that searching by - fields that are inherited will reveal both the parent and the children in the search results. - -* Reduced Size Index - - An additional index is maintained by the repository manager in the - {{{../apidocs/org/apache/maven/archiva/indexing/MinimalArtifactIndexRecord.html} MinimalIndex}} class. This - indexes all of the same artifacts as the first index, but stores them with shorter field names and less information to - maintain a smaller size. This index is appropriate for use by certain clients such as IDE integration for fast - searching. For a fuller interface to the repository information, the integration should use the XMLRPC interface. - - The following fields are in the reduced index: - - * <<>>: The JAR filename - - * <<>>: The JAR size - - * <<>>: The last modified timestamp - - * <<>>: A list of classes in the JAR (\n delimited) - - * <<>>: md5 checksum of the JAR - - * <<>>: the primary key of the artifact - - Only JARs are indexed at present. The JAR filename is used as the key for later deleting entries. - -* Searching - - Searching will be reasonably flexible, though the general use case will be to enter a single parsed query that is - applied to all fields in the index. - - Some features that will be available: - - * : the general case described above. - - * : This would be needed for search by checksum. - - * : This would be needed for searching based on update time. Note that in - Lucene it may be better to search by other fields (or return all), and then filter the results by dates rather - than making dates part of a search query. - - * : It will be useful to only search Java classes and packages, for example - - Another thing to note is that the search results should be able to be composed entirely from the index for performance - reasons. It should not have to read any metadata files or properties of files such as size and checksum from the disk. - This enables searching a repository remotely without having the physical repository available, which is useful for - IDE integration among other things. - - Note that to be able to do an exact match search, a field must be stored untokenized. For fields where it makes sense - to search both tokenized and untokenized, they will be stored twice. This currently includes: artifact ID, group ID, - and version. diff --git a/archiva-base/archiva-proxy/src/site/apt/how-to.apt b/archiva-base/archiva-proxy/src/site/apt/how-to.apt deleted file mode 100644 index 4f7d7eb51..000000000 --- a/archiva-base/archiva-proxy/src/site/apt/how-to.apt +++ /dev/null @@ -1,90 +0,0 @@ -~~ Copyright 2006 The Apache Software Foundation. -~~ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. - -~~ NOTE: For help with the syntax of this file, see: -~~ http://maven.apache.org/guides/mini/guide-apt-format.html - -ProxyManager - - The ProxyManager is designed to be used as a simple object or bean for use by - a command-line application or web application. - -Configuration - - An instance of a ProxyManager requires a configuration object that will - define its behavior called ProxyConfiguration. The ProxyConfiguration is a - plexus component and can be looked up to get an instance of it. Below is a sample - plexus lookup statement: - ----------- - ProxyConfiguration config = (ProxyConfiguration) container.lookup( ProxyConfiguration.ROLE ); ----------- - - Currently, a ProxyConfiguration lookup will return an empty instance of the - ProxyConfiguration which means it doesn't have any default definitions yet on - how the ProxyManager should behave. So the next step is to explicitly define - its behavior. - ----------- - ProxyConfiguration config = (ProxyConfiguration) container.lookup( ProxyConfiguration.ROLE ); - - config.setRepositoryCachePath( "/user/proxy-cache" ); - - ArtifactRepositoryLayout defLayout = new DefaultRepositoryLayout(); - - File repo1File = new File( "src/test/remote-repo1" ); - - ProxyRepository repo1 = new ProxyRepository( "central", "http://www.ibiblio.org/maven2", defLayout ); - - config.addRepository( repo1 ); ----------- - - The above statements sets up the ProxyConfiguration to use the directory - <<>> as the location of the proxy's repository cache. - Then it creates a ProxyRepository instance with an id of <<>> to - look for remote files in ibiblio.org. - -Instantiation - - To create or retrieve an instance of a ProxyManager, one will need to use the - ProxyManagerFactory. - ----------- - ProxyManagerFactory factory = (ProxyManagerFactory) container.lookup( ProxyManagerFactory.ROLE ); - proxy = factory.getProxyManager( "default", config ); ----------- - - The factory requires two parameters. The first parameter is the proxy_type - that you will want to use. And the second parameter is the ProxyConfiguration - which we already did above. The proxy_type defines the client that the - ProxyManager is expected to service. Currently, only <<>> - ProxyManager type is available and is defined to be for Maven 2.x clients. - -Usage - -* The get() method - - The ProxyManager get( target ) method is used to retrieve a path file. This - method first looks into the cache if the target exists. If it does not, then - the ProxyManager will search all the ProxyRepositories present in its - ProxyConfiguration. When the target path is found, the ProxyManager creates - a copy of it in its cache and returns a File instance of the cached copy. - -* The getRemoteFile() method - - The ProxyManager getRemoteFile( path ) method is used to force the - ProxyManager to ignore the contents of its cache and search all the - ProxyRepository objects for the specified path and retrieve it when - available. When successful, the ProxyManager creates a copy of the remote - file in its cache and then returns a File instance of the cached copy. \ No newline at end of file