2006-11-09 11:21:50 -05:00
|
|
|
Apache Lucene
|
2022-01-04 05:43:46 -05:00
|
|
|
Copyright 2001-2022 The Apache Software Foundation
|
2006-11-09 11:21:50 -05:00
|
|
|
|
2014-08-15 13:52:12 -04:00
|
|
|
This product includes software developed at
|
2006-11-09 11:21:50 -05:00
|
|
|
The Apache Software Foundation (http://www.apache.org/).
|
|
|
|
|
2012-04-04 16:49:14 -04:00
|
|
|
Includes software from other Apache Software Foundation projects,
|
|
|
|
including, but not limited to:
|
|
|
|
- Apache Jakarta Regexp
|
2012-04-17 23:29:38 -04:00
|
|
|
- Apache Commons
|
|
|
|
- Apache Xerces
|
2009-05-16 05:55:34 -04:00
|
|
|
|
2012-04-17 15:52:48 -04:00
|
|
|
ICU4J, (under analysis/icu) is licensed under an MIT styles license
|
2012-04-22 08:55:11 -04:00
|
|
|
and Copyright (c) 1995-2008 International Business Machines Corporation and others
|
2010-04-06 15:19:27 -04:00
|
|
|
|
2012-04-17 15:52:48 -04:00
|
|
|
Some data files (under analysis/icu/src/data) are derived from Unicode data such
|
2010-04-22 04:47:49 -04:00
|
|
|
as the Unicode Character Database. See http://unicode.org/copyright.html for more
|
|
|
|
details.
|
|
|
|
|
2017-05-23 03:13:25 -04:00
|
|
|
Brics Automaton (under core/src/java/org/apache/lucene/util/automaton) is
|
2010-04-06 15:19:27 -04:00
|
|
|
BSD-licensed, created by Anders Møller. See http://www.brics.dk/automaton/
|
|
|
|
|
2012-04-17 15:52:48 -04:00
|
|
|
The levenshtein automata tables (under core/src/java/org/apache/lucene/util/automaton) were
|
2010-04-06 15:19:27 -04:00
|
|
|
automatically generated with the moman/finenight FSA library, created by
|
|
|
|
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
|
2017-05-23 03:13:25 -04:00
|
|
|
see http://sites.google.com/site/rrettesite/moman and
|
2010-04-06 15:19:27 -04:00
|
|
|
http://bitbucket.org/jpbarrette/moman/overview/
|
2010-10-27 11:16:56 -04:00
|
|
|
|
2011-12-18 12:11:06 -05:00
|
|
|
The class org.apache.lucene.util.WeakIdentityMap was derived from
|
|
|
|
the Apache CXF project and is Apache License 2.0.
|
|
|
|
|
LUCENE-4702: Terms dictionary compression. (#1126)
Compress blocks of suffixes in order to make the terms dictionary more
space-efficient. Two compression algorithms are used depending on which one is
more space-efficient:
- LowercaseAsciiCompression, which applies when all bytes are in the
`[0x1F,0x3F)` or `[0x5F,0x7F)` ranges, which notably include all digits,
lowercase ASCII characters, '.', '-' and '_', and encodes 4 chars on 3 bytes.
It is very often applicable on analyzed content and decompresses very quickly
thanks to auto-vectorization support in the JVM.
- LZ4, when the compression ratio is less than 0.75.
I was a bit unhappy with the complexity of the high-compression LZ4 option, so
I simplified it in order to only keep the logic that detects duplicate strings.
The logic about what to do in case overlapping matches are found, which was
responsible for most of the complexity while only yielding tiny benefits, has
been removed.
2020-01-24 08:46:57 -05:00
|
|
|
The class org.apache.lucene.util.compress.LZ4 is a Java rewrite of the LZ4
|
|
|
|
compression library (https://github.com/lz4/lz4/tree/dev/lib) that is licensed
|
|
|
|
under the 2-clause BSD license.
|
|
|
|
(https://opensource.org/licenses/bsd-license.php)
|
|
|
|
|
2011-02-05 18:36:32 -05:00
|
|
|
The Google Code Prettify is Apache License 2.0.
|
|
|
|
See http://code.google.com/p/google-code-prettify/
|
2011-03-06 15:33:04 -05:00
|
|
|
|
2012-04-17 23:29:38 -04:00
|
|
|
This product includes code (JaspellTernarySearchTrie) from Java Spelling Checkin
|
|
|
|
g Package (jaspell): http://jaspell.sourceforge.net/
|
|
|
|
License: The BSD License (http://www.opensource.org/licenses/bsd-license.php)
|
|
|
|
|
|
|
|
The snowball stemmers in
|
|
|
|
analysis/common/src/java/net/sf/snowball
|
|
|
|
were developed by Martin Porter and Richard Boulton.
|
|
|
|
The snowball stopword lists in
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/snowball
|
|
|
|
were developed by Martin Porter and Richard Boulton.
|
|
|
|
The full snowball package is available from
|
2021-10-19 03:45:49 -04:00
|
|
|
https://snowballstem.org/
|
2012-04-17 23:29:38 -04:00
|
|
|
|
|
|
|
The KStem stemmer in
|
|
|
|
analysis/common/src/org/apache/lucene/analysis/en
|
|
|
|
was developed by Bob Krovetz and Sergio Guzman-Lara (CIIR-UMass Amherst)
|
|
|
|
under the BSD-license.
|
|
|
|
|
2017-08-24 08:05:22 -04:00
|
|
|
The Arabic,Persian,Romanian,Bulgarian, Hindi and Bengali analyzers (common) come with a default
|
2012-04-17 23:29:38 -04:00
|
|
|
stopword list that is BSD-licensed created by Jacques Savoy. These files reside in:
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/ar/stopwords.txt,
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/fa/stopwords.txt,
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/ro/stopwords.txt,
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/bg/stopwords.txt,
|
2017-08-24 08:05:22 -04:00
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/hi/stopwords.txt,
|
|
|
|
analysis/common/src/resources/org/apache/lucene/analysis/bn/stopwords.txt
|
2012-04-17 23:29:38 -04:00
|
|
|
See http://members.unine.ch/jacques.savoy/clef/index.html.
|
|
|
|
|
|
|
|
The German,Spanish,Finnish,French,Hungarian,Italian,Portuguese,Russian and Swedish light stemmers
|
|
|
|
(common) are based on BSD-licensed reference implementations created by Jacques Savoy and
|
|
|
|
Ljiljana Dolamic. These files reside in:
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/de/GermanLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/de/GermanMinimalStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/fi/FinnishLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/hu/HungarianLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/pt/PortugueseLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/ru/RussianLightStemmer.java
|
|
|
|
analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishLightStemmer.java
|
|
|
|
|
2017-05-23 03:13:25 -04:00
|
|
|
The Stempel analyzer (stempel) includes BSD-licensed software developed
|
2012-04-17 23:29:38 -04:00
|
|
|
by the Egothor project http://egothor.sf.net/, created by Leo Galambos, Martin Kvapil,
|
|
|
|
and Edmond Nolan.
|
|
|
|
|
|
|
|
The Polish analyzer (stempel) comes with a default
|
|
|
|
stopword list that is BSD-licensed created by the Carrot2 project. The file resides
|
|
|
|
in stempel/src/resources/org/apache/lucene/analysis/pl/stopwords.txt.
|
2022-06-13 13:50:54 -04:00
|
|
|
See https://github.com/carrot2/carrot2.
|
2012-04-17 23:29:38 -04:00
|
|
|
|
|
|
|
The SmartChineseAnalyzer source code (smartcn) was
|
|
|
|
provided by Xiaoping Gao and copyright 2009 by www.imdict.net.
|
|
|
|
|
2017-05-23 03:13:25 -04:00
|
|
|
WordBreakTestUnicode_*.java (under modules/analysis/common/src/test/)
|
|
|
|
is derived from Unicode data such as the Unicode Character Database.
|
2012-04-17 23:29:38 -04:00
|
|
|
See http://unicode.org/copyright.html for more details.
|
|
|
|
|
|
|
|
The Morfologik analyzer (morfologik) includes BSD-licensed software
|
2022-06-13 13:50:54 -04:00
|
|
|
developed by Dawid Weiss and Marcin Miłkowski
|
|
|
|
(https://github.com/morfologik/morfologik-stemming) and uses
|
|
|
|
data from the BSD-licensed dictionary of Polish (SGJP, http://sgjp.pl/morfeusz/).
|
2012-04-17 23:29:38 -04:00
|
|
|
|
|
|
|
===========================================================================
|
|
|
|
Kuromoji Japanese Morphological Analyzer - Apache Lucene Integration
|
|
|
|
===========================================================================
|
|
|
|
|
|
|
|
This software includes a binary and/or source version of data from
|
|
|
|
|
|
|
|
mecab-ipadic-2.7.0-20070801
|
|
|
|
|
|
|
|
which can be obtained from
|
|
|
|
|
|
|
|
http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz
|
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
http://jaist.dl.sourceforge.net/project/mecab/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz
|
|
|
|
|
|
|
|
===========================================================================
|
|
|
|
mecab-ipadic-2.7.0-20070801 Notice
|
|
|
|
===========================================================================
|
|
|
|
|
|
|
|
Nara Institute of Science and Technology (NAIST),
|
|
|
|
the copyright holders, disclaims all warranties with regard to this
|
|
|
|
software, including all implied warranties of merchantability and
|
|
|
|
fitness, in no event shall NAIST be liable for
|
|
|
|
any special, indirect or consequential damages or any damages
|
|
|
|
whatsoever resulting from loss of use, data or profits, whether in an
|
|
|
|
action of contract, negligence or other tortuous action, arising out
|
|
|
|
of or in connection with the use or performance of this software.
|
|
|
|
|
|
|
|
A large portion of the dictionary entries
|
|
|
|
originate from ICOT Free Software. The following conditions for ICOT
|
|
|
|
Free Software applies to the current dictionary as well.
|
|
|
|
|
|
|
|
Each User may also freely distribute the Program, whether in its
|
|
|
|
original form or modified, to any third party or parties, PROVIDED
|
|
|
|
that the provisions of Section 3 ("NO WARRANTY") will ALWAYS appear
|
|
|
|
on, or be attached to, the Program, which is distributed substantially
|
|
|
|
in the same form as set out herein and that such intended
|
|
|
|
distribution, if actually made, will neither violate or otherwise
|
|
|
|
contravene any of the laws and regulations of the countries having
|
|
|
|
jurisdiction over the User or the intended distribution itself.
|
|
|
|
|
|
|
|
NO WARRANTY
|
|
|
|
|
|
|
|
The program was produced on an experimental basis in the course of the
|
|
|
|
research and development conducted during the project and is provided
|
|
|
|
to users as so produced on an experimental basis. Accordingly, the
|
|
|
|
program is provided without any warranty whatsoever, whether express,
|
|
|
|
implied, statutory or otherwise. The term "warranty" used herein
|
|
|
|
includes, but is not limited to, any warranty of the quality,
|
|
|
|
performance, merchantability and fitness for a particular purpose of
|
|
|
|
the program and the nonexistence of any infringement or violation of
|
|
|
|
any right of any third party.
|
|
|
|
|
|
|
|
Each user of the program will agree and understand, and be deemed to
|
|
|
|
have agreed and understood, that there is no warranty whatsoever for
|
|
|
|
the program and, accordingly, the entire risk arising from or
|
|
|
|
otherwise connected with the program is assumed by the user.
|
|
|
|
|
|
|
|
Therefore, neither ICOT, the copyright holder, or any other
|
|
|
|
organization that participated in or was otherwise related to the
|
|
|
|
development of the program and their respective officials, directors,
|
|
|
|
officers and other employees shall be held liable for any and all
|
|
|
|
damages, including, without limitation, general, special, incidental
|
|
|
|
and consequential damages, arising out of or otherwise in connection
|
|
|
|
with the use or inability to use the program or any product, material
|
|
|
|
or result produced or otherwise obtained by using the program,
|
|
|
|
regardless of whether they have been advised of, or otherwise had
|
|
|
|
knowledge of, the possibility of such damages at any time during the
|
|
|
|
project or thereafter. Each user will be deemed to have agreed to the
|
|
|
|
foregoing by his or her commencement of use of the program. The term
|
|
|
|
"use" as used herein includes, but is not limited to, the use,
|
|
|
|
modification, copying and distribution of the program and the
|
|
|
|
production of secondary products from the program.
|
|
|
|
|
|
|
|
In the case where the program, whether in its original form or
|
|
|
|
modified, was distributed or delivered to or received by a user from
|
|
|
|
any person, organization or entity other than ICOT, unless it makes or
|
|
|
|
grants independently of ICOT any specific warranty to the user in
|
|
|
|
writing, such person, organization or entity, will also be exempted
|
|
|
|
from and not be held liable to the user for any such damages as noted
|
|
|
|
above as far as the program is concerned.
|
2018-04-13 05:26:42 -04:00
|
|
|
|
|
|
|
===========================================================================
|
|
|
|
Nori Korean Morphological Analyzer - Apache Lucene Integration
|
|
|
|
===========================================================================
|
|
|
|
|
|
|
|
This software includes a binary and/or source version of data from
|
|
|
|
|
2022-02-20 07:39:03 -05:00
|
|
|
mecab-ko-dic-2.1.1-20180720
|
2018-04-13 05:26:42 -04:00
|
|
|
|
|
|
|
which can be obtained from
|
|
|
|
|
2022-02-20 07:39:03 -05:00
|
|
|
https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
|