Bloom filter documentation (#128)

* Added initial bloom filter code.  Added changed lang3 dependency from
test to compile in pom.xml

* added tests + made recommended changes.

* Updated documentation

* refactored ProtoBloomFilter added tests.

* Cleand up code and added tests

* Added CountingBloomFilter

* Fixed CountingBloomFilter issues

Fixed checkstyle and bug report issues

* Initial bloom filter collections checkin

* Added unit tests

* fixed test cases

* Extract BloomFilter as an interface

* added missing license info

* fixed Jacoco errors

* fixed names for so build picks up tests

* cleaned up Jacoco report for BloomNestedCollection

* removed unused code

* cleaned up and reformatted

* added javadoc

fixed issue with BloomNestedCollection detecting duplicates in an edge
case.

* fixed candidate testing bug

* Cleand up niggling report issues.

* fixed javadoc errors

* fixed javadoc for java 13 issue

* Second set of fixes.


* "package private for testing" for methods and properties.
* In "Builder":
** Field "hashes" made "final"
* removes some "Serializable" implementations.
* "StandardBloomFilter" made non non "final" fields final and changed
"final protected" to "final private".
* removed transient fields
* made Package name singular
* added javadocs for private and protected fields and methods.
* Occurrences of "bloom" replaced with "Bloom"

* removed checkstyle and findbugs exclusions

* Fixed method and class names

* Documentation updates

* Fixed checkstyle isses

Added BloomFilterConfiguration functions for estimation.

* added .checkstyle to eclipse ignore section.

* renamed test classes to match main class names

* Updated the documentation.

* Implemented requested changes.  Part of COLLECTIONS-728

Changed remaining "get" comments to "gets" etc.
Added final where possible and reasonable.
renamed enum Change to CHANGE
fixed missing javadoc links and missed name changes.
fixed ProtoBloomFilter hashCode
renamed CollectionStatistics to BloomCollectionStatistics
renamed CollectionConfiguration to BloomCollectionConfiguration
renamed BloomCollectionStatistics.getTxnCount() to getTransactionCount()

* Added final set of constructors and tests for them.

Cleaned up issues from Gilles Sadowski review

* fixes for Gilles Sadowski issues in BloomCollectionStatistics

* Update javadoc

* renamed match() -> matches() and inverseMatch() -> inverseMatches()

This follows the pattern set with the Object.equals() method name.

* added isFull() method to check if a bloom filter is full.

* Changed gate from StandardBloomFilter to BloomFilter

* renamed BloomCollectionX -> BloomFilterGatedX

specifically:
BloomCollectionConfiguration -> BloomFilterGatedConfiguraiton
BloomCollectionStatistics -> BloomFilterGatedStatistics

* Made the StandardBloomFilter(BitSet) constructor public

* removed extraneous build() methods from ProtoBloomFilter.Factory

* Added Use cases

* Initial cut

* changes for interface

* Changed to Hasher implementation

* Added missing files and removed Shape from some BloomFilter calls

* Added  @since 4.5 tags

* fixed javadoc

* fixed PMD errors

* Added tests and fixed sign extension issues

* changed to Byte constant

* made BloomFilter.verify*() non final

* Added remove(Hasher) for completeness

* Replaced private implementation of MurmurHash3 with commons-codec

* fixed typo

* Removed Hasher.Factory added HashFunction interface

* removed Usage.md

* made commons-codec dependency optional

* Improved performance of Iterator.

* renamed instance variable "md" as messageDigest.

* updated javadoc

* renamed Iter to Iterator and removed unused imports

* removed unused imports

* Made instance variables final.

Also fixed MD5 constructor to throw IllegalStateException if MD5 algo
can not be found.

* removed unused imports

* Updated javadoc.

* Added HashFunctionIdentity to replace HashFunctionName

Added test cases, updated java doc.
Renamed function implementations to reflect actual function.
Added comparators for HashFunctionIdentity

* fixed naming issues

* Updated javadoc

* fixed checkstyle issue

* Removed link that was causing problems in java 11+ javadoc

* changed HashFunctionIdentity.getProcess() to getProcessType()

* changed HashFunctionIdentity.getProcess() to getProcessType()

* Added package documentation

* Added BloomFilter interface and removed unnecessary methods

* updated tests and fixed issues

* Moved set operations to separate class and updated tests

* fixed FindBugs, PMD and Checkstyle errors

* fixed javadocs

* Added SetOperations and tests

* Added javadocs indicating optional commons-codec required

* Added another cosine test

* Updated to commons-codec 1.14

* fixed typos

* moved Hasher to o.a.c.c.b.hasher package

* extracted Shape.java and moved to o.a.c.c.b.hasher package

* Added javadoc and removed unused imports in testing code

* Added isEmpty() method to Hasher

* initial documentation

* updated to latest mathjax

* Fixed typographical issues
This commit is contained in:
Claude Warren 2020-01-26 17:16:54 +00:00 committed by Gary Gregory
parent d61b83be16
commit 5639a5d790
4 changed files with 1561 additions and 0 deletions

View File

@ -0,0 +1,399 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*======================================================================
Core CSS for LaTeXML documents converted to (X)HTML */
/* Generic Page layout */
.ltx_page_header,
.ltx_page_footer { font-size:0.8em; }
.ltx_page_header *[rel~="prev"],
.ltx_page_footer *[rel~="prev"] { float:left; }
.ltx_page_header *[rel~="up"],
.ltx_page_footer *[rel~="up"] { display:block; text-align:center; }
.ltx_page_header *[rel~="next"],
.ltx_page_footer *[rel~="next"] { float:right; }
/* What was I trying for here; need more selective rule!
.ltx_page_header .ltx_ref,
.ltx_page_footer .ltx_ref {
margin:0 1em; }
*/
.ltx_page_header li {
padding:0.1em 0.2em 0.1em 1em;}
/* Main content */
.ltx_page_content { clear:both; }
.ltx_page_header { border-bottom:1px solid; margin-bottom:5px; }
.ltx_page_footer { clear:both; border-top:1px solid; margin-top:5px; }
.ltx_page_header:after,
.ltx_page_footer:after,
.ltx_page_content:after {
content:"."; display:block; height:0; clear:both; visibility:hidden; }
.ltx_page_footer:before {
content:"."; display:block; height:0; clear:both; visibility:hidden; }
.ltx_page_logo { font-size:80%; margin-top: 5px; clear:both; float:right; }
.ltx_page_logo a { font-variant: small-caps; }
.ltx_page_logo img { vertical-align:-3px; }
/* if shown */
.ltx_page_navbar li { white-space:nowrap; display:block; overflow:hidden; }
/* If ref got turned into span, it's "this section"*/
.ltx_page_navbar li span.ltx_ref { white-space:normal; overflow:visible; }
/* Ought to be easily removable/overridable? */
.ltx_pagination.ltx_role_newpage { height:2em; }
/*======================================================================
Document Structure; Titles & Frontmatter */
/* undo bold here to remove the browser's native h# styling,
at let all other styles override it (with more specific rules)*/
.ltx_title { font-size:100%; font-weight:normal; }
/* Hack to simulate run-in! put class="ltx_runin" on a title or tag
for it to run-into the following text. */
.ltx_runin { display:inline; }
.ltx_runin:after { content:" "; }
.ltx_runin + .ltx_para,
.ltx_runin + .ltx_para p,
.ltx_runin + p { display:inline; }
.ltx_outdent { margin-left: -2em; }
/* .ltx_chapter_title, etc should be in ltx-article.css etc.
*/
.ltx_page_main { margin:0px; padding:1em 3em 1em 2em; }
.ltx_tocentry { list-style-type:none; }
/* support for common author block layouts.*/
/* add class ltx_authors_1line to get authors in single line
with pop-up affiliation, etc. */
.ltx_authors_1line .ltx_creator,
.ltx_authors_1line .ltx_author_before,
.ltx_authors_1line .ltx_author_after { display:inline;}
.ltx_authors_1line .ltx_author_notes { display:inline-block; }
.ltx_authors_1line .ltx_author_notes:before { content:"*"; color:blue;}
.ltx_authors_1line .ltx_author_notes span { display:none; }
.ltx_authors_1line .ltx_author_notes:hover span {
display:block; position:absolute; z-index:10;
background:white; text-align:left;
border: 1px solid black; border-radius: 0 5px 5px 5px; box-shadow: 5px 5px 10px gray; }
/* add class=ltx_authors_multiline to get authors & affliations on separate lines*/
.ltx_authors_multiline .ltx_creator,
.ltx_authors_multiline .ltx_author_before,
.ltx_authors_multiline .ltx_author_after,
.ltx_authors_multiline .ltx_author_notes,
.ltx_authors_multiline .ltx_author_notes .ltx_contact {
display:block; }
/*======================================================================
Para level */
.ltx_float {
margin: 1ex 3em 1ex 3em; }
td.ltx_subfigure,
td.ltx_subtable,
td.ltx_subfloat { width:50%; }
/* theorems, figure, tables, floats captions.. */
/*======================================================================
Blocks, Lists, Floats */
.ltx_p,
.ltx_quote,
.ltx_block,
.ltx_para {
display: block; }
/* alignment within blocks */
.ltx_align_left { text-align:left; }
.ltx_align_right { text-align:right; }
.ltx_align_center { text-align:center; }
.ltx_align_justify { text-align:justify; }
.ltx_align_top { vertical-align:top; }
.ltx_align_bottom { vertical-align:bottom; }
.ltx_align_middle { vertical-align:middle; }
.ltx_align_baseline { vertical-align:baseline; }
.ltx_align_floatleft { float:left; }
.ltx_align_floatright { float:right; }
.ltx_td.ltx_align_left, .ltx_th.ltx_align_left,
.ltx_td.ltx_align_right, .ltx_th.ltx_align_right,
.ltx_td.ltx_align_center, .ltx_th.ltx_align_center { white-space:nowrap; }
.ltx_td.ltx_align_left.ltx_wrap, .ltx_th.ltx_align_left.ltx_wrap,
.ltx_td.ltx_align_right.ltx_wrap, .ltx_th.ltx_align_right.ltx_wrap,
.ltx_td.ltx_align_center.ltx_wrap, .ltx_th.ltx_align_center.ltx_wrap,
.ltx_td.ltx_align_justify, .ltx_th.ltx_align_justify { white-space:normal; }
.ltx_tabular .ltx_tabular { width:100%; }
.ltx_inline-block { display:inline-block; }
/* equations in non-aligned mode (not normally used) */
.ltx_eqn_div { display:block; width:95%; text-align:center; }
/* equations in aligned mode (aligning tags, etc as well as equations) */
.ltx_eqn_table { display:table; width:100%; border-collapse:collapse; }
.ltx_eqn_row { display:table-row; }
.ltx_eqn_cell { display:table-cell; width:auto; }
/* Padding between column pairs in ams align */
table.ltx_eqn_align tr.ltx_equation td.ltx_align_left + td.ltx_align_right,
table.ltx_eqn_align tr.ltx_equation td.ltx_align_left + td.ltx_align_center,
table.ltx_eqn_align tr.ltx_equation td.ltx_align_center + td.ltx_align_right,
table.ltx_eqn_align tr.ltx_equation td.ltx_align_center + td.ltx_align_center { padding-left:3em; }
table.ltx_eqn_eqnarray tr.ltx_eqn_lefteqn + tr td.ltx_align_right { min-width:2em; }
.ltx_eqn_eqno { max-width:0em; overflow:visible; white-space: nowrap; }
.ltx_eqn_eqno.ltx_align_right .ltx_tag { float:right; }
.ltx_eqn_center_padleft,
.ltx_eqn_center_padright { width:50%; min-width:2em;}
.ltx_eqn_left_padleft,
.ltx_eqn_right_padright { min-width:2em; }
.ltx_eqn_left_padright,
.ltx_eqn_right_padleft { width:100%; }
/* Various lists */
.ltx_itemize,
.ltx_enumerate,
.ltx_description {
display:block; }
.ltx_itemize .ltx_item,
.ltx_enumerate .ltx_item {
display: list-item; }
/* Position the tag to look like a normal item bullet. */
li.ltx_item > .ltx_tag {
display:inline-block; margin-left:-1.5em; min-width:1.5em;
text-align:right; }
.ltx_item .ltx_tag + .ltx_para,
.ltx_item .ltx_tag + .ltx_para .ltx_p { display:inline; }
/* NOTE: Need to try harder to get runin appearance? */
dl.ltx_description dt { margin-right:0.5em; float:left;
font-weight:bold; font-size:95%; }
dl.ltx_description dd { margin-left:5em; }
dl.ltx_description dl.ltx_description dd { margin-left:3em; }
/* Theorems */
.ltx_theorem {margin:1em 0em 1em 0em; }
.ltx_title_theorem { font-size:100%; }
/* Bibliographies */
.ltx_bibliography dt { margin-right:0.5em; float:left; }
.ltx_bibliography dd { margin-left:3em; }
/*.ltx_biblist { list-style-type:none; }*/
.ltx_bibitem { list-style-type:none; }
.ltx_bibitem .ltx_tag { font-weight:bold; margin-left:-2em; width:3em; }
/*.bibitem-tag + div { display:inline; }*/
.ltx_bib_title { font-style:italic; }
.ltx_bib_article .bib-title { font-style:normal !important; }
.ltx_bib_journal { font-style:italic; }
.ltx_bib_volume { font-weight:bold; }
/* Indices */
.ltx_indexlist li { list-style-type:none; }
.ltx_indexlist { margin-left:1em; padding-left:1em;}
/* Listings */
.ltx_listing {
display:block;
margin: 1ex 3em 1ex 0em;
overflow-x:auto;
text-align: left; }
.ltx_float .ltx_listing {
margin: 0; }
.ltx_listingline { white-space:nowrap; min-height:1em; }
.ltx_lst_numbers_left .ltx_listingline .ltx_tag {
background-color:transparent;
margin-left:-3em; width:2.5em;
position:absolute;
text-align:right; }
.ltx_lst_numbers_right .ltx_listingline .ltx_tag {
background-color:transparent;
width:2.5em;
position:absolute; right:3em;
text-align:right; }
/*
position:absolute; left:0em;
max-width:0em; text-align:right; }
*/
.ltx_parbox {text-indent:0em; }
/* NOTE that it is CRITICAL to put position:relative outside & absolute inside!!
I wish I understood why!
Outer box establishes resulting size, neutralizes any outer positioning, etc;
inner establishes position of stuff to be rotated */
.ltx_transformed_outer {
position:relative; bottom:0pt;left:0pt;
overflow:visible; }
.ltx_transformed_inner {
display:block;
position:absolute;bottom:0pt;left:0pt; }
.ltx_transformed_inner > .ltx_p {text-indent:0em; margin:0; padding:0; }
/* If simulating a table (html5), try to get rowspan to work...sorta? */
span.ltx_rowspan { position:absolute; top:0; bottom:0; }
/* by default, p doesn't indent */
.ltx_p { text-indent:0em; white-space:normal; }
/* explicit control of indentation (on ltx_para) */
.ltx_indent > .ltx_p:first-child { text-indent:2em!important; }
.ltx_noindent > .ltx_p:first-child { text-indent:0em!important; }
/*======================================================================
Columns */
.ltx_page_column1 {
width:44%; float:left; } /* IE uses % of wrong container*/
.ltx_page_column2 {
width:44%; float:right; }
.ltx_page_columns > .ltx_page_column1 {
width:48%; float:left; }
.ltx_page_columns > .ltx_page_column2 {
width:48%; float:right; }
.ltx_page_columns:after {
content:"."; display:block; height:0; clear:both; visibility:hidden; }
/*======================================================================
Borders and such */
.ltx_tabular { display:inline-table; border-collapse:collapse; }
.ltx_tabular.ltx_centering { display:table; }
.ltx_thead,
.ltx_tfoot,
.ltx_tbody { display:table-row-group; }
.ltx_tr { display:table-row; }
.ltx_td,
.ltx_th { display:table-cell; }
.ltx_framed { border:1px solid black;}
.ltx_tabular .ltx_td,
.ltx_tabular .ltx_th { padding:0.1em 0.5em; }
/* regular lines */
.ltx_border_t { border-top:1px solid black; }
.ltx_border_r { border-right:1px solid black; }
.ltx_border_b { border-bottom:1px solid black; }
.ltx_border_l { border-left:1px solid black; }
/* double lines */
.ltx_border_tt { border-top:3px double black; }
.ltx_border_rr { border-right:3px double black; }
.ltx_border_bb { border-bottom:3px double black; }
.ltx_border_ll { border-left:3px double black; }
/* Light lines */
.ltx_border_T { border-top:1px solid gray; }
.ltx_border_R { border-right:1px solid gray; }
.ltx_border_B { border-bottom:1px solid gray; }
.ltx_border_L { border-left:1px solid gray; }
/* Framing */
.ltx_framed_rectangle { border-style:solid; border-width:1px; }
.ltx_framed_top { border-top-style:solid; border-top-width:1px; }
.ltx_framed_left { border-left-style:solid; border-left-width:1px; }
.ltx_framed_right { border-right-style:solid; border-right-width:1px; }
.ltx_framed_bottom,
.ltx_framed_underline { border-bottom-style:solid; border-bottom-width:1px; }
.ltx_framed_topbottom { border-top-style:solid; border-top-width:1px;
border-bottom-style:solid; border-bottom-width:1px; }
.ltx_framed_leftright { border-left-style:solid; border-left-width:1px;
border-right-style:solid; border-right-width:1px; }
/*======================================================================
Misc */
/* .ltx_verbatim*/
.ltx_verbatim { text-align:left; }
/*======================================================================
Meta stuff, footnotes */
.ltx_note_content { display:none; }
/*right:5%; */
.ltx_note_content {
max-width: 70%; font-size:90%; left:15%;
text-align:left;
background-color: white;
padding: 0.5em 1em 0.5em 1.5em;
border: 1px solid black; border-radius: 0 5px 5px 5px; box-shadow: 5px 5px 10px gray; }
.ltx_note_mark { color:blue; }
.ltx_note_type { font-weight: bold; }
.ltx_note { display:inline-block; text-indent:0; } /* So we establish containing block */
.ltx_note_content .ltx_note_mark { position:absolute; left:0.2em; top:-0.1em; }
.ltx_note:hover .ltx_note_content,
.ltx_note .ltx_note_content:hover {
display:block; position:absolute; z-index:10; }
.ltx_ERROR { color:red; }
.ltx_rdf { display:none; }
.ltx_missing { color:red;}
.ltx_nounicode { color:red; }
/*======================================================================
SVG (pgf/tikz ?) basics */
/* Stuff appearing in svg:foreignObject */
.ltx_svg_fog foreignObject { margin:0; padding:0; overflow:visible; }
.ltx_svg_fog foreignObject > p { margin:0; padding:0; display:block; }
/*.ltx_svg_fog foreignObject > p { margin:0; padding:0; display:block; white-space:nowrap; }*/
/*======================================================================
Low-level Basics */
/* Note that LaTeX(ML)'s font model doesn't map quite exactly to CSS's */
/* Font Families => font-family */
.ltx_font_serif { font-family: serif; }
.ltx_font_sansserif { font-family: sans-serif; }
.ltx_font_typewriter { font-family: monospace; }
/* dingbats should be converted to unicode? */
/* Math font families handled within math: script, symbol, fraktur, blackboard ? */
/* Font Series => font-weight */
.ltx_font_bold { font-weight: bold; }
.ltx_font_medium { font-weight: normal; }
/* Font Shapes => font-style or font-variant */
.ltx_font_italic { font-style: italic; font-variant:normal; }
.ltx_font_upright { font-style: normal; font-variant:normal; }
.ltx_font_slanted { font-style: oblique; font-variant:normal; }
.ltx_font_smallcaps { font-variant: small-caps; font-style:normal; }
.ltx_font_oldstyle { font-variant: oldstyle-nums; /* experimental css3 ? Doesn't seem to work!*/
font-style:normal;
-moz-font-feature-settings: "onum";
-ms-font-feature-settings: "onum";
-webkit-font-feature-settings: "onum";
font-variant-numeric: oldstyle-nums; }
.ltx_font_mathcaligraphic { font-family: "Lucida Calligraphy", "Zapf Chancery","URW Chancery L"; }
/*
.ltx_font_mathscript { ? }
*/
cite { font-style: normal; }
.ltx_red { color:red; }
/*.ltx_centering { text-align:center; margin:auto; }*/
/*.ltx_inline-block.ltx_centering,*/
/* Hmm.... is this right in general? */
.ltx_centering { display:block; margin:auto; text-align:center; }
/* Dubious stuff */
.ltx_hflipped {
display:inline-block;
-moz-transform: scaleX(-1);
-o-transform: scaleX(-1);
-webkit-transform: scaleX(-1);
transform: scaleX(-1);
filter: FlipH;
-ms-fliter: "FlipH"; }
.ltx_vflipped {
display:inline-block;
-moz-transform: scaleY(-1);
-o-transform: scaleY(-1);
-webkit-transform: scaleY(-1);
transform: scaleY(-1);
filter: FlipV;
-ms-fliter: "FlipV"; }
/* .ltx_phantom handled in xslt */

View File

@ -0,0 +1,24 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
div.references { display: table }
div.references div { display: table-row}
span.refName { float:left; font-weight:bold}
span.refBody { display:table-cell; padding-left: 1%;}

File diff suppressed because it is too large Load Diff

View File

@ -40,6 +40,7 @@ This document highlights some key features to get you started.
</ul>
</li>
<li><a href='#Bags'>Bags</a></li>
<li><a href="bloomFilters.html">Bloom filters</a></li>
</ul>
<subsection name='Note On Synchronization'>
<p>