3363 Commits

Author SHA1 Message Date
aherbert
32a730d964 Remove Shape getNumberOfBytes
This method only applies to a Bloom filter using an uncompressed byte
representation. It is trivially derived from the number of bits.
2020-03-13 15:13:00 +00:00
aherbert
7b22b4ddc6 Update javadoc for Shape.
Update documented exception conditions.

Update javadoc for the shape properties to drop AKA abbreviation.

Change Probability of collision to Probability of False positives.

Update the getProbability method to document it applies to a filter full
to the intended capacity.
2020-03-13 15:10:16 +00:00
Alex Herbert
3a981a01b7 Update BloomFilterIndex comments and added tests for negative index. 2020-03-12 23:39:51 +00:00
aherbert
391d91e353 Improved documentation of Murmur3 hash functions.
Added references to Commons Codec and SMHasher.
2020-03-12 17:03:02 +00:00
aherbert
8fb518e6a1 Standardise computation of signatures. 2020-03-12 16:42:32 +00:00
aherbert
33d6ddc7f9 Correct javadoc of the hash function signature. 2020-03-12 15:17:39 +00:00
aherbert
9f2271334d Update the hash function tests to use a base class.
The base class performs the standard signature test that all hash
functions should pass.
2020-03-12 15:13:54 +00:00
aherbert
a51c96520a Remove javadocs in overridden methods that are duplicates.
An exact copy of the javadoc is redundant. It also means updates to the
parent get lost by those inheriting. It is better to use {@inheritDoc}
and add extra information.
2020-03-12 14:22:18 +00:00
aherbert
2a9bdc0098 Improve comment in BloomFilterIndexer. 2020-03-12 13:59:19 +00:00
aherbert
eda601dd04 Update package info for Bloom filter sub-packages. 2020-03-12 13:55:28 +00:00
Alex Herbert
cb967680c3 Standardise the Bloom filter shape equations.
Equations match those in:

https://hur.st/bloomfilter/

Fixed documentation of the approximate value of the denominator. Compute
using a re-arrangement.
2020-03-10 06:46:27 +00:00
Alex Herbert
03543e5f9b Ensure hashCode hashes the same properties as the equality.
Since HashFunctionIdentity is an interface there is no control over what
is hashed. Add a hash function to the HashFunctionValidator to ensure
the hash code is the same if two hash functions are equal according to
the hashFunctionIdentity.

Note: Since Shape is final we use the properties directly and not through the get methods.
2020-03-10 01:11:52 +00:00
Alex Herbert
0964d5bf19 Standardise Shape constructor validations.
Standardise the constructor assertions to functions.

Ensure Shape catches NaN probability in the constructor.

Previously NaN would result in a NaN computation for the number of bits.
When cast to int it would be zero. This change improves the error
message in the exception.

Clean-up javadocs.

Ensure Shape is final. If not final then the rest of the Bloom filter
API cannot assume that a Shape is valid as it may be extended and the
computations changed.
2020-03-10 00:29:04 +00:00
Alex Herbert
cb88c4ed01 Achieve 100% test coverage for BitSetBloomFilter.
This is done by duplicating the and/or/xor cardinality tests and merge
tests in the AbtsractBloomFilterTest using the current filter type
(provided via abstract methods) and a generic BloomFilter
implementation.
2020-03-09 22:49:47 +00:00
Alex Herbert
90ed5343bb Remove toString() method from BitSetBloomFilter.
If the BloomFilter is large and reaching capacity the toString() method
may overflow the length of an char[] array.
2020-03-09 22:19:21 +00:00
Alex Herbert
c18cd7b86e Increase HasherBloomFilter test coverage.
Coverage cannot reach 100% because assert statements have been included
that test assumptions. These asserts are unreachable if the StaticHasher
functions as expected and returns an iterator with at least 1 value when
it reports a non-zero size.
2020-03-09 22:07:09 +00:00
Alex Herbert
9831773447 Update travis to run japicmp in the main script and fix coveralls.
Anything that fails in after_success is ignored in travis reporting. The
checks must be done in the main script.

The japicmp was failing due to the lack of a jar to compare and so
coveralls was then not submitting. No coverage reports have been logged
by coveralls since June 2017.
2020-03-04 09:09:46 +00:00
Alex Herbert
8b4ecbc084 Compute the bit index into a Bloom filter using bit shifts.
Removes the use of Math.floorMod and integer division.

Operations have been put into a class for reuse among filters.
2020-02-25 00:30:43 +00:00
Alex Herbert
24ad759b31 Document the HashFunctionIdentity Signedness 2020-02-24 23:27:55 +00:00
Alex Herbert
9a496dc61c Update the AbstractBloomFilter to not use BitSet for cardinality.
The cardinality can be performed without memory allocation using
Long.bitCount.
2020-02-24 23:01:39 +00:00
Alex Herbert
e08f0be55b Revert CountingBloomFilter to ignore counts from another filter.
Partially removes changes made in commit:
6ad69bedd3436a75883e66bd130c77df884be98b

The class requires a revision to handle add/subtract of another
CountingBloomFilter. Restore the tests to check that Counting filters
are merged as if another non-counting filter type.

Remove the javadoc from the CountingBloomFilter methods that state it
uses the counts when merge/remove are called with a CountingBloomFilter
as this is not what the functionality currently performs.

Fix the merge with a hasher to only increment the count by 1 even if the
hasher contains duplicates. Add test to verify this works as documented.
This matches the remove functionality which removes duplicates before
subtraction.
2020-02-24 22:42:20 +00:00
Alex Herbert
ad04fff90d Set scope of method comments to protected.
This was the original level before update of checkstyle config.
2020-02-24 21:22:06 +00:00
aherbert
3aa817726b Add indentation check to checkstyle.
Fixed all code. Case statements have an indent of zero as recommended by
Oracle. Previously the code was using either 0 or 4 as the indent. Using
zero for the check has fewer violations that require fixing.
2020-02-24 21:22:06 +00:00
Alex Herbert
9bc4d0bc61 Fixed checkstyle in tests.
Changed rules to be more lenient on tests.
2020-02-18 23:25:52 +00:00
Alex Herbert
de46979712 Add checkstyle:check to defaultGoal 2020-02-18 23:07:37 +00:00
Alex Herbert
4797acefba Fixed checkstyle.
Add missing newlines at end of files.

Remove redundant modifiers.

Fix incorrect Apache license header.

Fix whitespace around elements.

Remove tab characters.

Fix right-curly location.

Correct modifier order.
2020-02-18 23:07:19 +00:00
Alex Herbert
72f45156d3 Update checkstyle configuration.
Configuration has been added based on commons-lang.

The Apache licence header has been added to the checkstyle config.
2020-02-18 21:38:53 +00:00
aherbert
6ad69bedd3 Increase coverage in CountingBloomFilter test.
The counting functionality appears to be broken. Annotations have been
added to the code at locations that are incorrect.

Tests have been updated that currently fail and disabled to allow the
build to pass.
2020-02-18 16:34:57 +00:00
aherbert
0f78a9c9e8 Test edge case in SetOperations when shapes are different. 2020-02-18 13:57:15 +00:00
aherbert
4ecffb5fee Test getProvider() is Apache Commons Collections. 2020-02-18 13:57:15 +00:00
aherbert
6215948227 Hit all edge cases in the Shape.equals method. 2020-02-18 13:57:15 +00:00
aherbert
55cb720ccf Remove HashFunctionIdentity comparators.
The comparators are never used to perform ordering of functions. The
only current use is to determine that two hash functions are
functionally equivalent. A replacement utility class has been added to
test for equality.
2020-02-18 13:35:58 +00:00
aherbert
66b418f3e9 Update DynamicHasher to have a specialised iterator when empty.
Add tests to exercise the empty and non-empty iterators and their
expected exceptions when no more elements.
2020-02-18 12:39:11 +00:00
aherbert
39f0955920 Removed spurious javadoc tag. 2020-02-17 15:22:13 +00:00
aherbert
d31ebdd0e4 Javadoc clean-up. 2020-02-17 14:10:10 +00:00
aherbert
7aaf396c83 Correct test javadoc headers. 2020-02-17 13:44:14 +00:00
aherbert
1f17189d53 Remove unthrown exception. 2020-02-17 13:34:56 +00:00
aherbert
4033ff63c5 Test code clean-up.
Make methods static if possible.

Correct formatting of braces.
2020-02-17 13:34:14 +00:00
aherbert
373a241752 Removed invalid javadoc. 2020-02-17 13:30:40 +00:00
aherbert
fa028268a8 Remove unthrown exception from test setup(). 2020-02-17 13:26:16 +00:00
aherbert
b377f59613 Remove extra lines. 2020-02-17 13:24:03 +00:00
aherbert
5f70948570 Remove whitespace around parentheses. 2020-02-17 13:17:11 +00:00
aherbert
82273e966e Added orCardinality to BitSetBloomFilter. 2020-02-17 13:06:52 +00:00
aherbert
2a0e867744 Remove javadoc from override method.
The javadoc incorrectly refers to BitSetBloomFilter as the argument.
2020-02-17 13:00:22 +00:00
aherbert
28b381008e Eliminate extra lines. 2020-02-17 12:56:53 +00:00
aherbert
a3e2ea2443 Remove methods from the javadoc that are not implemented.
These methods have been moved from AbstractBloomFilter to SetOperations.
2020-02-17 11:44:28 +00:00
Gary Gregory
d5bf76870f [COLLECTIONS-748] Let
org.apache.commons.collections4.properties.[Sorted]PropertiesFactory
accept XML input.
2020-02-16 20:42:27 -05:00
Gary Gregory
87497d0fa1 Use the stock JRE Objects.requireNonNull() for parameter validation.
Formatting. Javadoc.
2020-02-16 15:39:57 -05:00
Gary Gregory
a1ce1c2121 Formatting. 2020-02-16 15:22:43 -05:00
Gary Gregory
7d06bd77e9 [COLLECTIONS-747] MultiKey.getKeys class cast exception. 2020-02-16 15:18:27 -05:00