Commit Graph

3337 Commits

Author SHA1 Message Date
Alex Herbert bbee9fbd9b Update Hasher.Builder.
Add default methods to add a CharSequenece.

Make it clear each object added to the Builder should represent an
entire item.

Document that build() should reset the builder for future use.
2020-03-18 10:49:15 +00:00
Alex Herbert a34da7bcf5 DefaultBloomFilterMethodsTest: Correct javadoc for internal test class 2020-03-18 09:24:55 +00:00
Alex Herbert f157196e00 Merge branch 'dota17-fixtypoForBloomfilterTest' 2020-03-18 09:22:49 +00:00
dota17 00408690a2 Fixtypo for BloomfilterTest 2020-03-18 14:49:08 +08:00
Alex Herbert 2cbac58f7e Remove empty line. 2020-03-17 12:57:57 +00:00
Alex Herbert 70947b1767 Add link to Hasher in the HashFunction javadoc header 2020-03-17 12:43:20 +00:00
Alex Herbert ac2c7f2206 Improve documentation of Hasher. 2020-03-17 12:41:38 +00:00
Alex Herbert 0feeab0820 Change Hasher.getBits() to iterator() 2020-03-17 12:27:43 +00:00
Alex Herbert 976d645835 Remove Hasher isEmpty() 2020-03-17 12:16:09 +00:00
Alex Herbert f00daff8c8 Fix typo in Shape.checkNumberOfBits 2020-03-17 07:39:20 +00:00
Alex Herbert d6eeceb018 Optimise ObjectsHashIterative hash function.
Avoid using Arrays.deepHashCode. The array passed to deepHashCode is
always length 2. So we can unroll the same computation for the fixed 2
iterations.
2020-03-17 00:59:00 +00:00
aherbert a699c8b9ba Update Hasher javadoc.
Remove trailing periods from params and returns.

Remove the specification in the Hasher.Builder to convert the String to
bytes using the UTF-8 charset. This is an implementation detail. It has
been moved to the DynamicHasher implementation.

Update exception message for getBits to be less specific. The reference
to getName() is now obsolete.
2020-03-16 17:14:28 +00:00
Alex Herbert 142d53a6a5 Remove trailing whitespace 2020-03-15 23:43:12 +00:00
Alex Herbert 7b15598da0 Update javadoc for ArrayCountingBloomFilter.
Document that no exception is raised when the filter state transitions
to invalid.
2020-03-15 23:26:40 +00:00
Alex Herbert 9de28a7b62 Updated the BloomFilter javadoc.
Remove trailing periods on parameters and arguments.

Remove reference to LongBuffer. Clarify what the long[] represents in
'long[] getBits()'.

Clarify cardinality using (number of enabled bits).

Rearrange BloomFilter interface methods to functional order. The order
is:

- Query operations
- Modification operations
- Counting operations

Improve javadoc for BloomFilter contains with additional information for
what 'contains' means.

Update exception message for contains/merge/add/subtract to be
consistent.
2020-03-15 23:17:43 +00:00
Alex Herbert 86bac5e602 Change BloomFilter merge return type from void to boolean.
This is to support the extension to a counting Bloom filter which can
return true/false if the state is valid.

Drops redundant abstract methods from the AbstractBloomFilter that are
overrides of the BloomFilter interface .
2020-03-15 21:36:21 +00:00
Alex Herbert 22d161a25b Delete MapCountingBloomFilter.
This is obsolete given the ArrayCountingBloomFilter.
2020-03-14 14:25:28 +00:00
Alex Herbert fe88827643 Move the unique filtering of the Hasher indexes to a separate class. 2020-03-14 14:22:09 +00:00
Alex Herbert fb358a5c80 Added CountingBloomFilter interface and ArrayCountingBloomFilter. 2020-03-14 14:22:09 +00:00
Alex Herbert 9f4953f4cb Rename CountingBloomFilter to MapCountingBloomFilter 2020-03-14 14:22:09 +00:00
Alex Herbert 90f705e732 Change log to ln in Shape javadoc 2020-03-14 08:12:48 +00:00
Alex Herbert e3484deb51 Fix ShapeTest typos 2020-03-14 07:59:12 +00:00
Alex Herbert a1dd122342 Consolidate @throws clauses for Shape 2020-03-14 07:35:39 +00:00
Alex Herbert 34a5a6f0c5 Change minimum number of bits from 8 to 1 2020-03-14 07:27:30 +00:00
aherbert 32a730d964 Remove Shape getNumberOfBytes
This method only applies to a Bloom filter using an uncompressed byte
representation. It is trivially derived from the number of bits.
2020-03-13 15:13:00 +00:00
aherbert 7b22b4ddc6 Update javadoc for Shape.
Update documented exception conditions.

Update javadoc for the shape properties to drop AKA abbreviation.

Change Probability of collision to Probability of False positives.

Update the getProbability method to document it applies to a filter full
to the intended capacity.
2020-03-13 15:10:16 +00:00
Alex Herbert 3a981a01b7 Update BloomFilterIndex comments and added tests for negative index. 2020-03-12 23:39:51 +00:00
aherbert 391d91e353 Improved documentation of Murmur3 hash functions.
Added references to Commons Codec and SMHasher.
2020-03-12 17:03:02 +00:00
aherbert 8fb518e6a1 Standardise computation of signatures. 2020-03-12 16:42:32 +00:00
aherbert 33d6ddc7f9 Correct javadoc of the hash function signature. 2020-03-12 15:17:39 +00:00
aherbert 9f2271334d Update the hash function tests to use a base class.
The base class performs the standard signature test that all hash
functions should pass.
2020-03-12 15:13:54 +00:00
aherbert a51c96520a Remove javadocs in overridden methods that are duplicates.
An exact copy of the javadoc is redundant. It also means updates to the
parent get lost by those inheriting. It is better to use {@inheritDoc}
and add extra information.
2020-03-12 14:22:18 +00:00
aherbert 2a9bdc0098 Improve comment in BloomFilterIndexer. 2020-03-12 13:59:19 +00:00
aherbert eda601dd04 Update package info for Bloom filter sub-packages. 2020-03-12 13:55:28 +00:00
Alex Herbert cb967680c3 Standardise the Bloom filter shape equations.
Equations match those in:

https://hur.st/bloomfilter/

Fixed documentation of the approximate value of the denominator. Compute
using a re-arrangement.
2020-03-10 06:46:27 +00:00
Alex Herbert 03543e5f9b Ensure hashCode hashes the same properties as the equality.
Since HashFunctionIdentity is an interface there is no control over what
is hashed. Add a hash function to the HashFunctionValidator to ensure
the hash code is the same if two hash functions are equal according to
the hashFunctionIdentity.

Note: Since Shape is final we use the properties directly and not through the get methods.
2020-03-10 01:11:52 +00:00
Alex Herbert 0964d5bf19 Standardise Shape constructor validations.
Standardise the constructor assertions to functions.

Ensure Shape catches NaN probability in the constructor.

Previously NaN would result in a NaN computation for the number of bits.
When cast to int it would be zero. This change improves the error
message in the exception.

Clean-up javadocs.

Ensure Shape is final. If not final then the rest of the Bloom filter
API cannot assume that a Shape is valid as it may be extended and the
computations changed.
2020-03-10 00:29:04 +00:00
Alex Herbert cb88c4ed01 Achieve 100% test coverage for BitSetBloomFilter.
This is done by duplicating the and/or/xor cardinality tests and merge
tests in the AbtsractBloomFilterTest using the current filter type
(provided via abstract methods) and a generic BloomFilter
implementation.
2020-03-09 22:49:47 +00:00
Alex Herbert 90ed5343bb Remove toString() method from BitSetBloomFilter.
If the BloomFilter is large and reaching capacity the toString() method
may overflow the length of an char[] array.
2020-03-09 22:19:21 +00:00
Alex Herbert c18cd7b86e Increase HasherBloomFilter test coverage.
Coverage cannot reach 100% because assert statements have been included
that test assumptions. These asserts are unreachable if the StaticHasher
functions as expected and returns an iterator with at least 1 value when
it reports a non-zero size.
2020-03-09 22:07:09 +00:00
Alex Herbert 9831773447 Update travis to run japicmp in the main script and fix coveralls.
Anything that fails in after_success is ignored in travis reporting. The
checks must be done in the main script.

The japicmp was failing due to the lack of a jar to compare and so
coveralls was then not submitting. No coverage reports have been logged
by coveralls since June 2017.
2020-03-04 09:09:46 +00:00
Alex Herbert 8b4ecbc084 Compute the bit index into a Bloom filter using bit shifts.
Removes the use of Math.floorMod and integer division.

Operations have been put into a class for reuse among filters.
2020-02-25 00:30:43 +00:00
Alex Herbert 24ad759b31 Document the HashFunctionIdentity Signedness 2020-02-24 23:27:55 +00:00
Alex Herbert 9a496dc61c Update the AbstractBloomFilter to not use BitSet for cardinality.
The cardinality can be performed without memory allocation using
Long.bitCount.
2020-02-24 23:01:39 +00:00
Alex Herbert e08f0be55b Revert CountingBloomFilter to ignore counts from another filter.
Partially removes changes made in commit:
6ad69bedd3

The class requires a revision to handle add/subtract of another
CountingBloomFilter. Restore the tests to check that Counting filters
are merged as if another non-counting filter type.

Remove the javadoc from the CountingBloomFilter methods that state it
uses the counts when merge/remove are called with a CountingBloomFilter
as this is not what the functionality currently performs.

Fix the merge with a hasher to only increment the count by 1 even if the
hasher contains duplicates. Add test to verify this works as documented.
This matches the remove functionality which removes duplicates before
subtraction.
2020-02-24 22:42:20 +00:00
Alex Herbert ad04fff90d Set scope of method comments to protected.
This was the original level before update of checkstyle config.
2020-02-24 21:22:06 +00:00
aherbert 3aa817726b Add indentation check to checkstyle.
Fixed all code. Case statements have an indent of zero as recommended by
Oracle. Previously the code was using either 0 or 4 as the indent. Using
zero for the check has fewer violations that require fixing.
2020-02-24 21:22:06 +00:00
Alex Herbert 9bc4d0bc61 Fixed checkstyle in tests.
Changed rules to be more lenient on tests.
2020-02-18 23:25:52 +00:00
Alex Herbert de46979712 Add checkstyle:check to defaultGoal 2020-02-18 23:07:37 +00:00
Alex Herbert 4797acefba Fixed checkstyle.
Add missing newlines at end of files.

Remove redundant modifiers.

Fix incorrect Apache license header.

Fix whitespace around elements.

Remove tab characters.

Fix right-curly location.

Correct modifier order.
2020-02-18 23:07:19 +00:00