2981 Commits

Author SHA1 Message Date
dota17
4a662289d3 Modified the error in javadoc of BulkTest 2020-03-28 17:57:57 +13:00
Gary Gregory
0a9dccb7bc Format. 2020-03-27 14:08:14 -04:00
Gary Gregory
f9fb07955d Update tests from Apache Commons Lang 3.9 to 3.10. 2020-03-27 14:01:45 -04:00
dota17
092959ddcb Fixtypo for the bloomFilter 2020-03-25 19:26:58 +08:00
dota17
e70a21d7cd Import the fail 2020-03-19 14:35:35 +08:00
dota17
514c2eddfc add a testcase for DynamicHasher.NoValuesIterator.nextInt() 2020-03-19 11:33:44 +08:00
dota17
a7973b8d30 Fixed Murmur128x64Cyclic 2020-03-19 11:04:48 +08:00
Alex Herbert
39aef59785 Optimise DynamicHasher iterator. 2020-03-18 11:18:40 +00:00
Alex Herbert
bbee9fbd9b Update Hasher.Builder.
Add default methods to add a CharSequenece.

Make it clear each object added to the Builder should represent an
entire item.

Document that build() should reset the builder for future use.
2020-03-18 10:49:15 +00:00
Alex Herbert
a34da7bcf5 DefaultBloomFilterMethodsTest: Correct javadoc for internal test class 2020-03-18 09:24:55 +00:00
dota17
00408690a2 Fixtypo for BloomfilterTest 2020-03-18 14:49:08 +08:00
Alex Herbert
2cbac58f7e Remove empty line. 2020-03-17 12:57:57 +00:00
Alex Herbert
70947b1767 Add link to Hasher in the HashFunction javadoc header 2020-03-17 12:43:20 +00:00
Alex Herbert
ac2c7f2206 Improve documentation of Hasher. 2020-03-17 12:41:38 +00:00
Alex Herbert
0feeab0820 Change Hasher.getBits() to iterator() 2020-03-17 12:27:43 +00:00
Alex Herbert
976d645835 Remove Hasher isEmpty() 2020-03-17 12:16:09 +00:00
Alex Herbert
f00daff8c8 Fix typo in Shape.checkNumberOfBits 2020-03-17 07:39:20 +00:00
Alex Herbert
d6eeceb018 Optimise ObjectsHashIterative hash function.
Avoid using Arrays.deepHashCode. The array passed to deepHashCode is
always length 2. So we can unroll the same computation for the fixed 2
iterations.
2020-03-17 00:59:00 +00:00
aherbert
a699c8b9ba Update Hasher javadoc.
Remove trailing periods from params and returns.

Remove the specification in the Hasher.Builder to convert the String to
bytes using the UTF-8 charset. This is an implementation detail. It has
been moved to the DynamicHasher implementation.

Update exception message for getBits to be less specific. The reference
to getName() is now obsolete.
2020-03-16 17:14:28 +00:00
Alex Herbert
142d53a6a5 Remove trailing whitespace 2020-03-15 23:43:12 +00:00
Alex Herbert
7b15598da0 Update javadoc for ArrayCountingBloomFilter.
Document that no exception is raised when the filter state transitions
to invalid.
2020-03-15 23:26:40 +00:00
Alex Herbert
9de28a7b62 Updated the BloomFilter javadoc.
Remove trailing periods on parameters and arguments.

Remove reference to LongBuffer. Clarify what the long[] represents in
'long[] getBits()'.

Clarify cardinality using (number of enabled bits).

Rearrange BloomFilter interface methods to functional order. The order
is:

- Query operations
- Modification operations
- Counting operations

Improve javadoc for BloomFilter contains with additional information for
what 'contains' means.

Update exception message for contains/merge/add/subtract to be
consistent.
2020-03-15 23:17:43 +00:00
Alex Herbert
86bac5e602 Change BloomFilter merge return type from void to boolean.
This is to support the extension to a counting Bloom filter which can
return true/false if the state is valid.

Drops redundant abstract methods from the AbstractBloomFilter that are
overrides of the BloomFilter interface .
2020-03-15 21:36:21 +00:00
Alex Herbert
22d161a25b Delete MapCountingBloomFilter.
This is obsolete given the ArrayCountingBloomFilter.
2020-03-14 14:25:28 +00:00
Alex Herbert
fe88827643 Move the unique filtering of the Hasher indexes to a separate class. 2020-03-14 14:22:09 +00:00
Alex Herbert
fb358a5c80 Added CountingBloomFilter interface and ArrayCountingBloomFilter. 2020-03-14 14:22:09 +00:00
Alex Herbert
9f4953f4cb Rename CountingBloomFilter to MapCountingBloomFilter 2020-03-14 14:22:09 +00:00
Alex Herbert
90f705e732 Change log to ln in Shape javadoc 2020-03-14 08:12:48 +00:00
Alex Herbert
e3484deb51 Fix ShapeTest typos 2020-03-14 07:59:12 +00:00
Alex Herbert
a1dd122342 Consolidate @throws clauses for Shape 2020-03-14 07:35:39 +00:00
Alex Herbert
34a5a6f0c5 Change minimum number of bits from 8 to 1 2020-03-14 07:27:30 +00:00
aherbert
32a730d964 Remove Shape getNumberOfBytes
This method only applies to a Bloom filter using an uncompressed byte
representation. It is trivially derived from the number of bits.
2020-03-13 15:13:00 +00:00
aherbert
7b22b4ddc6 Update javadoc for Shape.
Update documented exception conditions.

Update javadoc for the shape properties to drop AKA abbreviation.

Change Probability of collision to Probability of False positives.

Update the getProbability method to document it applies to a filter full
to the intended capacity.
2020-03-13 15:10:16 +00:00
Alex Herbert
3a981a01b7 Update BloomFilterIndex comments and added tests for negative index. 2020-03-12 23:39:51 +00:00
aherbert
391d91e353 Improved documentation of Murmur3 hash functions.
Added references to Commons Codec and SMHasher.
2020-03-12 17:03:02 +00:00
aherbert
8fb518e6a1 Standardise computation of signatures. 2020-03-12 16:42:32 +00:00
aherbert
33d6ddc7f9 Correct javadoc of the hash function signature. 2020-03-12 15:17:39 +00:00
aherbert
9f2271334d Update the hash function tests to use a base class.
The base class performs the standard signature test that all hash
functions should pass.
2020-03-12 15:13:54 +00:00
aherbert
a51c96520a Remove javadocs in overridden methods that are duplicates.
An exact copy of the javadoc is redundant. It also means updates to the
parent get lost by those inheriting. It is better to use {@inheritDoc}
and add extra information.
2020-03-12 14:22:18 +00:00
aherbert
2a9bdc0098 Improve comment in BloomFilterIndexer. 2020-03-12 13:59:19 +00:00
aherbert
eda601dd04 Update package info for Bloom filter sub-packages. 2020-03-12 13:55:28 +00:00
Alex Herbert
cb967680c3 Standardise the Bloom filter shape equations.
Equations match those in:

https://hur.st/bloomfilter/

Fixed documentation of the approximate value of the denominator. Compute
using a re-arrangement.
2020-03-10 06:46:27 +00:00
Alex Herbert
03543e5f9b Ensure hashCode hashes the same properties as the equality.
Since HashFunctionIdentity is an interface there is no control over what
is hashed. Add a hash function to the HashFunctionValidator to ensure
the hash code is the same if two hash functions are equal according to
the hashFunctionIdentity.

Note: Since Shape is final we use the properties directly and not through the get methods.
2020-03-10 01:11:52 +00:00
Alex Herbert
0964d5bf19 Standardise Shape constructor validations.
Standardise the constructor assertions to functions.

Ensure Shape catches NaN probability in the constructor.

Previously NaN would result in a NaN computation for the number of bits.
When cast to int it would be zero. This change improves the error
message in the exception.

Clean-up javadocs.

Ensure Shape is final. If not final then the rest of the Bloom filter
API cannot assume that a Shape is valid as it may be extended and the
computations changed.
2020-03-10 00:29:04 +00:00
Alex Herbert
cb88c4ed01 Achieve 100% test coverage for BitSetBloomFilter.
This is done by duplicating the and/or/xor cardinality tests and merge
tests in the AbtsractBloomFilterTest using the current filter type
(provided via abstract methods) and a generic BloomFilter
implementation.
2020-03-09 22:49:47 +00:00
Alex Herbert
90ed5343bb Remove toString() method from BitSetBloomFilter.
If the BloomFilter is large and reaching capacity the toString() method
may overflow the length of an char[] array.
2020-03-09 22:19:21 +00:00
Alex Herbert
c18cd7b86e Increase HasherBloomFilter test coverage.
Coverage cannot reach 100% because assert statements have been included
that test assumptions. These asserts are unreachable if the StaticHasher
functions as expected and returns an iterator with at least 1 value when
it reports a non-zero size.
2020-03-09 22:07:09 +00:00
Alex Herbert
8b4ecbc084 Compute the bit index into a Bloom filter using bit shifts.
Removes the use of Math.floorMod and integer division.

Operations have been put into a class for reuse among filters.
2020-02-25 00:30:43 +00:00
Alex Herbert
24ad759b31 Document the HashFunctionIdentity Signedness 2020-02-24 23:27:55 +00:00
Alex Herbert
9a496dc61c Update the AbstractBloomFilter to not use BitSet for cardinality.
The cardinality can be performed without memory allocation using
Long.bitCount.
2020-02-24 23:01:39 +00:00