Adds the delimiter support in the local blob store. The existing
directory implementation is preserved and is not compatible with the
delimiter option, as the existing implementation assumes the usage of
filesystem directories.
The patch adds delimiter option support in the generic blob store
interface. A live integration test is added to verify that jclouds
correctly lists objects separated by a delimiter.
When the delimiter option is specified, local blob store should behave
similarly to swift and return the directory marker blobs (directory
blobs) and the common prefix. This undoes the filtering we were
previously doing.
When listing blobs, Swift returns an array of "subdir" entries for
every common prefix. The openstack-swift provider needs to process
such entries and add them to the result set. The subdir array is an
array of strings, so additional metadata needs to be added to the
objects (bogus ETag, bogus LastModifiedDate, and so on).
When directory marker blobs are used, this means that potential _two_
entries are generated for every directory if:
1. the delimiter is set and matches the directory blob (e.g. dir/ and
delimiter /)
2. there are objects under the directory name (e.g. dir/blob), which
will result in results that include common prefixes (subdir)
In the above example, we should expect two results: "dir" and "dir/"
representing the directory marker blob and the common prefix,
respectively.
This is caught in the testDirectory integration test.
The patch changes the behavior of the Swift provider to correctly
handle the results in the subdir stanza and changes the test to expect
the directory marker to be returned in the list.
When filtering results, we have to consider that a result was added to
the common prefixes list in its entirety (i.e. a directory). Such
results should be filtered out from the delimiter test.
An example that demonstrates this problem is if one creates a
directory "foo" and an object within it called "file". When listing
the results, they will include the directory object "foo" and the
object under "foo/file". During a non-recursive listing, we create a
list of common prefixes ("foo"). Subsequently, jclouds should remove
all objects that include the delimiter ("/"), however, that would not
apply to "foo". With the change to include the delimiter in the
listing, we need to be careful not to return two values "foo" and
"foo/".
A unit test for the local blob store to highlight this problem is
included. An integration test "testDirectory" also catches this issue.
Relaxes the membership check in prefix testing. The check is no longer
sensitive to ordering of the results, as different providers may
append, prepend, or insert in sorted order the relative prefixes.
We should not append a "/" to the marker when returning list results
in the case of directories (RELATIVE_PATH), as the names will already
include the delimiter.
Added a jclouds test to catch such regressions in the local store in
the future.
Implements prefix support for the local blob store. The patch allows
for correctly parsing prefixes that may not terminate with a delimiter
(i.e. foo with delimiter "/" and a key foobar/key, should return
foobar/ as the common path) and ones that do (i.e. foo/).
NOTE: there is a small change in behavior in this patch.
LocalBlobStore used to return the common prefixes without the
delimiter character ("/"). However, other providers do include the
delimiter (I checked S3 and Google Cloud Storage) and LocalBlobStore
should include it as well.
jclouds should issue the requests to create all of the containers in
the container pool before sleeping. The patch modifies the
createContainerAndEnsureEmpty() method to take an additional
parameter, which specifies whether awaitConsistency() should be called
or not.
Providers yield different messages so remove parsing from test. Also
narrow exception handling scope.TTP 416 into IllegalArgumentException
Providers yield different messages so remove parsing from test. Also
narrow exception handling scope.
for range get in a LocalBlobStore, metadata.getSize() has the
original blob size and contentMetadata.getContentLength() has
the actual payload size. Other blobstores have the same size
in both
container may not be valid hostnames and the current blob uri
code can fail. there isn't really a way to use uri for transient
blobstore anyway, and removing it is the easier option. for
reference, filesystem blobstore also doesn't do this
for range get in a LocalBlobStore, metadata.getSize() has the
original blob size and contentMetadata.getContentLength() has
the actual payload size. Other blobstores have the same size
in both
Some object stores have additive semantics for metadata. As currently
implemented Azure, S3, and Swift consistently replace user metadata
but have different behavior for system metadata, e.g., Content-Type.
Java on OS X does not support extended attributes, which the
filesystem blobstore uses to implement user metadata. This disable
the relevant test on OS X
Some providers, e.g., GCS and Swift, create multi-part blobs with a
manifest pointing to other blobs. Thus a container will have
additional objects in its listing.
This allows callers to read the content length during container
listing. Tested against: atmosonline, aws-s3, azureblob, filesystem,
and transient. Intentionally not implemented for legacy swift
provider.
make the filesystem blob store distinguish between a/ and a. a/
must be a directory blob with no content and only metadata
on listing, only directories with md5 attribute is considered a
blob and returned
Prefer ByteSourcePayload which offers a superset of its functionality.
Note that ByteArrayPayload implicitly set the contentLength while
users of ByteSourcePayload must do so explicitly.
This matches how most blobstores operate: delete container is a single
operation, not a compound operation which recursively deletes blobs.
Azure is the only provider which allows deleting a non-empty
container.
This commit replaces file resource-based test inputs with in-memory
equivalents. This is more consistent and efficient than the previous
approach. Also resized some test inputs to be partSize + 1 instead of
2 * partSize. Tested against aws-s3, blobstore, core, cloudfiles-us,
and filesystem.
Also remove bogus delete workaround. Previously unclosed
FileInputStream caused test failures on Windows which cannot delete
open files. Found with Kohsuke's file-leak-detector.
Callers should instead explicitly set contentMD5, usually with the
results from Guava Hashing.md5(). This narrows the API and removes a
strange IOException from callers. Further it removes a dangerous
rebuffering of arbitrarily-large non-repeatable Payloads.
This method informs callers of the expected blobstore behavior,
avoiding unnecessary work for providers which support strict
consistency (Azure, mock blobstores).
The only users of this seem to be
org.jclouds.atmos.blobstore.strategy.FindMD5InUserMetadata and
org.jclouds.azureblob.blobstore.strategy.FindMD5InBlobProperties which
are themselves unused.
The existing approach for deleting objects in a container suffers
from a head-of-line blocking problem. This commit implements a better
scheme which does not have that problem. This scheme uses a counting
semaphore for making sure that a certain number of futures are
issued in parallel. As each of these futures is completed, one
permit of the semaphore is released.
Added unit tests for testing this new scheme.
Callers should instead call BlobBuilder.contentMD5, usually with the
results from Guava Hashing.md5(). This narrows the API and removes a
strange IOException from callers.
Also deprecate byte[], File, InputSupplier<InputStream>, and String
Payloads. Callers should instead provide a ByteSource via
ByteSource.wrap(byte[]) and Files.asByteSource(File)
Previously this test downloaded 45 MB in 30 seconds which failed on
slower connections, causing spurious test failures. The larger input
size provides no benefit.
Previously this test uploaded ~10 MB in 30 seconds which failed on
slower connections, causing spurious test failures. The larger input
size provides no benefit.
Before we used the previous container name, which kept appending and
hit S3 limits during integration tests like:
Caused by: java.lang.IllegalArgumentException: Object 'gaul-blobstore27-4348532805015069992-1155797412711190448-1465016400636204031' doesn't match S3 bucket virtual host naming convention. Reason: Can't be null or empty. Length must be 3 to 63 symbols.. For more info, please refer to http://docs.amazonwebservices.com/AmazonS3/latest/index.html?BucketRestrictions.html
Also:
- Make SwiftBlobIntegrationLiveTest.testMultipartChunkedFileStream more realistic by uploading a file large enough to be split into parts.
- JavaDoc fixes for SwiftBlobStore: don't reference nonexistent methods.
This method breaks the contract of an InputSupplier since every call
to getInput returns the same InputStream instance. This is
particularly dangerous when one of the callers mutates or closes the
InputStream which causes all others callers to fail.
The latter is more concise, faster, and compact. .class size in bytes
before:
LocalAsyncBlobStore$5.class 1883
LocalAsyncBlobStore.class 20743
and after:
LocalAsyncBlobStore.class 21041
BaseAsyncBlobStore.clearContainer returns a Future which we previously
did not manage correctly, hanging when deleting non-existent
containers. Call BlobUtils.clearContainer directly to address this.