[test] Add standalone runner
It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file. You can run `StandaloneRunner` class using: * `-u file://URL/TO/YOUR/DOC` * `--size` set extracted size (default to mapper attachment size) * `BASE64` encoded binary Example: ```sh StandaloneRunner BASE64Text StandaloneRunner -u /tmp/mydoc.pdf StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 ``` It produces something like: ``` ## Extracted text --------------------- BEGIN ----------------------- This is the extracted text ---------------------- END ------------------------ ## Metadata - author: null - content_length: null - content_type: application/pdf - date: null - keywords: null - language: null - name: null - title: null ``` Closes #99. (cherry picked from commit 720b3bf) (cherry picked from commit 990fa15)
This commit is contained in:
parent
c353936b58
commit
931be57da9
36
README.md
36
README.md
|
@ -311,6 +311,42 @@ It gives back:
|
|||
}
|
||||
```
|
||||
|
||||
Stand alone runner
|
||||
------------------
|
||||
|
||||
If you want to run some tests within your IDE, you can use `StandaloneRunner` class.
|
||||
It accepts arguments:
|
||||
|
||||
* `-u file://URL/TO/YOUR/DOC`
|
||||
* `--size` set extracted size (default to mapper attachment size)
|
||||
* `BASE64` encoded binary
|
||||
|
||||
Example:
|
||||
|
||||
```sh
|
||||
StandaloneRunner BASE64Text
|
||||
StandaloneRunner -u /tmp/mydoc.pdf
|
||||
StandaloneRunner -u /tmp/mydoc.pdf --size 1000000
|
||||
```
|
||||
|
||||
It produces something like:
|
||||
|
||||
```
|
||||
## Extracted text
|
||||
--------------------- BEGIN -----------------------
|
||||
This is the extracted text
|
||||
---------------------- END ------------------------
|
||||
## Metadata
|
||||
- author: null
|
||||
- content_length: null
|
||||
- content_type: application/pdf
|
||||
- date: null
|
||||
- keywords: null
|
||||
- language: null
|
||||
- name: null
|
||||
- title: null
|
||||
```
|
||||
|
||||
License
|
||||
-------
|
||||
|
||||
|
|
Loading…
Reference in New Issue