[test] Add standalone runner
It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file. You can run `StandaloneRunner` class using: * `-u file://URL/TO/YOUR/DOC` * `--size` set extracted size (default to mapper attachment size) * `BASE64` encoded binary Example: ```sh StandaloneRunner BASE64Text StandaloneRunner -u /tmp/mydoc.pdf StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 ``` It produces something like: ``` ## Extracted text --------------------- BEGIN ----------------------- This is the extracted text ---------------------- END ------------------------ ## Metadata - author: null - content_length: null - content_type: application/pdf - date: null - keywords: null - language: null - name: null - title: null ``` Closes #99. (cherry picked from commit 720b3bf) (cherry picked from commit 990fa15)
This commit is contained in:
parent
c353936b58
commit
931be57da9
36
README.md
36
README.md
|
@ -311,6 +311,42 @@ It gives back:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Stand alone runner
|
||||||
|
------------------
|
||||||
|
|
||||||
|
If you want to run some tests within your IDE, you can use `StandaloneRunner` class.
|
||||||
|
It accepts arguments:
|
||||||
|
|
||||||
|
* `-u file://URL/TO/YOUR/DOC`
|
||||||
|
* `--size` set extracted size (default to mapper attachment size)
|
||||||
|
* `BASE64` encoded binary
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
StandaloneRunner BASE64Text
|
||||||
|
StandaloneRunner -u /tmp/mydoc.pdf
|
||||||
|
StandaloneRunner -u /tmp/mydoc.pdf --size 1000000
|
||||||
|
```
|
||||||
|
|
||||||
|
It produces something like:
|
||||||
|
|
||||||
|
```
|
||||||
|
## Extracted text
|
||||||
|
--------------------- BEGIN -----------------------
|
||||||
|
This is the extracted text
|
||||||
|
---------------------- END ------------------------
|
||||||
|
## Metadata
|
||||||
|
- author: null
|
||||||
|
- content_length: null
|
||||||
|
- content_type: application/pdf
|
||||||
|
- date: null
|
||||||
|
- keywords: null
|
||||||
|
- language: null
|
||||||
|
- name: null
|
||||||
|
- title: null
|
||||||
|
```
|
||||||
|
|
||||||
License
|
License
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue