From 931be57da91cdf5befc0cfdf664ce8f5473f3508 Mon Sep 17 00:00:00 2001 From: David Pilato Date: Mon, 9 Feb 2015 17:43:59 +0100 Subject: [PATCH] [test] Add standalone runner It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file. You can run `StandaloneRunner` class using: * `-u file://URL/TO/YOUR/DOC` * `--size` set extracted size (default to mapper attachment size) * `BASE64` encoded binary Example: ```sh StandaloneRunner BASE64Text StandaloneRunner -u /tmp/mydoc.pdf StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 ``` It produces something like: ``` ## Extracted text --------------------- BEGIN ----------------------- This is the extracted text ---------------------- END ------------------------ ## Metadata - author: null - content_length: null - content_type: application/pdf - date: null - keywords: null - language: null - name: null - title: null ``` Closes #99. (cherry picked from commit 720b3bf) (cherry picked from commit 990fa15) --- README.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/README.md b/README.md index cdb7ae69f9d..81378b2ab51 100644 --- a/README.md +++ b/README.md @@ -311,6 +311,42 @@ It gives back: } ``` +Stand alone runner +------------------ + +If you want to run some tests within your IDE, you can use `StandaloneRunner` class. +It accepts arguments: + +* `-u file://URL/TO/YOUR/DOC` +* `--size` set extracted size (default to mapper attachment size) +* `BASE64` encoded binary + +Example: + +```sh +StandaloneRunner BASE64Text +StandaloneRunner -u /tmp/mydoc.pdf +StandaloneRunner -u /tmp/mydoc.pdf --size 1000000 +``` + +It produces something like: + +``` +## Extracted text +--------------------- BEGIN ----------------------- +This is the extracted text +---------------------- END ------------------------ +## Metadata +- author: null +- content_length: null +- content_type: application/pdf +- date: null +- keywords: null +- language: null +- name: null +- title: null +``` + License -------