Tika may "hang up" client application
Sometimes Tika may crash while parsing some files. In this case it may generate just runtime errors (Throwable), not TikaException. But there is no “catch” clause for Throwable in the AttachmentMapper.java : String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (TikaException e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } As a result, tika() may “hang up” the whole application. (we have some pdf-files that "hang up" Elastic client if you try to parse them using mapper-attahcment plugin) We propose the following fix: String parsedContent; try { // Set the maximum length of strings returned by the parseToString method, -1 sets no limit parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars); } catch (Throwable e) { throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e); } (just replace “TikaException” with “Throwable” – it works for our cases) Thank you! Closes #21.
This commit is contained in:
parent
0fff26f2bf
commit
d6aa2f0615
|
@ -19,7 +19,6 @@
|
|||
|
||||
package org.elasticsearch.index.mapper.attachment;
|
||||
|
||||
import org.apache.tika.exception.TikaException;
|
||||
import org.apache.tika.metadata.Metadata;
|
||||
import org.elasticsearch.common.io.stream.BytesStreamInput;
|
||||
import org.elasticsearch.common.logging.ESLogger;
|
||||
|
@ -356,7 +355,7 @@ public class AttachmentMapper implements Mapper {
|
|||
try {
|
||||
// Set the maximum length of strings returned by the parseToString method, -1 sets no limit
|
||||
parsedContent = tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars);
|
||||
} catch (TikaException e) {
|
||||
} catch (Throwable e) {
|
||||
throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in New Issue