mirror of https://github.com/apache/poi.git
Add a quick guide to using the text extractor and friends, since that's a common use
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@409632 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
b3d7315beb
commit
30541423a2
|
@ -7,6 +7,7 @@
|
|||
</menu>
|
||||
<menu label="HWPF">
|
||||
<menu-item label="Overview" href="index.html"/>
|
||||
<menu-item label="Quick Guide" href="quick-guide.html"/>
|
||||
<menu-item label="HWPF Format" href="docoverview.html"/>
|
||||
<menu-item label="HWPF Project plan" href="projectplan.html"/>
|
||||
</menu>
|
||||
|
|
|
@ -0,0 +1,45 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!-- Copyright (C) 2004 The Apache Software Foundation. All rights reserved. -->
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
||||
|
||||
<document>
|
||||
<header>
|
||||
<title>POI-HWPF - A Quick Guide</title>
|
||||
<subtitle>Overview</subtitle>
|
||||
<authors>
|
||||
<person name="Nick Burch" email="nick at torchbox dot com"/>
|
||||
</authors>
|
||||
</header>
|
||||
|
||||
<body>
|
||||
<section><title>Basic Text Extraction</title>
|
||||
<p>For basic text extraction, make use of
|
||||
<code>org.apache.poi.hwpf.extractor.WordExtractor</code>. It accepts an input
|
||||
stream or a <code>HWPFDocument</code>. The <code>getText()</code>
|
||||
method can be used to
|
||||
get the text from all the paragraphs, or <code>getParagraphText()</code>
|
||||
can be used to fetch the text from each paragraph in turn. The other
|
||||
option is <code>getTextFromPieces()</code>, which is very fast, but
|
||||
tends to return things that aren't text from the page. YMMV.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section><title>Specific Text Extraction</title>
|
||||
<p>To get specific bits of text, first create a
|
||||
<code>org.apache.poi.hwpf.HWPFDocument</code>. Fetch the range
|
||||
with <code>getRange()</code>, then get paragraphs from that. You
|
||||
can then get text and other properties.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section><title>Changing Text</title>
|
||||
<p>It is possible to change the text via
|
||||
<code>insertBefore()</code> and <code>insertAfter()</code>
|
||||
on a <code>Range</code> object (either a <code>Range</code>,
|
||||
<code>Paragraph</code> or <code>CharacterRun</code>).
|
||||
It is also possible to delete a <code>Range</code>, but this
|
||||
code is know to have bugs in it.
|
||||
</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
Loading…
Reference in New Issue