Add documentation for the HMEF (TNEF/winmail.dat) support so far.

Also add a little bit to the HPBF docs, and tweak build.xml to check the right files when deciding if the docs are up to date.


git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1077891 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Nick Burch 2011-03-04 11:59:23 +00:00
parent e9f5fbd58d
commit ce77707b83
4 changed files with 209 additions and 15 deletions

View File

@ -748,7 +748,7 @@ under the License.
<target name="-check-docs">
<uptodate property="main.docs.notRequired" targetfile="${build.site}/index.html">
<srcfiles dir="${build.site.src}"/>
<srcfiles dir="${main.documentation}" />
</uptodate>
</target>

View File

@ -35,19 +35,15 @@
<p>HMEF is the POI Project's pure Java implementation of the
TNEF (Transport Neurtral Encoding Format), aka winmail.dat,
which is used by Outlook and Exchange in some situations.</p>
<p>Currently, HMEF provides a low-level, read-only api for
accessing core TNEF attributes. It is able to provide access
to both TNEF and MAPI attributes, and low level access to
attachments. Compressed RTF is not yet fully supported, and
user-facing access to common attributes and attachment contents
is not yet present.</p>
<p>HMEF is currently very much a work-in-progress, and we hope
to add a text extractor and attachment extractor in the not
too distant future.</p>
<p>To get a feel for the contents of a file, and to track down
where data of interest is stored, HMEF comes with
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
to print out the contents of the file.</p>
<p>Currently, HMEF provides a read-only api for accessing common
message and attachment attributes, including the message body
and attachment files. In addition, it's possible to have
read-only access to all of the underlying TNEF and MAPI
attributes of the message and attachments.</p>
<p>HMEF also provides a command line tool for extracting out
the message body and attachment files from a TNEF (winmail.dat)
file.</p>
<note>
This code currently lives the
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
@ -55,7 +51,167 @@
Ensure that you have the scratchpad jar or the scratchpad
build area in your classpath before experimenting with this code.
</note>
<note>
This code is a new POI feature, and the first release that will
contain it will be POI 3.8 beta 2. Until then, you will need to
build your own jars from a <link href="../subversion.html">svn
checkout</link>.
</note>
</section>
<section>
<title>Using HMEF to access TNEF (winmail.dat) files</title>
<section>
<title>Easy extraction of message body and attachment files</title>
<p>The class <em>org.apache.poi.hmef.extractor.HMEFContentsExtractor</em>
provides both command line and Java extraction. It allows the
saving of the message body (an RTF file), and all of the
attachment files, to a single directory as specified.</p>
<p>From the command line, simply call the class specifying the
TNEF file to extract, and the directory to place the extracted
files into, eg:</p>
<source>
java -classpath poi-3.8-FINAL.jar:poi-scratchpad-3.8-FINAL.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/
</source>
<p>From Java, there are two method calls on the class, one to
extract the message body RTF to a file, and the other to extract
all the attachments to a directory. A typical use would be:</p>
<source>
public void extract(String winmailFilename, String directoryName) throws Exception {
HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename));
File dir = new File(directoryName);
File rtf = new File(dir, "message.rtf");
if(! dir.exists()) {
throw new FileNotFoundException("Output directory " + dir.getName() + " not found");
}
System.out.println("Extracting...");
ext.extractMessageBody(rtf);
ext.extractAttachments(dir);
System.out.println("Extraction completed");
}
</source>
</section>
<section>
<title>Attachment attributes and contents</title>
<p>To get at your attachments, simply call the
<em>getAttachments()</em> method on a <em>HMEFMessage</em>
instance, and you'll receive a list of all the attachments.</p>
<p>When you have a <em>org.apache.poi.hmef.Attachment</em> object,
there are several helper methods available. These will all
return the value of the appropriate underlying attachment
attributes, or null if for some reason the attribute isn't
present in your file.</p>
<ul>
<li><em>getFilename()</em> - returns the name of the attachment
file, possibly in 8.3 format</li>
<li><em>getLongFilename()</em> - returns the full name of the
attachment file</li>
<li><em>getExtension()</em> - returns the extension of the
attachment file, including the "."</li>
<li><em>getModifiedDate()</em> - returns the date that the
attachment file was last edited on</li>
<li><em>getContents()</em> - returns a byte array of the contents
of the attached file</li>
<li><em>getRenderedMetaFile()</em> - returns a byte array of
a windows meta file representation of the attached file</li>
</ul>
</section>
<section>
<title>Message attributes and message body</title>
<p>A <em>org.apache.poi.hmef.HMEFMessage</em> instance is created
from an <em>InputStream</em> of the underlying TNEF (winmail.dat)
file.</p>
<p>From a <em>HMEFMessage</em>, there are three main methods of
interest to call:</p>
<ul>
<li><em>getBody()</em> - returns a String containing the RTF
contents of the message body.
<em>Note - see limitations</em></li>
<li><em>getSubject()</em> - returns the message subject</li>
<li><em>getAttachments()</em> - returns the list of
<em>Attachment</em> objects for the message</li>
</ul>
</section>
<section>
<title>Low level attribute access</title>
<p>Both Messages and Attachments contain two kinds of attributes.
These are <em>TNEFAttribute</em> and <em>MAPIAttribute</em>.</p>
<p>TNEFAttribute is specific to TNEF files in terms of the
available types and properties. In general, Attachments have a
few more useful ones of these then Messages.</p>
<p>MAPIAttributes hold standard MAPI properties and values, and
work in a similar way to <link href="../hsmf/">HSMF
(Outlook)</link> does. There are typically many of these on both
Messages and Attachments. <em>Note - see limitations</em></p>
<p>Both <em>HMEFMessage</em> and <em>Attachment</em> supports
support two different ways of getting to attributes of interest.
Firstly, they support list getters, to return all attributes
(either TNEF or MAPI). Secondly, they support specific getters by
TNEF or MAPI property.</p>
<source>
HMEFMessage msg = new HMEFMessage(new FileInputStream(file));
for(TNEFAttribute attr : msg.getMessageAttributes) {
System.out.println("TNEF : " + attr);
}
for(MAPIAttribute attr : msg.getMessageMAPIAttributes) {
System.out.println("MAPI : " + attr);
}
System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC));
for(Attachment attach : msg.getAttachments()) {
for(TNEFAttribute attr : attach.getAttributes) {
System.out.println("A.TNEF : " + attr);
}
for(MAPIAttribute attr : attach.getMAPIAttributes) {
System.out.println("A.MAPI : " + attr);
}
System.out.println("Filename is " + attach.getAttribute(TNEFProperty.CID_ATTACHTITLE));
System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION));
}
</source>
</section>
</section>
<section>
<title>Investigating a TNEF file</title>
<p>To get a feel for the contents of a file, and to track down
where data of interest is stored, HMEF comes with
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
to print out the contents of the file.</p>
</section>
<section>
<title>Limitations</title>
<p>HMEF is currently a work-in-progress, and not everything
works yet. The current limitations are:</p>
<ul>
<li>Compressed RTF Message Bodies are not correctly
decompressed. This means that a call to
<em>HMEFMessage.getBody()</em> is unlikely to return the
correct RTF.</li>
<li>Non-standard MAPI properties from the range 0x8000 to 0x8fff
may not be being quite correctly turned into attributes.
The values show up, but the name and type may not always
be correct.</li>
<li>All testing so far has been performed on a small number of
English documents. We think we're correctly turning bytes into
Java unicode strings, but we need a few non-English sample
files in the test suite to verify this!</li>
</ul>
</section>
</body>
</document>

View File

@ -0,0 +1,35 @@
<?xml version="1.0"?>
<!--
====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
====================================================================
-->
<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd">
<book software="POI Project"
title="HPBF"
copyright="@year@ POI Project">
<menu label="Apache POI">
<menu-item label="Top" href="../index.html"/>
</menu>
<menu label="HPBF">
<menu-item label="Overview" href="index.html"/>
<menu-item label="File Format" href="file-format.xml"/>
</menu>
</book>

View File

@ -45,7 +45,10 @@
the document (partly supported). Additional low level
code to process the file format may follow, if there
is demand and developer interest warrant it.</p>
<p>At this time, there is no <em>usermodel</em> api or similar.
<p>Text Extraction is available via the
<em>org.apache.poi.hpbf.extractor.PublisherTextExtractor</em>
class.</p>
<p>At this time, there is no <em>usermodel</em> api or similar.
There is only low level support for certain parts of
the file, but by no means all of it.</p>
<p>Our current understanding of the file format is documented