- Started to document the reading of general property set streams.

- Minor documentation fixes.


git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352993 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Rainer Klute 2003-01-30 17:13:15 +00:00
parent 40915fced6
commit 240263f685
3 changed files with 176 additions and 128 deletions

View File

@ -7,13 +7,13 @@
<header> <header>
<title>HPSF HOW-TO</title> <title>HPSF HOW-TO</title>
<authors> <authors>
<person name="Rainer Klute" email="klute@rainer-klute.de"/> <person name="Rainer Klute" email="klute@apache.org"/>
</authors> </authors>
</header> </header>
<body> <body>
<section title="How To Use the HPSF APIs"> <section title="How To Use the HPSF APIs">
<p>This HOW-TO is organized in three section. You should read them <p>This HOW-TO is organized in three sections. You should read them
sequentially because the later sections build upon the earlier ones.</p> sequentially because the later sections build upon the earlier ones.</p>
<ol> <ol>
@ -40,12 +40,9 @@
</li> </li>
</ol> </ol>
<p>Please note that there is separate document on <link
href="thumbnails.html">thumbnails</link>!</p>
<anchor id="sec1"/>
<anchor id="sec1" />
<section title="Reading Standard Properties"> <section title="Reading Standard Properties">
<note>This section explains how to read <note>This section explains how to read
@ -56,19 +53,20 @@
<p>The first thing you should understand is that properties are stored in <p>The first thing you should understand is that properties are stored in
separate documents inside the POI filesystem. (If you don't know what a separate documents inside the POI filesystem. (If you don't know what a
POI filesystem is, read its <link POI filesystem is, read the <link href="../poifs/index.html">POIFS
href="../poifs/index.html">documentation</link>.) A document in a POI documentation</link>.) A document in a POI filesystem is also called a
filesystem is also called a <strong>stream</strong>.</p> <strong>stream</strong>.</p>
<p>The following example shows how to read a POI filesystem's <p>The following example shows how to read a POI filesystem's
"title" property. Reading other properties is similar. Consider the API "title" property. Reading other properties is similar. Consider the API
documentation of <code>org.apache.poi.hpsf.SummaryInformation</code>.</p> documentation of <code>org.apache.poi.hpsf.SummaryInformation</code> to
learn which methods are available!</p>
<p>The standard properties this section focusses on can be <p>The standard properties this section focusses on can be found in a
found in a document called <em>\005SummaryInformation</em> in the root of document called <em>\005SummaryInformation</em> located in the root of the
the POI filesystem. The notation <em>\005</em> in the document's name POI filesystem. The notation <em>\005</em> in the document's name means
means the character with the decimal value of 5. In order to read the the character with the decimal value of 5. In order to read the title, an
title, an application has to perform the following steps:</p> application has to perform the following steps:</p>
<ol> <ol>
<li> <li>
@ -76,9 +74,8 @@
of the POI filesystem.</p> of the POI filesystem.</p>
</li> </li>
<li> <li>
<p>Create an instance of the class <p>Create an instance of the class <code>SummaryInformation</code> from
<code>SummaryInformation</code> from that that document.</p>
document.</p>
</li> </li>
<li> <li>
<p>Call the <code>SummaryInformation</code> instance's <p>Call the <code>SummaryInformation</code> instance's
@ -96,7 +93,10 @@
(POIFS) proceeds as shown by the following code fragment. (The full (POIFS) proceeds as shown by the following code fragment. (The full
source code of the sample application is available in the source code of the sample application is available in the
<em>examples</em> section of the POI source tree as <em>examples</em> section of the POI source tree as
<em>ReadTitle.java</em>.)</p> <em>ReadTitle.java</em>.</p>
<fixme>I just found out that <em>ReadTitle.java</em> is no longer there! I
shall look it up in the CVS and try to restore it.</fixme>
<source> <source>
import java.io.*; import java.io.*;
@ -141,7 +141,7 @@ r.registerListener(new MyPOIFSReaderListener(),
<code>processPOIFSReaderEvent</code> method. The eventing POI filesystem <code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
calls this method when it finds the <em>\005SummaryInformation</em> calls this method when it finds the <em>\005SummaryInformation</em>
document. In the sample application <code>MyPOIFSReaderListener</code> is document. In the sample application <code>MyPOIFSReaderListener</code> is
a static class in the <em>ReadTitle.java</em> source file.)</p> a static class in the <em>ReadTitle.java</em> source file.</p>
<p>Now everything is prepared and reading the POI filesystem can <p>Now everything is prepared and reading the POI filesystem can
start:</p> start:</p>
@ -209,10 +209,10 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
case that the POI filesystem does not have a title.</p> case that the POI filesystem does not have a title.</p>
<source>final String title = si.getTitle(); <source>final String title = si.getTitle();
if (title != null) if (title != null)
System.out.println("Title: \"" + title + "\""); System.out.println("Title: \"" + title + "\"");
else else
System.out.println("Document has no title.");</source> System.out.println("Document has no title.");</source>
<p>Please note that a Microsoft Office document does not necessarily <p>Please note that a Microsoft Office document does not necessarily
contain the <em>\005SummaryInformation</em> stream. The documents created contain the <em>\005SummaryInformation</em> stream. The documents created
@ -249,7 +249,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
<p>And of course you cannot call <code>getTitle()</code> because <p>And of course you cannot call <code>getTitle()</code> because
<code>DocumentSummaryInformation</code> has different query methods. See <code>DocumentSummaryInformation</code> has different query methods. See
the API documentation for the details!</p> the Javadoc API documentation for the details!</p>
<p>In the previous section the application simply caught all <p>In the previous section the application simply caught all
<strong>exceptions</strong> and was in no way interested in any <strong>exceptions</strong> and was in no way interested in any
@ -259,17 +259,19 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
<dl> <dl>
<dt><code>NoPropertySetStreamException</code>:</dt> <dt><code>NoPropertySetStreamException</code>:</dt>
<dd><p>This exception is thrown if the application tries to create a <dd>
<code>PropertySet</code> or one of its subclasses <p>This exception is thrown if the application tries to create a
<code>SummaryInformation</code> and <code>PropertySet</code> instance from a stream that is not a
<code>DocumentSummaryInformation</code> from a stream that is not a property set stream. (<code>SummaryInformation</code> and
property set stream. A faulty property set stream counts as not being a <code>DocumentSummaryInformation</code> are subclasses of
property set stream at all. An application should be prepared to deal <code>PropertySet</code>.) A faulty property set stream counts as not
with this case even if opens streams named being a property set stream at all. An application should be prepared to
deal with this case even if it opens streams named
<em>\005SummaryInformation</em> or <em>\005SummaryInformation</em> or
<em>\005DocumentSummaryInformation</em> only. These are just names. A <em>\005DocumentSummaryInformation</em> only. These are just names. A
stream's name by itself does not ensure that the stream contains the stream's name by itself does not ensure that the stream contains the
expected contents and that this contents is correct.</p></dd> expected contents and that this contents is correct.</p>
</dd>
<dt><code>UnexpectedPropertySetTypeException</code></dt> <dt><code>UnexpectedPropertySetTypeException</code></dt>
<dd><p>This exception is thrown if a certain type of property set is <dd><p>This exception is thrown if a certain type of property set is
@ -292,7 +294,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
document. Embedded objects may have property sets of their own. An document. Embedded objects may have property sets of their own. An
application can open these property set streams as described above. The application can open these property set streams as described above. The
only difference is that they are not located in the POI filesystem's root only difference is that they are not located in the POI filesystem's root
but in a nested directory instead. Just register a but in a <strong>nested directory</strong> instead. Just register a
<code>POIFSReaderListener</code> for the property set streams you are <code>POIFSReaderListener</code> for the property set streams you are
interested in. For example, the <em>POIBrowser</em> application in the interested in. For example, the <em>POIBrowser</em> application in the
contrib section tries to open each and every document in a POI filesystem contrib section tries to open each and every document in a POI filesystem
@ -303,12 +305,49 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
<anchor id="sec3"/> <anchor id="sec3"/>
<section title="Reading Non-Standard Properties"> <section title="Reading Non-Standard Properties">
<note>This section tells how to read <note>This section tells how to read non-standard properties. Non-standard
non-standard properties. Non-standard properties are application-specific properties are application-specific name/type/value triples.</note>
name/value/type triples.</note>
<fixme author="Rainer Klute">Write this section!</fixme> <p>Now comes the really hardcode stuff. As mentioned above,
<code>SummaryInformation</code> and
<code>DocumentSummaryInformation</code> are just special cases of the
general concept of a property set. The general concept says that a
property set consists of <strong>properties</strong>. Each property is an
entity that has a <strong>name</strong>, a <strong>type</strong>, and a
<strong>value</strong>.</p>
<p>Okay, that was still rather easy. However, to make things more
complicated Microsoft in its infinite wisdom decided that a property set
shalt be broken into <strong>sections</strong>. Each section holds a bunch
of properties. But since that's still not complicated enough: a section
can optionally have a dictionary that maps property IDs to property
names - we'll explain later what that means.</p>
<note>[To be continued.]</note>
<fixme>Let's consider a Java application that wants to read a stream
containing a general property set. It is modelled by the class
<code>PropertySet</code> in the <code>org.apache.poi.hpsf</code>
package.</fixme>
</section> </section>
</section> </section>
</body> </body>
</document> </document>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-omittag:nil
sgml-shorttag:nil
sgml-namecase-general:nil
sgml-general-insert-case:lower
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
-->

View File

@ -7,27 +7,30 @@
<title>HPSF (Horrible Property Set Format)</title> <title>HPSF (Horrible Property Set Format)</title>
<subtitle>Overview</subtitle> <subtitle>Overview</subtitle>
<authors> <authors>
<person name="Rainer Klute" email="klute@rainer-klute.de"/> <person name="Rainer Klute" email="klute@apache.org"/>
</authors> </authors>
</header> </header>
<body> <body>
<section title="Overview"> <section title="Overview">
<p>Microsoft applications like "Word" or "Excel" let the user describe his <p>Microsoft applications like "Word", "Excel" or "Powerpoint" let the user
document by properties like "title", "category" and so on. The application describe his document by properties like "title", "category" and so on. The
itself adds further information: last author, creation date etc. These application itself adds further information: last author, creation date
properties are stored in so-called <strong>property set streams</strong>. A etc. These document properties are stored in so-called <strong>property set
property set stream is a separate document within a <link streams</strong>. A property set stream is a separate document within a
href="../poifs/index.html">POI filesystem</link>. HPSF is POI's pure-Java <link href="../poifs/index.html">POI filesystem</link>. We'll call property
implementation to read (and in future to write) property set streams.</p> set streams mostly just "property sets". HPSF is POI's pure-Java
implementation to read (and in future to write) property sets.</p>
<p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java <p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java
application should do to read a property set using HPSF and to retrieve the application should do to read a property set using HPSF and to retrieve the
information it needs.</p> information it needs.</p>
<p>HPSF supports OLE2 property set streams in general, not only the special <p>HPSF supports OLE2 property set streams in general, and is not limited to
case of document properties mentioned above. The <link the special case of document properties in the Microsoft Office files
href="internals.html">HPSF description</link> describes the internal mentioned above. The <link href="internals.html">HPSF description</link>
structure of property set streams.</p> describes the internal structure of property set streams. A separate
document explains the internal of <link href="thumbnails.html">thumbnail
images</link>.</p>
</section> </section>
</body> </body>
</document> </document>

View File

@ -13,21 +13,17 @@
<body> <body>
<section title="The VT_CF Format"> <section title="The VT_CF Format">
<p> <p>Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The
Thumbnail information is stored as a VT_CF, or Thumbnail Variant. Thumbnail Variant is used to store various types of information in a
The Thumbnail Variant is used to store various types of information clipboard. The VT_CF can store information in formats for the Macintosh or
in a clipboard. The VT_CF can store information in formats for the Windows clipboard.</p>
Macintosh or Windows clipboard.
</p> <p>There are many types of data that can be copied to the clipboard, but the
only types of information needed for thumbnail manipulation are the image
formats.</p>
<p>
There are many types of data that can be copied to the clipboard,
but the only types of information needed for thumbnail manipulation are
the image formats.
</p>
<p>The <code>VT_CF</code> structure looks like this:</p> <p>The <code>VT_CF</code> structure looks like this:</p>
<table> <table>
<tr> <tr>
<th>Element:</th> <th>Element:</th>
@ -43,11 +39,9 @@
</tr> </tr>
</table> </table>
<p> <p>The Clipboard Size refers to the size (in bytes) of Clipboard Data
The Clipboard Size refers to the size (in bytes) of Clipboard Data (variable size) plus the Clipboard Format (four bytes).</p>
(variable size) plus the Clipboard Format (four bytes).
</p>
<p>Clipboard Format Tag has four possible values:</p> <p>Clipboard Format Tag has four possible values:</p>
<table> <table>
@ -83,16 +77,14 @@
<section title="Windows Clipboard Data"> <section title="Windows Clipboard Data">
<p> <p>Windows clipboard data has four image formats for thumbnails:</p>
Windows clipboard data has four image formats for thumbnails:
</p>
<table> <table>
<tr> <tr>
<th>Value</th> <th>Value</th>
<th>Identifier</th> <th>Identifier</th>
<th>Description</th> <th>Description</th>
</tr> </tr>
<tr> <tr>
<td>3</td> <td>3</td>
<td><code>CF_METAFILEPICT</code></td> <td><code>CF_METAFILEPICT</code></td>
@ -102,75 +94,89 @@
<td>8</td> <td>8</td>
<td><code>CF_DIB</code></td> <td><code>CF_DIB</code></td>
<td>Device Independent Bitmap</td> <td>Device Independent Bitmap</td>
</tr> </tr>
<tr> <tr>
<td>14</td> <td>14</td>
<td><code>CF_ENHMETAFILE</code></td> <td><code>CF_ENHMETAFILE</code></td>
<td>Enhanced Windows metafile format</td> <td>Enhanced Windows metafile format</td>
</tr> </tr>
<tr> <tr>
<td>2</td> <td>2</td>
<td><code>CF_BITMAP</code></td> <td><code>CF_BITMAP</code></td>
<td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td> <td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
</tr> </tr>
</table> </table>
<section title="Windows Metafile Format">
<p>
The most common format for thumbnails on the Windows platform
is the Windows metafile format. The Clipboard places and extra
header in front of a the standard Windows Metafile Format data.
</p>
<p>
The Clipboard Data byte array looks like this when an image is
stored in Windows' Clipboard WMF format.
</p>
<table>
<tr>
<th>Identifier</th>
<td>CF_METAFILEPICT</td>
<td>mm</td>
<td>width</td>
<td>height</td>
<td>handle</td>
<td>WMF data</td>
</tr>
<tr>
<th>Size</th>
<td>32 bit unsigned int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>byte array - variable length</td>
</tr>
<tr>
<th>Description</th>
<td>Clipboard WMF</td>
<td>Mapping Mode</td>
<td>Image Width</td>
<td>Image Height</td>
<td>handle to the WMF data array in memory, or 0</td>
<td>standard WMF byte stream</td>
</tr>
</table>
</section>
<section title="Device Independent Bitmap">
<p><strong>FIXME:</strong> Document Device Independent Bitmap format</p>
</section>
</section> </section>
<section title="Windows Metafile Format">
<p>The most common format for thumbnails on the Windows platform is the
Windows metafile format. The Clipboard places and extra header in front of
a the standard Windows Metafile Format data.</p>
<p>The Clipboard Data byte array looks like this when an image is stored in
Windows' Clipboard WMF format.</p>
<table>
<tr>
<th>Identifier</th>
<td>CF_METAFILEPICT</td>
<td>mm</td>
<td>width</td>
<td>height</td>
<td>handle</td>
<td>WMF data</td>
</tr>
<tr>
<th>Size</th>
<td>32 bit unsigned int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>16 bit unsigned(?) int</td>
<td>byte array - variable length</td>
</tr>
<tr>
<th>Description</th>
<td>Clipboard WMF</td>
<td>Mapping Mode</td>
<td>Image Width</td>
<td>Image Height</td>
<td>handle to the WMF data array in memory, or 0</td>
<td>standard WMF byte stream</td>
</tr>
</table>
</section>
<section title="Device Independent Bitmap">
<p><strong>FIXME:</strong> Describe the Device Independent Bitmap
format!</p>
</section>
<section title="Macintosh Clipboard Data"> <section title="Macintosh Clipboard Data">
<p><strong>FIXME:</strong> Document Macintosh clipboard formats.</p> <p><strong>FIXME:</strong> Describe the Macintosh clipboard formats!</p>
</section> </section>
</body> </body>
</document> </document>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-omittag:nil
sgml-shorttag:nil
sgml-namecase-general:nil
sgml-general-insert-case:lower
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
-->