From 641974525ab8040e733a8694d4fa8190357cce80 Mon Sep 17 00:00:00 2001 From: Rainer Klute Date: Wed, 6 Mar 2002 09:03:53 +0000 Subject: [PATCH] - Added first sections to HPSF HOW-TO. git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352153 13f79535-47bb-0310-9956-ffa450edef68 --- build/jakarta-poi/docs/changes.html | 2 +- build/jakarta-poi/docs/hpsf/how-to.html | 500 +++++++++++++++++++++++- build/jakarta-poi/docs/who.html | 3 +- src/documentation/xdocs/hpsf/how-to.xml | 314 ++++++++++++++- 4 files changed, 800 insertions(+), 19 deletions(-) diff --git a/build/jakarta-poi/docs/changes.html b/build/jakarta-poi/docs/changes.html index 29992d85b0..e3c5344868 100644 --- a/build/jakarta-poi/docs/changes.html +++ b/build/jakarta-poi/docs/changes.html @@ -143,7 +143,7 @@
- +
1.1-dev (March 3 2002) 1.1-dev (March 6 2002)
diff --git a/build/jakarta-poi/docs/hpsf/how-to.html b/build/jakarta-poi/docs/hpsf/how-to.html index 2987064335..7e6bc47384 100644 --- a/build/jakarta-poi/docs/hpsf/how-to.html +++ b/build/jakarta-poi/docs/hpsf/how-to.html @@ -73,11 +73,503 @@

- -

TODO: This documentation is still to be written. For the - time being, please see the API documentation (javadocs) of the - org.apache.poi.hpsf package.

+ + +

This HOW-TO is organized in three section. You should read them + sequentially because the later sections build upon the earlier ones.

+ + +
    +
  1. + +

    The first section explains how to read + the most important standard properties of a Microsoft Office + document. Standard properties are things like title, author, creation + date etc. It is quite likely that you will find here what you need and + don't have to read the other sections.

    + +
  2. + + +
  3. + +

    The second section goes a small step + further and focusses on reading additional standard properties. It also + talks about exceptions that may be thrown when dealing with HPSF and + shows how you can read properties of embedded objects.

    + +
  4. + + +
  5. + +

    The third section tells how to read + non-standard properties. Non-standard properties are application-specific + name/value/type triples.

    + +
  6. + +
+ + + + +
+ + + + + + + +
Reading Standard Properties
+
+ + +This section explains how to read + the most important standard properties of a Microsoft Office + document. Standard properties are things like title, author, creation + date etc. Chances are that you will find here what you need and + don't have to read the other sections. + + +

The first thing you should understand is that properties are stored in + separate documents inside the POI filesystem. (If you don't know what a + POI filesystem is, read its documentation.) A document in a POI + filesystem is also called a stream.

+ + +

The following example shows how to read a POI filesystem's + "title" property. Reading other properties is similar. Consider the API + documentation of org.apache.poi.hpsf.SummaryInformation.

+ + +

The standard properties this section focusses on can be + found in a document called \005SummaryInformation in the root of + the POI filesystem. The notation \005 in the document's name + means the character with the decimal value of 5. In order to read the + title, an application has to perform the following steps:

+ + +
    + +
  1. + +

    Open the document \005SummaryInformation located in the root + of the POI filesystem.

    + +
  2. + +
  3. + +

    Create an instance of the class + SummaryInformation from that + document.

    + +
  4. + +
  5. + +

    Call the SummaryInformation instance's + getTitle() method.

    + +
  6. + +
+ + +

Sounds easy, doesn't it? Here are the steps in detail.

+ + + +
+ + + + + + + +
Open the document \005SummaryInformation in the root of the POI filesystem
+
+ + +

An application that wants to open a document in a POI filesystem + (POIFS) proceeds as shown by the following code fragment. (The full + source code of the sample application is available in the + examples section of the POI source tree as + ReadTitle.java.)

+ + +
+ + + + +
+
+import java.io.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+
+// ...
+
+public static void main(String[] args)
+    throws IOException
+{
+    final String filename = args[0];
+    POIFSReader r = new POIFSReader();
+    r.registerListener(new MyPOIFSReaderListener(),
+                       "\005SummaryInformation");
+    r.read(new FileInputStream(filename));
+}
+
+
+ + +

The first interesting statement is

+ + +
+ + + + +
+
POIFSReader r = new POIFSReader();
+
+
+ + +

It creates a + org.apache.poi.poifs.eventfilesystem.POIFSReader instance + which we shall need to read the POI filesystem. Before the application + actually opens the POI filesystem we have to tell the + POIFSReader which documents we are interested in. In this + case the application should do something with the document + \005SummaryInformation.

+ + +
+ + + + +
+
+r.registerListener(new MyPOIFSReaderListener(),
+                   "\005SummaryInformation");
+
+
+ + +

This method call registers a + org.apache.poi.poifs.eventfilesystem.POIFSReaderListener + with the POIFSReader. The POIFSReaderListener + interface specifies the method processPOIFSReaderEvent + which processes a document. The class + MyPOIFSReaderListener implements the + POIFSReaderListener and thus the + processPOIFSReaderEvent method. The eventing POI filesystem + calls this method when it finds the \005SummaryInformation + document. In the sample application MyPOIFSReaderListener is + a static class in the ReadTitle.java source file.)

+ + +

Now everything is prepared and reading the POI filesystem can + start:

+ + +
+ + + + +
+
r.read(new FileInputStream(filename));
+
+
+ + +

The following source code fragment shows the + MyPOIFSReaderListener class and how it retrieves the + title.

+ + +
+ + + + +
+
+static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+    public void processPOIFSReaderEvent(POIFSReaderEvent e)
+    {
+        SummaryInformation si = null;
+        try
+        {
+            si = (SummaryInformation)
+                 PropertySetFactory.create(e.getStream());
+        }
+        catch (Exception ex)
+        {
+            throw new RuntimeException
+                ("Property set stream \"" +
+                 event.getPath() + event.getName() + "\": " + ex);
+        }
+        final String title = si.getTitle();
+        if (title != null)
+            System.out.println("Title: \"" + title + "\"");
+        else
+            System.out.println("Document has no title.");
+    }
+}
+
+
+
+ + +

The line

+ + +
+ + + + +
+
SummaryInformation si = null;
+
+
+ + +

declares a SummaryInformation variable and initializes it + with null. We need an instance of this class to access the + title. The instance is created in a try block:

+ + +
+ + + + +
+
si = (SummaryInformation)
+                 PropertySetFactory.create(e.getStream());
+
+
+ + +

The expression e.getStream() returns the input stream + containing the bytes of the property set stream named + \005SummaryInformation. This stream is passed into the + create method of the factory class + org.apache.poi.hpsf.PropertySetFactory which returns + a org.apache.poi.hpsf.PropertySet instance. It is more or + less safe to cast this result to SummaryInformation, a + convenience class with methods like getTitle(), + getAuthor() etc.

+ + +

The PropertySetFactory.create method may throw all sorts + of exceptions. We'll deal with them in the next sections. For now we just + catch all exceptions and throw a RuntimeException + containing the message text of the origin exception.

+ + +

If all goes well, the sample application retrieves the title and prints + it to the standard output. As you can see you must be prepared for the + case that the POI filesystem does not have a title.

+ + +
+ + + + +
+
final String title = si.getTitle();
+    if (title != null)
+        System.out.println("Title: \"" + title + "\"");
+    else
+        System.out.println("Document has no title.");
+
+
+ + +

Please note that a Microsoft Office document does not necessarily + contain the \005SummaryInformation stream. The documents created + by the Microsoft Office suite have one, as far as I know. However, an + Excel spreadsheet exported from StarOffice 5.2 won't have a + \005SummaryInformation stream. In this case the applications + won't throw an exception but simply does not call the + processPOIFSReaderEvent method. You have been warned!

+ +
+
+
+ +
+
+
+ + + + +
+ + + + + + + +
Additional Standard Properties, Exceptions And Embedded Objects
+
+ + +This section focusses on reading additional standard properties. It + also talks about exceptions that may be thrown when dealing with HPSF and + shows how you can read properties of embedded objects. + + +

A couple of additional standard properties are not + contained in the \005SummaryInformation stream explained above, + for example a document's category or the number of multimedia clips in a + PowerPoint presentation. Microsoft has invented an additional stream named + \005DocumentSummaryInformation to hold these properties. With two + minor exceptions you can proceed exactly as described above to read the + properties stored in \005DocumentSummaryInformation:

+ + +
    + +
  • +

    Instead of \005SummaryInformation use + \005DocumentSummaryInformation as the stream's name.

    +
  • + +
  • +

    Replace all occurrences of the class + SummaryInformation by + DocumentSummaryInformation.

    +
  • + +
+ + +

And of course you cannot call getTitle() because + DocumentSummaryInformation has different query methods. See + the API documentation for the details!

+ + +

In the previous section the application simply caught all + exceptions and was in no way interested in any + details. However, a real application will likely want to know what went + wrong and act appropriately. Besides any IO exceptions there are three + HPSF resp. POI specific exceptions you should know about:

+ + +
+ +
+NoPropertySetStreamException:
+ +
+

This exception is thrown if the application tries to create a + PropertySet or one of its subclasses + SummaryInformation and + DocumentSummaryInformation from a stream that is not a + property set stream. A faulty property set stream counts as not being a + property set stream at all. An application should be prepared to deal + with this case even if opens streams named + \005SummaryInformation or + \005DocumentSummaryInformation only. These are just names. A + stream's name by itself does not ensure that the stream contains the + expected contents and that this contents is correct.

+
+ + +
+UnexpectedPropertySetTypeException +
+ +
+

This exception is thrown if a certain type of property set is + expected somewhere (e.g. a SummaryInformation or + DocumentSummaryInformation) but the provided property + set is not of that type.

+
+ + +
+MarkUnsupportedException +
+ +
+

This exception is thrown if an input stream that is to be parsed + into a property set does not support the + InputStream.mark(int) operation. The POI filesystem uses + the DocumentInputStream class which does support this + operation, so you are safe here. However, if you read a property set + stream from another kind of input stream things may be + different.

+
+ +
+ + +

Many Microsoft Office documents contain embedded + objects, for example an Excel sheet on a page in a Word + document. Embedded objects may have property sets of their own. An + application can open these property set streams as described above. The + only difference is that they are not located in the POI filesystem's root + but in a nested directory instead. Just register a + POIFSReaderListener for the property set streams you are + interested in. For example, the POIBrowser application in the + contrib section tries to open each and every document in a POI filesystem + as a property set stream. If this operation was successful it displays the + properties.

+ +
+
+
+ + + + +
+ + + + + + + +
Reading Non-Standard Properties
+
+ + +This section tells how to read + non-standard properties. Non-standard properties are application-specific + name/value/type triples. + + +
+ + + + +
Write this section!
+
+ +
+
+
+
diff --git a/build/jakarta-poi/docs/who.html b/build/jakarta-poi/docs/who.html index 91c5a94d43..bbb24f4fc7 100644 --- a/build/jakarta-poi/docs/who.html +++ b/build/jakarta-poi/docs/who.html @@ -212,7 +212,8 @@
  • Glen Stampoultzis (glens at apache.org)
  • -
  • Rainer Klute (klute at rainer-klute dot de)
  • +
  • +Rainer Klute (klute at apache dot org)
  • diff --git a/src/documentation/xdocs/hpsf/how-to.xml b/src/documentation/xdocs/hpsf/how-to.xml index 4436b14bd7..4fa55b474d 100644 --- a/src/documentation/xdocs/hpsf/how-to.xml +++ b/src/documentation/xdocs/hpsf/how-to.xml @@ -1,17 +1,305 @@ -
    - HPSF HOW-TO - - - -
    - - -

    TODO: This documentation is still to be written. For the - time being, please see the API documentation (javadocs) of the - org.apache.poi.hpsf package.

    -
    - +
    + HPSF HOW-TO + + + +
    + + + +

    This HOW-TO is organized in three section. You should read them + sequentially because the later sections build upon the earlier ones.

    + +
      +
    1. +

      The first section explains how to read + the most important standard properties of a Microsoft Office + document. Standard properties are things like title, author, creation + date etc. It is quite likely that you will find here what you need and + don't have to read the other sections.

      +
    2. + +
    3. +

      The second section goes a small step + further and focusses on reading additional standard properties. It also + talks about exceptions that may be thrown when dealing with HPSF and + shows how you can read properties of embedded objects.

      +
    4. + +
    5. +

      The third section tells how to read + non-standard properties. Non-standard properties are application-specific + name/value/type triples.

      +
    6. +
    + + + + + This section explains how to read + the most important standard properties of a Microsoft Office + document. Standard properties are things like title, author, creation + date etc. Chances are that you will find here what you need and + don't have to read the other sections. + +

    The first thing you should understand is that properties are stored in + separate documents inside the POI filesystem. (If you don't know what a + POI filesystem is, read its documentation.) A document in a POI + filesystem is also called a stream.

    + +

    The following example shows how to read a POI filesystem's + "title" property. Reading other properties is similar. Consider the API + documentation of org.apache.poi.hpsf.SummaryInformation.

    + +

    The standard properties this section focusses on can be + found in a document called \005SummaryInformation in the root of + the POI filesystem. The notation \005 in the document's name + means the character with the decimal value of 5. In order to read the + title, an application has to perform the following steps:

    + +
      +
    1. +

      Open the document \005SummaryInformation located in the root + of the POI filesystem.

      +
    2. +
    3. +

      Create an instance of the class + SummaryInformation from that + document.

      +
    4. +
    5. +

      Call the SummaryInformation instance's + getTitle() method.

      +
    6. +
    + +

    Sounds easy, doesn't it? Here are the steps in detail.

    + + + + +

    An application that wants to open a document in a POI filesystem + (POIFS) proceeds as shown by the following code fragment. (The full + source code of the sample application is available in the + examples section of the POI source tree as + ReadTitle.java.)

    + + +import java.io.*; +import org.apache.poi.hpsf.*; +import org.apache.poi.poifs.eventfilesystem.*; + +// ... + +public static void main(String[] args) + throws IOException +{ + final String filename = args[0]; + POIFSReader r = new POIFSReader(); + r.registerListener(new MyPOIFSReaderListener(), + "\005SummaryInformation"); + r.read(new FileInputStream(filename)); +} + +

    The first interesting statement is

    + + POIFSReader r = new POIFSReader(); + +

    It creates a + org.apache.poi.poifs.eventfilesystem.POIFSReader instance + which we shall need to read the POI filesystem. Before the application + actually opens the POI filesystem we have to tell the + POIFSReader which documents we are interested in. In this + case the application should do something with the document + \005SummaryInformation.

    + + +r.registerListener(new MyPOIFSReaderListener(), + "\005SummaryInformation"); + +

    This method call registers a + org.apache.poi.poifs.eventfilesystem.POIFSReaderListener + with the POIFSReader. The POIFSReaderListener + interface specifies the method processPOIFSReaderEvent + which processes a document. The class + MyPOIFSReaderListener implements the + POIFSReaderListener and thus the + processPOIFSReaderEvent method. The eventing POI filesystem + calls this method when it finds the \005SummaryInformation + document. In the sample application MyPOIFSReaderListener is + a static class in the ReadTitle.java source file.)

    + +

    Now everything is prepared and reading the POI filesystem can + start:

    + + r.read(new FileInputStream(filename)); + +

    The following source code fragment shows the + MyPOIFSReaderListener class and how it retrieves the + title.

    + + +static class MyPOIFSReaderListener implements POIFSReaderListener +{ + public void processPOIFSReaderEvent(POIFSReaderEvent e) + { + SummaryInformation si = null; + try + { + si = (SummaryInformation) + PropertySetFactory.create(e.getStream()); + } + catch (Exception ex) + { + throw new RuntimeException + ("Property set stream \"" + + event.getPath() + event.getName() + "\": " + ex); + } + final String title = si.getTitle(); + if (title != null) + System.out.println("Title: \"" + title + "\""); + else + System.out.println("Document has no title."); + } +} + + +

    The line

    + + SummaryInformation si = null; + +

    declares a SummaryInformation variable and initializes it + with null. We need an instance of this class to access the + title. The instance is created in a try block:

    + + si = (SummaryInformation) + PropertySetFactory.create(e.getStream()); + +

    The expression e.getStream() returns the input stream + containing the bytes of the property set stream named + \005SummaryInformation. This stream is passed into the + create method of the factory class + org.apache.poi.hpsf.PropertySetFactory which returns + a org.apache.poi.hpsf.PropertySet instance. It is more or + less safe to cast this result to SummaryInformation, a + convenience class with methods like getTitle(), + getAuthor() etc.

    + +

    The PropertySetFactory.create method may throw all sorts + of exceptions. We'll deal with them in the next sections. For now we just + catch all exceptions and throw a RuntimeException + containing the message text of the origin exception.

    + +

    If all goes well, the sample application retrieves the title and prints + it to the standard output. As you can see you must be prepared for the + case that the POI filesystem does not have a title.

    + + final String title = si.getTitle(); + if (title != null) + System.out.println("Title: \"" + title + "\""); + else + System.out.println("Document has no title."); + +

    Please note that a Microsoft Office document does not necessarily + contain the \005SummaryInformation stream. The documents created + by the Microsoft Office suite have one, as far as I know. However, an + Excel spreadsheet exported from StarOffice 5.2 won't have a + \005SummaryInformation stream. In this case the applications + won't throw an exception but simply does not call the + processPOIFSReaderEvent method. You have been warned!

    +
    +
    + + + + + This section focusses on reading additional standard properties. It + also talks about exceptions that may be thrown when dealing with HPSF and + shows how you can read properties of embedded objects. + +

    A couple of additional standard properties are not + contained in the \005SummaryInformation stream explained above, + for example a document's category or the number of multimedia clips in a + PowerPoint presentation. Microsoft has invented an additional stream named + \005DocumentSummaryInformation to hold these properties. With two + minor exceptions you can proceed exactly as described above to read the + properties stored in \005DocumentSummaryInformation:

    + +
      +
    • Instead of \005SummaryInformation use + \005DocumentSummaryInformation as the stream's name.

    • +
    • Replace all occurrences of the class + SummaryInformation by + DocumentSummaryInformation.

    • +
    + +

    And of course you cannot call getTitle() because + DocumentSummaryInformation has different query methods. See + the API documentation for the details!

    + +

    In the previous section the application simply caught all + exceptions and was in no way interested in any + details. However, a real application will likely want to know what went + wrong and act appropriately. Besides any IO exceptions there are three + HPSF resp. POI specific exceptions you should know about:

    + +
    +
    NoPropertySetStreamException:
    +

    This exception is thrown if the application tries to create a + PropertySet or one of its subclasses + SummaryInformation and + DocumentSummaryInformation from a stream that is not a + property set stream. A faulty property set stream counts as not being a + property set stream at all. An application should be prepared to deal + with this case even if opens streams named + \005SummaryInformation or + \005DocumentSummaryInformation only. These are just names. A + stream's name by itself does not ensure that the stream contains the + expected contents and that this contents is correct.

    + +
    UnexpectedPropertySetTypeException
    +

    This exception is thrown if a certain type of property set is + expected somewhere (e.g. a SummaryInformation or + DocumentSummaryInformation) but the provided property + set is not of that type.

    + +
    MarkUnsupportedException
    +

    This exception is thrown if an input stream that is to be parsed + into a property set does not support the + InputStream.mark(int) operation. The POI filesystem uses + the DocumentInputStream class which does support this + operation, so you are safe here. However, if you read a property set + stream from another kind of input stream things may be + different.

    +
    + +

    Many Microsoft Office documents contain embedded + objects, for example an Excel sheet on a page in a Word + document. Embedded objects may have property sets of their own. An + application can open these property set streams as described above. The + only difference is that they are not located in the POI filesystem's root + but in a nested directory instead. Just register a + POIFSReaderListener for the property set streams you are + interested in. For example, the POIBrowser application in the + contrib section tries to open each and every document in a POI filesystem + as a property set stream. If this operation was successful it displays the + properties.

    +
    + + + + + This section tells how to read + non-standard properties. Non-standard properties are application-specific + name/value/type triples. + + Write this section! + +
    +