diff --git a/build/jakarta-poi/docs/changes.html b/build/jakarta-poi/docs/changes.html index 29992d85b0..e3c5344868 100644 --- a/build/jakarta-poi/docs/changes.html +++ b/build/jakarta-poi/docs/changes.html @@ -143,7 +143,7 @@
1.1-dev (March 3 2002) | +1.1-dev (March 6 2002) | ||||||||||||||||
diff --git a/build/jakarta-poi/docs/hpsf/how-to.html b/build/jakarta-poi/docs/hpsf/how-to.html index 2987064335..7e6bc47384 100644 --- a/build/jakarta-poi/docs/hpsf/how-to.html +++ b/build/jakarta-poi/docs/hpsf/how-to.html @@ -73,11 +73,503 @@ | |||||||||||||||||
- - TODO: This documentation is still to be written. For the
- time being, please see the API documentation (javadocs) of the
- This HOW-TO is organized in three section. You should read them + sequentially because the later sections build upon the earlier ones. + + +
+
+
+ + +
+
+
+ + +
+
+
+ |
TODO: This documentation is still to be written. For the
- time being, please see the API documentation (javadocs) of the
- org.apache.poi.hpsf
package.
This HOW-TO is organized in three section. You should read them + sequentially because the later sections build upon the earlier ones.
+ +The first section explains how to read + the most important standard properties of a Microsoft Office + document. Standard properties are things like title, author, creation + date etc. It is quite likely that you will find here what you need and + don't have to read the other sections.
+The second section goes a small step + further and focusses on reading additional standard properties. It also + talks about exceptions that may be thrown when dealing with HPSF and + shows how you can read properties of embedded objects.
+The third section tells how to read + non-standard properties. Non-standard properties are application-specific + name/value/type triples.
+The first thing you should understand is that properties are stored in + separate documents inside the POI filesystem. (If you don't know what a + POI filesystem is, read its documentation.) A document in a POI + filesystem is also called a stream.
+ +The following example shows how to read a POI filesystem's
+ "title" property. Reading other properties is similar. Consider the API
+ documentation of org.apache.poi.hpsf.SummaryInformation
.
The standard properties this section focusses on can be + found in a document called \005SummaryInformation in the root of + the POI filesystem. The notation \005 in the document's name + means the character with the decimal value of 5. In order to read the + title, an application has to perform the following steps:
+ +Open the document \005SummaryInformation located in the root + of the POI filesystem.
+Create an instance of the class
+ SummaryInformation
from that
+ document.
Call the SummaryInformation
instance's
+ getTitle()
method.
Sounds easy, doesn't it? Here are the steps in detail.
+ + +An application that wants to open a document in a POI filesystem + (POIFS) proceeds as shown by the following code fragment. (The full + source code of the sample application is available in the + examples section of the POI source tree as + ReadTitle.java.)
+ + + +The first interesting statement is
+ + + +It creates a
+ org.apache.poi.poifs.eventfilesystem.POIFSReader
instance
+ which we shall need to read the POI filesystem. Before the application
+ actually opens the POI filesystem we have to tell the
+ POIFSReader
which documents we are interested in. In this
+ case the application should do something with the document
+ \005SummaryInformation.
This method call registers a
+ org.apache.poi.poifs.eventfilesystem.POIFSReaderListener
+ with the POIFSReader
. The POIFSReaderListener
+ interface specifies the method processPOIFSReaderEvent
+ which processes a document. The class
+ MyPOIFSReaderListener
implements the
+ POIFSReaderListener
and thus the
+ processPOIFSReaderEvent
method. The eventing POI filesystem
+ calls this method when it finds the \005SummaryInformation
+ document. In the sample application MyPOIFSReaderListener
is
+ a static class in the ReadTitle.java source file.)
Now everything is prepared and reading the POI filesystem can + start:
+ + + +The following source code fragment shows the
+ MyPOIFSReaderListener
class and how it retrieves the
+ title.
The line
+ + + +declares a SummaryInformation
variable and initializes it
+ with null
. We need an instance of this class to access the
+ title. The instance is created in a try
block:
The expression e.getStream()
returns the input stream
+ containing the bytes of the property set stream named
+ \005SummaryInformation. This stream is passed into the
+ create
method of the factory class
+ org.apache.poi.hpsf.PropertySetFactory
which returns
+ a org.apache.poi.hpsf.PropertySet
instance. It is more or
+ less safe to cast this result to SummaryInformation
, a
+ convenience class with methods like getTitle()
,
+ getAuthor()
etc.
The PropertySetFactory.create
method may throw all sorts
+ of exceptions. We'll deal with them in the next sections. For now we just
+ catch all exceptions and throw a RuntimeException
+ containing the message text of the origin exception.
If all goes well, the sample application retrieves the title and prints + it to the standard output. As you can see you must be prepared for the + case that the POI filesystem does not have a title.
+ + + +Please note that a Microsoft Office document does not necessarily
+ contain the \005SummaryInformation stream. The documents created
+ by the Microsoft Office suite have one, as far as I know. However, an
+ Excel spreadsheet exported from StarOffice 5.2 won't have a
+ \005SummaryInformation stream. In this case the applications
+ won't throw an exception but simply does not call the
+ processPOIFSReaderEvent
method. You have been warned!
A couple of additional standard properties are not + contained in the \005SummaryInformation stream explained above, + for example a document's category or the number of multimedia clips in a + PowerPoint presentation. Microsoft has invented an additional stream named + \005DocumentSummaryInformation to hold these properties. With two + minor exceptions you can proceed exactly as described above to read the + properties stored in \005DocumentSummaryInformation:
+ +Instead of \005SummaryInformation use + \005DocumentSummaryInformation as the stream's name.
Replace all occurrences of the class
+ SummaryInformation
by
+ DocumentSummaryInformation
.
And of course you cannot call getTitle()
because
+ DocumentSummaryInformation
has different query methods. See
+ the API documentation for the details!
In the previous section the application simply caught all + exceptions and was in no way interested in any + details. However, a real application will likely want to know what went + wrong and act appropriately. Besides any IO exceptions there are three + HPSF resp. POI specific exceptions you should know about:
+ +NoPropertySetStreamException
:This exception is thrown if the application tries to create a
+ PropertySet
or one of its subclasses
+ SummaryInformation
and
+ DocumentSummaryInformation
from a stream that is not a
+ property set stream. A faulty property set stream counts as not being a
+ property set stream at all. An application should be prepared to deal
+ with this case even if opens streams named
+ \005SummaryInformation or
+ \005DocumentSummaryInformation only. These are just names. A
+ stream's name by itself does not ensure that the stream contains the
+ expected contents and that this contents is correct.
UnexpectedPropertySetTypeException
This exception is thrown if a certain type of property set is
+ expected somewhere (e.g. a SummaryInformation
or
+ DocumentSummaryInformation
) but the provided property
+ set is not of that type.
MarkUnsupportedException
This exception is thrown if an input stream that is to be parsed
+ into a property set does not support the
+ InputStream.mark(int)
operation. The POI filesystem uses
+ the DocumentInputStream
class which does support this
+ operation, so you are safe here. However, if you read a property set
+ stream from another kind of input stream things may be
+ different.
Many Microsoft Office documents contain embedded
+ objects, for example an Excel sheet on a page in a Word
+ document. Embedded objects may have property sets of their own. An
+ application can open these property set streams as described above. The
+ only difference is that they are not located in the POI filesystem's root
+ but in a nested directory instead. Just register a
+ POIFSReaderListener
for the property set streams you are
+ interested in. For example, the POIBrowser application in the
+ contrib section tries to open each and every document in a POI filesystem
+ as a property set stream. If this operation was successful it displays the
+ properties.