hdf images hdf images

NCSA HDF5 XML Overview

1. Introduction

The NCSA HDF group is developing a comprehensive suite of standards and tools for using XML together with HDF5. One important goal is to lay the foundation for many uses of XML. to meet this goal, we will follow and support relevant standards.

1.1 How XML Might be Used with HDF5

In earlier work, we analyzed "Use Cases" for how XML might be used with HDF5.[UseCases] This analysis described a variety of different uses for XML.

Some of the most important roles for XML will be as a standard format for interchanging descriptions of scientific datasets, interchanging between programs and across time (i.e., store and retrieve), in an open and heterogeneous environment.

Figure 1 shows some of the roles for XML: transformed to HTML for standard Web browsers (1), ingested into Java and other tools (2), as input to Data Location Services (Catalogs) (3), and as input for ingest, editing, and validation tools (4). We aim to support and enable as many of these uses as possible.


Figure 1. Some Roles for XML with HDF5

Another class of applications may use XML to XML translation and filtering to convert and interoperate multiple formats. For example, netCDF may be converted to HDF5 via XML using an XSL stylesheet. [XSLExperiment]

2.2. The Foundation

To realize this vision for XML, the initial tool set includes: Table 1 briefly describes the HDF5 XML foundation tools.

Table 1. HDF5 XML Foundation.
Tool
Description
Status
Document Type Definition DTD for describing structure and contents of an HDF5 file. Updated for HDF5.1.4, april 2001.
h5dump --xml Print XML description of an HDF5 file Available April 2001 (HDF5.1.4 patch)
h5gen read XML description, create HDF5 file. Update April, 2001.
h5view Read XML, create HDF5, display and edit the HDF5.
Write XML description of open HDF5 file.
New features available April 2001.

2. The HDF5 XML Foundation

2.1. HDF5 Document Type Definition (DTD)

The foundation for all use of XML is to define a DTD or Schema which defines a valid description of HDF5. The HDF5 DTD is based on earlier work which specified a formal model [UML] and a grammar for describing HDF5 files [DDL]. Given these, defining the DTD was a matter of expressing the concepts in XML.

At the time the DTD was implemented, XML Schema had not yet been agreed and was not clearly understood. In the future, the DTD can and should be recast as an XML Schema, and extended to specify additional aspects of the HDF5 model that could not be covered with a DTD.

The HDF5 DTD is in its second major revision, updated to be consistent with HDF5.1.4.[DTD]

Known Limitations

The DTD suffers from generic shortcomings in XML and XML DTDs, including:

These issues are discussed in [DesignNotes].

In addition, several aspects of the HDF5 data model are not specified (yet) in the DTD. These include:

Finally, there are parts of HDF5 which can be expressed correctly in XML, but which the h5gen and other tools cannot process. These include: In the future, it is likely that XML Schema will be defined for HDF5 files and objects. This will address some of the issues of marking up data.

2.2. New option to h5dump utility

The HDF5 h5dump utility prints a human readable version of the contents of an HDF5 file. The default output is HDF DDL [DDL]. A new option has been added to output the description in XML.

Essentially any HDF5 file can be dumped in XML, with the provision that there is no guarantee that all tools will be able to read the XML--even though the XML might be perfectly correct. For example, the h5dump utility will write out a dataset with compound data into a correct XML. However, the h5gen tool cannot read the data values into HDF5. (For explanation of this, see [Compound].)

2.3. Java tool, h5gen

The Java h5gen tool (revised version) reads and XML description of an HDF5 file, and generates the HDF5 file. This tool calls the HDF5 library through JHI5 JNI. [JavaHDF5]

[move this below]

The output of h5gen faithfully reproduces and HDF5 file from the XML, except for data that cannot be read by h5gen.

Known Limitations

The h5gen tool is available on all platforms that the Java HDF5 tools support. The h5gen cannot handle some HDF5 objects and interfaces that are not supported by the Java HDF5 Interface. Basically, these are features that are defined in C but are difficult or impossible to implement in Java. The most important cases are:

Other features such as non-base compression or IO modules are not supported at this time.

There may be additional limitations in the implementation, not fully understood at this time. Possible areas of limitation may include:

2.3. New features in the Java Editor h5view

The Java h5view visual browser/editor now has the ability to convert XML to HDF5, and to write XML. The user may select an XML file that conforms to the HDF DTD, and the corresponding HDF5 file will be generated (with the same code as the h5gen tool). The HDF5 file will be opened for browsing and editing.
The h5view can also write a file as XML, which will conform to the HDF5 DTD.

The h5view tool allows the ingest of an XML description of a file (perhaps a template) to create an HDF5 file, editing of the HDF5 file to add, delete, or modify objects or their values, and generation of either HDF5 or XML.

It is important to note that the h5view tool does not edit XML. It converts from XML to HDF5 and then edits the HDF5.

Know Limitations

The h5view uses the same code as the h5gen, and thus has the same limitations. The h5gen can output XML for all HDF5 objects that it can read.

The spacing and indentation of the XML output from the h5view may not precisely match the output of the h5dump.

2.4 Interoperation of Tools

All the tools discussed here exchange data in either HDF5 or XML, and in general produce the same results.

When h5gen or h5view read from XML to convert to HDF5, the XMl is validated against the HDF5 DTD, and the same HDF5 file will be created for a given XML input file. The h5dump and h5view will write the same XML, given the same HDF5 input file.

In fact, the transformation:

file1.h5 -> h5dump --xml -> file1.h5.xml -> h5gen -> file2.h5
will usually produce a file2.h5 that is identical to file1.h5.

Known Limitations

Some objects cannot be converted to XML at all (e.g., region references), and some object cannot be read into Java (e.g., compound data). In these cases, the output will be incomplete but correct.

In some cases, the output files may be logically identical (i.e., they have the same elements, attributes, values, etc.), but have slightly different binary representations on disk. This may happen, for instance, when the same objects are written in a different order.

3. Futures

There are many possible future activities with XML. These may include:

References

[UML] "HDF5 Abstract Data Model", http://www.hdfgroup.uiuc.edu/papers/presentations/ADM/ADM_990506/index.html

[DDL] "DDL in BNF for HDF5", /HDF5/doc/ddl.html

[DTD] //DTDs/HDF5-File.dtd

[UseCases] "Some Suggested Use Cases for XML with HDF-5",
/HDF5/XML/UseCases/use-cases-1.html

[DesignNotes] "The XML DTD for HDF5: Design Notes",
/HDF5/XML/design-notes.html

[XSLExperiment] Robert E. McGrath, "Experiment with XSL: translating scientific data", /HDF5/XML/nctoh5/writeup.htm

[Binary] "Representing "Binary" Data in XML" /HDF5/XML/tools/binary.html

[Compound] "HDF5 Compound Data: Technical Issues for XML, Java, and Tools" /HDF5/XML/tools/compound-data.html

[JavaHDF5] "THG HDF Java Products" /hdf-java-html

[Folk] Mike Folk, "Proposal for representing simple data in the HDF5 XML DTD",
/HDF5/XML/design-notes.html

- - Last modified:June 26th 2007