17 Using the XML Schema Processor for Java
Topics here cover how to use the Extensible Markup Language (XML) schema processor for Java.
17.1 Introduction to XML Validation
Topics cover the different techniques for XML validation.
17.1.1 Prerequisites for Using the XML Schema Processor for Java
Prerequisites for using the XML schema processor are covered.
This section assumes that you have working knowledge of these technologies:
-
document type definition (DTD). An XML document type definition (DTD) defines the legal structure of an XML document.
-
XML Schema language. XML Schema defines the legal structure of an XML document.
To learn more about these technologies, consult the XML resources in Related Documents.
17.1.2 Standards and Specifications for the XML Schema Processor for Java
XML Schema is a World Wide Web Consortium (W3C) standard.
The Oracle XML Schema processor supports the W3C XML Schema specifications:
Related Topics
17.1.3 XML Validation with DTDs
Document type definition (DTDs) were originally developed for SGML. XML DTDs are a subset of those available in SGML and provide a mechanism for declaring constraints on XML markup. XML DTDs enable the specification of:
-
Which elements can be in your XML documents.
-
The content model of an XML element, that is, whether the element contains only data or has a set of subelements that defines its structure. DTDs can define whether a subelement is optional or mandatory and whether it can occur only once or multiple times.
-
Attributes of XML elements. DTDs can also specify whether attributes are optional or mandatory.
-
Entities that are legal in your XML documents.
An XML DTD is not itself written in XML, but is a context-independent grammar for defining the structure of an XML document. You can declare a DTD in an XML document itself or in a separate file from the XML document.
Validation is the process by which you verify an XML document against its associated DTD, ensuring that the structure, use of elements, and use of attributes are consistent with the definitions in the DTD. Thus, applications that handle XML documents can assume that the data matches the definition.
Using XDK, you can write an application that includes a validating XML parser; that is, a program that parses and validates XML documents against a DTD. Depending on its implementation, a validating parser may:
-
Either stop processing when it encounters an error, or continue.
-
Either report warnings and errors as they occur or in summary form at the end of processing.
-
Enable or disable validation mode
Most processors can enable or disable validation mode, but they must still process entity definitions and other constructs of DTDs.
17.1.3.1 DTD Samples in XDK
An example DTD is shown, together with an example XML document that conforms to that DTD.
Example 17-1 shows the contents of a DTD named family.dtd
, which is located in $ORACLE_HOME/xdk/demo/java/parser/common/
. The <ELEMENT>
tags specify the legal nomenclature and structure of elements in the document, whereas the <ATTLIST>
tags specify the legal attributes of elements.
Example 17-2 shows the contents of an XML document named family.xml
, which is also located in $ORACLE_HOME/xdk/demo/java/parser/common/
. The <!DOCTYPE>
element in family.xml
specifies that this XML document conforms to the external DTD named family.dtd
.
Example 17-1 family.dtd
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT family (member*)> <!ATTLIST family lastname CDATA #REQUIRED> <!ELEMENT member (#PCDATA)> <!ATTLIST member memberid ID #REQUIRED> <!ATTLIST member dad IDREF #IMPLIED> <!ATTLIST member mom IDREF #IMPLIED>
Example 17-2 family.xml
<?xml version="1.0" standalone="no"?> <!DOCTYPE family SYSTEM "family.dtd"> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
17.1.4 XML Validation with XML Schemas
Concepts involving validation using XML schemas are introduced.
The XML Schema language, also known as XML Schema Definition, was created by the W3C to use XML syntax to describe the content and the structure of XML documents. An XML schema is an XML document written in the XML Schema language. An XML schema document contains rules describing the structure of an input XML document, called an instance document. An instance document is valid if and only if it conforms to the rules of the XML schema.
The XML Schema language defines such things as:
-
Which elements and attributes are legal in the instance document
-
Which elements can be children of other elements
-
The order and number of child elements
-
Data types for elements and attributes
-
Default and fixed values for elements and attributes
A validating XML parser tries to determine whether an instance document conforms to the rules of its associated XML schema. Using XDK you can write a validating parser that performs this schema validation. Depending on its implementation, a validating parser may:
-
Either stop processing when it encounters an error, or continue.
-
Either report warnings and errors as they occur or in summary form at the end of processing.
The processor must consider entity definitions and other constructs that are defined in a DTD that is included by the instance document. The XML Schema language does not define what must occurs when an instance document includes both an XML schema and a DTD. Thus, the behavior of the application in such cases depends on the implementation.
17.1.4.1 XML Schema Samples in XDK
A sample XML document is shown which contains a purchase report that describes parts that have been ordered in different regions. This document is located at $ORACLE_HOME/xdk/demo/java/schema/report.xml
. An XML schema document, report.xsd
, which you can use to validate report.xml
, is also shown.
Among other things, the XML schema defines the names of the elements that are legal in the instance document and the type of data that the elements can contain.
Example 17-3 report.xml
<purchaseReport
xmlns="http://www.example.com/Report"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/Report report.xsd"
period="P3M" periodEnding="1999-12-31">
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
Example 17-4 report.xsd
<schema targetNamespace="http://www.example.com/Report"
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:r="http://www.example.com/Report"
elementFormDefault="qualified">
<annotation>
<documentation xml:lang="en">
Report schema for Example.com
Copyright 2000 Example.com. All rights reserved.
</documentation>
</annotation>
<element name="purchaseReport">
<complexType>
<sequence>
<element name="regions" type="r:RegionsType">
<keyref name="dummy2" refer="r:pNumKey">
<selector xpath="r:zip/r:part"/>
<field xpath="@number"/>
</keyref>
</element>
<element name="parts" type="r:PartsType"/>
</sequence>
<attribute name="period" type="duration"/>
<attribute name="periodEnding" type="date"/>
</complexType>
<unique name="dummy1">
<selector xpath="r:regions/r:zip"/>
<field xpath="@code"/>
</unique>
<key name="pNumKey">
<selector xpath="r:parts/r:part"/>
<field xpath="@number"/>
</key>
</element>
<complexType name="RegionsType">
<sequence>
<element name="zip" maxOccurs="unbounded">
<complexType>
<sequence>
<element name="part" maxOccurs="unbounded">
<complexType>
<complexContent>
<restriction base="anyType">
<attribute name="number" type="r:SKU"/>
<attribute name="quantity" type="positiveInteger"/>
</restriction>
</complexContent>
</complexType>
</element>
</sequence>
<attribute name="code" type="positiveInteger"/>
</complexType>
</element>
</sequence>
</complexType>
<simpleType name="SKU">
<restriction base="string">
<pattern value="\d{3}-[A-Z]{2}"/>
</restriction>
</simpleType>
<complexType name="PartsType">
<sequence>
<element name="part" maxOccurs="unbounded">
<complexType>
<simpleContent>
<extension base="string">
<attribute name="number" type="r:SKU"/>
</extension>
</simpleContent>
</complexType>
</element>
</sequence>
</complexType>
</schema>
17.1.5 Differences Between XML Schemas and DTDs
The XML Schema language includes most of the capabilities of the DTD specification. An XML schema serves a similar purpose to a DTD, but is more flexible in specifying document constraints.
Table 17-1 compares some features between the two validation mechanisms.
Table 17-1 Feature Comparison Between XML Schema and DTD
Feature | XML Schema | DTD |
---|---|---|
Element nesting |
X |
X |
Element occurrence constraints |
X |
X |
Permitted attributes |
X |
X |
Attribute types and default values |
X |
X |
Written in XML |
X |
|
Namespace support |
X |
|
Built-In data types |
X |
|
User-Defined data types |
X |
|
Include/Import |
X |
|
Refinement (inheritance) |
X |
These reasons are probably the most persuasive for choosing XML schema validation over DTD validation:
-
The XML Schema language enables you to define rules for the content of elements and attributes. You achieve control over content by using data types. With XML Schema data types you can more easily perform actions such as:
-
Declare which elements are to contain which types of data, for example, positive integers in one element and years in another
-
Process data obtained from a database
-
Define restrictions on data, for example, a number between 10 and 20
-
Define data formats, for example, dates in the form MM-DD-YYYY
-
Convert data between different data types, for example, strings to dates
-
-
Unlike DTD grammar, documents written in the XML Schema language are themselves written in XML. Thus, you can perform these actions:
-
Use your XML parser to parse your XML schema
-
Process your XML schema with the XML Document Object Model (DOM)
-
Transform your XML document with Extensible Stylesheet Language Transformation (XSLT)
-
Reuse your XML schemas in other XML schemas
-
Extend your XML schema by adding elements and attributes
-
Reference multiple XML schemas from the same document
-
17.2 Using the XML Schema Processor: Overview
The Oracle XML Schema processor is a SAX-based XML schema validator that you can use to validate instance documents against an XML schema. The processor supports both language example (LAX) and strict validation.
You can use the processor in these ways:
-
Enable it in the XML parser
-
Use it with a DOM tree to validate whole or part of an XML document
-
Use it as a component in a processing pipeline (like a content handler)
You can configure the schema processor in different ways depending on your requirements. For example, you can:
-
Use a fixed XML schema or automatically build a schema based on the
schemaLocation
attributes in an instance document. -
Set
XMLError
andentityResolver
to gain better control over the validation process. -
Determine how much of an instance document is to be validated. You can use any of the validation modes specified in Table 12-1. You can also designate a type of element as the root of validation.
17.2.1 Using the XML Schema Processor for Java: Basic Process
XDK packages that are important for applications that process XML schemas are described.
These are the important packages for applications that process XML schemas:
-
oracle.xml.parser.v2
, which provides APIs for XML parsing -
oracle.xml.parser.schema
, which provides APIs for XML Schema processing
The most important classes in the oracle.xml.parser.schema
package are described in Table 17-2. These form the core of most XML schema applications.
Table 17-2 oracle.xml.parser.schema Classes
Class/Interface | Description | Methods |
---|---|---|
|
Represents XML Schema component model. An |
The principal methods are:
|
|
Represents schema components in a target namespace, including type definitions, element and attribute delcarations, and group and attribute group definitions. |
The principal methods are |
|
Builds an |
The principal methods are:
|
|
Validates an instance XML document against an XML schema. When registered, an |
The principal methods are:
|
Figure 17-1 depicts the basic process of validating an instance document with the XML Schema processor for Java.
The XML Schema processor performs these major tasks:
-
A builder (
XSDBuilder
object) assembles the XML schema from an input XML schema document. Although instance documents and schemas need not exist specifically as files on the operating system, they are commonly referred to as files. They may exist as streams of bytes, fields in a database record, or collections of XML Infoset "Information Items."This task involves parsing the schema document into an object. The builder creates the schema object explicitly or implicitly:
-
In explicit mode, you pass in an XML schema when you invoke the processor. Validating Against Externally Referenced XML Schemas explains how to build the schema object in explicit mode.
-
In implicit mode, you do not pass in an XML schema when you invoke the processor because the schema is internally referenced by the instance document. Validating Against Internally Referenced XML Schemas explains how to create the schema object in implicit mode.
-
-
The XML schema validator uses the schema object to validate the instance document. This task has these steps:
-
A Simple API for XML (SAX) parser parses the instance document into SAX events, which it passes to the validator.
-
The validator receives SAX events as input and validates them against the schema object, sending an error message if it finds invalid XML components.
Validation in the XML Parser describes the validation modes that you can use when validating the instance document. If you do not explicitly set a schema for validation with the
XSDBuilder
class, then the instance document must have the correctxsi:schemaLocation
attribute pointing to the schema file. Otherwise, the program does not perform the validation. If the processor encounters errors, it generates error messages. -
The validator sends input SAX events, default values, or post-schema validation information to a DOM builder or application.
-
See Also:
-
Oracle Database XML Java API Reference to learn about the
XSDBuilder
,DOMParser
, andSAXParser
classes -
Using the XML Schema Processor for Java to learn about the XDK SAX and DOM parsers
17.2.2 Running the XML Schema Processor Demo Programs
Demo programs for the XML Schema processor for Java are included in $ORACLE_HOME/xdk/demo/java/schema
.
Table 17-3 describes the XML files and programs that you can use to test the XML Schema processor.
Table 17-3 XML Schema Sample Files
File | Description |
---|---|
cat.xsd |
A sample XML schema used by the |
catalogue.xml |
A sample instance document that the |
catalogue_e.xml |
A sample instance document used by the |
DTD2Schema.java |
This sample program converts a DTD (first argument) into an XML Schema and uses it to validate an XML file (second argument). |
embeded_xsql.xsd |
The XML schema used by |
embeded_xsql.xml |
The instance document used by |
juicer1.xml |
A sample XML document for use with |
juicer1.xsd |
A sample XML schema for use with |
juicer2.xml |
A sample XML document for use with |
juicer2.xsd |
A sample XML document for use with |
report.xml |
The sample XML file that |
report.xsd |
A sample XML schema used by the |
report_e.xml |
When the program validates this sample XML file using |
xsddom.java |
This program shows how to validate an instance document by get a DOM representation of the document and using an |
xsdent.java |
This program validates an XML document by redirecting the referenced schema in the |
xsdent.xml |
This XML document describes a book. The file is used as an input to |
xsdent.xsd |
This XML schema document defines the rules for |
xsdent-1.xsd |
The XML schema document referenced by the |
xsdproperty.java |
This demo shows how to configure the XML Schema processor to validate an XML document based on a complex type or element declaration. |
xsdsax.java |
This demo shows how to validate an XML document received as a SAX stream. |
XSDLax.java |
This demo is the same as |
XSDSample.java |
This program is a sample driver that you can use to process XML instance documents. |
XSDSetSchema.java |
This program is a sample driver to process XML instance documents by overriding the |
Documentation for how to compile and run the sample programs is located in the README
in the same directory. The basic steps are:
17.2.3 Using the XML Schema Processor Command-Line Utility
You can use the XML parser command-line utility (oraxml
) to validate instance documents against XML schemas and DTDs.
See Also:
Using the Java XML Parser Command-Line Utility (oraxml) for information about how to run oraxml
.
17.2.3.1 Using oraxml to Validate Against a Schema
An example shows how you can validate document report.xml
against the XML schema report.xsd
by invoking oraxml
on the command line.
Example 17-5 Using oraxml to Validate Against a Schema
Invoke this command in directory $ORACLE_HOME/xdk/demo/java/schema
:
oraxml -schema -enc report.xml
The expected output is:
The encoding of the input file: UTF-8
The input XML file is parsed without errors using Schema validation mode.
17.2.3.2 Using oraxml to Validate Against a DTD
An example shows how you can validate document family.xml
against the DTD family.dtd
by invoking oraxml
on the command line.
Example 17-6 Using oraxml to Validate Against a DTD
Invoke this command in directory $ORACLE_HOME/xdk/demo/java/parser/common
:
oraxml -dtd -enc family.xml
The expected output is:
The encoding of the input file: UTF-8
The input XML file is parsed without errors using DTD validation mode.
17.3 Validating XML with XML Schemas
Topics cover various ways to validate XML documents using XML schemas.
17.3.1 Validating Against Internally Referenced XML Schemas
$ORACLE_HOME/xdk/demo/java/schema/XSDSample.java
shows how to validate against an implicit XML Schema. The validation mode is implicit because the XML schema is referenced in the instance document itself.
Follow the steps in this section to write programs that use the setValidationMode()
method of the oracle.xml.parser.v2.
DOMParser
class:
17.3.2 Validating Against Externally Referenced XML Schemas
$ORACLE_HOME/xdk/demo/java/schema/XSDSetSchema.java
shows how to validate an XML schema explicitly. The validation mode is explicit because you use the XSDBuilder
class to specify the schema to use for validation: the schema is not specified in the instance document as in implicit validation.
Follow the basic steps in this section to write Java programs that use the build()
method of the oracle.xml.parser.schema.XSDBuilder
class:
17.3.3 Validating a Subsection of an XML Document
In LAX mode, you can validate parts of an XML document without validating all of it. LAX parsing validates elements in a document that are declared in an associated XML schema. The processor does not consider the instance document invalid if it contains no elements declared in the schema.
By using LAX mode, you can define the schema only for the part of the XML to be validated. The $ORACLE_HOME/xdk/demo/java/schema/XSDLax.java
program shows how to use LAX validation. The program follows the basic steps described in Validating Against Externally Referenced XML Schemas:
- Build an XML schema object from the user-specified XML schema document.
- Create a DOM parser to use for validation of the instance document.
- Specify the XML schema to use for validation.
- Set the validation mode for the DOM parser object.
- Set the output error stream for the parser.
- Validate the instance document against the XML schema by invoking
DOMParser.parse()
.
To enable LAX validation, the program sets the validation mode in the parser to SCHEMA_LAX_VALIDATION
rather than to SCHEMA_VALIDATION
. This code fragment from XSDLax.java
shows this technique:
dp.setXMLSchema(schemadoc); dp.setValidationMode(XMLParser.SCHEMA_LAX_VALIDATION); dp.setPreserveWhitespace (true); . . .
You can test LAX validation by running the sample program:
java XSDLax embeded_xsql.xsd embeded_xsql.xml
17.3.4 Validating XML from a SAX Stream
$ORACLE_HOME/xdk/demo/java/schema/xsdsax.java
shows how to validate an XML document received as a SAX stream. You instantiate an XSDValidator
and register it with the SAX parser as the content handler.
Follow the steps in this section to write programs that validate XML from a SAX stream:
17.3.5 Validating XML from a DOM
$ORACLE_HOME/xdk/demo/java/schema/xsddom.java
shows how to validate an instance document by get a DOM representation of the document and using an XSDValidator
object to validate it.
The xsddom.java
program follows these steps:
17.3.6 Validating XML from Designed Types and Elements
$ORACLE_HOME/xdk/demo/java/schema/xsdproperty.java
shows how to configure the XML Schema processor to validate an XML document based on a complex type or element declaration.
The xsdproperty.java
program follows these steps:
-
Create
String
objects for the instance document name, XML schema name, root node namespace, root node local name, and specification of element or complex type ("true" means the root node is an element declaration). This code fragment shows this technique:String xmlfile = args[0]; String xsdfile = args[1]; ... String ns = args[2]; //namespace for the root node String nm = args[3]; //root node's local name String el = args[4]; //true if root node is element declaration, // otherwise, the root node is a complex type
-
Create an XSD builder and use it to create the schema object. This code fragment shows this technique:
XSDBuilder builder = new XSDBuilder(); URL url = XMLUtil.createURL(xsdfile); XMLSchema schema; ... schema = (XMLSchema) builder.build(url);
-
Get the node. Invoke different methods depending on whether the node is an element declaration or a complex type:
-
If the node is an element declaration, pass the local name and namespace to the
getElement()
method of the schema object. -
If the node is an element declaration, pass the namespace, local name, and root complex type to the
getType()
method of the schema object.
xsdproperty.java
uses this control structure:QxName qname = new QxName(ns, nm); ... XSDNode nd; ... if (el.equals("true")) { nd = schema.getElement(ns, nm); /* process ... */ } else { nd = schema.getType(ns, nm, XSDNode.TYPE); /* process ... */ }
-
-
After getting the node, create a new parser and set the schema to the parser to enable schema validation. This code fragment shows this technique:
DOMParser dp = new DOMParser(); URL url = XMLUtil.createURL (xmlURI);
-
Set properties on the parser and then parse the URL. Invoke the
schemaValidatorProperty()
method:-
Set the root element or type property on the parser to a fully qualified name.
For a top-level element declaration, set the property name to
XSDNode.ROOT_ELEMENT
and the value to aQName
, as showd by theprocess1()
method.For a top-level type definition, set the property name to
XSDNode.ROOT_TYPE
and the value to aQName
, as showd by theprocess2()
method. -
Set the root node property on the parser to an element or complex type node.
For an element node, set the property name to
XSDNode.ROOT_NODE
and the value to anXSDElement
node, as showd by theprocess3()
method.For a type node, set the property name to
XSDNode.ROOT_NODE
and the value to anXSDComplexType
node, as showd by theprocess3()
method.
This code fragment shows the sequence of method invocation:
if (el.equals("true")) { nd = schema.getElement(ns, nm); process1(xmlfile, schema, qname); process3(xmlfile, schema, nd); } else { nd = schema.getType(ns, nm, XSDNode.TYPE); process2(xmlfile, schema, qname); process3(xmlfile, schema, nd); }
The processing methods are implemented:
static void process1(String xmlURI, XMLSchema schema, QxName qname) throws Exception { /* create parser... */ dp.setXMLSchema(schema); dp.setSchemaValidatorProperty(XSDNode.ROOT_ELEMENT, qname); dp.setPreserveWhitespace (true); dp.setErrorStream (System.out); dp.parse (url); ... } static void process2(String xmlURI, XMLSchema schema, QxName qname) throws Exception { /* create parser... */ dp.setXMLSchema(schema); dp.setSchemaValidatorProperty(XSDNode.ROOT_TYPE, qname); dp.setPreserveWhitespace (true); dp.setErrorStream (System.out); dp.parse (url); ... } static void process3(String xmlURI, XMLSchema schema, XSDNode node) throws Exception { /* create parser... */ dp.setXMLSchema(schema); dp.setSchemaValidatorProperty(XSDNode.ROOT_NODE, node); dp.setPreserveWhitespace (true); dp.setErrorStream (System.out); dp.parse (url); ... }
-
17.4 Tips and Techniques for Programming with XML Schemas
Topics include overriding schema location and converting a DTD to an XML schema.
17.4.1 Overriding the Schema Location with an Entity Resolver
When XSDBuilder
builds a schema, it might need to include or import other schemas that are specified as URLs in a schemaLocation
attribute. In some situations, you might want to override the schema locations specified in <import>
and supply the builder with the required schema documents.
The xsdent.java
demo described in Table 17-3 shows a case where a schema specified as schemaLocation
needs to be imported. The document element in xsdent.xml
file contains this attribute:
xsi:schemaLocation = "http://www.example.com/BookCatalogue xsdent.xsd">
The xsdent.xsd
document contains these elements:
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com/BookCatalogue" xmlns:catd = "http://www.example.com/Digest" xmlns:cat = "http://www.example.com/BookCatalogue" elementFormDefault="qualified"> <import namespace = "http://www.example.com/Digest" schemaLocation = "xsdent-1.xsd" />
As an example of wanting to override schema locations specified in <import>
and supplying the builder with the required schema documents, suppose that you have downloaded the schemas documents from external web sites and stored them in a database. In such a situation, you can set an entity resolver in the XSDBuilder
. XSDBuilder
passes the schema location to the resolver, which returns an InputStream
, Reader
, or URL
as an InputSource
. The builder can read the schema documents from the InputSource
.
The xsdent.java
program shows how you can override the schema location with an entity resolver. You must implement the EntityResolver
interface, instantiate the entity resolver, and set it in the XML schema builder. In the demo code, sampleEntityResolver1
returns InputSource
as an InputStream
whereas sampleEntityResolver2
returns InputSource
as a URL
.
Follow these basic steps:
17.4.2 Converting DTDs to XML Schemas
Because of the power and flexibility of the XML Schema language, you may want to convert your existing DTDs to XML schema documents. You can use XDK to perform this transformation.
The $ORACLE_HOME/xdk/demo/java/schema/DTD2Schema.java
program shows how to convert a DTD. You can test the program:
java DTD2Schema dtd2schema.dtd dtd2schema.xml
Follow these basic steps to convert a DTD to an XML schema document: