XML in Databases
YoonJoon Lee ( 李 潤 俊 )
韓國科學技術院
Contents
• What is XML?
• XML Data vs Documents
• Store and retrieve XML in RDB
• GML
What is XML?
– A markup language that you can use to create your own tags
– Created by W3C to overcome the limitations of HTML
– Based on SGML(Standard Generalized Markup Language – “Sounds great, maybe later), used in publishing industry
– Designed with the Web in mind
Origins of XML
– In 1996, Jon Bosak convinced that W3C to let him form a committee on using SGML on the Web.
– November, the committee has created the beginning of a simplified form of SGML, this was XML.
– In March 1997, Bosak released a paper “XML, Java and the Future of the Web.”
– SGML was created for general document structuring, HTML as an application SGML for Web document, XML is a simplification of SGML for general Web use.
A Sample XML document
< address>
1<name>
2
3## <title>Mrs.</title>
4
5## <first-name>Mary</first-name>
6
7## <last-name>McGoon</last-name>
8
9## </name>
1<street>1401 Main Street</street>
1<city state="“NC”">Anytown</city>
1<postal-code>34829</postal-code>
Tags, elements and attributes
DTD (1/ 2 )
– Document type definition
– Extensible in XML, a dialect of XML
• RDF, HL7 SGML/XML, MathML, XML/EDI, FDX
– Describes what tags the markup language has, what tags’ attribute may be, and how they may be combined.
– Specifies very clearly what information may or may not be included in markup language.
– DTD syntax is different from ordinary XML syntax.
DTD (2/2)
Is XML a DB?
• “ collection of data”
• Advantages: self-describing, portable, data in tree or graph structure
• Disadvantages: verbose, slow access
+ storage, schemas, query languages, programming interfaces, …
- efficient storage, indexes, security, transactions and data integrity, multi-user access. Trigger queries across multiple documents, …
Why DB?
• Want to expose legacy data
• Looking for a place to store web pages
• Database used by an e-commerce application in which XML is used as a data transfer
• Interested in Data or Documents
Data vs. Documents
• Used simply as a data transport between the database and a application?
• Integral use as in the case of XHTML and DocBook documents?
Data-Centric Documents (1/2)
• For machine consumption
Ex) sales orders, flight schedules, …
• Fairly regular structure, fine-grained data and little or no mixed content, no significant order in sibling
Data-Centric Documents (2/2)
< FlightInfo>
1<airline>ABC Airways</airline>
provides
1<count>three</count>
non-stop flights daily from
1<origin>Dallas</origin>
to
1<destination>Fort Worth</destination>
. Departure times are
1<departure>09:15</departure>
,
1<departure>11:15</departure>
, and
1<departure>13:15</departure>
. Arrival times are minutes later.
< Flights>
1<airline>ABC Airways</airline>
1<origin>Dallas</origin>
1<destination>Fort Worth</destination>
1<flight>
2
3## <departure>09:15</departure>
4
5## <arrival>09:16</arrival>
6
7## </flight>
1<flight>
2
3## <departure>11:15</departure>
4
5## <arrival>11:16</arrival>
6
7## </flight>
1<flight>
2
3## <departure>13:15</departure>
4
5## <arrival>13:16</arrival>
6
7## </flight>
Document-Centric Documents (1/2)
• For human consumption
Ex) books, email, advertisement, …
• Less regular or irregular structure, larger grained data, lots of mixed contents, almost significant order in sibling
Document-Centric Documents (2/2)
< Product>
1<intro>
2
3## The <productname>Turkey Wrench</productname> from <developer>Full Fabrication Labs, Inc.</developer> is <summary>like a monkey wrench, but not as big.</summary>
4
5## </intro>
1<description>
2
3## <para>The turkey wrench, which comes in <i>both right- and left- handed versions (skyhook optional)</i>, is made of the <b>finest stainless steel</b>. The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</para>
4
5## <para>You can:</para>
6
7## <list>
8
9## <item><link url="Order.html"/>Order your own turkey wrench</item>
10
11## <item><link url="Wrenches.htm"/>Read more about wrenches</item>
12
13## <item><link url="Catalog.zip"/>Download the catalog</item>
14
15## </list>
16
17## <para>The turkey wrench costs <b>just $19.99</b> and, if you order now, comes with a <b>hand-crafted shrimp hammer</b> as a bonus gift.</para>
18
19## </description>
Store & Retrieve XML
• File
• ** RDBMS
**