XML规范化(1)

** XML规范化(1) **

文章的主要目的是说明如何规范化XML文档,为了更好地了解规范化的规则,我在翻译时省略了一些内容(XML数字签名;非对称密钥体系和信息摘要)。
让我们先来看看下面两份文件(文件1和文件2)
**文件1
**

1<rooms>
2<room charge="50" currency="USD" type="single"></room>
3<room charge="70" currency="USD" type="double"></room>
4<room charge="100" currency="USD" type="suite"></room>
5</rooms>

** 文件2
**

1<rooms>
2<room charge="50" currency="USD" type="single"></room>
3<room charge="70" currency="USD" type="double"></room>
4<room charge="100" currency="USD" type="suite"></room>
5</rooms>

你肯定会说:这两份文件是一样的。对的,这两份文件表达的是相同的信息,采用了同样的文档结构,它们在逻辑上是一样的。你也许也已经注意到了它们之间的一些小差别:某些内容的顺序不一样(蓝色字体的内容)。
在这个例子里,两份文件的元素room的属性的顺序是不一样的,所以,它们相应的字节流也是不同的。当然,还有其他很多原因导致在逻辑上相同的XML文档的字符流不同。建立XML文档规范形式的目的是用来判定不同的XML文档在逻辑上是否相同。W3C制定了规范化规则,使用这些规则对两份逻辑上相同的文档进行规范化后,可以得到相同的文档。 当我们需要判断两份XML文档在逻辑上是否相同时,我们可以先将文档规范化,然后转化成字节流进行比较,如果字节流相同,那么我们可以断定这两份文档在逻辑上是相同的。
XML规范化规则定义了一套规则用来形成规范的XML文档。下面将以一份文件(文件3)为例,逐步说明如何规范化XML文档。

** 文件3 **

]>

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part id="P 184.675" name="bearing">
 4<sup:supplier id="S  1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>&amp;testhistory</comments>
 8</part>
 9<part id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>&amp;testhistory</comments>
14</part>
15</parts>
16</product>

**1 编码方式
** 编码是指按照一定的方式用字节代替字符。很显然,使用了不同的编码方式的同样内容的文档,得到的字节流是不同的。
XML规范条款规定XML的规范形式使用UTF-8进行编码,如果需要规范化的XML文档使用其他的方式编码,首先要将它转化为UTF-8编码。


**2 断行符
** 文本文件中断行符一般使用A或D(十六进制)或者两者的组合来表示。XML文档是普通的文档文件,所以它也使用#xA和#xD作为断行符。XML的规范形式要求所有的断行符都用#xA表示。

3 空白符

XML规范化要求将所有的空白符(比如tab和space)转化成space(#x20),文件4是转化后的文件。注:在文件3中(

1<sup:supplier id="S  1753"></sup:supplier>

),S与1753之间存在一个制表符

** 文件4 **

]>

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>&amp;testhistory</comments>
 8</part>
 9<part id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>&amp;testhistory</comments>
14</part>
15</parts>
16</product>

4 属性值中的双引号

XML文档的规范形式中,属性值必须使用双引号括起来。文件4中(红色部分),name的属性值用的是单引号,必须改成双引号。文件5是规范后的文件。

** 文件5 **

]>

 1<product "="" classification="MeasuringInstruments/Electrical/Energy/" energymeter"="" id="P 184.435" name="rotating disc  " xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>&amp;testhistory</comments>
 8</part>
 9<part id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>&amp;testhistory</comments>
14</part>
15</parts>
16</product>

5 属性值中的特殊字符

文件5有一个问题(红色部分):name 的属性值含有双引号。XML规范化规则规定,属性值中的特殊字符(比如双引号)必须使用相应的转义字符(比如用"代替双引号)代替。

** 文件6 **

]>

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments> &amp;testhistory  </comments>
 8</part>
 9<part id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments> &amp;testhistory  </comments>
14</part>
15</parts>
16</product>

**6 实体引用
** 文件6包含了DTD声明,它定义了一个实体:testhistory(红色部分),这个实体被元素comments引用。规范化要求文档中不能存在实体引用,需要用其内容代替引用。文件7是规范化后的文档。
** 文件7 **

]>

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "e;Energymeter"e;' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

7 缺省属性

文件7为part元素定义了一个缺省属性approved(红色字体),在规范化的文档中,缺省属性必须出现在元素的属性中。文件8时规范化后的文件。
** 文件8 **

]>

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "e;Energymeter"e;' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

**9 XML和DTD声明
** 规范化的XML文档不能存在XML或DTD声明,文件9是将XML和DTD声明去除后的文件。

** 文件9
**

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "e;Energymeter"e;' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

**10 文档元素外的空格
** 规范化的XML文档在文档元素外面不能存在空格,文档以“<”开始,在"<"前面不能有空格。文件10时去掉“<”前面的空格后的文件。

** 文件10
**

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "e;Energymeter"e;' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

**11 开始和结束元素中的空格
** 1 ) "<"与元素名之间不能存在空格,"

"之前不能有空格。

**12 空元素
** 规范化的xml文档中,空元素要以<...>

的形式出现,将

1<emptyelement></emptyelement>

转化为

1<emptyelement></emptyelement>

后得到文件11。

**文件11
**

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet" xmlns=" http://www.myFictitiousCompany.com/product ">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

**13 名称空间声明
** XML文档规范化要求文档中除了多余的名称空间外,所有的名称空间都保留。文件11中第二个part元素的名称空间是多余的,将她出去不会影响文档中所有节点的名称空间上下文。

** 文件12
**

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

**14 元素属性的排序
** XML文档规范化要求元素的属性以字母的升序排列,在一个元素中,名称空间首先出现,然后是属性名和属性值,文件13是排列后的文件
** 文件13
**

 1<product classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name='rotating disc "Energymeter"' xmlns=" http://www.myFictitiousCompany.com/product " xmlns:sup=" http://www.myFictitiousCompany.com/supplier ">
 2<parts>
 3<part approved="yes" id="P 184.675" name="bearing">
 4<sup:supplier id="S 1753"></sup:supplier>
 5<sup:supplier id="S 2341"></sup:supplier>
 6<sup:supplier id="S 3276"></sup:supplier>
 7<comments>Part has been tested according to the specified standards.</comments>
 8</part>
 9<part approved="yes" id="P 184.871" name="magnet">
10<sup:supplier id="S 3908"></sup:supplier>
11<sup:supplier id="S 4589"></sup:supplier>
12<sup:supplier id="S 1098"></sup:supplier>
13<comments>Part has been tested according to the specified standards.</comments>
14</part>
15</parts>
16</product>

(待续)

XML规范化规则 (W3C制定)

Published At
Categories with Web编程
Tagged with
comments powered by Disqus