如何用正则表达式过滤word文件另存为的HTML文件中的冗余样式代码

下面是一段WORD文件另存为HTML文件的HTML代码

 1<html>
 2<head>
 3<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 4<meta content="Microsoft Word 10 (filtered)" name="Generator"/>
 5<title>Hello1</title>
 6<style>   
 7<!--   
 8/* Font Definitions */   
 9@font-face   
10{font-family:宋体;   
11panose-1:2 1 6 0 3 1 1 1 1 1;}   
12@font-face   
13{font-family:"\@宋体";   
14panose-1:2 1 6 0 3 1 1 1 1 1;}   
15/* Style Definitions */   
16p.MsoNormal, li.MsoNormal, div.MsoNormal   
17{margin:0cm;   
18margin-bottom:.0001pt;   
19text-align:justify;   
20text-justify:inter-ideograph;   
21font-size:10.5pt;   
22font-family:"Times New Roman";}   
23/* Page Definitions */   
24@page Section1   
25{size:595.3pt 841.9pt;   
26margin:72.0pt 90.0pt 72.0pt 90.0pt;   
27layout-grid:15.6pt;}   
28div.Section1   
29{page:Section1;}   
30\-->   
31</style>
32</head>
33<body lang="ZH-CN" style="text-justify-trim:punctuation">
34<div class="Section1" style="layout-grid:15.6pt">
35<table align="left" border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse:collapse;border:none;margin-left:6.75pt;margin-right:   
366.75pt">
37<tr style="height:22.5pt">
38<td rowspan="2" style="width:63.0pt;border:solid windowtext 1.0pt;   
39padding:0cm 5.4pt 0cm 5.4pt;height:22.5pt" valign="top" width="84">
40<p class="MsoNormal"><span lang="EN-US">Hello1</span></p>
41</td>
42<td colspan="3" style="width:261.0pt;border:solid windowtext 1.0pt;   
43border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:22.5pt" valign="top" width="348">
44<p class="MsoNormal"><span lang="EN-US">Hello2</span></p>
45</td>
46</tr>
47<tr style="height:15.0pt">
48<td colspan="3" style="width:261.0pt;border-top:none;   
49border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;   
50padding:0cm 5.4pt 0cm 5.4pt;height:15.0pt" valign="top" width="348">
51<p class="MsoNormal"><span lang="EN-US">Hello3</span></p>
52</td>
53</tr>
54<tr style="height:31.5pt">
55<td colspan="2" style="width:99.0pt;border:solid windowtext 1.0pt;   
56border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:31.5pt" valign="top" width="132">
57<p class="MsoNormal"><span lang="EN-US">Hello4</span></p>
58</td>
59<td style="width:117.0pt;border-top:none;border-left:   
60none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;   
61padding:0cm 5.4pt 0cm 5.4pt;height:31.5pt" valign="top" width="156">
62<p class="MsoNormal"><span lang="EN-US">Hello5</span></p>
63</td>
64<td style="width:108.0pt;border-top:none;border-left:   
65none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;   
66padding:0cm 5.4pt 0cm 5.4pt;height:31.5pt" valign="top" width="144">
67<p class="MsoNormal"><span lang="EN-US">Hello6</span></p>
68</td>
69</tr>
70<tr height="0">
71<td style="border:none" width="84"></td>
72<td style="border:none" width="48"></td>
73<td style="border:none" width="156"></td>
74<td style="border:none" width="144"></td>
75</tr>
76</table>
77<p class="MsoNormal"><span lang="EN-US"> </span></p>
78</div>
79</body>
80</html>

我想把整个代码中的

1<table><tr><td>标签中的样式属性给过滤掉。比方说把   
2<td style="width:108.0pt;border-top:none;border-left:   
3none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;   
4padding:0cm 5.4pt 0cm 5.4pt;height:31.5pt" valign="top" width="144">   
5过滤成干净的<td>。   
6自己写了好多pattern都不成。刚接触正则表达式,还不熟悉,请各位帮忙。最好能提供VBSCRIPT版本的代码。   
7\---------------------------------------------------------------   
8objRegExp.Pattern = "<td(. ¦\n)+?="">"</td(.></td></td></td></tr></table>
Published At
Categories with Web编程
Tagged with
comments powered by Disqus