用JS清除word保存为html格式后产生的垃圾代码

比如:

1<span lang="EN-US" style="FONT-SIZE: 14pt; FONT-FAMILY: 宋体; mso-bidi-font-size: 12.0pt">策划</span>
1<span lang="EN-US" style="FONT-SIZE: 14pt; FONT-FAMILY: 宋体; mso-bidi-font-size: 12.0pt"><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></span>
1<p align="center" class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; LINE-HEIGHT: 150%; TEXT-ALIGN: center"><span lang="EN-US" style="FONT-FAMILY: 宋体"><font size="3"> <o:p></o:p></font></span></p>

==============================================================
我想把

1<span></span>

都去掉,只剩下

1<p></p>

1<b></b>

,请教各位了!
---------------------------------------------------------------

那你就用FrontPage嘛,为何非得用Word
---------------------------------------------------------------

可以删除吗? 关注
---------------------------------------------------------------

字符窜查找替换啊。把所有"

1<span>""</span>

"都换成""
---------------------------------------------------------------

用Dreamweaver打开页面,在“命令”下有个“清理HTML命令”执行一下,就可以了,里面的参数有个word的选项的
---------------------------------------------------------------

用word做的网页会产生大量的垃圾代码,
所以,建议用Dreamweaver或是FrontPage做网页
用Dreamweaver打开该html文件
里面有清理word的垃圾代码的命令。

---------------------------------------------------------------

gz
---------------------------------------------------------------

 1<script language="javascript">   
 2<!--   
 3function cleanWordString( html ) {   
 4html = html.replace(/<\/?SPAN[^>]*>/gi, "" );// Remove all SPAN tags   
 5html = html.replace(/<(\w[^>]*) class=([^ ¦>]*)([^>]*)/gi, "<$1$3") ; // Remove Class attributes   
 6//html = html.replace(/<(\w[^>]*) style="([^"]*)"([^>]*)/gi, "<$1$3") ; // Remove Style attributes   
 7html = html.replace(/<(\w[^>]*) lang=([^ ¦>]*)([^>]*)/gi, "<$1$3") ;// Remove Lang attributes   
 8html = html.replace(/<\\\?\?xml[^>]*>/gi, "") ;// Remove XML elements and declarations   
 9html = html.replace(/<\/?\w+:[^>]*>/gi, "") ;// Remove Tags with XML namespace declarations: <o:p></o:p>   
10html = html.replace(/&nbsp;/, " " );// Replace the &nbsp;   
11// Transform <P> to <DIV>   
12var re = new RegExp("(<P)([^>]*>.*?)(<\/P>)","gi") ; // Different because of a IE 5.0 error   
13html = html.replace( re, "<div$2</div>" ) ;   
14//insertHTML( html ) ;   
15test.b.value = html   
16}   
17//-->   
18</script>
1<form id="test">
2<textarea cols="60" name="a" rows="13"></textarea><br/>
3<textarea cols="60" id="b" name="b" rows="13"></textarea>
4<input onclick="cleanWordString(test.a.value);" type="button" value="转换"/>
5</form>
Published At
Categories with Web编程
comments powered by Disqus