remove attributes from html tags in xslt - XSLT

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

David W. 159 posts 284 karma points c-trib

Jul 01, 2010 @ 10:26

0

Remove attributes from html-tags in xslt

XSLT

Hello,

I'm trying to strip all attributes (specifically 'style') from html-tags in my xslt for my RSS-feed. I want to keep all html-tags (, etc), so umbraco.library.StripHtml wont do it.

ie, I want "some text" to become "some text". How can I achieve this?

Thanks.

Copy Link
Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Jul 01, 2010 @ 10:43
0
Sledger,

Just found this on google not tested but you need something like
```
 <xsl:template match="p"> 
```
```
   
    <xsl:for-each select="@*"> 
    </xsl:for-each> 
```
```
<xsl:value-of select="./text()"/>  
```
```
 
  </xsl:template> 
```
that will loop through all attributes and we dont write out anything in the for-each hence they will get ignored.

Regards

Ismail
Copy Link
David W. 159 posts 284 karma points c-trib

Jul 01, 2010 @ 10:57

0

Thanks for the reply. But the string I want to format is from the bodyText-field, like this:

...
<content:encoded>
<xsl:value-of select="concat('<![CDATA[ ', ./data [@alias='bodyText'],']]>')" disable-output-escaping="yes"/>
</content:encoded>
...

Is it possible to apply your method to this as well?

Copy Link
Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Jul 01, 2010 @ 11:15
0
Sledger,

Not sure if this will work but could you do something like
```
<xsl:variable name="tmpBodyText">
```
```
 <xsl:copy-of select="./data [@alias='bodyText']"/>
```
```
</xsl:variable>
```
```
<xsl:apply-templates select="msxml:node-set($tmpBodyText)//p"/>
```
then do what you need to do, again not tested just an idea.

Regards

Ismail
Copy Link
David W. 159 posts 284 karma points c-trib

Jul 01, 2010 @ 11:24

0

Hm, I will give that a try, but I need it toremove attributes from all html-tags and not just -tags.

Copy Link

Lee Kelleher 4020 posts 15802 karma points MVP 13x admin c-trib

Jul 01, 2010 @ 11:44

Hi Sledger,

Not sure that you're going to be able to do this purely with XSLT. (It might be possible, but reckon you'll burn hours trying to achieve it!)

My suggestion is to write an XSLT extension to perform a RegEx against the bodyText, removing specific attributes.

i.e.

public static string CleanHtml(string html)
{ 
    // start by completely removing all unwanted tags 
    html = Regex.Replace(html, @"<[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>", "", RegexOptions.IgnoreCase); 
    // then run another pass over the html (twice), removing unwanted attributes 
    html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase); 
    html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase); 
    return html;
}

Reference to source: http://tim.mackey.ie/CleanWordHTMLUsingRegularExpressions.aspx

Good luck, Lee.

Copy Link

Rich Green 2246 posts 4008 karma points

Jul 01, 2010 @ 11:52

0

I've never used it but isn't http://htmlagilitypack.codeplex.com/ ideal for this type of thing?

Rich

Copy Link
Lee Kelleher 4020 posts 15802 karma points MVP 13x admin c-trib

Jul 01, 2010 @ 11:57

0

Yes, HTML Agility Pack is excellent for navigating/traversing/manipulating (and more) with HTML objects (DOM). You could remove all attributes with it - but it's an extra dependency, when a quick-n-dirty RegEx can (could) take care of it. (Since RegEx is already in the .NET framework).

Copy Link
Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Jul 01, 2010 @ 12:04

0

Lee,

Would my suggestion not work? Or is it that bodyText unless in cdata with have entities etc that will cause it to go boom? Ps that idea with @* remembered it from a tridion project where we had to clean out some word crap.

Regards

Ismail

Copy Link
Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib

Jul 01, 2010 @ 12:18

0

Guys,

Thinking about this some more, are we not over complicating things, we could just update tinymce config file so that for p elements only allowed attribute is class. true you would have issues with updates becuase you would end up overwriting tinymce config but in theory that should sort it?

Regards

Ismail

Copy Link
David W. 159 posts 284 karma points c-trib

Jul 01, 2010 @ 12:26

0

The xslt extension thing seems to me to be the best solution (really need to start a umbraco video subscription so I can se the end of Nielses video;).

Isamail: Thanks for your help but modifying the RTE is not a solution to me because I need the style-tags for the web presentation, the stripped version is only for the RSS.

Thanks to all.

Copy Link
Lee Kelleher 4020 posts 15802 karma points MVP 13x admin c-trib

Jul 01, 2010 @ 12:32

0

@Ismail: I've had a quick test of trying to parse the 'bodyText' string as XML, but kept hitting various entity-encoding errors. I'm sure there is a way to do it, but I keep getting cross-eyed with the entities. I recall an old forum post about trying to achieve the same thing, and whoever it was ended up using an XSLT extension to convert the content/string to an XPathNodeIterator. (If I find the topic, I'll post here).

Cheers, Lee.

Copy Link
is working on a reply...

Please Sign in or register to post replies

Flag this post as spam?

Remove attributes from html-tags in xslt