Both $escapetool.xml() and $escapetool.html() serve the same purpose in the context of XWiki, since both types of escaping are handled properly by the browsers we support.
However, XML escaping is currently way more adopted than HTML escaping (I’m counting 7624 usages against 94 in xwiki-platform). Also, the current HTML escaping implementation is mostly the default provided by Velocity while XML escaping’s is ours, and has been improved multiple times so far.
Instead of maintaining two implementations that can be used interchangeably, I suggest the following:
Deprecate the use of $escapetool.html() in favor of $escapetool.xml().
Replace the very few uses $escapetool.html() in our code base.
Thanks Pierre. I’m globally ok but there are things to check IMO, see below.
I think you need to elaborate on the differences between the 2 implementations and explain what you propose to do for the differences.
Beyond that we have some other things to dicuss:
Should the deprecated $escapetool.html() be modified to fix some escaping issues (or is it more complete than $escapetool.xml())?
We may want to legacify, which is the only way to not reintroduce the usage of $escapetool.html()
Con: using $escapetool.html() when you have HTML is more natural than using ``$escapetool.xml()` since HTML != XML (except for HTML5 I think). I don’t think it’d a big deal but thought it worth mentioning. Note: I wouldn’t agree on keeping both as synonyms.
We cannot really do that since it’s initially a Velocity API, not ours. Deprecated warnings are quite visible too when you are developing an application.
By default, $escapetool.html() escapes basic 4 characters (", &, <, >) and a set of ISO-8859-1 characters. I believe (but might be wrong) that XWiki extensions should only use UTF-8 encoding, meaning that in our context we only see the 4 basic characters actually escaped. The most notable difference between the two is thus the output, since HTML escaping returns HTML entity codes while XML escaping returns numerical XML entities. $escapetool.html() was recently extended to also escape { and }, but $escapetool.xml() already had support for curly braces for nearly a year.
So in practice, in the context of XWiki, the XML escape tool actually supersedes the HTML escape tool (it also escapes single quote symbols).
The suggestion for deprecation mostly comes from the fact that one implementation (the most used) was more maintained than the other one (which was still Velocity’s default implementation until recently).
While I do agree, this does not reflect the current code base where xml() is already preferred over html(), even for HTML content.
I also agree on this. At the very least, since they do not return the same kinds of entities, they should never be synonyms.
I’d say that’s because it’s mostly used by the XWiki core devs and we know about it. But if you ask someone who’s not an XWiki developer (like someone writing scripts in a wiki page), and ask him what he’d use to escape some HTML between .xml() and .html(), I’m pretty sure he/she’d go for the latter.
My point was more about having inconsistencies in usages if you have 2 APIs.
That’s actually a problem because if the .html() api is provided by Velocity we cannot get rid of it. It’s documented on the web, when you read Velocity user guides, etc. So that would tend to show that we should keep this one instead of the .xml() one. The other solution (a bit less nice I think), is the one I proposed, in answer to Thomas above:
What’s sure for me is that we cannot just keep an API deprecated. It’s not enough. We must have a way to ensure that it cannot be used in our code and fail the build if used. If only to ensure consistency.
I think the fact that we have $escapetool.xml comes from that fact that until recently, we used XHTML in many places and thus our HTML was supposed to be XML. Now XHTML is basically dead (in general, not just in XWiki) and apart from parsing HTML using an XML parser there is no real need to be XML-compliant in XWiki. So we could decide to migrate to $escapetool.html. The problem here is the huge number of calls that would need to be migrated both in XWiki itself and in extensions, in particular as there is no tooling for such refactoring tasks in Velocity. Another aspect we should consider is that $escapetool.html is designed for HTML 4 and escapes a lot of characters we normally wouldn’t want to escape. Therefore, we should probably change the implementation of $escapetool.html to be the same as the current $escapetool.xml - which again would be an API breakage. For backwards-compatibility, I therefore think it would better to stay on $escapetool.xml even though I agree that $escapetool.html feels more natural.
Again my main comment is not so much about html() vs xml() but about ensuring that html() cannot be used inside XS development. It’s not enough to deprecate it.
We have 2 options (that I can think of) if we want to keep xml() and deprecate html():
Option 1: Unusable API
In org.xwiki.velocity.tools.EscapeTool, override html() and throw an exception in it with a message like “…Use xml() instead”
Deprecate html()
In a legacy Aspect, add back the default implementation (with a deprecated annotation)
Option 2: Remove usage of org.apache.velocity.tools.generic.EscapeTool
In org.xwiki.velocity.tools.EscapeTool stop extending org.apache.velocity.tools.generic.EscapeTool but reimplement all the methods except the html() one, using org.apache.velocity.tools.generic.EscapeTool internally.
In a legacy Aspect, add the html() implementation, with a deprecated annotation
Option 2 is cleaner from an API POV (it also provides a greater control over future APIs that we could insert unknowingly (when upgrading Velocity tools). It means more code over option 1 but I don’t think it’s a big deal.
Note that revapi will still bark but since it’s used in a velocity context, I don’t think there would be any real breakage.