As already hinted in the roadmap, I’m currently working on switching the WYSIWYG editing process to HTML5 which is currently still using XHTML 1.0. The main motivation is support for the
<figcaption>-tags to have native support in CKEditor for figure captions but there are also other reasons:
- The Flamingo Skin already uses HTML5, so the page content is by default already rendered with HTML5 so we currently have a discrepancy between view and edit mode.
- In lots of places, we already use HTML5
data-*-attributes that are not valid in XHTML 1.0.
There are a couple of places where we still use XHTML 1.0, in particular for example when UI extensions are rendered, frequently still XHTML 1.0 is used as shown in the tutorial and the HTML macro also currently cleans the HTML to be valid XHTML 1.0, see XRENDERING-509.
My proposal is to basically deprecate XHTML 1.0 and completely switch to HTML5. This will be a gradual switch and, at least for now, there will only be few differences between XHTML 1.0 and HTML5 and in particular we will still try producing HTML5 that is valid XML. The main difference for XWiki rendering is that there is no
<tt>-tag anymore for monospace and verbatim content and figures are rendered using the
In particular, this involves the following:
- Add an option to HTMLCleaner to clean using HTML5. My proposal is to leave the default at HTML 4 in order to not to break existing usages.
- Introduce an HTML5 parser. For now, this parser will mainly handle the parts of HTML5 that are compatible with XWiki syntax (e.g., no links wrapping flow content).
- Change the WYSIWYG script service and the HTML converter it uses to use HTML5. This will be a breaking change in the sense that all existing methods will expect/produce HTML5 and for proper CKEditor support a new version of CKEditor will be needed. As we pass the HTML through HTMLCleaner, the script service will be very forgiving though if, e.g., the input still contains the
<tt>-tag. I propose to add a property to the script service to ask for the HTML version such that for example CKEditor can detect if it is running on the new version or not.
- Change CKEditor to support HTML5. This can be backwards-compatible or we could also release a version 2.0 of CKEditor that is no longer compatible with XWiki versions before that change.
Are there any objections, in particular to the changes of WYSIWYG script service which are not backwards-compatible? Is there any code besides CKEditor using this script service in a way that it would be affected by the change to HTML5?
There is another topic with respect to cleaning HTML5: In HTML5 a lot of things that were invalid in XHTML 1.0 are valid. HTML5 distinguishes between flow content and phrasing content, where the latter is a subset of the former. In paragraphs, only phrasing content can be used but as phrasing content including plain text is also flow content, there is no need to wrap plain text in a paragraph -
<body>Hello World!</body> is perfectly valid. Further, we have now even more tags that are basically like
<div>: Elements like
<figcaption> allow flow content as children, i.e., can contain lists, paragraphs etc. but also simply plain text. My suggestion is the following:
- In HTMLCleaner, wrap all phrasing content that is directly below the
<body>-tag in a paragraph similar to the existing BodyFilter even though this is not necessary. In this context, treat elements like
<del>that are only phrasing content when they contain phrasing content as phrasing content.  I’ve just noticed that we already allow
<del>directly below the
<body>-tag so this probably shouldn’t be changed.
- In WikiModel, treat tags like
<figcaption>as “document” (similar to
<div>), i.e., like an embedded document where inline content will always be wrapped in a paragraph and all content is allowed.
In particular, this means that
<figure><img ... /><figcaption>Caption</figcaption></figure> will be parsed as
<figure><p><img ... /></p><figcaption><p>Caption</p></figcaption></figure>. Note that the former HTML code is what CKEditor’s native caption support would produce and (probably) also needs as input. The new figure captions support would thus render the first version without paragraphs but accept the latter version as input.