Rendering questions about lists containing groups

slauriere · April 14, 2022, 6:58am

Hi everyone,

I have a difficulty to get my head around this: when rendering the text below with syntax xwiki/2.1, Item 3 and Item 4 are interpreted as being part of a sublist under Item 2, while I would have expected they’re considered as siblings of Item 1 and Item 2 in the same list, since they’re declared on a new line. I guess there’s a good reason for the former interpretation over the latter, but any explanation would be welcome:

* Item 1
* Item 2
(((
* Item 3
)))
(((
* Item 4
)))

Also, when parsing then rendering the same text using syntax xwiki/2.1, I get the following output:

* Item 1
* (((
Item 2(((
* Item 3
)))

(((
* Item 4
)))
)))

I have the following questions:

Why is there a new line inserted between Item 3 and Item 4?
Why is a group created for Item 2?
More generally: is a parsing operation followed by a rendering one (using a common syntax for both operations and on a content that is syntaxically correct) supposed to produce an output that is equal to the input or is it not part of the contract? I guess it’s not but it would be helpful to be sure and if we know the exact range of changes that are considered legitimate.

The related issue I’m facing is a case where the group creation on a list item produces a layout bug, when using the html macro with wiki="true" on the following content structure:

<section>
* Item 1
* Item 2
</section>
<p>
(((
* Item 3
)))
</p>
<p>
(((
* Item 4
)))
</p>

This input is transformed as follows when parsed then rendered in xwiki/2.1, which then becomes invalid HTML (due to the section tag not being closed at a valid position):

<section>

* Item 1
* (((
Item 2
</section>
<p>(((
* Item 3
)))</p>
<p>

(((
* Item 4
)))</p>
)))

Any insight welcome!

Thanks & Cheers

Stéphane

lucaa · April 14, 2022, 7:37am

The key here is in the details of how groups work ( ((( ) . As far as I understand it, the group is a block element but it also a way to group some content together in order for it to be handled as part of another element. As a block element, it behaves like any other block element, in that that it requires to be separated with an empty line from other content.
For example, when you write:

* item
(((
content
)))

the group is not separated with an empty line from the previous element (list item) and thus it becomes part of the list item, grouping all its content inside the list item. Which is why anything that is in the group will be handled in its own context and rendered as the content of the list item. Top level list of this group becomes an ul in the parent li.

When you write:

* item

(((
content
)))

the empty line between the item and the group separates the previous element (list / list item) from the next block (group) and thus the group groups things as a sibling of the list .

Because for groups (since they don’t actually have an inline mode, as far as I know, they only have this “special” inline mode for grouping things in table cells or lists), the following 2 syntaxes are equivalent:

(((group 1)))(((group 2)))

(((group 1)))

(((group 2)))

, they both represent 2 block elements one after the other. Contrary to elements that have both an ‘inline’ and a ‘block’ mode depending on their context (such as html, velocity, warning, error macros), the first syntax above does not involve the creation of an inline context (paragraph) in which both elements are put.

That I cannot really explain, somebody else needs to take over here .

I think the answer to that is “no” but the 2 text representations represent the same XDOM. Otherwise said, if you’d parse again the result of the rendering, you’d endup with the same XDOM as parsed from the initial source. There is a certain ‘normalization’ of the representation of the XDOM that takes place.
I dunno if this is true for all syntaxes, but it’s definitely the case for XWiki 2.1.

Yeah, this one is tricky here, because from a HTML pov, the wiki syntax is ignored when checked for corectitude and from wiki syntax pov the html syntax is ignored and so depending on which one you parse or verify, you may have different results.

However, what’s invalid in there? After the rendering of the wiki syntax the result is invalid? Or before? Before rendering of wiki syntax there’s nothing invalid in there. After rendering of the wiki syntax I’m curious what’s invalid since the initial syntax should be invalid as well… More specifically, I’m really curious what’s the HTML that gets rendered by the original syntax in your case when rendering wiki syntax, because, since it’s not aware of html syntax, it would render the closing </section> tag as text content and it would render it as content of the Item 2 element, which would not be valid either.
Now, all this may be hidden by some cleanup operation of the html macro or by some html 5 magic in the browser, but that’s still sloppy content that gets rendered to something reasonable only by pure luck, I would say.

Hope this helps!

vmassol · April 14, 2022, 7:44am

A new line is not enough since a paragraph can be on several lines for example.

Line 2 is: * Item 2<NL>(((<NL>* Item 3<NL>)))

In xdom+xml/current syntax you get <list><listItem><word>Item</word><space></space><word>2</word><group><list><listItem><word>Item</word><space></space><word>3</word></listItem></list></group></listItem></list>

Said differently, when you’re inside a list the only character to start a new list item is * or 1, see the syntax guide.

slauriere · April 14, 2022, 8:43am

lucaa:

However, what’s invalid in there? After the rendering of the wiki syntax the result is invalid? Or before? Before rendering of wiki syntax there’s nothing invalid in there. After rendering of the wiki syntax I’m curious what’s invalid since the initial syntax should be invalid as well… More specifically, I’m really curious what’s the HTML that gets rendered by the original syntax in your case when rendering wiki syntax, because, since it’s not aware of html syntax, it would render the closing </section> tag as text content and it would render it as content of the Item 2 element, which would not be valid either.
Now, all this may be hidden by some cleanup operation of the html macro or by some html 5 magic in the browser, but that’s still sloppy content that gets rendered to something reasonable only by pure luck, I would say.

Thanks @lucaa yes it helps a lot. I’ll see if I can answer accurately this question (for now I have trouble invoking the HTML macro programmatically from Groovy), meantime I’m adding an important observation: when inserting the transformed text (below) into an html macro with wiki="true" clean="true", the produced HTML generates no layout breakage. However when using clean="false" the global page layout gets broken, even though it’s not clear to me what exactly generates this breakage:

<section>

* Item 1
* (((
Item 2
</section>
<p>(((
* Item 3
)))</p>
<p>

(((
* Item 4
)))</p>
)))

lucaa · April 14, 2022, 10:29am

One thing that @vmassol is touching here without explicitly mentioning it is that you need to imagine the parsing from text to XDOM as working with some sort of ‘scope’ thing, being ‘in’ some element which at some point ends and then you enter another ‘scope’. Getting ‘in’ and ‘out’ of things has some specific rules.

For the example above, this would be: when the parser encounters a line that starts with a * the current element assumed from that is a list and then a list item. So you’ll be in a list item in a list for all the following things in the syntax. In order to exit the list item and enter another list item, the parser needs to encounter a new list item (which would automatically end the previous one), i.e. a line with a * or a 1. In order to exit the list, you need an empty line. However, in the list item itself you can have some other syntax that changes the scope (by adding another imbricated element) and thus a line starting with a * will have an effect in this new scope, not in the scope of the initial list item.

Otherwise, if we imagine the behaviour of the parser line by line, it would be something like this:

* Item 1

list item starting syntax, in the current ‘empty’ scope. Parser starts a list and a list item in this list:
current scope: list > list item 1

* Item 2

list item starting syntax, in the current scope which is a list item. Parser closes the previous list item scope and opens another one, for this newly encountered list item:
current scope: list > list item 2

</section>
<p>

for the wiki parser this is text, so interpreted as text content of the current scope, which doesn’t change
current scope: list > list item 2

(((

group start syntax, in the current scope, so a new group scope is open under the current scope:
new current scope: list > list item 2 > group

* Item 3

list item start syntax, in the current scope which is a group. A list & list item is created in the group:
new current scope: list > list item 2 > group > list > list item 3
At this point, this makes Item 3 a sublist of Item 2, not a sibling.

)))

group closing syntax, closes the currently open group and all scopes below:
new current scope: list > list item 2

etc.

For the elements that don’t have an explicit closing syntax (as macros or groups do), there are probably a few rules to know about which elements start auto-closes the previously open scopes (and which ones) and which ones don’t.
For example, the fact that group start doesn’t close list items but generates content inside and, for example, horizontal line start (----) auto-closes both list item and list. Also, that no start syntax auto-closes a group since that’s the purpose of a group, to ‘contain’ all sorts of content in a given scope.

slauriere · April 14, 2022, 11:11am

Thank you @lucaa and @vmassol, the fact that a group declaration does not auto-close a list even on a new line is now completely clear to me.

I understand also better the equivalence between juxtaposed groups and groups separated by a new line.

What remains a question to me and apparently an issue in my scenario is why a group gets generated for Item 2. I understand it’s likely to be due to a normalization of the XDOM, but that’d be great to know:

Why this specific normalization happens
In which other cases the XDOM normalization introduces a new group

lucaa · April 14, 2022, 12:17pm

well, I’ll answer without actually answering:
The group there is necessary if an empty line is generated between the groups for Item 3 and Item 4, otherwise that empty line would break out of the toplevel list (because that’s what an empty line does in a list). Of course, now the question is why the empty line between item 3 and item 4 group is mandatory and whether it could not be omitted. I guess this is about canonical form.

I cannot fully answer that. I guess at least any such case where groups following eachother would be rendered in a canonical form is susceptible of having some new groups generated in order to counter the effect of the empty line, but there also may be other cases, I don’t know them.

slauriere · April 27, 2022, 6:20am

Indeed, the initial input gets rendered as follows, with </section> as text content of Item 2, which is invalid. Once parsed and rendered again, the output is invalid as well (no surprise), it’s just differently invalid, and in my scenario which contained a more complex hierarchy of nodes, it resulted in page breakages that were not visible on the initial rendered input. But the key here as you pointed out is that the initial input is invalid, I had overlooked that, so there is no guarantee on what will outcome from it, thanks for the detailed explanations.

Initial input HTML-rendering

<section>
    <ul>
        <li>Item 1</li>
        <li>Item 2
</section>
<p>
<div>
    <ul>
        <li>Item 3</li>
    </ul>
</div></p>
<p>
<div>
    <ul>
        <li>Item 4</li>
    </ul>
</div>
</p>
</li>
</ul>

Parsed input HTML-rendering

<section>
    <ul>
        <li>Item 1</li>
        <li>
            <div>Item 2
</section>
<p>
<div>
    <ul>
        <li>Item 3</li>
    </ul>
</div></p>
<p>
<div>
    <ul>
        <li>Item 4</li>
    </ul>
</div>
</p>
</div>
</li>
</ul>