Add the concept of variants to Syntax

vmassol · November 11, 2020, 6:39pm

Hi devs,

Right now a Syntax format is <id>/<version>. And you can call Syntax#getType()#getId() and Syntax#getVersion().

Note that we also have a qualifier but it’s used only for when displaying the syntax into string and we introduced it to mark a syntax as experimental.

What we are missing is the concept of syntax variants. This can be useful for example if you wish to list all syntaxes that exist for a base syntax (e.g. all Markdown syntaxes or all XWiki syntaxes).

For example:

Markdown:
- Base syntax id = markdown
- Variants = github (github-flavored), commonmark, etc

So far, we’ve used the notation + in the id part to separate the base and variant part. For example: confluence+xhtml, xdom+xml, etc.

The only problem is that this is not semantic/typed. It’s just a convention. We don’t have a getVariant() API.

This proposal is about adding such as a variant or variants.

Proposal:

Allow zero, one or several variants
Keep the same syntax format as now, with the “+” to separate the variants
Introduce a new constructor in Syntax to be a List<String> variants.
Introduce a List<String> Syntax#getVariants() method.
Modify the toString() to display the variants + change equals/hashcode ofc
Modify the well-known syntax types to use the constructor with the variant

WDYT?

surli · November 18, 2020, 1:45pm

Not sure what you mean by several variants? Either you’re talking about a specific Syntax (such as Markdown) which is allowed to have several variants (but not all at the same time). Or you’re talking about a Syntax instance which can have 0 or 1 variant (e.g. XWiki 2.1 with no variant and Markdown Github with github variant only).

+1

vmassol · November 18, 2020, 1:54pm

Examples

0 variant: xwiki/2.1
1 variant: confluence+xhtml/1.0
2 variants: markdown+commonmark+0.28/1.0 (in this case 0.28 is the commonmark spec version, 1.0 is the implementation version of that syntax in XWiki).
N variants: someid+variant1+...+variantN/1.0

surli · November 18, 2020, 1:58pm

hmm, shouldn’t we consider this as a unique variant of the same syntax then?
You’d have:

Markdown+github/1.0
Markdown+github-commonmark-0.xx/1.0
Markdown+github-commonmark-0.28/1.0

All those would be syntaxes with a unique variant, wouldn’t that work for you? I’m not sure to see the need for having several variants, sounds like it would be pretty rare to reuse a variant in several syntaxes.

vmassol · November 18, 2020, 2:08pm

No, it wouldn’t work. I really don’t see why there would be only a single variant per syntax. This would negate the rationale for materializing variants in the first place (since you’d need to implement custom parsing to extract the various variant parts yourself, as we have now - and since it wouldn’t be standardized it would be impossible to do!)

vmassol · November 18, 2020, 2:11pm

PS: I fixed the example which was not correct.

surli · November 18, 2020, 2:27pm

Note that if I follow what you said above it would mean that 0.28 would be parsed as a variant and completely decorrelated from commonmark variant, so I find this example a bit odd.

For me the idea would be to define a syntax as the combination of:

a syntax type
a variant
a version

Semantically I find that weird to have in the same syntax different variants. I now understand the need you have for parsing different components of a variant, but then I would distinguish: the global unique variant (maybe made of various components) and the different variant components.

vmassol · November 18, 2020, 2:38pm

A variant is not something that exists outside of the syntax. It’s a variant of the syntax and the list matters. If you take github or commonmark outside of any context it doesn’t mean anything. What makes sense is [commonmark, 0.28] in this example.

I think you’re overdoing it and you’re trying to put too much meaning to variants. They’re just variations to the syntax and the idea of supporting more than 1 is to allow for expansion. I really don’t see the problem of returning List<String> instead of String and considering that + is a variant delimiter. If you don’t need more than 1 variant the don’t use it. If your variant is named commonmark-0.28 then it’s also fine. But the idea is to support several variants here because I’m pretty sure that we need more than one. I already proved it with the example markdown+commonmark+0.28. Note that there can also be markdown+commonmark+0.27.

vmassol · November 18, 2020, 2:41pm

I really don’t see why. Why a syntax would only have a single variant? Maybe you don’t like the name Variant? We can pick something else although I like it. It could be “specifier” too if you prefer that.

Again I’m putting examples of syntaxes with different variants, differing only on the last variant:

markdown+commonmark+0.27
markdown+commonmark+0.28

surli · November 18, 2020, 2:44pm

Ok let’s cut it there, it’s not a big deal. I agree in general with the idea and indeed it’s not mandatory to use a list of variants if I don’t want to

vmassol · November 18, 2020, 2:44pm

What is a global unique variant?
What is a variant component?

vmassol · November 18, 2020, 2:46pm

Yeah, it’s just a way to separate the parts after the + character.

vmassol · November 18, 2020, 2:47pm

Thanks @surli for the feedback. I’d also like to have @tmortagne’s opinion on this proposal too, especially as he’s the one who started the concept of using + in syntax types.

tmortagne · November 19, 2020, 4:08pm

I’m fine with the concept, that’s what I did for FilterStreamType (but with only two parts and not an infinite list).

vmassol · November 21, 2020, 8:52pm

We need to decide where we put the variants. In my local proof of concept code I’ve started, I’ve put it in the Syntax and not in the SyntaxType and considered the SyntaxType to represent the base syntax:

/**
 * Represents a wiki syntax that the user can use to enter wiki content. A syntax is made of four parts:
 * <ul>
 *   <li>a base syntax type (e.g. {@ode xwiki}, {@code confluence}, {@code mediawiki}, etc).</li>
 *   <li>zero or more variants, which represent Syntax variations. For example the {@code markdown} syntax has
 *       the {@code commonmark} variant and the {@code github} variant.</li>
 *   <li>a version ({@code 1.0}, {@code 2.0}, etc.</li>
 *   <li>an optional qualifier which is a free form string adding some additional information about the Syntax when
 *       serialized as a String). Can be used for example to mark a Syntax as experimental.</li>
 * </ul>
 * The syntax id string format is: <code>&lt;type>[+&lt;variant>]*&#47;&lt;version></code>.
 * Examples:
 * <ul>
 *   <li>{@code xwiki/2.1}</li>
 *   <li>{@code markdown+commonmark/1.2}</li>
 *   <li>{@code sometype+variant1+...+variantN/1.0}</li>
 * </ul>
 *
 * @version $Id: 89454c157252098191b80c6bcc94a07c0d6d2c2d $
 * @since 2.0RC1
 */

Now this will have a consequence on the following for ex:

    /**
     * Confluence wiki syntax.
     */
    public static final SyntaxType CONFLUENCE = register("confluence", "Confluence");

    /**
     * Confluence XHTML based syntax.
     *
     * @since 5.3M1
     */
    public static final SyntaxType CONFLUENCEXHTML = register("confluence+xhtml", CONFLUENCE.getName());

Thus this would break backward compatibility I think, even though I find it more logical to have it in the Syntax than in the SyntaxType.

So I’m going to refactor my code and go in the direction of moving the variants concept to SyntaxType instead.

Let me know if you see something I haven’t seen

Thx

vmassol · July 12, 2022, 3:39pm

Done in Loading...