Language based URL rewriting

We’re using HAproxy in front of a XWiki 11.10.3 instance.

For SEO reasons, the language should be encoded in the URL, so the same site can be accessed with different links like so:

  • https://domain.eu/bin/view/<WIKI_SPACE>/<LANGUAGE_CODE>/<PAGE_PATH>
  • https://domain.eu/bin/view/Public/de/Applications/iOS/ for the German version (default language)
  • https://domain.eu/bin/view/Public/en/Applications/iOS/ for the English version

As far as we understood, in XWiki the language is set by providing a language parameter, causing XWiki to set the language within a Cookie and issuing a redirect. The URLs for both languages are the same (https://domain.eu/bin/view/Public/Applications/iOS) with the language controlled by the Cookie value.

Due to the redirect, a “simple URL rewrite” (/en?language=en) isn’t possible as it would end in an redirect loop until it eventually stops at XWiki’s default language.

We tried to alter the Cookie values with some success in terms of

  • https://domain.eu/bin/view/Public/de/Applications/iOS resulting in the German page language and
  • https://domain.eu/bin/view/Public/en/Applications/iOS resulting in the English language version

without further redirects. However, the links within the page don’t know about our URL rewriting, always pointing to a XWiki URL without the respective /de or /en part of the path, causing XWiki falling back to its default language.

Ex.:
Page https://domain.eu/bin/view/Public/en/Applications/iOS does contain a link to another child-page called User-Manual which will be rendered as https://domain.eu/bin/view/Public/Applications/iOS/User-Manual/ (without language in path).
In this specific example, clicking on the link from the English version of a page will result in getting the default language page (which is German) served as per our reverse proxy rules (see below).

We might be able to further improve the Cookie handling on reverse proxy side, giving the cookie value priority over the language code in the URL. But even when, it probably will result in a redirect for every single link without a language code.

So, is it even possible to achieve what we want? While we’ve read about custom URL schemes, documentation is rather limited. In addition, we would like to stay near XWiki defaults as far as possible to minimize the risk of failure during upgrades.

PS: This is our HAproxy test configuration:

frontend domain.eu
    bind *:80

    # Prettify URLs
    # Language parameter ?language=de|en has to be part of the path
    # Language DE should be default, thus redirecting every request without language specification or any language other than EN
    # https://domain.eu/bin/view/Main/[?language=de] -> https://domain.eu/bin/view/Main/de
    # https://domain.eu/bin/view/Main/?language=en   -> https://domain.eu/bin/view/Main/en

    # Determine language
    acl is_invalid_language_path path_reg ^\/$|\/bin\/view\/(Main|Public|Customer|OnPremise|Partner)($|\/$|\/(?!de\/|en\/).*)
    acl has_valid_language_parameter urlp_reg(language) -i ^(en|de)$
    http-request set-var(req.lang) str("de") if is_invalid_language_path !has_valid_language_parameter
    http-request set-var(req.lang) query,regsub(\".*language=(\S{2}).*\",\1) if is_invalid_language_path has_valid_language_parameter
    http-request set-var(req.lang) path,regsub(\"^\/bin\/view\/(Main|Public|Customer|OnPremise|Partner)\/(\S{2}).*\",\2) if !is_invalid_language_path

    # Determine wiki name
    acl is_root_path path /
    http-request set-var(req.wiki) str("Main") if is_root_path
    http-request set-var(req.wiki) path,regsub(\"^\/bin\/view\/(Main|Public|Customer|OnPremise|Partner).*\",\1) if !is_root_path

    # Determine requested page (either \3 or \4 is populated (exclusively) if there's an actual page requested, otherwise both will return an empty string)
    http-request set-var(req.page) path,regsub(\"^\/$|\/bin\/view\/(Main|Public|Customer|OnPremise|Partner)($|\/$|\/(?!\S{2}\/)(.*)|\/\S{2}\/(.*))\",\3\4)

    # Remove language from URL parameters
    # This will also impact above ACL `has_valid_language_parameter` in terms of it being always `false` when used beyond this line
    http-request set-query "%[query,regsub(\"(^language=\\S{2}&?|&language=\\S{2})\",,)]"
    acl has_additional_query_parameters query -m len gt 0

    # Finally redirect to "pretty" URL
    http-request redirect code 302 location http://%[hdr(host)]/bin/view/%[var(req.wiki)]/%[var(req.lang)]/%[var(req.page)]?%[query] if is_invalid_language_path has_additional_query_parameters
    http-request redirect code 302 location http://%[hdr(host)]/bin/view/%[var(req.wiki)]/%[var(req.lang)]/%[var(req.page)] if is_invalid_language_path !has_additional_query_parameters

    default_backend xwiki

backend xwiki
    option forwardfor

    # When changing language, XWiki redirects from parameterized URL (/?language=en|de) to parameterless URL, saving language choice as cookie
    # We want to avoid that redirect (which would reset the language to default DE) and set the cookie ourselves before forwarding request to XWiki
    acl has_cookie_language hdr_reg(Cookie) -i ^.*language=\S{2}.*$
    http-request replace-value Cookie "(.*)language=\S{2}(.*)" "\1 language=%[var(req.lang)]\2" if has_cookie_language
    http-request replace-value Cookie "(.*)" "\1; language=%[var(req.lang)]" if !has_cookie_language

    acl is_rewritable_path path_reg -i ^\/bin\/view\/(Main|Public|Customer|OnPremise|Partner).*
    acl has_additional_query_parameters query -m len gt 0
    # http-request set-query language=%[var(req.lang)]&%[query] if has_additional_query_parameters is_rewritable_path
    # http-request set-query language=%[var(req.lang)] if !has_additional_query_parameters is_rewritable_path
    http-request set-path /bin/view/%[var(req.wiki)]/%[var(req.page)] if is_rewritable_path

    server xwiki localhost:8080 check port 8080

Hello,

Just so that it’s mentioned here:

  • one technique that I have seen used in XWiki (and not only) for handling multiple languages, and which would also add the language in the URL is to have a subwiki for each language. This would easily make URLs like https://domain.eu/wiki/de/view/Public/Applications/iOS for the ‘de’ subwiki and the language used on that wiki will be the default language set in the preferences of the subwiki, which can be de.
    However, lots of the multilanguage features of XWiki are lost if you go for this content structuring approach: you won’t have the UI to change between the different translations of the same page easily, also, the UI that allows to easily create a translation for an existing document in a new language won’t be there anymore without customization, etc. So I guess it depends on your needs if this is a real solution for your case or not.

Now, to answer strictly the question that is asked:

  • first of all, from the point of view of XWiki the path has an indefinite number of components and after handling the first ones related to the path of the servlet, the “rest” is parsed into the reference of document to be fetched, with an unlimited number of spaces and an optional page name (…/A/B/C/D will be interpreted as document D in space C in space B in space A, with D being either terminal or non-terminal document, whichever exists).
    So, from my point of view, having the language particle “in the middle” of this reference part is complicating things more than anything, since, in order to avoid bugs, you need a way to handle the generic case and make sure that /de/ never ends up interpreted by XWiki as the space called ‘de’, regardless of the depth of the page hierarchy you may have.
    I would say that URLs like https://domain.eu/bin/view/en/Public/Applications/iOS are probably much easier to handle, with the /en/ particle between the action particle and the document reference parts, and quite equivalent to the one you mentioned from the pov of the need of having the language in the URL.

  • second of all, there is a way to use tuckey urlrewrite with XWiki, see here an example for simplifying the URLs Short URLs (XWiki.org) and this method also alters URLs generated by XWiki, in addition to accepting different incoming URLs. So, in order to alter the URLs generated by XWiki you can try to go this way. This solution only involves modifications in the XWiki application, you don’t need to handle stuff at HAProxy level anymore, I would say.
    I have not done this specific case, but tuckey urlrewrite is a rather powerful tool, and it appears that, at least in the doc, it says that you can put a condition for a rewrite based on a cookie value UrlRewriteFilter - Manual so you should be able to handle the language part.

Hope this helps, let us know if it works for you.

Enjoy XWiki,
Anca

As a joke, anything that is decidable is possible and it becomes a matter of budget :slight_smile: .

tuckey urlrewrite is an extra “layer” that you put “on top” of the wiki and you can always remove this extra layer (by removing the jar and its config in web.xml) and go back to “standard” XWiki URLs which will work just fine. Of course this removal will change the URLs for your wiki, so the extent of “working just fine” depends on how those URLs are used (for example, if you store them in bookmarks of the browser or copy-paste them in XWiki or external tools, they won’t work anymore when you remove the urlrewrite).

While this is probably supposed to work. I’ve not had any success in using it. It worked on the front page, but once you go into a subpage you have to add /bin to make it work again.

So just a word of warning if you’re going down that route. I at least couldn’t make it work properly

I did use it successfully in the past, but also with additional rewrite rules (in addition to the ones for short URLs) so either you had a difficulty with your setup or there is an issue with this on more recent versions of XWiki, for some reason, which would be interesting to know and report as a regression if confirmed.

Hi Anca,
Thank you very much for your help. I’m more of a middleman and have forwarded your infos to my colleagues. Since it will take some time to try it out, I just wanted to give this short response.

We will try out the described options (especially UrlRewriteFilter looks interesting to us) and send our feedback anytime later.

Kind regards