Machine Translation Extension

Hi everyone,

Just a question out of curiosity - I didn’t find anything related using Google:

Is there any Xwiki extension available or in development which would allow to automatically translate Wiki page contents - preferrably while retaining formatting - into another language?

It might e.g. provide a way to create rough drafts of new Xwiki page translations, or just to help convert a single-language wiki page from one language to another, or maybe also just parts of a page.

There are different translation services out there which provide and API which could be used to implement this, and there’s LibreTranslate and its API which can be self-hosted for everyone needing to translate sensitive information.

Also possible the Xwiki LLM Application might be a starting point, using LLMs directly, even though it seems to me that it more focuses on content generation rather than transformation and it would make re-using the translation feature by different self-hosted services more difficult, I think. (e.g. LibreTranslate can also be used by Nextcloud, and maybe in the future by Collabora Online office - however on the other hand, https://localai.io/ might be way to leverage LLM sharing in this direction).

Does anyone know if there’s anything in the works?

1 Like

Hello @GOhrner,

You might be interested in the DeepL Application.

There is also a similar feature request in the issue tracker at Loading... for the LLM Application. [Edit] Just saw that this is actually by you :slight_smile:. As I already replied there, I think that for a simple initial translation, using an LLM might not be the best idea from a performance point of view and possibly also quality point of view if you take smaller self-hosted models.

However, I think LLMs are very interesting if we’re talking about features beyond the initial translation of content. For example, I think you could provide an LLM a glossary with translations for special terms. Also, a major challenge with translations is to keep them up-to-date. By providing an LLM enough context about the change (e.g., original page, translated page, diff of the update) maybe an LLM could produce an update to the translated page that leaves unchanged parts unchanged and takes the way the rest of the page is translated into account for translating the changed part.

So from my point of view, this is a very interesting feature and I’ve heard that request several times already. We’re currently heavily working on the LLM extension for another project (focused on search) but once this has been finished I would be very interested in finding time (which probably includes finding sponsorship of some sort) for such an advanced translation feature. Independent of that, developing a translation feature based, e.g., on LibreTranslate could also be very interesting. Again, it’s mostly a question about finding time to develop this which is both a question if somebody has time to actually implement it which would be easier if somebody would pay for it.

If you (whoever is reading this) would be interested in sponsoring such a feature (details to be discussed), feel free to contact the companies providing professional support for XWiki (that is, XWiki SAS at the time of writing). Of course, actual code contributions for such a feature are also very welcome.

1 Like

@MichaelHamann: Whoops, yes, that was from me, and you already even pointed the DeepL extension out to me back then… I somehow totally forgot about it… :-/

Thanks for reminding me.

I’ll definitely have a look at the DeepL translation - already had a sneak peek at the code, but am pretty much time limited at the moment.

Actually yesterday I hacked together (or rather, was curious about ChatGPT programming skills and pushed it to prototype it… ) a tiny Python adapter script in order to run the LibreOffice DeepL feature against a LibreTranslate server. This actually worked quite well already.

If the DeepL extension looks promising, I might try to make the Python adapter compatible with it. (Even though the Xwiki DeepL extension looks so simple, and the LibreTranslate API so similar to DeepL’s, that it probably might be easier just to fork it into a separate LibreTranslate extension. Or maybe make it sufficiently configurable to work with both APIs, which would require more effort, though. And playing with ChatGPT and the adapter script was quite fun… :wink: )