In addition to the translator project, we are developing an extension that gives the possibility to pass a glossary to the translator.
Description
The goal of the glossary for the translator application is to provide a glossary for translation. Currently the glossary application provides the ability to have each entry in multiple languages. So we have a translation for every word in the glossary.
This application will give us the ability to pass that term to the translator and so the translator will use that specific translation when we give them a text to translate. As the translator could be a remote service, we need to synchronise the entry from the glossary with the translation service.
So this application will:
Generate a list of translations for all words in the glossary (with all language pairs)
Synchronise the previously generated entries with the translator.
Note that this application will depend on the translator application (and also on the glossary application). For the synchronisation, this application will only do the generic part that doesn’t depend on the specific translator implementation. The translator-specific implementation will use the API of the XWiki translator application, which will provide this feature.
Synchronisation will mainly be done by a scheduler. There will also be a button on the UI to force synchronisation if needed.
Implementation
This application will be quite simple as most of the work will be done by the glossary and translator application. So this application will mostly just be the glue between these two extensions.
So this application will have these components:
Scheduler
This will periodically synchronise the glossary to ensure that all data on the translator is up to date.
Job macro page
This job macro will synchronise the glossary from a manual call.
A glossary translator information page
This page will display
The list of glossaries in the translator
The glossary content and information for the translator
The language pairs supported by the translator for the glossary function (note that it’s not always the same list that is just for translation, without glossary function).
The list of all glossary entries for a selected language pair.
Script service
This provides a method called ‘synchronizeGlossaries()’ to synchronise glossaries.
It’s not indicated above but after researching it, it seems that you’ve decided to implement the extension mentioned inside the Glossary app. See GLOSSARY-53. This is the key decision and what should have been discussed IMO.
I’ve checked and it seems there’s no dependency on it from the existing Glossary modules. That’s good since it requires some internet connection and we need to keep the Glossary feature working without internet connection.
It’s named application-glossary-machine-translation. Why not simply application-glossary-translator since AFAICS it’s depending on the (not yet released) Translator Application.
Could you confirm that the way it works, is that the application-glossary-machine-translation will iterate over all glossaries, for all languages, and that it’ll ask the Translator api to perform BOTH the translation to a target language and also save the translation as a new wiki page?
This behavior requires synchronization and thus the scheduler idea. Why not have done this automatically by listening to glossary changes?
Thanks
PS for @slauriere regarding the translator github project:
In addition the Documentation & Downloads: Documentation & Download part is pointing to some existing but different Google Translate Macro. That looks wrong since it should point to the doc of the Translator extension.
Well because we thought that the name application-glossary-translator mean translator for the glossary while, this app is not this. Ideally the name would be more simething like application-transalator-glossary which mean the glossary for the translator, but it’s not really compatible with the way to name the extension as a module of glossary. So it’s why we chose this name.
But personally I don’t mind to change it, we can change if you have other preference. I think we should discuss about this also with @slauriere.
Well mainly the task to this extension is only synchronising the entry with the
translator.
After that, when we request a translation, the translator application will see if a glossary is available (on translator side) and if yes, it will use it for the translation.
Yes we also thought about adding a listener for this. But it’s a bit more complex to implement. We plan to do this in a future.
Regarding the name: along what we are discussing about the machine translation extension, the current name application-glossary-machine-translation could be a good fit (to be understood as “glossary for machine translation”). Another more accurate name would be machine-translation-glossary since it is the dedicated term used in this field as far as I can see:
However with the app parent prefix it would become application-glossary-machine-translation-glossary which sounds a bit heavy. Unless using machine-translation-glossary within the Glossary app can be considered but I doubt it. No real preference between keeping it in the same repository or creating a new one. Note that in the future we could consider creating a distinct app machine-translation-glossary which would manage machine translation glossaries in a generic way, not necessarily bound to the way the Glossary app implements glossaries.
Taking into account these considerations, Vincent, would you be fine with the current name? What do you think?
Indeed, and in addition, large translation glossaries are not necessarily available immediately for translations. Without the synchronization action, users will expect that a newly entry added to the glossary is available immediately while it’s not necessarily the case. Also, it’s not clear if machine translation services support incremental glossary updates and how much they charge when recreating a glossary. Alleviating the need to synchronize would definitely be a great improvement in terms of UX though.
Out of curiosity: have you considered allowing to import an existing translation glossary as part of this extension? For example, in the Weblate instance used to translate XWiki it’s possible to download the actual glossary in TermBase eXchange format (see: https://en.wikipedia.org/wiki/TermBase_eXchange) which is a fairly simple XML. I’m thinking that it could be probably useful to have such kind of feature to allow better integration with other tools.