Wrong encoding issue with svg image on xwiki 10.11.3

Hello,
Since I upgraded to xwiki 10.11.3, page with svg file to display an image have wrong encoding and all svg image with accentuated text are “destroyed”.
SVG file are UTF-8 encoding but xwiki displayed them with iso-8859 on Firefox.
See : Upload and share screenshots and images - print screen online | Snipboard.io
Attachment svg file start with

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg

I used this xwiki code to displayed svg files:

[[image:PIC - Outils.svg||width="666" height="500"]]

If I upload again same svg file, image rendering is fixed with correct “Content-Type
image/svg+xml;charset=UTF-8” … but after a time wrong encoding coming back (I don’t know which cause it is)

On Chromium, svg image are well displayed.
On xwiki 9.11.4 and previous, I hadn’t this issue
thxs.
Pascal B

Is it still OK when re-save the page ? A restart ?

Resave the page doesn’t fix it.

I will restart tonight (try to fix CPU usage)

It’s not what I meant.

You said that re-upload fix it, is it still OK if you resave the page after the upload ?

It is still ok only for svg file reuploaded (I have 2 svg image on page).
I restarted tomcat, still ok.
I resave the page: still ok for image reuploaded

On production server. I reuploaded all svg image but after a while. This issue come back again

Could you access the xwikiattachment table and check the value of the column XWA_MIMETYPE for the broken svg images.

Ok, on my testing xwiki, I have a page with 3 image svg. One svg is fexed and all svg file have multiple versions:

  • PIC - Outils.svg that I fixed when upload file again OK
    • one line only in table xwikiattachment with “image/svg+xml” and xdd_charset: utf-8 (last version uploaded by admin account)
  • PIC - Niveaux d’intégration.svg KO
    • 3 lines in table:
      • 2017-06-21 empty xma_mimetype and empty xdd_charset
      • 2017-07-06 WITH image/svg+xml and empty xdd_charset
      • 2017-07-12 empty xma_mimetype and empty xdd_charset
  • PIC - Activités TMA.svg KO
    • 3 lines in table:
      • 2017-06-21 empty xma_mimetype and empty xdd_charset
      • 2017-07-06 WITH image/svg+xml and empty xdd_charset
      • 2017-07-12 empty xma_mimetype and empty xdd_charset

For the exact same attachment in the same document ? That should not be possible since the id is unique and generated based on the attachment reference.

So that means the encoding is properly stored but then lost from the database for some weird reason. Have the broken attachments you listed been “fixed” before ?

yes right, apparently not same doc because xwa_doc_id is different, then for same xwa_doc_id: 8566739748077385869

  • PIC - Outils.svg that I fixed when upload file again OK
    • one line only in table xwikiattachment with “image/svg+xml” and xdd_charset: utf-8
  • PIC - Niveaux d’intégration.svg KO
    • empty xma_mimetype and empty xdd_charset
  • PIC - Activités TMA.svg KO
    • empty xma_mimetype and empty xdd_charset

No.
On production server, I uploaded again all svg file and fixed it … but after a while (and server restarted and other cleaning tomcat cache and other things I can’ remember) my user told me the issue coming back again… and it was true :frowning:

I mean having an empty charset is expected for pre upgrade attachment for example and that means we need to find a trick to improve the behavior of download action when there is no charset in the context of your application server (we don’t set the iso-8859 charset but maybe the application server does it automatically when empty).

Not sure I understand. So can you take a look at the table entry for an attachment that was fixed for sure and on which the issue is back ?

Ok on production server, then in database, all svg file are broken:

  • PIC - Outils.svg
    • one line only in table xwikiattachment with “image/svg+xml” and empty xdd_charset
  • PIC - Niveaux d’intégration.svg
    • empty xma_mimetype and empty xdd_charset
  • PIC - Activités TMA.svg
    • empty xma_mimetype and empty xdd_charset

All entries have xwa_date in … 2017, maybe I didn’t upload these files again, or if it is exactly the same file the file is not stored?
Then I suppose that reupload again all svg file fix the issue or better, edit entries in database: xdd_charset for svg attachment.

I wonder where ISO-8859 come from, because my server is on debian/tomcat…

I was able to reproduce the same thing using standard Tomcat 8.0.47. Looks like this is just the default charset in Tomcat (it’s not getting it from the system since mine is utf8). Trying to find a way
to avoid letting Tomcat try to be clever without getting the previous warning.

I guess it’s possible to change that default charset somewhere in Tomcat configuration in the meantime and set it to utf8.

So the bad news is that it seems to be impossible to force an empty charset in a recent Tomcat (I tried with 8.5.38). Even when using the low level setHeader() Tomcat catch it and make sure it contains valid stuff (and empty charset is not valid from its point of view). Old version of Tomcat used to accept that. Of course I’m only talking about default configuration here, I’m sure the default encoding set by Tomcat is configurable but it’s a pity. Fortunately we now store the charset during upload but not much we can do for old attachments.

No problem in Jetty.

ok thxs.
I read https://wiki.apache.org/tomcat/FAQ/CharacterEncoding but didn’t manage to fix my issue (server.xml was fine and web.xml editing without changes).
Anyways I edited some entries of xwikiattachment (add image/svg+xml and UTF-8), restart my tomcat 8.5.11 to fix my issue.
I think I will launch an sql query to fix my issue…

UPDATE xwiki.xwikiattachment
   SET xwa_mimetype='image/svg+xml', xdd_charset='UTF-8'
 WHERE xwa_filename like '%.svg';

This confirm what Tomcat does with

If a character encoding is not specified, the Servlet specification requires that an encoding of ISO-8859-1 is used.

but indeed I cannot find anything in that page related to how to change it.

FTR there’s also info about this in Loading...