Pandox docx to xwiki has several issues

Trying to convert docx to xwiki using pandoc.

Workaround using commonmark:

I tested several direct and indirect conversions between syntaxes using pandoc, and the one that works best is from docx to commonmark, then pasting in xwiki editor using commonmark syntax, then switching syntax to xwiki 2.1.
The correct input and output is:


Bullet list

  • Asdf asdf
  • Asdf asdf
  • Asd fasdf asd
  • F asd fasdf

Multi-level bullet list

  • Asd fasdf asdf
  • Asdf
    • Asd
    • F a
    • Sdf
      • Asdf
      • Asdfasdfasd fasdf asdf
      • Asdf

Numbered list

  1. Asd fa
  2. Sdf
  3. Asd
  4. F as
  5. Df

Multi-level numberered list

  1. Asd fasdf
  2. Asdf
  3. Asd
  4. F asd
  5. Fas df
  6. asdfasdf
    1. Asdf
    2. Asd
    3. asdf

Links

https://google.com

This is also a link


Issues with pandoc docx to xwiki

When I use pandoc to directly convert docx to xwiki, the external links, bullet lists and numbered hierarchical lists are coming out not well in the final display:

Bullet list

  • Asdf asdf

  • Asdf asdf

  • Asd fasdf asd

  • F asd fasdf

Multi-level bullet list

  • Asd fasdf asdf

  • Asdf

    • Asd

    • F a

    • Sdf

      • Asdf

      • Asdfasdfasd fasdf asdf

      • Asdf

Numbered list

  1. Asd fa

  2. Sdf

  3. Asd

  4. F as

  5. Df

Multi-level numberered list

  1. Asd fasdf

  2. Asdf

  3. Asd

    1. F asd

    1. Fas df

    1. asdfasdf

      1. Asdf

      1. Asd

      1. asdf

Links

https://google.com__

This is also a link


pandoc docx to xwiki output.txt (1.2 KB)

True, could (partly) reproduce. The Pandoc xwiki converter seems to have a problem with multi-level lists.

Actually it just requires a very small manual correction: in the xwiki source format generated by Pandoc, remove the empty lines between the levels.

This is what it looks like before the correction:
http://playground.xwiki.org/xwiki/bin/view/Sandbox/pandoc_test_docx2xwiki
(Note that it still looks better than what you posted here - maybe that’s because you just copied and pasted into your post.)

And this after the correction, which seems fine:
http://playground.xwiki.org/xwiki/bin/view/Sandbox/pandoc_test_docx2xwiki2

Not sure though whether this forum is the right place to address this.

A workaround could be to do the cleanup of the empty lines scripted. But this issue isn’t the only one. You can unwillingly apply so many weird formattings in a Word document which you just don’t see and don’t notice (even after displaying the hidden format symbols) but hinder the converter to function correctly. And after downloading from Google Docs, it’s even worse because also GDocs has the same problem of unwanted weird formattings which occur after working a while with the doc. There’s no easy general solution I’m afraid.

1 Like

Why don’t you use the office importer feature of xwiki, it’s the best solution to import docx docs into XWiki? See Office Importer Application (XWiki.org)

@vmassol Not my preferred solution because a Libre Office server instance is required. But yes, I will test that too, thanks.

Just found this topic, one additional remark: Pandoc’s XWiki syntax implementation is not maintained by the XWiki development team and I don’t think the developers of Pandoc monitor this forum. So please report a bug in their issue tracker if you want this to be fixed in Pandoc.