Import almost works but rejects some files - List of erroneous pages

zara · July 13, 2020, 4:08pm

I am trying to import many pages from another TWiki by creating an XAR file full of XWiki syntax files.
I have respected the XML format and I can import over 200 successfully, but around 40 get rejected with the message

List of erroneous pages

I use pandoc to convert the TWiki files to firstly convert the file content to html (as TWiki pages also support html) and then another pandoc to convert the html to XWiki. The XML pages that are created look ok , they have the correct ‘reference’ , ‘name’, ‘date’, ‘title’ etc.

Some of the rejected files contain some code but this code is surrounded by the XWIki {{code}} tags (inserted by pandoc). Some rejected files have references to image attachments that are not yet there (is this important?). The top WebHome page is rejected too although it is very small and it only contains a small list of URLs

What should I look out for when cleaning up the files before import?

Thanks for any tips on this

zara · July 17, 2020, 3:09pm

Update:
My problem is with the encoding. Some of the documents have URLs with & characters and these documents were rejected by the import. By changing the & to & amp; I was able to import almost everything. The command

file -i myimportfile.xml

shows charset=us-ascii

Should this be acceptable by the XWIki importer?

vmassol · July 30, 2020, 3:34pm

Note that XWiki suports TWiki syntax and can do the conversion. However, I wouldn’t bet on its quality

OTOH conversion from HTML to XWiki Syntax is very good and you could use that.

vmassol · July 30, 2020, 3:35pm

What do you mean by the XWiki importer?

zara · August 3, 2020, 7:12am

I meant the Admin feature to import from an XAR file. Should the import nornally succeed even if
a file to import has charset type us-ascii and contains the & character.
The example case is the & as used for the paramters in a URL.

vmassol · August 3, 2020, 7:18am

So you are creating the XAR file by hand? If so make sure you follow https://extensions.xwiki.org/xwiki/bin/view/Extension/XAR%20Module%20Specifications

Provided that the XML files are valid XML files, it should work fine. But & is not valid in an XML file so your question is a bit strange (you need &)…

zara · August 3, 2020, 7:52am

Yes I am creating the XAR file by hand and respecting the XAR syntax and specs.

It concerns the <content>…</content> tag of the XML file. If the imported text has the & in any URL then the import fails. If the content between the tags is just text then the & should be valid. I can change these occurences to & but I just wanted to clarify if this is expected behaviour