Hi devs,
Context
I’d like to discuss how we could improve our PDF export so that exporting content that uses javascript that modifies the DOM will work. For example, consider the {{chartjs}}
macro which displays some graphs using javascript.
Right now we render the page on the server side using the XHTML renderer, convert this XHTML to XSLFO and then ask FOP to convert it to PDF (not mentioned various substeps). This means that we don’t execute javascript and thus we don’t get the modified HTML DOM in the export, which is a big limitation.
Proposal 1: Headless Chrome
I had an initial discussion with Marius and Thomas and this is the idea we came up with so far:
- Run Chrome in headless mode on the server side. In order to make this portable across OSes, use a Chrome docker image for that.
- Control this Chrome instance from XWiki by using some java API to do so.
- More specifically when exporting a page, ask Chrome to load the page URL and then to export it to PDF
There are some challenges though, some are listed in a previous discussion we had on this topic back in 2017.
Let’s list what I have in mind (please add more if you can think of more):
- The chrome loading of the XWiki page must be authenticated with the same use so that the rendering produces the same result as the page being viewed.
- One solution is to copy the cookies. See this nice article where they listed a lot of challenges and how they solved them.
- Handling of multipage exports. I can think of 2 solutions:
- Prepare a transient XWiki document representing the full content to export (using
{{display}}
macros for all pages) and ask Chrome to export that page.- Problems: This will create a page in the wiki which will need to be deleted afterwards, not nice. It could also lead to a very heavy page (imagine wanting to export a full wiki to PDF).
- Possible solution: create a service that does the concatenation
- Loop over all pages and for each one, ask Chrome to generate a PDF for it. Then merge the PDFs using a java library such as PDFBox.
- Prepare a transient XWiki document representing the full content to export (using
- Making sure the JS has finished executing before Chrome converts the page to PDF.
- They solved this in this article again, where they mentioned having to modify their code so that their JS can tell when the content is ready. Open question: that’s fine for our code but how do we handle 3rd party code? How do we know it’s finished loading? Do we have such cases?
- Table of content and page numbering (also in the context of multipage exports)
- The TOC could be solved by generating it server-side and then inserting it in the PDF using PDFBox or some other java library.
- Same for the page numbering I guess. Even if there are options for controlling page numbering using the CDP protocol for Chrome, we would still need some manual updates to handle multipage exports if we use the PDF merging approach.
- Lots of other small details that they solved in this article
Any other issues you see that are not already listed?
Conclusion
While this approach is far from simple and brings lots of questions, it has the nice advantage of relying on Chrome which is a well-maintained piece of software, fixing bugs and improving at each release, regularly. This seems a much better approach than FOP which is moving a lot more slower IMO. So I believe this approach has a higher change of generating higher quality PDFs (if we can solve all the points mentioned above).
If we implement proposal 1 and succeed in solving all listed items, would you be ok to bundle this in XS (maybe turned on only when there’s Docker installed on the server, and ideally making the standard FOP-based code into an extension so that admins could decide which one to install/uninstall to control how many export to PDF buttons users will see in the export UI)? Even if not bundled by default in XS, would you be ok to make it an extension part of platform and thus officially supported by the XWiki dev team? On my side, I think it would be a good thing since it would solve an important issue we currently have (JS not being executed).
WDYT? Do you see an alternative that would work better?
Thanks
-Vincent