Should we limit the PDF export size by default?

Hi everyone,

The PDF Export Application has a configuration option to limit the size of the PDF export:

# [Since 14.10]
# The maximum content size, in kilobytes (KB), an user is allowed to export to PDF; in order to compute the content size
# we sum the size of the HTML rendering for each of the XWiki documents included in the export; the size of external
# resources, such as images, style sheets, JavaScript code is not taken into account; 0 means no limit;
export.pdf.maxContentSize=100

This configuration is exposed in the dedicated administration section. The reasons for which we introduced this configurable limit are:

  • XWiki supports multi-page PDF export, meaning that any user can trigger a large export (e.g. of tens or hundreds of pages)
  • even if the selected pages are rendered in a background (daemon) thread with low priority, we need to keep their XDOM (see Loading...) in order to compute the aggregated Table of Contents, which can end up taking a lot of memory

I got reports from users complaining they get:

Failed to export as PDF. RuntimeException: Maximum content size limit exceeded

which shows that the default value for this limit is not right. I will improve the message to indicate that the limit can be increased or disabled from the administration, but we need to decide on the default value. I see two options:

  1. No limit by default, i.e. set the limit to 0 (proposed by @lucaa )
  2. Increase the limit, from 100KB to 1MB

WDYT?

On my side I don’t have a strong opinion, so +0 for either of them.

Thanks,
Marius

Hi Marius,

What’s the most important to me is that a user who gets the message should understand what it means and what to do to solve the problem. We need to consider the 2 cases: the user is an Admin, the user is not an Admin. If he’s an Admin then provide a link to the Admin UI to change the the default limit value. If he’s not an Admin provide the explanations to give to an Admin.

Now, even with the improved message, it could be good to avoid this “WTF” moment for the user, as much as possible.

Is it possible to auto compute the limit as a percentage of the total memory or the free memory (which is not perfect since it depends if the gc has executed or not. We could force it but this doesn’t guarantee its execution if I’m correct)?

ofc it would be best to fix Loading... (maybe save the XDOM on disk and then use some stream to read it and compute the TOC so that it doesn’t use much memory?)

@mflorea Have you done some testing with no limit and large pages to export, to see what happens and whether the wiki becomes unresponsive or crashes? Is the export timeout starting after the full XDOM is in memory or is it part of it?

Thanks

No.

After. The documentation states:

# The number of seconds to wait for the web page to be ready (for print) before timing out.
export.pdf.pageReadyTimeout=60

The timeout is for the HTML page ready for print. After we render all the selected wiki pages we compute the aggregated HTML which is loaded in the browser (user’s or remote Chrome). We print the HTML page to PDF only after it’s ready (in order to include dynamic content added / modified by JavaScript). Waiting for the HTML page to be ready for print can take long or even forever, e.g. if there is a JavaScript bug (infinite loop) or error (that prevents the JavaScript code of the PDF Export Application from executing).

So the size limit (server-side) and the page ready timeout (client-side) serve different needs.

I think this is a more general question if we want to protect against OOM errors. We don’t enforce such limits in other places like the display macro that could easily be used to, e.g., include 100 other documents in a single document. The PDF export makes this easier to trigger, though. I think in general it might be a good idea to protect against OOM attacks.

Unfortunately, the current limit is hardly a predictor for the actual memory usage of the XDOM (e.g., embedding images as data URIs could easily increase the output size without needing much memory or causing problems in Chrome). Would it be possible to instead compute a rough estimation of the XDOM’s memory consumption based on the number of nodes and maybe number of parameters in addition to the size of the HTML limit and only stop once one of the sizes reaches a much higher limit like 100MB? If this sounds too complicated, I’m +1 for option 2, possibly with an even higher limit like 5MB.

Another proposal: don’t stop the PDF export when the size limit is exceeded after processing the last document. That way, a single document can always be exported regardless of its size (and the XDOM is in memory at this point, anyways, so it doesn’t really help to abort the export at that point).

That’s an interesting suggestion, but I’m worried that this leads to some inconsistency: the PDF export may fail (or not) depending on the order of the selected wiki pages. This will certainly create confusion.

That’s true, another option could be to enable this only for single-document exports, i.e., ignore the size limit if just a single document shall be exported.

I’m OK with that. Thanks for the suggestion. I reported Loading... .

Coming back to this, even if we do Loading... there is still the question of whether to apply the size limit to multi-page PDF export by default or not. Basically the options are:

  • Keep the current behaviour, thinking that with Loading... this limit is less of a problem
  • No limit by default, i.e. set the limit to 0 (proposed by @lucaa )
  • Increase the limit, from 100KB to something else, say 1MB or 5MB

WDYT?

Thanks,
Marius

Actually in a previous reply I’ve proposed the following which is needed IMO (whatever other option we choose from your list):

I think I’d be ok with the following to 5MB, if we also have the message improvement above:

Ideally, I would have liked to set the limit to 0 but my fear is that this could create more easily some DDOS attacks (more easily since there are less request to do) if the admin user doesn’t configure it, and thus it’s probably safer to set a default value. But again what’s important is for users to understand what they can do if it fails. That’s the key for me.

Thanks

FTR, I’m planning to implement the following issues (as a result of this thread):

If anyone is against this please let me know ASAP.

Thanks,
Marius