Section Titles (H1) in Header for PDF Export

I’m having a hard time. I’ve tried multiple ways but I don’t think I understand the rendering phases for PDF well enough. To do this with JavaScript I need to be able to see the page after the page breaks are in place. When I try to retrieve the DOM elements representing all my page headers it is returning just one whether I use…

require(['xwiki-page-ready'], function(pageReady) {

or add…

pageReady.afterPageReady(() => {

or add to that…

return new Promise((resolve, reject) => {

I’m seeing the page before the pagination. Obviously when the table of contents (TOC) is being built and pages are being set, it has exactly the data that I need, since this is generally the last item on the TOC for the page number preceding the current page.

Not accounting for a few details I’m just trying to set the last “h1” section text as part of my page header to indicate that the current page starts within the middle of the section, i.e. instead of just the document title in each page’s header something like: “<document title> - <current section>” (where <current section> is the text content of the last h1 element). I can’t even get that far, but a nuance is excluding the “Table of Contents” literal text which is in an h1 element coming before the first page of content, or when an h1 is the first content on a new page following a page break the last section title should be excluded from the header. I could deal with those cases maybe if I could just get the basic functionality working.

I tried doing this with JavaScript within the PageReady steps shown above this way. “hs “ is always printed with a length of 1 though in the finished document the span with class “pdf-chapter-name” occurs many times, showing that my script is running before the pages with headers have been set up:

```
var sectionHeadings = document.getElementsByTagName('h1');
console.info('shs '+sectionHeadings.length);
var i = 0;
var previousSectionHeading = null;
var sectionHeading = sectionHeadings[i];
console.info('sh '+sectionHeading.outerHTML);

var headers = document.getElementsByClassName('pdf-chapter-name');
console.info('hs '+headers.length);
for(var i2 = 0; i2< headers.length; i2++){
  var header = headers[i2];
  console.info('h '+header.outerHTML);
  while(sectionHeading !== null && (header.compareDocumentPosition(sectionHeading) & Node.DOCUMENT_POSITION_PRECEDING)) {
    i = i+1;
    previousSectionHeading = sectionHeading;
    sectionHeading = sectionHeadings[i];
    console.info('sh '+sectionHeading.outerHTML);
  }
  if(previousSectionHeading !== null){
    let newSpan = document.createElement('span');
    newSpan.innerText = previousSectionHeading.innerText;
    header.after(newSpan);
  }
}

I thought about trying to pull from the table of contents page, but I suspect I’d have the same problem and the content doesn’t exist yet when my javascript would run.

I tried doing it in LESS but lazy loading is killing me, because I can’t get the sequence

h1 {
  string-set: section-name content(text);`
}

is kind of close, but often one off. So I tried something like (syntax may be wrong)…

h1 {
  string-set: previous-section-name string(section-name);
  string-set: section-name content(text);
}

doesn’t work because these aren’t done in sequence but through lazy loading, so I think when I get the syntax right what happened was previous-section-name and section-name are the same, since it evaluates section-name on reference.

Hi,

If you want to show the current section name for a given level in the PDF header then you should first break the print page before any section of that level. It doesn’t make sense to mix multiple sections of the same level in the same print page if the header shows the name of a single section, which might not even start on that print page.

break-before: page;

Once you do this, then:

string-set: section-name content(text);

should work as expected because for each print page you can have:

  • either the section name is on that page (i.e. the section starts on that page)
  • or there is no section name (of that level) on that page, because the section continues on that page (it has started on a previous print page)

In both cases string-set will set the section-name to the name of the current section (that started on the current print page or a previous print page).

Adding a separator between the wiki page title and the section name, in the PDF header, is a bit tricky, because paged.js currently doesn’t support passing multiple values to string-set, as in:

string-set: section-name ' \00A7 ' content(text);

but you can use other styling to separate the two.

Hope this helps

Current state: I did end up inserting the page breaks, making the doc longer (making very short sections in particular look bizarrely wasteful), and also I get an annoying blank page after the TOC since the page break precedes each new section which includes the first section. Some sections happen to be huge (a long running table) while others only take a few sentences.

Previous state: It’s a little irritating to be told what I’m asking for doesn’t make sense. It made sense to me. Wiki users requested it to me, because it made sense to them. If it makes sense to us, it does make sense, right? It isn’t crazily abstract. It’s annoying to have to flip forward or backward pages to figure out where you are. I run into that when trying to find something specific in a textbook all the time when I’m already familiar enough with to flip close to where I want to be. A page number on the page gives no meaningful context if I don’t know the exact page I’m looking for. It just represents an exact location, like the numerical address in a mailing address: by itself meaningless. If the title of the book is like the city, and the street like the chapter, a cross street name is more memorable and meaningful than the number. Page numbers are arbitrarily based on where previous natural page breaks happen to have fallen. Section titles are meaningful for helping me know where in logically organized content I am. It makes it easier to flip through and find what you’re looking for or to always know what you are looking at. For new sections the title is obviously inline at the start of the section, so it would be redundant to put it at the top of the page. For sections large enough to be interrupted by page breaks, on all but the first page the section title isn’t visible to tell you what section you’re in, and I think it can make perfect sense to repeat it. You could just add " - Continued." to the title to make it clear you’re in the middle of it.

@jamesw maybe I wasn’t clear enough. From my point of view, the information you put in the header or the footer of a page must refer to / cover / apply to the entire page. It’s the header /footer of the page, not the header / footer of the first half of the page:

  • if you put an author, it should be the author of the entire page, not just half of the page
  • if you put a page number, it’s the number of the entire page
  • if you put the title of a chapter then the entire page should be part of that chapter

As a wiki user myself, that’s how I see it and what I expect. But I accept others may see it differently.

I hope it was clear I didn’t suggest to break on all headings, but only on levels that usually / statistically have enough content to span multiple print pages, e.g. break only on level 1 headings.

Hope this helps,
Marius