XWiki for ELN (timestamping/hashing)

svollmar · November 4, 2023, 3:06pm

We are currently evaluating if XWiki is suitable for some of our “Electronic Lab Notebook” (ELN) use cases (sorry if this has been asked before) - in particular, we are looking for efficient methods to asynchronously hash and timestamp new content (this would also be interesting for any non-scientific applications where this can be important for legal reasons). We already have code for hashing (SHA256) and timestamping (RFC 3161) - in principle, we could do a nightly export of suitable XAR files, extract the content and attachment parts, then hash and timestamp those - this would work but is not very efficient: ideally, we would want a reliable list of “dirty” new content - or a method to generate that on the fly (“list all pages with changes after a given modify date”) which we can call, say, once every hour - or, preferably after receiving some sort of “change” event. Hashing/timestamping in a synchronous fashion (immediately after new content has been committed) is not what we are looking for: timestamping requires communication with an external service provider - a free or commercial Time Stamping Authority (TSA) - which for technical reasons might not always be available immediately, the important point is that timestamping happens “reasonably fast” and reliable. So (1) how can we efficiently get a listing of pages with new/changed content (or similar), (2) what is the best method to efficiently extract those pages for processing (assume low-level admin access to XWiki), (3) Can we add metainformation to individual pages containing information on hashing and timestamping? Many thanks, Stefan

vmassol · November 5, 2023, 7:40am

Hello, yes, it’s possible with XWiki to listen to events and there’s an event sent when a page is modified.

This will require some java code, or some groovy code, see https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/Tutorials/WritingEventListenerTutorial/

As to storing the hashcode, that’s very easy, you can simply create an XClass for the hash/timestamp and then add or update an XObject in the listener. See https://www.xwiki.org/xwiki/bin/view/Documentation/DevGuide/DataModel/ and I recommend following the FAQ tutorial listed at the bottom of that page.

Thanks

svollmar · November 8, 2023, 4:56pm

Many thanks for your help - we are impressed by XWiki’s capabilities.

Before we dive in at the deep end (we are eager to), here is roughly what we plan to do (please excuse potentially stupid questions, XWiki is new for us; assume root-level access for this application): (1) use a method from the examples in your links to capture Document-modified-events and call a URL, or preferably, a local script with the page’s identifier (we need something to trigger asynchronous operations; we have happily used ZeroMQ in the past), (2) use a different machine (ideally within a few minutes of the last modification) to automatically transfer the recently modified page as XAR file from XWiki, hash and time-stamp (RFC 3161) the content and attachments if available (this does not have to be efficient as this processing is done asynchronously), (3) attach an object to the recently modified page with information about the status of time-stamping and archiving.

The idea is to have a simple macro (provided in a template for new pages) that dynamically displays information at the top of each page (e.g. a red square if the page is “dirty”, a green square is the page has been time-stamped and archived). ad (1): can we use groovy to call a local bash script with the modified page’s identifier (whatever we need to access the corresponding XAR file and how w?) or should we use the Java example for handling the Document-modified-events, e.g. for performance reasons? ad (3): this can also be done from the remote machine via the REST interface, correct?

vmassol · November 13, 2023, 4:50pm

There’s a better solution than this! Since you are going to use an XClass and XObjects in pages, you can associate a sheet to that XClass, meaning that XWiki will call that sheet to render any page with an XObject of that XClass. Thus the page itself doesn’t need to use any macro to automatically render the page the way you want it (i.e. with some info at the top).

See https://extensions.xwiki.org/xwiki/bin/view/Extension/Sheet%20Module for details.

Best is to use an Event Listener, listen to document changes and asynchronously call whatever you need (local OS script, an URL, etc) to get the stamping data you need and the, update the XObject data. There are various ways to do this asynchronously depending on your volumes, etc. It could be some running threads handling this, it could also be about using some message queue and then external servers that poll these queues to perform some actions, etc. Depending on your architecture choice you can update the XObject metadata either from within XWiki or from an external server using XWiki’s REST API.

Thanks

svollmar · November 16, 2023, 7:01pm

There is even more interesting infrastructure in XWiki than we thought - “Sheet Modules” (thanks for suggesting this) look indeed very promising for displaying a page’s status (dirty or archived and time-stamped). We are familiar with message queues, so the asynchronous processing on a remote machine is not a problem as long as we can reliably capture modified/create events and access/change pages via the REST interface. Now we will get some coding done and let you know if it worked smoothly - thanks for your help!