Break backward compatibility for Active Installs

Hi devs,

I’d like to break the backward compatibility for the Active Installs module. The main reason is that I’m working on upgrading the ElasticSearch (ES) support from version 1.1 to 7.17.x (latest stable), and a lot of things have changed in ES over this time. Some examples:

  • The framework we were using to access ES (JestClient) has stopped supporting versions of ES a long time ago
  • It also didn’t support Java 11
  • ES has stopped providing a _timestamp field
  • ES has stopped recommending using a type (only the index is recommended now, i.e. installs3 instead of installs3/installs
  • The latest ES java client API is now a typed API and this means exposing a Ping java object as the result of queries instead of an untyped JsonObject.
  • And some more…

I’m also taking the opportunity to provide improvements to the schema to make it more extensible (each data provider now provides its data as a sub item).

Before:

{
    "dbName": "HSQL Database Engine",
    "extensions": [
        {
            "id": "org.xwiki.platform:xwiki-platform-dashboard-ui",
            "version": "6.1-SNAPSHOT"
        },
        ...
    ],
    "instanceId": "xxxxx-0846-4eec-85f0-yyyyyyy",
    "firstPingDate": 1399304100207,
    "dbVersion": "2.3.2",
    "javaVendor": "Oracle Corporation",
    "distributionVersion": "6.1-SNAPSHOT",
    "osName": "Linux",
    "sinceDays": 5,
    "osArch": "amd64",
    "distributionId": "org.xwiki.enterprise:xwiki-enterprise-web",
    "javaVersion": "1.7.0_51",
    "osVersion": "3.10.23-vs2.3.6.8-beng"
}

Now:

{
	"date": {
		"current": "2022-02-24T12:37:29.844197419Z",
		"first": "2022-02-24T12:37:29.844197419Z",
		"since": 0
	},
	"memory": {
		"total": 536870912,
		"max": 1610612736,
		"used": 202686312,
		"free": 334184600
	},
	"os": {
		"name": "Mac OS X",
		"arch": "aarch64",
		"version": "12.2"
	},
	"distribution": {
		"extension": {
			"features": ["featureId/featureVersion"],
			"id": "distributionId",
			"version": "distributionVersion"
		},
		"instanceId": "a0e7adc7-84b7-42f1-a756-1f32ef3fe9da"
	},
	"extensions": [{
		"features": ["featureId1/featureVersion1"],
		"id": "extensionId1",
		"version": "extensionVersion1"
	}, {
		"features": ["featureId2/featureVersion2"],
		"id": "extensionId2",
		"version": "extensionVersion2"
	}],
	"database": {
		"name": "databaseProductName",
		"version": "databaseProductVersion"
	},
	"java": {
		"vendor": "JetBrains s.r.o.",
		"version": "11.0.13",
		"specificationVersion": "11"
	},
	"servletContainer": {
		"name": "servletcontainername",
		"version": "servletcontainerversion"
	}
}

If ActiveInstalls was a contrib extension this would be a major new version, but since it’s inside XS, I can’t do this.

Since I don’t believe there are lots of users of the API and since the schema has changed anyway (this means users need to update their code anyway to make it work), I’m proposing to break backward compatibility.

WDYT?

Thanks

It’s not very clear to me what is exactly broken here. You seem to be talking only about people manipulating the schema, but doesn’t this change also means we won’t receive pings from instances in older versions of XWiki too ?

Good question.

For old versions of XWiki, my immediate plan would be to keep the ES 1.1 instance. What could be done in the future (but I think it’s overkill) would be to create a proxy server to convert the old JestClient HTTP request (and the old ES 1.1 responses) but it’s really a lot of work.

Now this means that old XWiki pings can still be recorded but we still won’t be able to display them in the ActiveInstalls queries/dashboard. One option could be to move the current ActiveInstalls code in xwiki-attic and have the new ActiveInstalls code use a different maven artifactId + a new xwiki space for the wiki pages so that we could install both extensions on xwiki.org.

The other option would be to drop displaying old XWiki instance active install data and only consider the new ones (and possibly stop the old ES 1.1 instance and just make sure that we respond with a 200 to ping requests at the old URL).

Opinions welcome.

Thanks

This sounds like the best option.

AFAIU once you break the compatibility we will stop being able to receive information for all xwiki instance before the breakage. So we’re not really talking about “old” instance here: if it’s broken in 14.2, we won’t get information from all instances including current LTS, 14.1, etc. In practice it means when we’ll upgrade XWiki.org to next LTS (14.10) by default we won’t get stats anymore for instances using 13.x, 12.x etc, which is a pity IMO.
So better be able to install the old extension too at this point to keep gathering data.

yeah and it also solves the backward compat breakage since there’s no breakage anymore :wink: (replaced by a deprecation)

I’m proposing:

  • Artifact ids: xwiki-platform-activeinstalls2*
  • Java packages: org.xwiki.activeinstalls2*
  • Wiki space ActiveInstalls2
  • Config properties prefix activeinstalls2.
  • Script Service activeinstalls2
  • User Agent XWikiActiveInstalls2
  • Default ping URL https://extensions.xwiki.org/activeinstalls2

A la Apache Commons Lang…

I have the same feeling. We should still gather data from “old” instances.

I’ve discussed with Ludovic yesterday about this and while he said it’s ok to proceed as planned, he would prefer if we could not loose past data and that we could get rid of the old ES instance ASAP.

So I’m putting 2 diagrams showing the current proposal and a proposal for the future to migrate from the current proposal to a target architecture.

Current proposed architecture:

currentProposal

Future proposed architecture:

futureProposal

Notes:

  • I’ve put XWiki 14.5 to be the version when we’ll use the new ActiveInstalls2 module but that could change.
  • The migrator part is optional and will depend whether we want to migrate the existing data or not.
  • The explanation for the LogWriter process is:
    • A simple process with a very good uptime and small memory needs, having the only goal of writing log files
    • The LogWriter will need to understand different input data formats (for XWiki < 14.5 and >= 14.5)
    • As it’s very simple, it offers a very limited attack surface
    • These log files would be the XWiki raw data for active installs
    • They are independent of the underlying technology (ElasticSearch or something else in the future) and can be replayed/loaded into a new store if needed in the future (even if that could take a very long time)
    • Requires large amount of disk space (old logs could be stored offline if need be)
  • (Optional) Right now the data sent by the XWiki instances are continuing to use the JSON format of ES but we could imagine using another format if we wanted. We would just need to modify the LogWriter to support a new format.

WDYT?

FTR these diagrams were generated using yUML, using:

[XWiki < 14.5]-Ping (JSON over HTTP) >[ElasticSearch v1.1.1]
[Kibana UI Old]->[ElasticSearch v1.1.1]
[XWiki.org|ActiveInstalls Extension (UI)]->[ElasticSearch v1.1.1]
[XWiki >= 14.5]-Ping (JSON over HTTP) >[ElasticSearch v7.x]
[Kibana UI New]->[ElasticSearch v7.x]
[XWiki.org|ActiveInstalls Extensions (UI)|ActiveInstalls2 Extensions (UI)]->[ElasticSearch v7.x]
[XWiki < 14.5]-JSON>[XWiki LogWriter Process|Small process to parse JSON & write logs]-Write >[Log Files {bg:green}]
[LogStash]-read >[Log Files {bg:green}]
[LogStash]-write >[ElasticSearch v7.x]
[XWiki >= 14.5]-JSON>[XWiki LogWriter Process]
[Kibana UI New]->[ElasticSearch v7.x]
[XWiki.org|ActiveInstalls2 Extension (UI)]->[ElasticSearch v7.x]
[Migrator {bg:wheat}]-read >[ElasticSearch v1.1.1 {bg:wheat}]
[Migrator {bg:wheat}]-write >[XWiki LogWriter Process]

Big +1 for the XWiki (LogWriter) entry point, never been a fan of communicating directly with ElasticSearch…

For the Log filesLogStash part, I don’t know, never used LogStash. You’re sure going through logs won’t limit the kind of requesting we can then do in the UI ? Thinking about things like type of metadata and related calculation, etc.

The log files are just the raw data and are not meant to be queried. They need to be loaded in some tool (like ES, Solr, RDBMS, etc) for querying.

Just FTR, that was never the intended target architecture, it was just the simplest thing that could be done with the little time available for some data that was not runtime-critical. Also, there’s a small front end/proxy in front of ES.

We’ve used it a bit in the past (we did that during a hackathon) to gather all xwiki.org logs and load them in a ES instance. We were then using Kibana to view them.

I’ve put Logstash since it already exists and it’s meant for that; it’s also well developed/maintained but it’s possible to use something custom if we want (but I’d really try to avoid that, there are already more custom pieces in the target architecture than before and they require dev, doc & maintenance).

I know that they are not going to be queried, but they are still just lines of text, so I want to be sure we don’t lose information by going through a log format.

It’s essentially a serialization of the JSON sent by the XWiki instances.

OK, so it’s more file storage or metadata, that real “log” you would look at.

Also a big +1 from me for this new architecture that avoids the direct dependency between XWiki installations and Elasticsearch.

Personally, I’m all for making XWiki independent of Elasticsearch which isn’t FLOSS anymore and thus I’m also +1 to move to a JSON format that is independent of Elasticsearch. I understand that moving to a different search backend would be a huge cost at the moment (though a migration to OpenSearch might be feasible) but I think every step to reduce that vendor lock-in is a good one and making the API that is used by XWiki installations independent of Elasticsearch is a very important one as it the one place we have least control on.

I would also expect that the JSON log files are very well-compressible as the data should be very repetitive.

1 Like

+1 for me too. Thanks