Break the details of the wiki + space search REST API

MichaelHamann · May 23, 2025, 2:03pm

Hi everyone,

the wiki and space-level search REST API currently has a very detailed specification how it performs the search:

Returns the list of pages and objects that contain the {keywords} in the specified {scope}s. Multiple scopes can be specified. Search results are relative to the whole {wikiName} and are obtained via a HQL query. The specified keywords are converted to uppercase and used in a HQL LIKE clause (e.g if the scope is CONTENT then the document’s content is matched to the specified keywords).

The only use of this API that I’m aware of is the page picker. In order to fix the slowness of this page picker (XWIKI-22958), I see two options:

Break this description and modify the search behavior. In particular, the search API would have the following new behavior:
1. It uses Solr for searching, so no exact substring match is performed. Instead, the query is matched with the standard Solr query parser. Further, for the last token of the query, a wildcard is added at the end to support a partially typed word at the end of the query. This wildcard processing is disabled for the page content to improve performance.
2. Sorting is by default by match score but still supported for some properties, in particular, fullName, name, title, language, date, creationDate, author, creator, space, version, hidden. For empty queries, as a special case, sorting is by date in descending order (to return the most recently modified pages).
3. Searching in the name restores the previous behavior of matching in all spaces. The title has a higher score, though, so matches that occur in the title will be preferred over just matches in the space. Additionally, when searching in the name, the search now also matches the entered text against the full document reference to specifically support pasting a full document reference (this is an exact match only, no substring).
Introduce a new API for the page picker that exposes the behavior described above, or use an existing Solr API that is flexible enough to support the use case (not sure, as it is very nice to be able to use the tokenizer for getting the last token for the wildcard search).

I’m in favor of breaking the existing REST API as it is a REST API that exposes slow database searches and I think we should remove such REST APIs. Further, I think it would be better to have a replacement that mimics the old REST API than to completely remove it.

Therefore, I’m opening this vote to perform the breaking change of option 1. If the vote should fail, I’ll proceed with option 2.

I’m opening this vote for a bit more than a week until June 2, 10:00.

tmortagne · May 26, 2025, 8:39am

I’m fine with option 1. IMO, it still covers the intent of that API (fuzzy search of pages) and will improve performance of any code which currently uses it.

vmassol · May 26, 2025, 8:40am

I’m -0 to remove the current REST API since it’s an API and we don’t know where it could be used. I’d have either created a new API and marked the current one as deprecated and legacify it. Or maybe better, keep the current API but introduce the ability to configure how it operates (either with HQL or with SOLR), configure it by default to use SOLR and explain the change in the RN and how to bring back the old behavior (and ofc explain that it’s a slow API and give figures/explanations).

In any case, I’m not blocking the vote.

Thanks for working on this.

MichaelHamann · June 6, 2025, 9:44am

As the deadline of the vote passed with only two committers answering, this vote failed.

I’m going to proceed by adding an optional parameter that allows choosing the source. As making the default Solr would be a breaking change, too, the default will be the database search, and I’ll change the page picker to use Solr. If we should agree to make Solr the default or remove the database option it would be an easy change.

MichaelHamann · June 16, 2025, 9:31am

While adapting the implementation, I noticed that adding another parameter to the REST API wouldn’t be nice as it already has a lot of parameters. Further, adding the parameter on the client-side didn’t seem that straightforward as the URL is provided by XWiki’s JavaScript API. Adding the parameter there would have a similarly breaking effect.

I’ve therefore decided to implement a different approach: I’ve refactored the code to use an internal component role for getting the results. There are two implementations for this API, and a configuration option in xwiki.properties allows switching between the two implementations. The Solr-based implementation is provided by the xwiki-platform-search-solr-rest module that also changes the default of the configuration to Solr while the standard REST server uses database search as default as it only includes that implementation. This configuration option makes it possible for admins to easily switch back to the previous implementation in case there should be any problems.

MichaelHamann · June 23, 2025, 11:45am

I’ve implemented the variant that I’ve presented in the previous message with the respective configuration options as part of XWIKI-22958. I’ve documented the new configuration option and documented the breakage in the release notes.