Break the details of the wiki + space search REST API

Hi everyone,

the wiki and space-level search REST API currently has a very detailed specification how it performs the search:

Returns the list of pages and objects that contain the {keywords} in the specified {scope}s. Multiple scopes can be specified. Search results are relative to the whole {wikiName} and are obtained via a HQL query. The specified keywords are converted to uppercase and used in a HQL LIKE clause (e.g if the scope is CONTENT then the document’s content is matched to the specified keywords).

The only use of this API that I’m aware of is the page picker. In order to fix the slowness of this page picker (XWIKI-22958), I see two options:

  1. Break this description and modify the search behavior. In particular, the search API would have the following new behavior:
    1. It uses Solr for searching, so no exact substring match is performed. Instead, the query is matched with the standard Solr query parser. Further, for the last token of the query, a wildcard is added at the end to support a partially typed word at the end of the query. This wildcard processing is disabled for the page content to improve performance.
    2. Sorting is by default by match score but still supported for some properties, in particular, fullName, name, title, language, date, creationDate, author, creator, space, version, hidden. For empty queries, as a special case, sorting is by date in descending order (to return the most recently modified pages).
    3. Searching in the name restores the previous behavior of matching in all spaces. The title has a higher score, though, so matches that occur in the title will be preferred over just matches in the space. Additionally, when searching in the name, the search now also matches the entered text against the full document reference to specifically support pasting a full document reference (this is an exact match only, no substring).
  2. Introduce a new API for the page picker that exposes the behavior described above, or use an existing Solr API that is flexible enough to support the use case (not sure, as it is very nice to be able to use the tokenizer for getting the last token for the wildcard search).

I’m in favor of breaking the existing REST API as it is a REST API that exposes slow database searches and I think we should remove such REST APIs. Further, I think it would be better to have a replacement that mimics the old REST API than to completely remove it.

Therefore, I’m opening this vote to perform the breaking change of option 1. If the vote should fail, I’ll proceed with option 2.

I’m opening this vote for a bit more than a week until June 2, 10:00.

I’m fine with option 1. IMO, it still covers the intent of that API (fuzzy search of pages) and will improve performance of any code which currently uses it.

I’m -0 to remove the current REST API since it’s an API and we don’t know where it could be used. I’d have either created a new API and marked the current one as deprecated and legacify it. Or maybe better, keep the current API but introduce the ability to configure how it operates (either with HQL or with SOLR), configure it by default to use SOLR and explain the change in the RN and how to bring back the old behavior (and ofc explain that it’s a slow API and give figures/explanations).

In any case, I’m not blocking the vote.

Thanks for working on this.