Solr image search architecture

Hello all,

I’d like to discuss some design choices related to the migration of the suggest attachments widget from the database to solr.

Some context first:

  • the search input is based on solr (by calling SuggestSolrService)
  • the attachment and image dialogs in CKEditor are based on solr (alsao calling SuggestSolrService but have their own client code)
  • the suggest attachments widget is based on the /attachments rest endpoint (itself based on database queries) and we want to migrate it to solr
  • a new CKEditor image dialog is in development and will also need to search for images (i.e., attachments), with some additional constraints. For instance, the images from the current document must be listed first.

Here is some solutions I have in mind.

Proposal 1: reusable client library

A javascript client library, calling SuggestSolrService is implemented and can be reused by:

  • the attachment and image CKEditor attachment/image dialogs
  • the suggest attachment widget
  • the new CKEditor image dialog

Currently, suggestAttachments.js converts the rest endpoint response to objects that can be used by selectize. We’ll need to do the same with the response from SuggestSolrService. While this is technically possible, we’ll need to make sure that conversion results are similar. This is not straightforward as the response does not contains the same information, or not represented in the same way.

Proposal 2: Make the /attachments rest endpoints use solr

  • Proposal 2.1 the rest endpoints are migrated to solr and the interface does not change
  • Proposal 2.2 the rest endpoint are migrated to solr but a type parameter is introduced, allowing to choose which store to use.

Proposal 2.1 involves more regression risks but allow to migrate client code more quickly to solr based queries.
In this case too, a conversion will need to be done (from the solr results to the response type of /attachments), but can be slightly simpler than client side as it is not too costly to do some additional database queries to complete the solr results.

Proposal 3: Introduce a new rest endpoint

While close to proposal 2, in this solution a new rest endpoint is introduced (e.g., /search/attachments).
This leads to yet another endpoint to maintain, but would allow us to be search oriented, instead of the existing /attachments endpoints which are “list” oriented.
I think this solutions could let us more room to provide interesting search features (which are not really relevant in the existing /attachments endpoints), especially when it comes to search specific aspects such as:

  • prioritizing the attachments to the current document
  • taking into account OCR based indexed context (e.g., lower these results priority, and/or ignore them)

In this case we’ll also need to develop a reusable javascript client library, then gradually integrate it where it’s needed.
Note that while the implementation would be different, this endpoint could return the same entities as the current /attachments endpoint, making it easier to migrate form one to the other.

WDYT?