Add the ability to do case-insensitive Solr search for StringProperty and ListProperty

I think it would help being able to do exact searches (100% match) in object strings in a case-insensitive way using Solr.

It would probably make sense for StringProperty and ListProperty object parameters, maybe not so much for LargeStringProperty.

I see several possible resolutions for this matter:

  1. do nothing and keep the things as they are now: we don’t address case insensitive search on string properties
  2. Systematically have a case insensitive Solr field for all such properties
  3. Selectively opt in for a case insensitive Solr field for properties we know it will be useful for
    3.1. XWiki tags should probably be one of them
    3.2. How to opt in? Should this be in the definition of a property?

There’s concern about the Solr index size increase this could incur and this should be considered. On the other hand:

  • not having to reindex when we notice some field would benefit from a case insensitive Solr field would be great
  • maybe the need for a case insensitive search can come from a separate developer than the developer in charge of the property, so not systematically having a case insensitive Solr field is risking a lack of functionality
  • some properties having the case-insensitive field and some not having it could cause confusion because things are less uniform than they could be

Implementation details:

  • stored="false" should help with the Solr index size.
  • we could use the new LowerCaseStrField

Note: We already have propertyvalue_ is type="text_general", allowing case insensitive search, but not exact searches.

WDYT?

Note that an extension can already add to the index anything through a SolrEntityMetadataExtractor (and create the field in the index at init using the SolrJ API, if the field is custom too).

2. sounds OK (and definitely useful), but would be interesting to try it and see how bigger the index get.

Not a huge fan of 3. (does not seem very practical). But should not be too hard to automate the reindexing of all the documents containing only a specific object type for which the indexing just been enabled/disabled (or the opposite).

It seems it would actually provide a straightforward solution to any extension needing this right now, do we have some reference / documentation for this? Good practice regarding the naming e.g., to avoid collisions?

Is it generally desirable that extensions add their own fields in the Solr index?

I was wondering if it would actually remove the need for this proposal, but probably not:

  • it’s still probably a bit more complicated and manual to do
  • we risk that extensions define several fields that do the same thing (if several extensions need case insensitive search for tags for example, we could end up with several case insensitive solr fields for tags), or even conflicting, incompatible custom field definitions if we are not careful with some naming rule avoiding collisions

+1 for 2, i.e., to add a (separate) lowercase index for all non-large strings and all lists. I cannot imagine that this would significantly impact index size, at least as long as we don’t do this for large strings.

Regarding stored vs. non-stored, I’m not sure if it’s worth to avoid storing the value in the document (and then not having the value in the results). Note that we definitely need docValues=true for facets and sorting. See https://stackoverflow.com/questions/51925871/what-are-docvalues-in-solr-when-should-i-use-them for an explanation on docValues.

I thought about using Solr for tag clouds, and for that we definitely need case-insensitive indexing in Solr, so completely independent of Confluence-related needs I think we absolutely need this in XWiki.

It’s fine if they add a custom concept.

IMO, the need you describe here is generic enough to be in XWiki Standard and not having each extension that have this need add its own custom field.

It seems to be missing from https://extensions.xwiki.org/xwiki/bin/view/Extension/Solr%20Search%20API, indeed. I just added https://extensions.xwiki.org/xwiki/bin/view/Extension/Solr%20Search%20API#HExtensibility.