Unreliable Search queries

Hello everyone,

we have an issue with our SOLR Search engine, that becomes more and more a real problem für our users.

The search queries often don’t get any matches, even though the page contains the words itselves or parts of it.

Latest issue was the seach for some smaller parts of the word like “schräge ecke” where the page to be found contained the text “Dachschrägen über Außenecke wird nicht mehr korrekt dargestellt”. (Don’t know if that’s important, but the Page to be found is part of an application. The same effect does occur on normal pages as well)

Search queries for
(chschräge or chschräge) or (außene or außene)
or similar did get a match, though, so we don’t understand, why the original query didn’t find the corresponding page.

The imminent problem here is, that our users rely on the search very much as a knowledge base. If they don’t find the page at first try but get results for other page, they usually don’t alter the query just to find a page they don’t even know it exists, so we need to find a way to get the search engine more reliable.

Search engine is: SOLR
XWiki Version is: Enterprise 9.5.1

This issue is torturing us for quite some time now so any kind of help will by highly appreciated.

Best regards

Hello @shdwiki,

That behaviour is by design. Solr is “lexically aware”. Try using *schräge* for partial matches.

You might be interested in:

https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

Also, there were some improvements made to search in v10.1 I think, so check out the release notes.

Ben

Hi Ben,

thank you for your answer. We also did a similar approach as the query parser in our wiki but our problem is, that even the vanilla search querry didn’t get results for “schräge ecke”

Unfortunately, upgrading to whole wiki to v10.1 is not an option, because this is a rather complicated process in my company involving several other parties as well. Last major update took us several weeks to complete because of the bureaucracy waterfall before and afterwards.

Is there a way to just update the search engine, while leaving the other components as is?

Best Regards
Björn

Sounds like a question for @mflorea. However, I think I see where you might be having a problem. Are schräge and ecke fully formed words? apologies, I don’t understand German. If they are, what language is set on the page where they appear?

An example in English would be much appreaciated :slight_smile:

You could implement this through the Web UI: Added Solr sloppy phrase matching capability and config by benmegson · Pull Request #617 · xwiki/xwiki-platform · GitHub

Ah, I see. My bad. System language is set to german.

An example in english would be the seach for “quater” and “neighbour” or “head” and “hood” to find a page containing the sentence “find our headquater in your neighbourhood”

In german, “schräge” as well as “ecke” are valid words on their own.

OK, so those words won’t expand to form other words because they are of a different lexical root, i.e. neighbour will find neighbouring, but not neighbourhood. Google “stemming” to find out more. This is by design, as it helps queries be more relevant.

If you wanted to query in the manner you are suggesting the standard database query might work better.

Struggling to understand why you would want this though. We (in my company) find that Solr works really well.

I might have misunderstood your problem though.

FYI… check the page languages, not just the system language. I had some problems with this in v9.5.1. When you search, tick German in the facet list and see what happens. If your pages disappear, you have a problem.

Hi Ben,

I just checked that and you are right! The page dissappeared as I unselected german so it was appearantly declared as “without language specification” (ohne Sprachangabe)

Maybe this is the main reason. I will test this and report back but you already have my thanks for this hint.

Best Regards
Björn

there were some outdated properties from import of our bugtracker interface, that prevented the language to be detected as german but even as I removed these properties, the search querry didn’t behave differently.

The main reason why we also want to find fragements of the word is, because sometimes, the entry in the knowledge base must be found by someone, who has only a vague idea what he is looking for, so he couldn’t spacify the search. Also, if there are already matches for the initial search, and you don’t find your entry, nobody would assume that the entry was just not searched correctly. Instead, one would assume that there is no entry and create a new one, which potentially results in data chaos.

Have a read at this: lucene - How to configure Solr to do partial word matching - Stack Overflow

It should behave a little differently. You should be getting stopword removal and stemming. If you don’t there’s still something wrong.

Hi Ben,

This looks promising. I’ll give this a try and will report back to you here.

Many thanks and best regards
Björn

Indeed, ATM we don’t index prefixes or suffixes so the only way to search for a prefix / suffix is by using wildcards, e.g. quater*

And in case you’re wondering, adding the wildcard automatically is not that easy because the query could be complex (using advanced syntax) and the wildcard can change the semantic of the query (what the user intended).

@mflorea and @ben.megson

our main problem is, that we didn’t even manage to compose a search query in vanilla search (without our query parser in front of the actual search) so, we also couldn’t instruct our users how to search properly to find the mentioned team.

Can you give me a hint what the user should’ve looked for as he searched for “schräge ecke” respectively after our query parser went through it, it was actually “(schräge or schräge) and (ecke or ecke)” so we can modify our query parser accordingly?

I’m afraid I don’t follow… Did you try adding a different filter factory?

We discussed this in our team and since we were not sure how far we can alter these settings and would still be able to upgrade to newer versions without an increase in manual maintenance, we didn’t do this yet.

My Teamleader wants to customize as less as possible. Is there a way to achive our search query with the vanilla search and a query parser in front of it?