I think answer you tell would work.
And I tested org/apache/lucene/lucene-analysis-smartcn/9.8.0 of lucene, it work well for Chinese tokenizer too.
@tmortagne Could Xwiki officially integrate the function in future Xwiki.
just insert the smartcn jar in …/solr/search_9/lib/ and add some setting in managed-schema.xml, like:
<!-- smartcn tokenizer -->
<dynamicField name="*_zh" type="text_smartcn" indexed="true" stored="true" multiValued="true" />
<dynamicField name="*_zh_CN" type="text_smartcn" indexed="true" stored="true" multiValued="true" />
<dynamicField name="*_zh_TW" type="text_smartcn" indexed="true" stored="true" multiValued="true" />
<!-- smartcn tokenizer -->
<fieldType name="text_smartcn" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
More infrormation for pepple in stuck of chinese tokenizer:
Xwiki 16.4.0 use Solr 9.4.1 and Lucene 9.8.0