Content from new documents cannot be searched (Solr)

Hi,

I’ve noticed that searching in our XWiki instance became really unreliable recently. To be honest, I’m not sure was it ever working as expected, because our documentation suite was small enough to navigate without using search.

What do I mean by unreliable? I have tried multiple different searches and I see that sometimes term in document content is found just fine, other times not at all - e.g. I have created a new document with “conversation” (typo on purpose) word in it, and search never find it when I look for it. What is interesting, it works just fine when I switch Database as the Search Engine.

I’m running XWiki 14.1 (but had the same issue with 13.10.1 and previous versions) using Docker (using 14.1.0-mysql-tomcat to be precise).

Moreover, I have explored this forum looking for advice and here is what I have tried so far:

  • stopped XWiki, removed folder with Solr data ( /usr/local/xwiki/data/store/solr - found it in Solr related logs in XWiki logs) - folder is recreated but
  • used “Delete from Index”, “Add to Index” and “Reindex” on “Entire farm” in “Search” configuration panel - I see that thousands of documents are indexed but search is unreliable (sometime it works, other times it doesn’t work)

Any other ideas about what I should try? I have read Solr Search Application documentation, but I didn’t find I should try there.

I have restarted XWiki and there is one error that fails Solr operation:

2022-03-16 20:50:18,495 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@422460f([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - 0 documents added, 0 deleted and 0 updated during the synchronization of the Solr index. 
2022-03-16 20:50:18,496 [solr/indexer job group daemon thread - org.xwiki.search.solr.internal.job.IndexerJob@422460f([solr, indexer])] INFO  o.x.s.s.i.j.IndexerJob         - Finished job of type [solr.indexer] with identifier [[solr, indexer]] 
2022-03-16 20:52:09,325 [XWiki Solr index thread] WARN  o.a.p.u.XMLHelper              - SAX Feature unsupported [log suppressed for 5 minutes]http://javax.xml.XMLConstants/property/accessExternalSchema 
java.lang.IllegalArgumentException: Property 'http://javax.xml.XMLConstants/property/accessExternalSchema' is not recognized.
	at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.setAttribute(Unknown Source)
	at org.apache.poi.util.XMLHelper.trySet(XMLHelper.java:284)
	at org.apache.poi.util.XMLHelper.getDocumentBuilderFactory(XMLHelper.java:114)
	at org.apache.poi.util.XMLHelper.<clinit>(XMLHelper.java:85)
	at org.apache.poi.ooxml.util.DocumentHelper.newDocumentBuilder(DocumentHelper.java:47)
	at org.apache.poi.ooxml.util.DocumentHelper.<clinit>(DocumentHelper.java:36)
	at org.apache.poi.openxml4j.opc.internal.ContentTypeManager.parseContentTypesFile(ContentTypeManager.java:393)
	at org.apache.poi.openxml4j.opc.internal.ContentTypeManager.<init>(ContentTypeManager.java:102)
	at org.apache.poi.openxml4j.opc.internal.ZipContentTypeManager.<init>(ZipContentTypeManager.java:53)
	at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:282)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:743)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:315)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185)
	at org.apache.tika.Tika.parseToString(Tika.java:525)
	at org.apache.tika.Tika.parseToString(Tika.java:495)
	at org.xwiki.tika.internal.TikaUtils.parseToString(TikaUtils.java:153)
	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:528)
	at org.xwiki.search.solr.internal.metadata.DocumentSolrMetadataExtractor.setAttachment(DocumentSolrMetadataExtractor.java:281)
	at org.xwiki.search.solr.internal.metadata.DocumentSolrMetadataExtractor.setAttachments(DocumentSolrMetadataExtractor.java:261)
	at org.xwiki.search.solr.internal.metadata.DocumentSolrMetadataExtractor.setExtras(DocumentSolrMetadataExtractor.java:187)
	at org.xwiki.search.solr.internal.metadata.DocumentSolrMetadataExtractor.setFieldsInternal(DocumentSolrMetadataExtractor.java:135)
	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:151)
	at org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:499)
	at org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:408)
	at org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:376)
	at java.base/java.lang.Thread.run(Thread.java:829)

but I have no idea what does it mean.

Hi, we were experiencing same problems and finally decided to switch to database search, because SOLR was obviously ignoring some content and no reindexing helped. We are using wiki in paralel of Czech and English, but these troubles seemed to affect also pages in English, with no “special” characters.

1 Like