How to query for pages without a specific tag

I’m trying to run a query to find pages that have NOT been tagged with a specific tag in a space. I know that to find pages WITH a tag, I can use the query string:

"select doc.title, doc.fullName from Document doc, 
doc.object(XWiki.TagClass) tag  
where ('someTag' member of tag.tags) and 
(doc.space like :space or doc.space like ':space.%') 
order by doc.fullName asc"

However, the inverse doesn’t work:

"select doc.title, doc.fullName from Document doc, 
doc.object(XWiki.TagClass) tag  
where ('someTag' not member of tag.tags) and 
(doc.space like :space or doc.space like ':space.%') 
order by doc.fullName asc"

When I run this query, it will return the following:

  • Pages that have tags that are not someTag.
  • Pages that have been tagged previously, but no longer have any tags.

The query misses:

  • Pages that have never been tagged.

So the query seems to only find pages that have had a TagClass object initialized at some point in the page’s existence.

What is the proper way to query for pages WITHOUT a specific tag?

I think you’d need to search for documents having an XWiki.TagClass xobject with a tags xproperty not containing your tag value AND for documents not having an XWiki.TagClass xobject.

Thank you @vmassol. I understand the logic, but the part I can’t figure out is how to test whether a document has an XWiki.TagClass object. I’m familiar with SQL, but am new to XWQL/JPQL and HQL. I don’t understand how to work with the entities returned by doc.object(). None of the usual SQL constructs seem to work with it (testing for null, empty, etc.), so I’m clearly misunderstanding what doc.object() is doing. There is very little even mentioned about it either on the XWiki Query Module page or in Oracle’s JPQL documentation. Is there somewhere I can go to learn more about how this works? Could you provide an example on how to find pages without an XWiki.TagClass object?

Indeed, I don’t know how to express that using XWQL. I’ve tried the following but it didn’t work:

select distinct doc.fullName from Document doc where doc.fullName not in (select distinct doc2.fullName from Document doc2, doc2.object(XWiki.TagClass) as tags)

Thanks. I tried something similar yesterday, along with about 50 other ways too. This would be trivially easy if tags were stored in the database with a many-to-many relationship with the xwikidoc table.

Perhaps another approach might work. Are the tags stored in the database at all? There isn’t a table dedicated to tags, but there is an xwikiobjects table that associates the XWiki.TagClass class name in the XWO_CLASSNAME column with documents in the XWO_NAME column. Would this be a way of determining which documents have an XWiki.TagClass object?

I tried running the following SQL directly on the database and it gets me close:

SELECT d.* FROM xwikidoc d WHERE d.XWD_FULLNAME LIKE 'Development.%' 
AND d.XWD_FULLNAME NOT IN 
(SELECT o.XWO_NAME FROM xwikiobjects o WHERE o.XWO_NAME LIKE 'Development.%' 
AND o.XWO_CLASSNAME = 'XWiki.TagClass');

I could only sample a few pages out of the 1256 rows it returned, most of them were indeed untagged. However I found 1 page that was tagged and a bunch of pages in the result set that had been deleted or renamed.

I modified my query to the following and ran it directly on the database:

SELECT d.XWD_FULLNAME FROM xwikidoc d 
WHERE d.XWD_TITLE <> '' 
AND d.XWD_FULLNAME LIKE 'Development.%' 
AND d.XWD_FULLNAME NOT LIKE '%.WebPreferences' 
AND d.XWD_FULLNAME NOT IN (SELECT o.XWO_NAME FROM xwikiobjects o 
WHERE o.XWO_NAME LIKE 'Development.%' 
AND o.XWO_CLASSNAME = 'XWiki.TagClass');

This does find pages without an XWiki.TagClass object, though I have no way of knowing if it is finding all of them yet. I’ll have to analyze some other queries to see if the total number of documents in each add up to the total number of documents in the space. It also filters out all the pages that have been deleted (title = ‘’). Now the question is how can this be done from XWQL?

It might be not possible to do with XWQL (maybe it is, idk), but with HQL that should be doable for sure.

Note that if we cannot do it with XWQL, we’ll need to open a jira issue and improve our XWQL grammar to support it.

Thanks @vmassol. I’ll dig into HQL to see if I can figure it out. I had previously opened a jira issue (now closed) thinking there was a bug with the query in my first post: https://jira.xwiki.org/browse/XWIKI-21291. It seems wrong that you can test whether something IS a member of a set, but you can’t invert it to test whether something is NOT a member of set.

I figured it out in XWQL (with some HQL in the subquery). This query will find all pages that have never been tagged, which means they don’t have an XWiki.TagClass object instance associated with them:

select doc.title, doc.fullName from Document doc where 
doc.fullName not in (select distinct obj.name from BaseObject 
as obj where obj.className = 'XWiki.TagClass') 
and (doc.fullName not like '%.WebPreferences') 
and (doc.title <> '') 
and (doc.space like :space or doc.space like ':space.%') 
order by doc.fullName asc

It also filters out pages that have been deleted, but are still in the Document table. The results match those from the SQL query run directly on the database.

By combining this with my queries that find pages without tags, but that have an associated XWiki.TagClass instance, I will have the complete picture. I’ll post the completed query once I have it working for anyone else who runs into this problem.

I haven’t yet been able to combine the two queries into one since XWQL/JPQL/HQL don’t support SQL’s UNION and other non-UNION methods have failed due to the query’s complexity requiring nested subqueries. However, for my purposes, I only need to list the results on a page, so I ran the queries separately and combined their output in a Velocity script:

{{velocity}}
  #set ($spaceReferenceString = "Development")
  #set ($untagged_query = $services.query.xwql("SELECT doc.title, doc.fullName FROM Document doc, doc.object(XWiki.TagClass) AS obj WHERE (size(obj.tags)=0) AND (doc.space LIKE :space OR doc.space LIKE ':space.%') ORDER BY doc.fullName ASC"))
  #set ($untagged_query = $untagged_query.bindValue('space').literal("${spaceReferenceString}.").anyChars().query())
  #set ($untagged = $untagged_query.execute())
  #set ($never_tagged_query = $services.query.xwql("select doc.title, doc.fullName from Document doc where doc.fullName not in (select distinct obj.name from BaseObject as obj where obj.className = 'XWiki.TagClass') and (doc.fullName not like '%.WebPreferences') and (doc.title <> '') and (doc.space like :space or doc.space like ':space.%') order by doc.fullName asc"))
  #set ($never_tagged_query = $never_tagged_query.bindValue('space').literal("${spaceReferenceString}.").anyChars().query())
  #set ($never_tagged = $never_tagged_query.execute())
  #set ($total = $untagged.size() + $never_tagged.size())
|=Total
$total|=Page Title
Page Reference
    #foreach ($item in $untagged)
        |$velocityCount |$item[0]
        [[$item[1]>>doc:$item[1]]]
    #end
    #foreach ($item in $never_tagged)
        #set ($count = $velocityCount + $untagged.size())
        |$count |$item[0]
        [[$item[1]>>doc:$item[1]]]
    #end
{{/velocity}}

This will return all pages that have never been tagged (no associated XWiki.TagClass instance), or have had all their tags deleted (length of tags list is 0). The only real downside to this method is you can’t order the entire results table. Each set of queries will be in order, but the ordering will start over at the beginning of the second query’s results in the output table.

1 Like