Many attachments appear to be duplicated on the disk

In april an article with a lot of bigger mp4 files as attachments was created. Then in june a parent article was renamed. Suddenly we had an increased size of the attachment folder on disk. It increased the size of about the amount of all mp4 files of this article. BUT this article wasn’t really moved to the new path but stayed on the old one. (We could see this as all parents of this article didn’t exist anymore.)

I tried to investigate:

du -ab /data/xwiki/data/store/file/ | grep -v '\s/[^.]*$' | sort -rh

This was part of the output (I inserted a newline for all different file sizes):

396587201	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/9/c/8b87561d8a7da0feb31900d693215e/fv1.1.mp4
396587201	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/9/c/8b87561d8a7da0feb31900d693215e/f.mp4

388466341	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/4/4/dbf1179719c5ad1cc2667404308fab/fv1.1.mp4
388466341	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/4/4/dbf1179719c5ad1cc2667404308fab/f.mp4
388466341	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/4/4/dbf1179719c5ad1cc2667404308fab/fv1.1.mp4
388466341	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/4/4/dbf1179719c5ad1cc2667404308fab/f.mp4

384606891	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/6/6/c08a57bd529dc4a53abff283ebb192/fv1.1.mp4
384606891	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/6/6/c08a57bd529dc4a53abff283ebb192/f.mp4

365238479	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/7/1/574022a02ffbdb6067f126cafca323/fv1.1.mp4
365238479	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/7/1/574022a02ffbdb6067f126cafca323/f.mp4
365238479	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/7/1/574022a02ffbdb6067f126cafca323/fv1.1.mp4
365238479	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/7/1/574022a02ffbdb6067f126cafca323/f.mp4

363736390	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/e/e/ee7c1c4372983672870b4803910480/fv1.1.mp4
363736390	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/e/e/ee7c1c4372983672870b4803910480/f.mp4
363736390	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/e/e/ee7c1c4372983672870b4803910480/fv1.1.mp4
363736390	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/e/e/ee7c1c4372983672870b4803910480/f.mp4

354608928	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/9/1/5c4de92f0cd681f2d0a87e014901bb/fv1.1.mp4
354608928	/data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/attachments/9/1/5c4de92f0cd681f2d0a87e014901bb/f.mp4
354608928	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/9/1/5c4de92f0cd681f2d0a87e014901bb/fv1.1.mp4
354608928	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/9/1/5c4de92f0cd681f2d0a87e014901bb/f.mp4

326592448	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/6/6/ab0a160e262a8856c5853fadb16651/fv1.1.mp4
326592448	/data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/attachments/6/6/ab0a160e262a8856c5853fadb16651/f.mp4

Two things are interesting. First for every file there is a f.mp4 and a fv1.1.mp4 with the same size (looks like versioning). I tried to check if the second one is only symlinked but that doesn’t seemed to be so. So is it correct that there is a duplicate of every file because of versioning?

Second: a lot of file sizes existing four times with f.mp4 and a fv1.1.mp4 too. So we have very often a duplicate from /data/xwiki/data/store/file/wiki/3/3/abf25f8f1606fdea1da10d168f190d/ in /data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/. As I looked onto creation dates it matched april for the first folder and the exact date and time of the renaming process for the second folder.

But I can find neither a duplicated article nor duplicated mp4 files in the wiki itself.

Any ideas? Regards, Simpel

PS: As far as I can remember, the storage space for attachments was also full after renaming the parent article. Maybe there wasn’t enough space for all the videos?

Hi,

we recently fixed an old issue related to attachments being saved twice because of versioning, see: Loading...

Basically the issue is that there was always a full file for the latest version, and not just a symlink, so the latest version was basically always duplicated. You can find more details in the issue.
The first point you mentioned is most likely this issue, for the second one I’m not sure.

Hope that helps.

Ok so first point is clear (as we use 13.10.6 today).

Is there a possibility to find the related renamed (duplicated) article (which I can’t find in the wiki)? Maybe this article didn’t exist due to little storage for all the attachments but the attachments were first duplicated to prepare the article renaming?

Would be nice to know if it’s safe to delete them.

I don’t remember a bug like this (leftover attachment files during a rename). Is it something you can still reproduce (I tried quickly in 15.10.9, and I cannot reproduce) ?

If you want to be sure the folder you are looking at is corresponding to the reference of the page you have in mind, you can try the tool in https://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Store/Filesystem (you can get the exact document reference from the Information page below the page). If that page does not exist anymore, then you can safely remove those files.

It could be a rename with running into the error “disk is full”. Maybe it’s was an interupted renaming?

I used the tool you provided. Renaming started from:

  • Anwendungen.SAP.(D)einSAP.(D)einSAP - Controlling.Medienobjektcontrolling (MOC).Schulungsunterlagen (MOC Hilfe).Schulungsvideos.WebHome
    • /3/3/abf25f8f1606fdea1da10d168f190d/

to

  • Anwendungen.SAP.DeinSAP.(D)einSAP - Controlling.Medienobjektcontrolling (MOC).Schulungsunterlagen (MOC Hilfe).Schulungsvideos.WebHome
    • /3/8/bedce67295e2cce2b54a68116e0cd6/

You can see the removed paranthesis. When I replace the “.” with “/” and paste this in the adressbar (ofc full url) then I can see that this article isn’t existing.

With other words: it should be safe to remove the whole folder /data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/?

Simpel

PS: du -h /data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6 returns 8.5 GB (of about 100 GB in total) so it’s worth the research.

Not sure I understand, didn’t you say this was the new location of the page after the rename ?

Depends what’s in it, if there is no existing page corresponding to it, and you only have attachments then sure, but if you have deleted attachments or documents then it would be better to remove it from the deleted attachment/document index UI.

It should be the new location. Some parent was renamed. All articles moved but not this page.

OK, I thought your problem was attachment still stored at the old location after the rename.

In theory, failed rename is supposed to cancel this kind of stuff too but would need to reproduce to be sure of what exactly leaded to this situation.

ls /data/xwiki/data/store/file/wiki/3/8/bedce67295e2cce2b54a68116e0cd6/ is returning attachments only.