mailbox spring cleaning with notmuch
After many years of switching between email systems, service providers, backup and sychronisation strategies, mail folders layouts, protocols and inadvertent replications, i ended up with a sizeable amount of local email folders containing, i was quite sure, lots of duplicates. My current email handling systems of choice, Gnus with nnml and notmuch, are pretty good at hiding them from me in normal operation, but i know those dups are there, lurking in the dark, and periodically look for a way to uncover and kill them.
I know about the sophisticated mail-deduplicate utility, but I guess it's too sophisticated for me :) Turns out the solution was in front of my eyes: notmuch knows about duplicates, and can list them easily:
notmuch search --duplicate=2 --output=files '*'`
That command will list the full path to the second file in duplicate pairs found by notmuch. So, getting rid of them is as easy as
for f in `notmuch search --duplicate=2 --output=files '*'``; do rm $f; done notmuch new
After doing that (note we rebuild notmuch's database at the end, searching again for level-2 duplicates might discover new files if you have emails duplicated more than twice (i had). Rinse and repeat.