Optimise message deduplication

Multiple options (could be combined):

Store the last seen Message ID, UID, and UIDVALIDITY to detect if there have been new mails (as outlined by RFC3501). If the UIDVALIDITY has not changed, any mails with a lower UID than was last seen should be disregarded as already processed. That would ditch the requirement for keeping a log of all UIDs we have ever processed (in combination with keeping the Message IDs)
Store message IDs along with the recipients for which that mail has been delivered. Further more anonymisation / pseudonymisation could be achieved by hashing the Message ID and recipients address separately.
Come up with a better performing storage scheme than we have right now. Both the UID and Message-ID File will grow indefinitely increasing lookup-times. A storage schema like so could (maybe, I am not an expert in this) be better performing:

-<Message-ID Domain or Fallback (possibly hashed)>/
--<first char of Non-Domain part of the Message ID (possibly hashed)>/
---<Non-Domain part of the Message ID (possibly hashed)>

The <Non-Domain part of the Message ID (possibly hashed)> file would either contain, separated by new lines, all recipients (possibly hashed) for which this message has been processed. Or, if a balance between file-count and entries per file should be achieved, the (hashed) Message-ID, and separated by e.g. a comma, the (hashed) recipient. Multiple lines for the same Message-ID are allowed to exist in one file, but each combination of Message-ID and recipient may not exist more than once.

A FS structure could look like this:

-<Message-ID Domain or Fallback (possibly hashed)>/
--<first two characters of Non-Domain part of the Message ID (possibly hashed)>

The message-id Domain could of course also be tiered.