How To Identify Email Belongs to Existing Thread or Conversation
We have an internal .NET case management application that automatically creates a new case from an email. I want to be able to identify other emails that are related to the original email so we can prevent duplicate cases from being created.
I have observed that many, but not all, emails have a thread-index header that looks useful.
Does anybody know of a straightforward algorithm or package that we could use?
Use the JWZ threading algorithm.
As far as I know, there's not going to be a 100% foolproof solution, as not all email clients or gateways preserve or respect all headers.
However, you'll get a pretty high hit rate with the following:
Every email message should have a unique "Message-ID" field. Find this, and keep a record of it as a part of the case. (See RFC-822)
If you receive two messages with the same Message-ID, discard the second one as it's a duplicate.
Check for the "In-Reply-To" field, if the ID shown matches a known Message-ID then you know the email is related.
The "References" and "Original-Message-ID" headers have similar meanings.
If your system ever generates emails, include a CaseID# in the subject line in a way that you can search for it if you get an email back (eg: [Case#20081114-01]); most people don't edit subject lines when replying.
Given that there will always be messages that are missed (for whatever reason), you'll also probably want related features in your case management system - say, "Close as Duplicate Case" or "Merge with Duplicate Case", along with tools to make it easier to find duplicates.