Email database design (schema)

Currently we are developing quite a big application which will have to work with some huge amounts of records.

The idea is that the e-mails will have to be stored (with attachments) and via a web-api users should be able to search in their stored e-mails. Users should be able to search (within their own messages they have exported into the database/storage) on at least the following items:

  • from
  • to
  • subject
  • date (range)
  • attachments (names & types only)
  • message contents
  • (optional) mailbox / folder structure

The application should be able to work with big numbers of users and extreme numbers of e-mails (easily growing from millions to billions). The users should be able to download the whole originals message (with attachments) so they can import it into their email client.

I was thinking about indexing the e-mails into a database, and just storing the full e-mail with attachments with a unique key as a package into a seperate storage. With this way I should keep the database load as low as possible and therefore the search as quick as possible.

I have found several database schemas for handling e-mail like this. I couldn't find any database that is able to handle with hundreds of millions and maybe even billions of records (e-mails).

Is this the most efficient way to keep it simple, efficient and fast or am I forgetting anything?

// edit The idea is to run this on the amazon cloud (perhaps any suggestions related to it?)

Answers


You can use mongoDB database for this amount of data. Here is detail of mongoDb.http://www.mongodb.org/

In mongoDb mysql table is called as collections and row as document.

Mongo store data in JSON based object format.

one possible way to make db schema here.

from : string
to : string
subject: string
date (range): datetime
attachments (names & types only) : Object Array
message contents : string
(optional) mailbox / folder structure: string

for example:
from: from@gmail.com
to: to@gmail.com
subject: "test subject"
date: "current date",
attachments: {
 [0]=>{
   names: "attachments1",
   types: "text"
},
[1]=>{
  names: "attachments2",
   types: "pdf"
}
}

You don't want to store this sort of information in an RDBMS. Rather, you want to extend something like lucene. For email, solr has an email indexer. Hope that helps...


Need Your Help

Setting up cakePHP on shared host and remove /app/ in URL

.htaccess mod-rewrite cakephp

I've installed cakePHP on a shared server where I can't set the Apache webroot, so it's currently pointing to /public_html. In that folder I have /cake, /app, /vendors, etc. Everything works fine; ...

How to find the distance between two CG points?

iphone cocoa-touch

When we do multitouch with two fingers in a UIScrollView, we get two CG points. I want to find the distance between them. Then when again we do the pinch(inside or outside), Then we will again get ...