Datastore vs Cloud SQL in Google App Engine
I want to build an application that will serve a lot of people (more than 2 million) so I think that I should use Google Cloud Datastore. However I also know that there is an option to use Google Cloud SQL and still serve a lot of people using mySQL (like what Facebook and Youtube do). Is this a correct assumption to use Datastore rather that the relational Cloud SQL with this many users? Thank you in advance
It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.
To give an intelligent answer, I would need to know a lot more about your app. But... I'll outline the biggest gotchas I've found...
Google Datastore is effectively a distributed hierarchical data store. To get the scalability they wanted there had to be some compromises. As a developer you will find that these are anywhere from easy to work around, difficult to work around, or impossible to work around. The latter is far more likely than you would ever assume.
If you are accustomed to relational databases and the ability to manipulate data across multiple tables within the same transaction, you are likely to pull your hair out with datastore. The biggest(?) gotcha is that transactions are only supported across a limited number of entity groups (5 at the current time). To give a simple example, say you had a simple parent-child relationship and you needed to update child records under more than 5 parents at the same time within a transaction... can't be done (yes, really). If you reorganize your data structures and try to put all of the former child records under a single entity so they can be updated in a single transaction, you will come across another limitation... the fact that you can't reliably update the same entity group more than once per second (yes, really). And if you query an entity type across parents without specifying the root entity of each, you will get what is euphemistically referred to as "eventual consistency"... which means it isn't (yes, really).
The above is all in Google's documentation, but you are likely to gloss over it if you are just getting started (of course it can handle it!).
It depends on what you mean by 'a lot of people', what sort of data you have, and what you want to do with it.
Cloud SQL is designed for applications that need a SQL database, which can handle any query you can write in SQL, and ensures your data is always in a consistent state.
Cloud SQL can serve up to 3200 concurrent queries, depending on the tier. If the queries are simple and can be served from RAM they should take just a few ms, and assuming your users issue about 1 request per second, then it could support tens of thousands of simultaneously active users. If, however, they are doing more complex queries like searches, or writing a lot of data, then it will be less.
If you have a simple set of queries, are less concerned about immediate consistency, or expect much more traffic, then you should look at datastore.