When should I NOT use App Engine's Full Text Search API?

So far, I've used App Engine's Full Text Search to help search through existing entities in my datastore. This involves creating at least one Document per entity, and linking the two together somehow. And every time I change the entity, I must change the corresponding Documents.

My question is, why not just store all my data in Documents and forget about Datastore entities? The search API supports a much richer query language that can handle multiple inequality filters and boolean operators, unlike the datastore.

Am I missing something about the design of the search API that would preclude using it to replace the Datastore entirely?


According to the Java docs

However, an index search can find no more than 10,000 matching documents. The App Engine Datastore may be more appropriate for applications that need to retrieve very large result sets.

Though I don't see that as a common use case.

More realistically, getting entities by key will be a lot cheaper with the Datastore (presumably faster as well). With the search API, you can either use Index.get() to find a document by ID, or duplicate the ID by storing it in a field and searching on that field.

Here's a cost breakdown:

- Index.get():     $0.10 /  10,000 or 0.00001 per get
- Index.search():  $0.13 /  10,000 or 0.000013 per get
- Datastore get(): $0.06 / 100,000 or 0.0000006 per get

As you can see, a Datastore get is much cheaper than the Search API options (16x cheaper than Index.get()).

If your data is structured in a way that makes use of a lot of direct gets and few complex searches, the Datastore will be a clear winner in terms of cost.

Note: I did not include the extra cost for storing duplicate data with the Index.search() method, since that depends on how many entities you store.

Just put the data in both - the storage is cheap and depending how much writes your app does it could be cheap to do updates as well. For easy queries and getting single entities by key - use memcache and datastore. For complex queries use search api. You'll have to make the tradeoff once pricing is announced.

right now indexing an entity in the searchdoc every time i put it and i also index a serialized version of the entity. its actually much much faster searching for documents over the search api and extracting the serialized field than getting the same amount of entities from the datastore.

Wouldn't you:

  1. lose any benefits of memcache

  2. face lower quotas. "we expect that our free quota will cover about 1,000 searches per day once the feature has graduated from experimental" I can't see the number of reads you get but I believe it's higher for datastore. I looked at https://developers.google.com/appengine/docs/quotas#Resources

    Also, for an entity update, we are charged differently by update or new put. It seems the indexes are not updated but rather added as a new document (that's what I'm doing anyway). Not having the details of index pricing, it's difficult to know exactly but perhaps updating one or two indexed values on an entity would be cheaper that putting a new whole index. It would depend on your data I guess.

    Finally, the Total Index Size for indexes is now at 250M while data is capped at 1 GB. The datastore is larger then and no word yet on additional pricing costs for the index.

  3. need to come up with a backup plan. I don't know anyway now to backup or restore the index if it got corrupted. Having the data in entities means the search index could be recreated. You can backup with the admin console for the datastore now.

In addition to performance costs for querying large sets of data, the datastore also has the advantage of allowing strongly consistent data. Take a look at this link for more information on strongly consistent vs. eventual consistent data.

It should be assumed that documents stored in the Search API indexes are eventually consistent.

Need Your Help

Java error: Found interface ... but class was expected

java guice

I am getting a strange runtime error from my code:

How to avoid merge commits from Git pull when pushing to remote

git merge pull

I have a repository and some local changes to commit.