How to best design a date/geographic proximity query on GAE?
I'm building a directory for finding athletic tournaments on GAE with web2py and a Flex front end. The user selects a location, a radius, and a maximum date from a set of choices. I have a basic version of this query implemented, but it's inefficient and slow. One way I know I can improve it is by condensing the many individual queries I'm using to assemble the objects into bulk queries. I just learned that was possible. But I'm also thinking about a more extensive redesign that utilizes memcache.
The main problem is that I can't query the datastore by location because GAE won't allow multiple numerical comparison statements (<,<=,>=,>) in one query. I'm already using one for date, and I'd need TWO to check both latitude and longitude, so it's a no go. Currently, my algorithm looks like this:
1.) Query by date and select
2.) Use destination function from geopy's distance module to find the max and min latitude and longitudes for supplied distance
3.) Loop through results and remove all with lat/lng outside max/min
4.) Loop through again and use distance function to check exact distance, because step 2 will include some areas outside the radius. Remove results outside supplied distance (is this 2/3/4 combination inefficent?)
5.) Assemble many-to-many lists and attach to objects (this is where I need to switch to bulk operations)
6.) Return to client
Here's my plan for using memcache.. let me know if I'm way out in left field on this as I have no prior experience with memcache or server caching in general.
-Keep a list in the cache filled with "geo objects" that represent all my data. These have five properties: latitude, longitude, event_id, event_type (in anticipation of expanding beyond tournaments), and start_date. This list will be sorted by date.
-Also keep a dict of pointers in the cache which represent the start and end indices in the cache for all the date ranges my app uses (next week, 2 weeks, month, 3 months, 6 months, year, 2 years).
-Have a scheduled task that updates the pointers daily at 12am.
-Add new inserts to the cache as well as the datastore; update pointers.
Using this design, the algorithm would now look like:
1.) Use pointers to slice off appropriate chunk of list based on supplied date.
2-4.) Same as above algorithm, except with geo objects
5.) Use bulk operation to select full tournaments using remaining geo objects' event_ids
6.) Assemble many-to-manys
7.) Return to client
Thoughts on this approach? Many thanks for reading and any advice you can give.
You might be interested by geohash, which enables you to do an inequality query like this:
SELECT latitude, longitude, title FROM myMarkers WHERE geohash >= :sw_geohash AND geohash <= :ne_geohash
As a note on your proposed design, don't forget that entities in Memcache have no guarantee of staying in memory, and that you can not have them "sorted by date".
GeoModel is the best I found. You may look how my GAE app return geospatial queries. For instance India http query is with optional cc (country code) using geomodel library lat=20.2095231&lon=79.560344&cc=IN