Automatically Compile Linq Queries

We've found that compiling our Linq queries is much, much faster than them having to compile each time, so we would like to start using compiled queries. The problem is that it makes code harder to read, because the actual syntax of the query is off in some other file, away from where it's being used.

It occurred to me that it might be possible to write a method (or extension method) that uses reflection to determine what queries are being passed in and cache the compiled versions automatically for use in the future.

var foo = (from f in db.Foo where f.ix == bar select f).Cached();

Cached() would have to reflect the query object passed in and determine the table(s) selected on and the parameter types for the query. Obviously, reflection is a bit slow, so it might be better to use names for the cache object (but you'd still have to use reflection the first time to compile the query).

var foo = (from f in db.Foo where f.ix == bar select f).Cached("Foo.ix");

Does anyone have any experience with doing this, or know if it's even possible?

UPDATE: For those who have not seen it, you can compile LINQ queries to SQL with the following code:

public static class MyCompiledQueries
{
    public static Func<DataContext, int, IQueryable<Foo>> getFoo =
        CompiledQuery.Compile(
            (DataContext db, int ixFoo) => (from f in db.Foo
                                            where f.ix == ixFoo
                                            select f)
        );
}

What I am trying to do is have a cache of these Func<> objects that I can call into after automatically compiling the query the first time around.

Answers


You can't have extension methods invoked on anonymous lambda expressions, so you'll want to use a Cache class. In order to properly cache a query you'll also need to 'lift' any parameters (including your DataContext) into parameters for your lambda expression. This results in very verbose usage like:

var results = QueryCache.Cache((MyModelDataContext db) => 
    from x in db.Foo where !x.IsDisabled select x);

In order to clean that up, we can instantiate a QueryCache on a per-context basis if we make it non-static:

public class FooRepository
{
    readonly QueryCache<MyModelDataContext> q = 
        new QueryCache<MyModelDataContext>(new MyModelDataContext());
}

Then we can write a Cache method that will enable us to write the following:

var results = q.Cache(db => from x in db.Foo where !x.IsDisabled select x);

Any arguments in your query will also need to be lifted:

var results = q.Cache((db, bar) => 
    from x in db.Foo where x.id != bar select x, localBarValue);

Here's the QueryCache implementation I mocked up:

public class QueryCache<TContext> where TContext : DataContext
{
    private readonly TContext db;
    public QueryCache(TContext db)
    {
        this.db = db;
    }

    private static readonly Dictionary<string, Delegate> cache = new Dictionary<string, Delegate>();

    public IQueryable<T> Cache<T>(Expression<Func<TContext, IQueryable<T>>> q)
    {
        string key = q.ToString();
        Delegate result;
        lock (cache) if (!cache.TryGetValue(key, out result))
        {
            result = cache[key] = CompiledQuery.Compile(q);
        }
        return ((Func<TContext, IQueryable<T>>)result)(db);
    }

    public IQueryable<T> Cache<T, TArg1>(Expression<Func<TContext, TArg1, IQueryable<T>>> q, TArg1 param1)
    {
        string key = q.ToString();
        Delegate result;
        lock (cache) if (!cache.TryGetValue(key, out result))
        {
            result = cache[key] = CompiledQuery.Compile(q);
        }
        return ((Func<TContext, TArg1, IQueryable<T>>)result)(db, param1);
    }

    public IQueryable<T> Cache<T, TArg1, TArg2>(Expression<Func<TContext, TArg1, TArg2, IQueryable<T>>> q, TArg1 param1, TArg2 param2)
    {
        string key = q.ToString();
        Delegate result;
        lock (cache) if (!cache.TryGetValue(key, out result))
        {
            result = cache[key] = CompiledQuery.Compile(q);
        }
        return ((Func<TContext, TArg1, TArg2, IQueryable<T>>)result)(db, param1, param2);
    }
}

This can be extended to support more arguments. The great bit is that by passing the parameter values into the Cache method itself, you get implicit typing for the lambda expression.

EDIT: Note that you cannot apply new operators to the compiled queries.. Specifically you cannot do something like this:

var allresults = q.Cache(db => from f in db.Foo select f);
var page = allresults.Skip(currentPage * pageSize).Take(pageSize);

So if you plan on paging a query, you need to do it in the compile operation instead of doing it later. This is necessary not only to avoid an exception, but also in keeping with the whole point of Skip/Take (to avoid returning all rows from the database). This pattern would work:

public IQueryable<Foo> GetFooPaged(int currentPage, int pageSize)
{
    return q.Cache((db, cur, size) => (from f in db.Foo select f)
        .Skip(cur*size).Take(size), currentPage, pageSize);
}

Another approach to paging would be to return a Func:

public Func<int, int, IQueryable<Foo>> GetPageableFoo()
{
    return (cur, size) => q.Cache((db, c, s) => (from f in db.foo select f)
        .Skip(c*s).Take(s), c, s);
}

This pattern is used like:

var results = GetPageableFoo()(currentPage, pageSize);

Since nobody is attempting, I'll give it a shot. Maybe we can both work this out somehow. Here is my attempt at this.

I set this up using a dictionary, I am also not using DataContext although this is trivial i believe.

public static class CompiledExtensions
    {
        private static Dictionary<string, object> _dictionary = new Dictionary<string, object>();

        public static IEnumerable<TResult> Cache<TArg, TResult>(this IEnumerable<TArg> list, string name, Expression<Func<IEnumerable<TArg>, IEnumerable<TResult>>> expression)
        {
            Func<IEnumerable<TArg>,IEnumerable<TResult>> _pointer;

            if (_dictionary.ContainsKey(name))
            {
                _pointer = _dictionary[name] as Func<IEnumerable<TArg>, IEnumerable<TResult>>;
            }
            else
            {
                _pointer = expression.Compile();
                _dictionary.Add(name, _pointer as object);
            }

            IEnumerable<TResult> result;
            result = _pointer(list);

            return result;
        }
    }

now this allows me to do this

  List<string> list = typeof(string).GetMethods().Select(x => x.Name).ToList();

  IEnumerable<string> results = list.Cache("To",x => x.Where( y => y.Contains("To")));
  IEnumerable<string> cachedResult = list.Cache("To", x => x.Where(y => y.Contains("To")));
  IEnumerable<string> anotherCachedResult = list.Cache("To", x => from item in x where item.Contains("To") select item);

looking forward to some discussion about this, to further develop this idea.


For future posterity : .NET Framework 4.5 will do this by default (according to a slide in a presentation I just watched).


Need Your Help

Are LinkedBlockingQueue's insert and remove methods thread safe?

java multithreading concurrency synchronization

I'm using LinkedBlockingQueue between two different threads. One thread adds data via add, while the other thread receives data via take.

Is there way to track progress on a mclapply?

r progress-bar mclapply

I love the setting .progress = 'text' in plyr's llply. However, it causes my much anxiety to not know how far along an mclapply (from package multicore) is since list items are sent to various core...