LINQ to SQL Tricks: Building Efficient Queries that Include Reference Data or Child Entities


Problem Summary

Some times you need to read entities from a table along with some properties that are reference data or child entities. This relationship is usually represented by foreign keys in your database. LINQ to SQL makes this job extremely easy but, by default, also very inefficient. There are multiple ways this inefficiency can be addressed. Some of these alternatives are standard facilities provided by LINQ to SQL and some are custom coded patterns and tricks. This article walks your through these solutions, explaining their pros and cons and helping you decide which one is right for you.

Default Behavior

Consider the following simple subset from the AdventureWorksLT database:

AdventureWorksCustomerAddress

Lets say your job is to select and display all the customer addresses in the database (we will limit results to first 3 to simplify output), with the following fields: First Name, Last Name, City, State, and Modified Date. You have your Data Access Layer (DAL) to get the data and your driver code to print the results. Something like this:

using(AdventureWorksDataContext ctx = new AdventureWorksDataContext())
{
IEnumerable<CustomerAddress> q = BuildQuery(ctx);

foreach (var ca in q)
{
Console.Out.WriteLine(“{0}, {1} lives in {2}, {3} as of {4}”,
ca.Customer.LastName,
ca.Customer.FirstName,
ca.Address.City,
ca.Address.StateProvince,
ca.ModifiedDate);
}
}

private static IEnumerable<CustomerAddress> BuildQuery(
    AdventureWorksDataContext ctx)
{
    var q = (from ca in ctx.CustomerAddresses
             select ca).Take(3);
    return q;
}

Very simple but also very inefficient. If you turn on the logger (ctx.Log = Console.Out;) you can see actual SQL statements generated and issued during program execution:

SELECT TOP (3) [t0].[CustomerID], [t0].[AddressID], [t0].[AddressType], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[CustomerAddress] AS [t0]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson], [t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash], [t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [29485]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[AddressID], [t0].[AddressLine1], [t0].[AddressLine2], [t0].[City], [t0].[StateProvince], [t0].[CountryRegion], [t0].[PostalCode], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Address] AS [t0]
WHERE [t0].[AddressID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [1086]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson], [t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash], [t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [29486]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[AddressID], [t0].[AddressLine1], [t0].[AddressLine2], [t0].[City], [t0].[StateProvince], [t0].[CountryRegion], [t0].[PostalCode], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Address] AS [t0]
WHERE [t0].[AddressID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [621]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson], [t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash], [t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [29489]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

SELECT [t0].[AddressID], [t0].[AddressLine1], [t0].[AddressLine2], [t0].[City], [t0].[StateProvince], [t0].[CountryRegion], [t0].[PostalCode], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Address] AS [t0]
WHERE [t0].[AddressID] = @p0
-- @p0: Input Int (Size = -1; Prec = 0; Scale = 0) [1069]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

Now you can see why it is inefficient. The first query reads top 3 rows from CustomerAddress table as we would expect. What is not expected is that it doesn’t try to make joins to Customer and Address tables, as you would, if you write SQL query yourself. This is because LINQ to SQL does not know about your intention to access child entities upfront. Instead, it issues two separate queries on-demand for EACH row in CustomerAddress table, one to fetch the Customer data, and one to fetch the Address. For a 3 row recordset we got 7(!) queries total. For a recordset of 1000 rows that would be 2001(!) queries. We clearly need to address this.

There are many ways to solve this and we are going to examine the following three solutions:

  1. Using the LoadWith() option provided by LINQ to SQL
  2. Hinting LINQ to SQL to use joins by explicitly including child data in the compound object returned from the LINQ query
  3. Hinting LINQ to SQL to use joins by using intermediate anonymous compound objects while still returning “clean” LINQ to SQL entity (CustomerAddress class in our case)

The last solution is by far my favorite when it comes to DAL API, so, please, bear with me.

LoadWith() option

Using the LoadWith() data option we can hint LINQ to SQL that every time entity X is being requested its child entity Y should be requested as well. Since now LINQ to SQL “knows” upfront we are interested in child entity (entities) it can optimize the query using joins to retrieve all the data in one shot. In fact, this is how our modified code may look like:

private static IEnumerable<CustomerAddress> BuildQuery(
    AdventureWorksDataContext ctx)
{
    // NEW CODE - LoadWith options
    var dataOptions = new System.Data.Linq.DataLoadOptions();
    dataOptions.LoadWith<CustomerAddress>(ca => ca.Address);
    dataOptions.LoadWith<CustomerAddress>(ca => ca.Customer);
    ctx.LoadOptions = dataOptions;
    // END NEW CODE - LoadWith options

    var q = (from ca in ctx.CustomerAddresses
             select ca).Take(3);
    return q;
}

Now, if we run our driver code with logging we see the following SINGLE(!) SQL query is generated:

SELECT TOP (3) [t0].[CustomerID], [t0].[AddressID], [t0].[AddressType], [t0].[rowguid], [t0].[ModifiedDate], [t1].[AddressID] AS [AddressID2], [t1].[AddressLine1], [t1].[AddressLine2],
[t1].[City], [t1].[StateProvince], [t1].[CountryRegion], [t1].[PostalCode], [t1].[rowguid] AS [rowguid2], [t1].[ModifiedDate] AS [ModifiedDate2], [t2].[CustomerID] AS [CustomerID2], [t2].[NameStyle], [t2].[Title], [t2].[FirstName], [t2].[MiddleName], [t2].[LastName], [t2].[Suffix], [t2].[CompanyName], [t2].[SalesPerson], [t2].[EmailAddress], [t2].[Phone], [t2].[PasswordHash], [t2].[PasswordSalt], [t2].[rowguid] AS [rowguid3], [t2].[ModifiedDate] AS [ModifiedDate3]
FROM [SalesLT].[CustomerAddress] AS [t0]
INNER JOIN [SalesLT].[Address] AS [t1] ON [t1].[AddressID] = [t0].[AddressID]
INNER JOIN [SalesLT].[Customer] AS [t2] ON [t2].[CustomerID] = [t0].[CustomerID]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.21006.1

We got the perfect query using the proper joins. Why do we need to look into alternative solutions? Well, the LoadWith() option has several limitations that may prevent us from using it in some scenarios:

  • LoadWith() option has a global scope and can’t be applied selectively to specific queries. Once applied, ALL the queries requesting the entity will automatically include joins to pull registered child entities. This is not desired in many cases since we want to keep unrelated queries lightweight and free of “LoadWith()” joins. Also, once applied LoadWith() option can not be revoked.
  • Since LoadWith() option applies to the data context it needs to be passed around to various classes that construct the queries. This is a “smell” since in most cases we would prefer to deal with data context at the higher levels only.
  • Setting load options is not allowed after results have been returned from query involving base or child entity. For example the following [simplified] code throws InvalidOperationException:
using (AdventureWorksDataContext ctx = new AdventureWorksDataContext())
{
    var q2 = from ca in ctx.Customers
             select ca;
    q2.ToArray();   // this forces query execution

    // this will throw if using LoadWith()
    IEnumerable<CustomerAddress> q = BuildQuery(ctx);
}

Using compound object (projection) to hint LINQ to SQL into an efficient join

Take a look at the following changes to the code:

using (AdventureWorksDataContext ctx = new AdventureWorksDataContext())
{
    var q = (from ca in ctx.CustomerAddresses
             select new {
                 CustomerAddress = ca,
                 ca.Customer,
                 ca.Address
             }).Take(3);

    foreach (var item in q)
    {
        Console.Out.WriteLine("{0}, {1} lives in {2}, {3} as of {4}",
            item.CustomerAddress.Customer.LastName,
            item.CustomerAddress.Customer.FirstName,
            item.CustomerAddress.Address.City,
            item.CustomerAddress.Address.StateProvince,            item.CustomerAddress.ModifiedDate);
    }
}

The main thing that is different here from our original solution is the use of anonymous type in the select statement. In this type we explicitly mention child entities, so LINQ to SQL optimize the select query to include them upfront with JOIN. In fact, if we run this with logging turned on, we will see exact same single statement that was generated when using LoadWith() options.

This solution does not rely on any global settings like LoadWith(), which eliminates unwanted side effects and much easier to manage in a component/layered architecture.  The obvious drawback of this method is using anonymous type. For example, we aren’t able to simply return resulting IEnumerable from our Data Access Layer. We can replace the anonymous type with a composite class, but would it be nice if we can simply return IEnumerable<CustomerAddress> where access to child properties won’t cause additional database roundtrips?

The perfect solution – the best of both worlds

Here is solution I like the best and I use it a lot in my DAL code:

private static IEnumerable<CustomerAddress> BuildQuery(
    AdventureWorksDataContext ctx)
{
    var q = (from ca in ctx.CustomerAddresses
             select new
             {
                 CustomerAddress = ca,
                 ca.Customer,
                 ca.Address
             }).Take(3)
             .AsEnumerable()
             .Select(ca => ca.CustomerAddress);
    return q;
}

Here is how this works. The select part of it starts exactly the same as the one we used in the “anonymous type” solution. Using anonymous composite type forces the JOINs. Then there is “.AsEnumerable()” – very important part of the solution that I explain later. The last part (.Select(ca => ca.CustomerAddress)) simply “cleanses” anonymous type and “converts” it back to “clean” CustomerAddress class that we want to use/expose. If we don’t use AsEnumerable() before cleansing the whole expression will be evaluated as IQueryable by LINQ to SQL implementation and cleansing will “optimize away” the composite anonymous type benefits, and LINQ to SQL will no longer assume you need child properties, and will NOT generate JOINs. Injecting .AsEnumerable() takes control away from LINQ to SQL at the time when JOINs are assumed, and gives control to LINQ to Objects for cleansing.

The end result is the best of both world: single efficient query that uses JOINs and clean DAL API (IEnumerable<CustomerAddress>).

10 comments
  1. This is good info and a good solution. I’m usually going the anonymous type route and I haven’t seen this problem yet, but I’ll keep this in mind.

    Funny that I never heard of the DataLoadOptions class and now that I know about it my only plan is to avoid using it.

  2. AndreiZ said:

    Is it possible by using this last perfect solution to load child properties which are collections/EntitySet ? For example if CustomerAddress has multiple records for one Customer and we want to load “parent” Customer object together with collection of CustomerAddresses.
    If to use LoadWith approach this works as:
    DataLoadOptions options = new DataLoadOptions();
    db.LoadOptions = options;
    options.LoadWith((Customer c) => c.CustomerAddresses);

  3. Melle said:

    I struggled with DataLoadOptions many times, and did not find a satisfying solution.
    Your trick absolutely rocks! I will try to generalize and embed it into my apps.
    Thank you!

  4. cottsak said:

    LoadWith() is not a smell at all. You should not be passing your DataContext around for multiple jobs- it’s designed to be lightweight for single jobs (UoW) or small tasks only. If you had designed with this in mind you could use different LoadWith() configurations all over the place and it would not be “global”.

  5. sgorozco said:

    Great tip!! Thanks! =)

  6. You, my friend, are awesome. Thanks a lot for writing this article.

  7. Vincent Ottens said:

    Good explanation, but what should i do when i want to load a sub collection of CustomerAddress?

Leave a comment