Showing posts with label Entity Framework. Show all posts
Showing posts with label Entity Framework. Show all posts

Sunday, August 2, 2009

More on Fluent Interface Pattern for Composing Entity Framework Queries

Recap of Last Week

In last week's post I went over an Entity Framework pattern that I’ve been using that’s saved me quite a bit of time in my Repository classes.  The basic idea is that most Entity Framework queries can be broken down into 3 pieces as shown in the diagram below.

image The interesting part emerges when we realize that an EF query is just an object of type ObjectQuery, and that we can create methods that take an ObjectQuery as a parameter, do their processing, and then pass back a return value of type ObjectQuery. This means that we can chain the methods together, fluent interface style, to create queries that look like this:

List<BlogPost> postSet = GetBlogPostSet().SortBy(“Date”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetAll();

Shaping Data With Projection Methods

This week I want to take a closer look at the Projection Method and how we can use it to shape our return data.  So let’s start out by looking at a pretty standard query that returns a list of BlogPost entities.

List<BlogPost> posts = GetAllBlogPosts().SortBy(sortOption).GetList();

This query contains each of the three components described above, a Filter method (GetAllBlogPosts), a Sort method (SortBy), and a Projection method (GetList). The code for each of the methods is listed below.  Note that our Projection Method, GetList(), is just running toList() on the query and returning the resulting List<BlogPost>.

// GetBlogPostSetByDate

private ObjectQuery<BlogPost> GetAllBlogPosts()

{

    var query = from p in Context.BlogPostSet.Include("Categories")

                select p;

    return (ObjectQuery<BlogPost>)query;

}

// SortBy

internal static ObjectQuery<BlogPost> SortBy( this ObjectQuery<BlogPost> query, Enums.BlogPostSortOption sortOption)

{

    switch (sortOption)

    {

        case Enums.BlogPostSortOption.ByDate:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

        case Enums.BlogPostSortOption.BySite:

            return (ObjectQuery<BlogPost>)query.OrderBy(p => p.BlogProfile.BlogName);

        case Enums.BlogPostSortOption.ByVote:

            return query;

        default:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

    }

}

// GetList

internal static List<BlogPost> GetList(this ObjectQuery<BlogPost> query)

{

    return query.ToList<BlogPost>();

}

Wonderful, we can mix an match methods like this and they work great as long as we want to return BlogPost entities. But what if we just want a list of the BlogNames?  How do we handle that? It seems like a waste to pull a list of full BlogPost entity objects when all we really need is the BlogName.  In this case it would be nice to shape my return data into a simple list of BlogNames.

I typically handle this in my Projection Methods.  I already have a Filter Method that’s applying the appropriate filter, I have a Sort method that’s applying the appropriate sort, now I need a Projection Method that projects my results onto the appropriate data shape.  So, I need to create a new projection method to do this called GetBlogNames().  My new data access code and the new GetBlogNames() method look like this.

List<String> blogNames = GetAllBlogPosts().SortBy(sortOption).GetBlogNames();

 

// GetBlogNames

internal static List<String> GetBlogNames(this ObjectQuery<BlogPost> query)

{

    var newQuery = from p in query

                   select p.BlogProfile.BlogName;  

    return newQuery.Distinct().ToList<String>();

}

Look closely and you’ll see that the GetBlogNames() method does something kind of interesting.  It does a second select on my query.  We did the first select back in our Filter Method GetAllBlogPosts().  That select returned full BlogPost entities.  Now in my projection method I decide that I only need the BlogNames property so I do a second select that returns only the BlogNames. 

At first glance it may seem that this is still doing something really inefficient.  We have an initial query that pulls full entity objects, then we do a second select against that result set and it get’s the BlogNames.  We’re still pulling full entities in the first query, aren’t we?

No we’re not. Thanks to delayed execution our query never does pull full entities.  This goes back to the big realization that the query is now an object.  I’m a little fuzzy on some of the details here, but it goes something like this.  When we create an Entity Framework query, we ‘re creating an ObjectQuery that contains a Command Tree.  That’s important.  The ObjectQuery contains a Command Tree that represents each of the commands (like select, order by, from, etc.) in our query.  When we need the results of the ObjectQuery, like when we call ToList() or when we iterate over the results, our Command Tree gets translated into TSQL (or whatever is used by the provider we’re querying) and the query is executed. 

So, right up to the point that I call ToList(), my query is just a data structure containing all of my commands, and adding another select is just adding another command to the Command Tree.  At query time, all of my commands will be translated into SQL and executed as a single query that returns just BlogNames.

A Better Data Shaping Example

We’ve seen how we can use a select in our projection method to shape data.  Now let’s apply that principle to a more practical, real world, example, drop down list choices.  How many times have you seen code that pulled down a list of full entity objects, or a fully populated DataTable, just to populate the choices of a drop down list.  This scenario comes up all the time so I’ve added a ListItemDTO Data Transfer Object to my BLL (for more on DTOs see What is the difference between a DTO and a POCO). My ListItemDTO class is just a simple data container for a list item. 

public class ListItemDTO

{

    public String Text { get; set; }

    public String Value { get; set; }

}

Now, for this example let’s assume that in my UI, I have a page that displays a grid of all blog posts, for all blogs, that were posted in a given week.  Now I want to add a DropDownList to the top of this page that allows me to pick a blog and then the grid will display only the posts from that blog.  Simple enough scenario.  I just need to create a DropDownList that contains ListItems with Text equal to BlogName,  and values equal to BlogId.  To get the data to populate this ddl, I need a method in my Repository that gets the list of BlogPosts for the given week, but then shapes the return value as a distinct list of ListItemDTOs where the Text property is BlogName and the Value property is BlogId. 

For us this is no problem.  We already have the Filter and Sort methods needed to get the data, we just need a Projection method that will shape it as a List<ListItemDTO>. 

// GetBlogNameListItemDTOs

internal static List<ListItemDTO> GetBlogNameListItemDTOs(this ObjectQuery<BlogPost> query)

{

    List<ListItemDTO> items = new List<ListItemDTO>();

    var newQuery = from p in query

                   select new {p.BlogProfile.BlogName, p.BlogProfile.BlogId};

    foreach (var item in newQuery.Distinct())

    {

        items.Add( new ListItemDTO{Text=item.BlogName, Value=item.BlogId.ToString()});

    }

    return items;

}

In this Projection method, we added a new select that modified our query to return only the BlogName and the BlogId.  We then used a foreach loop to iterate over the results of the query, create a new ListItemDTO for each item, and then return the List<ListItemDTO>.  There are two details to make note of.  First we modified our query to return it’s data as an an anonymous type with two properties, BlogName and BlogId.  That’s the reason for the “new {p.BlogProfile.BlogName, p.BlogProfile.BlogId}” object syntax.  Second, when we iterate over the query in the foreach loop, we call the Distinct() method which, wait for it, makes sure our results are distinct. 

So, that was pretty painless. We can now write a data access method that leverages our existing Filter and Sort methods, and uses our new GetBlogNameListItemDTOs() to project the results onto a List<ListItemDTO>.  The code is given below.

// BlogNameListItemDtosForWeek

public List<ListItemDTO> BlogNameListItemDTOsForWeek(DateTime startDate)

{

  return GetBlogPostSetForWeek(startDate)

         .SortBy(Enums.BlogPostSortOption.BySite)

         .GetBlogNameListItemDTOs();

}

Conclusion

Hopefully this illustrates a little more of the potential of this pattern.  Now that I’ve been using it for a while I can’t bring myself to structure my Entity Framework Repositories any other way.  There are always situations where the model doesn’t quite work for me, and when that happens I just go back to writing Entity Framework queries from scratch.  But, so far I’ve been pleasantly surprised by how often my data access methods fit neatly into this little time saving model.  If you have suggestions on how to improve the pattern, or other data access patterns that have saved you time, please leave a comment or drop me an email.

For anyone who wants to see the Filter/Sort/Projection model in the context of a repository class, I’m including a partial listing of my BlogPostRepository below.  You’ll see everything we’ve covered over the last two posts, plus there’s some additional code that handles things like caching.

 

public class BlogPostRepository : RepositoryBase

{

 

    // Constructors

    public BlogPostRepository() { }

    public BlogPostRepository(RAEntities context)

    {

        this.Context = context;

    }

 

 

    //*********************************************************************************

    // Data Access Methods

    //******************************************************************************** 

 

    // BlogPostSetForWeek_GetPage

    public List<BlogPost> BlogPostSetForWeek_GetPage(int pageIndex, int pageSize, DateTime startDate, Enums.BlogPostSortOption sortOption, CacheOptions cacheOptions)

    {

        startDate = ToSunday(startDate);

        CacheHelper cache = new CacheHelper();

        string cacheKey = String.Format("BlogPostSetByWeek_GetPage_{0}_{1}_{2}_{3}", pageIndex.ToString(), pageSize.ToString(), startDate.ToShortDateString(), sortOption.ToString());

        // If the cache contains the data, and the user wants us to use

        // cache, then return the cached data

        if (cache.Contains(cacheKey) && cacheOptions.UseCache)

        {

            return (List<BlogPost>)cache.Get(cacheKey);

        }

        // If we fell through to this point then we need to pull data.

        List<BlogPost> postSet = GetBlogPostSetForWeek(startDate)

                                 .SortBy(sortOption)

                                 .GetPage(pageIndex, pageSize);

        if (cacheOptions.UseCache)

        {

            cache.Insert(postSet, cacheKey);

        }

        return postSet;

    }

 

 

    // BlogNamesForWeek

    public List<String> BlogNamesForWeek(DateTime startDate)

    {

        return GetBlogPostSetForWeek(startDate)

               .SortBy(Enums.BlogPostSortOption.BySite)

               .GetBlogNames();

    }

 

 

    // BlogNameListItemDtosForWeek

    public List<ListItemDTO> BlogNameListItemDTOsForWeek(DateTime startDate)

    {

        return GetBlogPostSetForWeek(startDate)

               .SortBy(Enums.BlogPostSortOption.BySite)

               .GetBlogNameListItemDTOs();

    }

 

 

    //*********************************************************************************

    // Filter Methods

    //*********************************************************************************   

    // GetBlogPostSetByDate

    private ObjectQuery<BlogPost> GetAllBlogPosts()

    {

        var query = from p in Context.BlogPostSet.Include("Categories").Include("BlogProfile")

                    select p;

        return (ObjectQuery<BlogPost>)query;

    }

 

    // GetBlogPostSetForWeek

    private ObjectQuery<BlogPost> GetBlogPostSetForWeek(DateTime startDate)

    {

        startDate = ToSunday(startDate);

        DateTime startUtc = startDate.Date;

        DateTime endUtc = startDate.AddDays(7).Date;

        var query = from p in Context.BlogPostSet.Include("Categories").Include("BlogProfile")

                    where p.PostedUtc > startUtc & p.PostedUtc < endUtc

                    select p;

        return (ObjectQuery<BlogPost>)query;

    }

 

 

 

 

    //*********************************************************************************

    // Utility Methods

    //*********************************************************************************

 

    public static DateTime ToSunday(DateTime date)

    {

        double offset = date.DayOfWeek == DayOfWeek.Sunday ? 0 : Convert.ToDouble(date.DayOfWeek);

        return date.AddDays(-offset);

    }

}

 

 

internal static class BlogPostRepositoryExtensionMethods

{

    //*********************************************************************************

    // Sort Methods

    //*********************************************************************************

 

    // SortBy

    internal static ObjectQuery<BlogPost> SortBy( this ObjectQuery<BlogPost> query, Enums.BlogPostSortOption sortOption)

    {

        switch (sortOption)

        {

            case Enums.BlogPostSortOption.ByDate:

                return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

            case Enums.BlogPostSortOption.BySite:

                return (ObjectQuery<BlogPost>)query.OrderBy(p => p.BlogProfile.BlogName);

            case Enums.BlogPostSortOption.ByVote:

                return query;

            default:

                return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

        }

    }

 

 

    //*********************************************************************************

    // Projection Methods

    //*********************************************************************************

 

    // GetList

    internal static List<BlogPost> GetList(this ObjectQuery<BlogPost> query)

    {

        return query.ToList<BlogPost>();

    }

 

    // GetPage

    internal static List<BlogPost> GetPage(this ObjectQuery<BlogPost> query, int pageIndex, int pageSize)

    {

        int skipCount = pageSize * pageIndex;

        return query.Skip(skipCount).Take(pageSize).ToList<BlogPost>();

    }

 

    // GetBlogNames

    internal static List<String> GetBlogNames(this ObjectQuery<BlogPost> query)

    {

        var newQuery = from p in query

                       select p.BlogProfile.BlogName;  

        return newQuery.Distinct().ToList<String>();

    }

 

    // GetBlogNameListItemDTOs

    internal static List<ListItemDTO> GetBlogNameListItemDTOs(this ObjectQuery<BlogPost> query)

    {

        List<ListItemDTO> items = new List<ListItemDTO>();

        var newQuery = from p in query

                       select new {p.BlogProfile.BlogName, p.BlogProfile.BlogId};

        foreach (var item in newQuery.Distinct())

        {

         items.Add( new ListItemDTO{Text=item.BlogName, Value=item.BlogId.ToString()});

        }

        return items;

    }

 

}

Tuesday, July 28, 2009

Fluent Interface Pattern for Composing Entity Framework Queries

I’ve been doing a fair amount of work with Entity Framework recently.  There are some things about EF that make me want to throw it out the window, but this post is about something that I really like, the ability to eliminate redundant code from my BLL and DLL and create a fluent interface for composing queries.

The problem we want to solve

So here’s a typical scenario.  I have a blog aggregator application that I’m building.  I use Entity Framework to create a BlogPost entity and it’s data mappings. Great, now I’m ready to create a BlogRepository class that will contain all of my queries for getting Blog posts.  So I write the first data access method and it looks something like this.

public List<BlogPost> BlogPostSetByWeek_GetPage(int pageIndex, int pageSize, DateTime startDate)

{

    startDate = ToSunday(startDate);        

    DateTime startUtc = startDate.Date;

    DateTime endUtc = startDate.AddDays(7).Date;

    int skipCount = pageSize * pageIndex;

    var query = from p in Context.BlogPostSet.Include("Categories")

                where p.PostedUtc > startUtc & p.PostedUtc < endUtc

                orderby p.PostedUtc descending

                select p;

    List<BlogPost> postSet = query.Skip(skipCount).Take(pageSize).ToList<BlogPost>();

    return postSet;

}

The above method takes a startDate and some paging parameters and then returns the specified page of results in the shape of a generic list of BlogPost entities.  How easy was that!! 

Now for the next step.  I need a query that’s exactly like the query above but this time I want the entire list of results instead of just a page.  And after that I need another query that sorts the BlogPosts by Category instead of by PostedUtc, and then I need another that sorts by the BlogName, and on and on and on.  So how do I handle this??  I could just create a completely new EF query for each one of these.  Or maybe I could use EntitySQL instead of Linq to Entities and then I would be able to use a bunch of conditional blocks to create the EntitySQL text that I need….. Neither of those solutions really appeals to me.  First, I don’t like the idea of rewriting the same query over and over with minor differences in criteria or sort order.  That just seems inefficient.  Second I don’t really want to use EntitySQL because I like the strong typing that I get with Linq to Entities, plus I would need a lot of conditionals to handle all of the possible query combinations and that sounds like a mess.

The Solution

So I was thinking about how much I hate duplicating the same query code over and over when I realized something.  Microsoft has made the query an object. I didn’t really appreciate the significance of that before.  The query is no longer just text, it is now an object, an ObjectQuery<> object to be precise.  The cool part is that if I write methods that take an ObjectQuery as their parameter and then return an ObjectQuery for their return value,  I can chain them together and use them to compose queries.

How could this work?  I looked at the queries in my BLL and found that each of them consists of 3 major components:

image

Looking at this break down, I realized that I could have a Filter Method that creates an ObjectQuery that gets the data I’m looking for, then I could pass that ObjectQuery to a  Sort Method that applies a sort then returns the modified ObjectQuery, then I could pass that to a Projection Method that applies paging, shapes the data, and executes the ObjectQuery. 

So, when all this is said and done I should be able to compose Entity framework queries by combining a Filter Method, a Sort Method, and a Projection Method.  The end result should be data access code that looks like this:

List<BlogPost> postSet = GetBlogPostSet().SortBy(“Date”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetPage(pageIndex, pageSize);

List<BlogPost> postSet = GetBlogPostSet().SortBy(“ID”).GetAll();

Building an Example

So, I coded it up and it works pretty well.  The first step is creating a Filter Method.  This method takes search criteria as parameters and returns an ObjectQuery. Below is my filter method for getting the BlogPost entities for a given week. 

// GetBlogPostSetForWeek

private ObjectQuery<BlogPost> GetBlogPostSetForWeek(DateTime startDate)

{

    startDate = ToSunday(startDate);

    DateTime startUtc = startDate.Date;

    DateTime endUtc = startDate.AddDays(7).Date;

    var query = from p in Context.BlogPostSet.Include("Categories")

                where p.PostedUtc > startUtc & p.PostedUtc < endUtc

                select p;

    return (ObjectQuery<BlogPost>)query;

}

Now I need to create my Sort Method. This method will take the results of my Filter Method as a parameter, along with an enum that tells the method what sort to apply. Note that I’m using strongly typed object queries of type ObjectQuery<BlogPost>.  The strong typing serves two purposes.  First it lets my Sort Method know that I’m dealing with BlogPost entities which tells me what fields are available to sort by.  Second, the stong typing provides a distinct method signature so I can have multiple methods called SortBy which all handle ObjectQueries that return different types of entities.  I can have a SortBy( ObjectQuery<BlogPost>), SortBy(ObjectQuery<Person>), etc.  

One other thing.  I want to chain these methods together, fluent interface style.  For that reason I’m implementing both SortBy and my GetPage as extension methods. Here’s the code for the SortBy method.

// SortBy

internal static ObjectQuery<BlogPost> SortBy( this ObjectQuery<BlogPost> query, Enums.BlogPostSortOption sortOption)

{

    switch (sortOption)

    {

        case Enums.BlogPostSortOption.ByDate:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

        case Enums.BlogPostSortOption.BySite:

            return (ObjectQuery<BlogPost>)query.OrderBy(p => p.BlogProfile.BlogName);

        case Enums.BlogPostSortOption.ByVote:

            return query;

        default:

            return (ObjectQuery<BlogPost>)query.OrderByDescending(p => p.PostedUtc);

    }

}

Lastly we need to create a Projection Method.  Below is the GetPage  method.  It takes the ObjectQuery<BlogPost> from the SortBy method, applies paging logic to it, executes the query, then returns the results as a List<BlogPost>. 

// GetPage

internal static List<BlogPost> GetPage(this ObjectQuery<BlogPost> query, int pageIndex, int pageSize)

{

    int skipCount = pageSize * pageIndex;

    return query.Skip(skipCount).Take(pageSize).ToList<BlogPost>();

}

So that’s it.  I now have all the pieces needed to create my data access methods without duplicating query logic over and over.  If I want all blog posts ordered by date, I can use the code:

  Enums.BlogPostSortOption sort = Enums.BlogPostSortOption.ByDate;

  return GetBlogPostSetForWeek(startDate).SortBy(sort).GetPage(pageIndex, pageSize);

To sort those same results by BlogName I can use the code:

  Enums.BlogPostSortOption sort = Enums.BlogPostSortOption.BySite;

  return GetBlogPostSetForWeek(startDate).SortBy(sort).GetPage(pageIndex, pageSize);

If I want to get BlogPosts by category instead of by week, I just write a new filter method named GetBlogPostSetForCategory and it plugs right in:

  return GetBlogPostSetForCategory(category).SortBy(sort).GetPage(pageIndex, pageSize);

Conclusion

So that's it.  This technique has significantly reduced the amount of data access code in my Repository classes and the time that it takes to write it.  I also like the fact that I’m not writing the same paging and sorting code over and over in different queries.  If you see any advantages or disadvantages to the technique, please leave a comment and let me know what you think.  Also, if you’re aware of anyone else using a similar method, please send me a link at rlacovara@gmail.com, I would like to check it out.

Saturday, April 11, 2009

Entity Framework Patterns: Unit of Work

The purpose of this series is to introduce some patterns used by Entity Framework which are probably a departure from the way most of us are used to handling persistence.  Last time we looked at Identity Map, this time we’re going to look at a closely related pattern “Unit of Work”.

What is the Unit of Work Pattern?

This time I’m not even going to try to use the definition.  For some reason the definitions for these patterns seem designed to sound impressive while conveying no information in as few words as possible.  Instead, here are a couple of descriptive paragraphs from Martin Fowler’s book Patterns of Enterprise Application Architecture.

When you're pulling data in and out of a database, it's important to keep track of what you've changed; otherwise, that data won't be written back into the database. Similarly you have to insert new objects you create and remove any objects you delete.

You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

So what does this mean?  Let’s look at two examples, the traditional save/update scenario and the Unit of Work save/update scenario.

Example: Traditional Save/Update Scenario

Ok, we’re going to look at the old way of doing things first.  For this example we have an architecture that uses the Active Record pattern.  Even if you’re not familiar with the term Active Record, you’ve probably used some variation of this pattern.  We have two classes, PostRecord and CommentRecord.  PostRecord represents a single blog post and PostComment represents a single comment on that blog post. Each class represents a single record of data. They encapsulate data, business logic methods, and persistence (Get and Save) methods which are used to get data from or save data to the database.  A typical block of code where we get a post from the db, make a quick change, and then create some comments for the blog post follows.  While we run this code I’ll be checking Sql Profiler to monitor any database activity.  The idea is that we want to keep track of when saves and updates are written to db.

int postId = 1;

PostRecord post = PostRecord.GetPost(postId);

// PROFILER: The line above hits the db and pulls 

// the post with post_id = 1.

 

post.PostTitle = "Rusty Bedsprings";

post.SavePost();

// PROFILER: At this point we see our change to

// the post record is saved back to db.

 

CommentRecord comment1 = new CommentRecord()

{

    CommentId = 100,

    CommentText = "Life in the fast lane.",

    PostId = 1

};

CommentRecord comment2 = new CommentRecord()

{

    CommentId = 101,

    CommentText = "Another one bites the dust",

    PostId = 1

};

 

comment1.SaveComment();

// PROFILER: We see our new comment record

// saved back to db.

comment2.SaveComment();

// PROFILER: Again, we see our new comment

// record saved back to db.

No real surprises there.  We create objects in memory, set their members, then save them.  When we call a save method, our DAL opens a connection to the database and runs the query that saves the data.  This, or something very similar, is the way that persistence is handled in almost every .Net application I’ve ever worked on.  We may hit the db 20 times in a block of code, but each time we open separate db connection and execute a separate db query.

Example: Unit of Work Save/Update Scenario

Now we’re going to do the same thing using ADO.Net Entity Framework, which implements the Unit of Work pattern.  There are some minor changes to our architecture.  We still have two classes BlogPost and BlogComment, that represent our post and comment data, and the associated business logic. But now our persistence methods are located in a separate ObjectContext class named UowEntities.  Both Linq and EntityFramework use a context class to encapsulate persistence logic.  This context class is where the Unit of Work pattern is implemented.  So, let’s go through our Entity Framework code, once again running the Sql Profiler to monitor database activity.

    UowEntities context = new UowEntities();

    var query = from p in context.PostSet where p.post_id == 1 select p;

    Post existingPost = query.FirstOrDefault<Post>();

    // PROFILER: The line above hits the db and gets

    // data for post with post_id=1

    existingPost.post_title = "Rusty Bedsprings";

    // PROFILER: No db activity

 

    PostComment comment1 = new PostComment()

    {

        comment_id = 100,

        comment_text = "Life in the fast lane.",

    };

    PostComment comment2 = new PostComment()

    {

        comment_id= 101,

        comment_text = "Another one bites the dust",

    };           

    // PROFILER: No db activity

 

    existingPost.PostComments.Add(comment1);

    existingPost.PostComments.Add(comment2);

    // PROFILER: No db activity

 

     context.SaveChanges();

    // PROFILER: This time db code runs to commit all

    // changes. A single connection is opened and SQL

    // is executed that updates the post_title, then

    // SQL is executed that saves the two new comment

    // records to the database.

So the Unit of Work pattern batches all of our db calls together and runs them at one time.  We have an ObjectContext class that encapsulates our persistence logic.  Part of what that class does is to track the state of every entity that we get from it.  It knows what entities have changed.  Then when we call SaveChanges(), it creates a batch of SQL statements to persist all of the changes that have taken place, then it runs them with a single db connection.

Saves and Updates are Deferred

The thing that probably caused me the most heartburn about Unit of Work is that data saves and updates are deferred.  When I change a property on an entity, my change is not saved back to the database.  When I create new comment objects and add them to my context (in the example above I added them to the post.PostComments collection which adds them to the context behind the scenes), my new comments are not saved back to the database.  Nothing is saved back to the database until I call SaveChanges() on my ObjectContext. Until that method is called, the ObjectContext just sits there and keeps track of all entities that have changed.  I’m still not sure that I like the idea of all my data changes being deferred and then batched together, but it’s growing on me.

How Does Entity Framework Know Which Entities Have Changed?

So, the context class knows which entities have changed and it batches SQL statements to update them.  How does it know which entities have changed?  If you said that it keeps a copy of the original data for each entity, then give yourself a gold star.  From the first post in this series, you may recall that the context stores an Identity Map which is basically just a central cache of all entities that are loaded through the context. In addition to a cached copy of the current state of each entity, Entity Framework also stores a copy of the original state of each entity (all of it’s original data values).  If you want to read more about it, search for Entity Framework Object Services.

Why Do I Care?

So what’s the advantage of using a framework that implements Unit of Work?  I still have a tough time convincing myself that this is a good idea, so if anyone out there is a strong supporter of this pattern, please post your reasons why in the comments.  But, even with my ambivalence toward the pattern, I can see 3 advantages.

  1. Performance.  Batching 30 SQL updates and running them on a single connection is more efficient than running 30 updates over 30 separate connections. Also, if your code is designed in a way that it produces duplicate SQL updates for the same entity (which is bad design but I have seen it before), the framework will consolidate those multiple updates into a single update.
  2. Concurrency. Unit of Work will automatically handle concurrency issues within a single thread by merging any updates to the same entity.  Now recall that Entity Framework’s implementation of Unit of Work involves tracking the initial state of each entity.  This means that Entity Framework can use an optimistic concurrency model to determine if that database has changed since you initially pulled your entities.  Entity data members have an attribute named ConcurrencyMode which is set to “None” by default.  If you set this attribute to “Fixed”, Entity Framework will check the database before saving and will throw an OptimisticConcurrencyException if the data has changed.
  3. Transactions.  Unit of Work is transactional by default.  When you call SaveChanges(), if there is a problem with any of the data updates, they all get rolled back, or depending on the exception, they may never make it to the db at all.  This can provide an easy way to batch data updates in a single transaction, even if the code for those updates exists in multiple, separate code modules.  You just create a new context object, pass it around as a parameter to the different modules (a little Dependency Injection), then call SaveChanges() and all of the data updates from the different modules are batched together in a single transaction.

So, hopefully this will give you a little background on why things are done the way they are in Entity Framework.  I still find myself ready to chuck it out the window every once in a while (especially when debugging failed transactions).  Then I realize that many of my objections stem from the fact that I’m just not used to using these patterns for data persistence.  They do represent a different paradigm that doesn’t always make sense to me, but they are also best practices that have evolved over years thanks to the efforts of some very intelligent programmers. 

kick it on DotNetKicks.com

Wednesday, March 25, 2009

Entity Framework Patterns: Identity Map

ADO.Net Entity Framework is a different way of looking at persistence than most of us are used to.  It wants us to do things like add objects/entities to our data context instead of saving them, then it doesn’t save those objects to the  database until we call SaveChanges(). We don’t directly save specific entities but instead EF tracks the entities we’ve loaded and then saves changes to db for any entities that it thinks have changed. My first reaction when I realized how different these concepts were from my standard way of saving data was that I hated it (this actually took place with LINQ to SQL which I still don’t care for due to the way it handles sprocs).  But the promise of rapid application development and more maintainable code kept me coming back. I started reading up on architectures using ORMs (mostly in the Java world) and I discovered that most of the things I initially didn’t like about Entity Framework and LINQ to SQL are actually accepted design patterns from the ORM world that have been developed by people much smarter than me who have been working for years to solve the Impedance Mismatch problem.  So I thought it might be helpful to talk about some of these patterns and how they are handled by Entity Framework.  The first one we’ll look at is Identity Map.

Identity Map Definition

In Martin Fowler’s book Patterns of Enterprise Application Architecture, he defines Identity Map with the following two phrases:

Ensures that each object gets loaded only once by keeping every every loaded object in a map. Looks up objects using the map when referring to them.

So what does this mean?  It’s probably better to demonstrate than to explain, so let’s look at the characteristics of Identity Map through some code examples.

There Can Be Only One

Let’s start by looking at the other way of doing things.  This is the non-Identity Map example. If we have an app that uses a simple persistence layer that does a database query,  and returns to us a DataTable we might see code like the following:

DataTable personData1 = BAL.Person.GetPersonByEmail("bill@gates.com");

DataTable personData2 = BAL.Person.GetPersonByEmail("bill@gates.com");

if (personData1 != personData2)

{

    Console.WriteLine("We have 2 different objects");

}

In this example, personData1 and personData2 both contain separate copies of the data for person Bill Gates. If we change the data in personData2, it has no effect on personData1.  They are totally separate objects that happen to contain the same data. If we make changes to both and then save them back to the database there is no coordination of the changes. One just overwrites the changes of the other.  Our persistence framework (ADO.Net DataTables) just doesn’t know that personData1 and personData2 both contain data for the same entity.  The thing to remember about this scenario is that multiple separate objects that all contain data for the same entity, lead to concurrency problems when it’s time to save data.

Now let’s look at the Identity Map way of doing things. Below, we have some ADO.Net Entity Framework code where we create two different object queries that both get data for the same person, and then we use those queries to load three different person entity objects.

EFEntities context = new EFEntities();

 

var query1 = from p in context.PersonSet

            where p.email == "bill@gates.com"

            select p;

Person person1 = query1.FirstOrDefault<Person>();

Person person2 = query1.FirstOrDefault<Person>();

 

var query2 = from p in context.PersonSet

             where p.name == "Bill Gates"

             select p;

Person person3 = query2.FirstOrDefault<Person>();           

 

if (person1 == person2 & person1==person3)

{

    Console.WriteLine("Identity Map gives us 3 refs to a single object");

}

 

person1.name = "The Billster";

Console.WriteLine(person3.name); // writes The Billster

When I run the code above, all 3 entities are in fact equal.  Plus, when I change the name property on person1, I get that same change on person3. What’s going on here?  They’re all refs to a single object that is managed by the ObjectContext. So Entity Framework does some magic behind the scenes where regardless of how many times or how many different ways we load an entity, the framework ensures that only one entity object is created and the multiple entities that we load are really just multiple references to that one object. That means that we can have 10 entity objects in our code and if they represent the same entity, they will all be references to the same object.  The result is that at save time we have no concurrency issues.  All changes get saved.  So how does this work? 

Every entity type has a key that uniquely identifies that entity.   If we look at one of our Person entities in the debugger, we notice that it has a property that Entity Framework created for us named EntityKey.  EntityKey contains a lot of information on things like what key values our entity has (for our Person entity the key field is PersonGuid), what entity sets our entity belongs to, basically all the information Entity Framework needs to uniquely identify and manage our Person entity.

The EntityKey property is used by the ObjectContext (or just context) that Entity Framework generates for us.  In our example the context class is EFEntities.  The context class does a number of things and one of them is maintaining an Identity Map.  Think of the map as a cache that contains one an only one instance of each object identified by it’s EntityKey. In fact, you will probably never hear the term Identity Map used.  Most .Net developers just call it the object cache, or even just the cache. So, in our example,  when we get person1 from our context, it runs the query, creates an instance of person (which the context knows is uniquely identified by field PersonGuid), stores that object in the cache, and gives us back a reference to it.  When we get person2 from the context, the context does run the query again and pulls data from our database, but then it sees that it already has a person entity with the same EntityKey in the cache so it throws out the data and returns a reference to the entity that’s already in cache.  The same thing happens for person3.

Quiz: What Happens To Cached Entities When  the Database Changes?

So here’s a question.  If we run the code sample above that loads person1, person2, and person3 from our context, but this time we use a break point to pause execution right after we load person1, then we manually update the database by changing the phone_home field on Bill Gates’ record to “(999) 999-9999”, then we continue executing the rest of our code. What value will we see for phone_home when we look at person1, person2, and person3?  Will it be the original value, or the new value?  Remember that all 3 entities are really just 3 references to the same entity object in the cache, and our first db hit when we got person1 did pull the original phone_home value, but then the queries for person2 and person3 also hit the database and pulled data.  How does Entity Framework handle that. The answer is shown in the debugger watch window below. It throws the new data out. 

image 

This can lead to some really unexpected behavior if you don’t know to look for it, especially if you have a long running context that’s persisted and used over and over for multiple requests.  It is very important to be thinking about this when you’re deciding when to create a context, how long to keep it running, and what you want to happen when data on the backend is changed.  There is a way to modify this behavior for individual queries by setting the ObjectQuery.MergeOption property.  But we still need to remember and plan for this default behavior.

If There’s a Cache, Why Am I Hitting The Database? 

Remember the second part of Martin Fowler’s definition where he said that the Identity Map looks up objects using the map when referring to them?  The natural question that comes to mind is, if I’m loading an object that already exists in my cache, and Entity Framework is just going to return a reference to that cached object and throw away any changes it gets from the database query, can’t I just get the object directly from my cache and skip the database query altogether? That could really reduce database load.

Unfortunately the answer is kind of, but not really.  In Entity Framework v1, you can get an entity directly from the cache without hitting the database, but only if you use a special method to get the entity by it’s EntityKey.  Having to use the EntityKey is a big limitation since most of the time you want to look up data by some other field.  For example, in a login situation I need to get a person entity by email or username.  I don’t have the PersonGuid.  I’m hoping that we get more options for loading entities from the cache in v2 but for now, if you do have the key field, this is how you do it:

Guid billsGuid = new Guid("0F3087DB-6A83-4BAE-A1C8-B1BD0CE230C0");

EntityKey key = new EntityKey("EFEntities.PersonSet", "PersonGuid", billsGuid);

Person bill = (Person)context.GetObjectByKey(key);

There are a couple of things I want to point out.  First, when we creating the key, the first parameter we have to give is the entity set name that we’re pulling from and this name must include the name of our ObjectContext class. Second, you’ll notice that GetObjectByKey() returns an Object, so we did have to cast the return value to Person.

Conclusion

So that’s one pattern down.  Hopefully discussing some of these differences in approaching persistence helps ease your transition to using Entity Framework a bit.  Next time we’ll cover another key pattern, Unit of Work.

kick it on DotNetKicks.com