понедельник, 6 сентября 2010 г.

I'm starting a series of blog posts about Code First approach in Entity Framework 4

Hi everyone.
I changed the job three months ago and now working for a little software company where we are building workflow system using .NET. We are working in a greenfield, which means we have that rare ability in software development to start completely clean of ugly stuff, choose technologies we'll work with, don’t have to tackle with backward compatibility and don’t refactor legacy code. Quite a list:)
We chose Entity Framework 4 as an ORM solution. Furthermore, we decided to use Code First approach, that haven't had a major release yet (it's currently in CTP mode, last one is CTP4). So, the first important thing I’m going to say – DON’T EVER USE SOLUTIONS THAT HAVE NO STABLE RELEASE. It really sucks, believe me. I was able to find only a small amount of documentation, it means that you have to solve all your problems with the library yourself, or endlessly search for blog posts or, may be, people who have already run into that problem and have find a solution for it.
While serfing the internet trying to find out answers to many questions I have about Code First, I was astonished (in a bad way) about the fact that there are not so many articles, examples, blog posts that covers the feature. I think I'm not the only one in this boat, since many developers asking across the net "How can I do this? Where did that come from?".
Not only Code First feature is poorly covered, but I was shocked when I found out that its API has no comments (which by the way is fairly uncommon about Microsoft-written piece of .NET functionality). The only sources turned out to be useful was ADO.NET team blog, where each post covers only basics, and several blog posts and articles from .NET community enthusiasts who plays around with the feature. As Code First hasn’t been released yet, the only book I found that contains any examples and explanation on the feature was "Pro Entity Framework 4.0" by Scott Klein, published by Apress. This is a good resource, with the whole chapter 10 devoted to Code First approach. Many examples are already a little bit outdated (the API has been mercilessly refactored since CTP2), and still you can find useful information there. Examples are good, but unfortunately covers only basics.
So, after having successfully won the battle with the framework, I thought that I may be of some help to those who is out there still struggling with Code First. I will be doing a series of blog posts covering Code-First mapping capabilities. I can't say this is going to be an ad-vanced kind of stuff, but I struggled with some of the problems for days. By the way, does anyone know why frameworks that are supposed to help us to be more productive and efficient in our work, only make us work harder?
Althought I’m not writing a book, I simply must put one acknowledgement before I start. I will never ever succeed in using Code First without help of my colleague Roman. His broad experience in using .NET technologies and specifically EF helped me a lot. He somehow been able to make some assumptions about how Code First works and nearly each of them turned out to be true.
Code First approach is pretty powerful as it relies on EF4, but currently, to get non-trivial existing domain model mapped to a database and, which is more important, to let your domain drive your database structure, you must have plenty of spare time and a patient of an angel:)
This one will be an introduction, I'll tell a little bit about how Entity Framework evolved starting with its first release included in .NET 3.5 SP1. We'll see what the heck is Code First, why do we need it and what advantages it can bring to the table.
Microsoft ADO.NET Entity Framework was announced at TechEd 2006 Conference and was released two years later, in July 2008 as part of Visual Studio 2008 SP1 and .NET 3.5 SP1. For now, it has two major releases, the second being in the 12th of April 2010 along with .NET 4 and Visual Studio 2010.
In EF 1, there was no option to embrace Persistence Ignorance, so if this release was supposed only for developers who wants to allow their data access details leak into their business model, I don’t know. The only thing you can do in v1 is to generate conceptual model from database. Every entity you want EF to know about must derive from EntityObject in order to support lazy-loading and change-tracking. Developers using domain-driven design were also left aside of this release, because of the same thing – how can smart DDD guy allow some persistence-aware classes in his perfectly refactored and unit-tested object model? No way. So, when I realized what this release is about, my first thought was - Microsoft rocks again.
Thankfully, in v2 (which was called EF4 to match .NET 4 release) we got an ability to generate data-base from model, use model-first approach and even write our own POCO classes. Great, but still there was no way to control both classes that represent your entities and how they map to the database tables. For us control freaks this was no good. That’s where Code First (or Code Only, depends on which name you prefer most) steps in.
Those of you who used NHibernate (other ORM solution for .NET development platform, pretty mature and robust, but the learning curve is rather high, I should say), probably know of a project called Fluent NHibernate. What it allows us to do is to write our mappings in a strongly-typed manner, using lambda expressions and type-checking. For example, that’s how the mapping looks in Fluent NHibernate for a simple Book class:
public class Book
{
    public virtual Int32 Id { getprivate set; }
    public virtual String Name { getset; }
    public virtual String ISBN { getset; }
    public IList<Author> Authors { getset; }
}
 
public class BookMap : ClassMap<Book>
{
    public BookMap()
    {
        Id(b => b.Id);
        Map(b => b.Name);
        Map(b => b.ISBN);
        HasManyToMany(b => b.Authors);
    }
}
The whole point of the project was to give developers as much freedom and mapping tuning capabilities to define their mappings as they want. Want to name tables your own way? Just implement the right interface. Want to name relationships in accordance with your company rules? Got it. Persistence details doesn’t leak into business layer, so that’s the good thing. It’s also better for Separation of Concerns, because using this approach we can easily unit-test and reuse both mappings and the model, as long as we can throw the mapping layer away at any time and change it to something else, which means our system is quite flexible. Awesome, isn’t it?
So what Code First approach has to do with all this stuff? It allows developers to keep their model and mappings separated from each other. To archieve this, fluent interface is used – so that’s pretty much like what Fluent NHibernate is doing except for EF doesn’t allow us that much freedom. Microsoft doesn’t trust developers who use their technologies. I suspect that for a long time:)
As you may have deduced, using a framework like Fluent NHibernate puts constraints on how we declare our POCO classes, as does Code First. Here they are:
1) If you want to store piece of information in the database, you should declare it as public property.
2) Every type you want to have the mapping for should have public parameterless contructor.
What the hell? Why should EF put some constraints on my domain model? Am I not the developer who says what it is to be done?
The reason is actually very simple. To provide developers with change-tracking and lazy-loading, proxy classes are generated at runtime. It’s just derives from your entity classes and do some tricky things :), so that’s why you need public properties. Microsoft implemented things this way because in v1 they enforced any entity to derive from EntityObject to open the room for these features, but Code First is all about Persistence Ignorance, right? So EntityObject class and INotifyPropertyChanged interface were replaced with more peaceful conventions like public properties.
What advantages does Code First have? Why should one use it? So far I found those two:
1) Separation of Concerns – keeping business model separated from database mapping are great. If tomorrow your boss comes in and says “Jack, we have a few new business requirements we need to put into our system”, you’ll be able to modify business model without having to deal with highly-coupled business-and-database code. You should then remember to update your mappings and DB structure, of course, but still you can easily keep different concerns away from each other.
2) Gaining more control over your mappings. While using EF v1, you have to rely on many default conventions you couldn’t even override. This changes now, because you can say “I want table for this entity to be named like that” and other stuff. If you needed control – you can get it now.
What about disadvantages? Why does any technology by Microsoft have disadvantages? :)
1) Not so many extension points in the framework. Fluent NHibernate took this to extreme thanks to convention over configuration approach – framework assumes a lot about your code, but if you don’t like default behavior, you can override pretty much everything. In Code First we don’t have that luxury, at least not yet.
2) As I mentioned before, documentation is very poor. Few articles and examples from ADO.NET team on their blog, no XML comments we are used to when seeing MS code, no MSDN support.
3) Developers from ADO.NET team have made some very unobvious choices for default behavior. I’ll mention it in the upcoming posts, just believe me at this point.
So, I already mentioned the book you may want to pick up if you’re not that familiar with EF in common. What about what you’ll need in this series to get yourself up and running?
I’m going to use .NET 4, VS2010 Ultimate and SQL Server 2008 for the demo purposes. You also need to install CTP 4, package which contains several Code-First related DLLs. I’m going to use CTP4 for the demos because it’s the latest one available right now.
VS Ultimate and Sql Server 2008 are not required, you can use VS2010 Express and Sql Server Express as well. Please don’t use beta versions or release candidate of VS2010 – they have pre-installed beta versions of .NET 4 and have significant performance issues compared to the release version.
Useful links:
- ADO.NET team blog