Monday, 24 September 2012

Introducing Linq


Linq is short for Language Integrated Query. Just think what makes SQL different from C#. It’s a C# 3.0 API centred on Data Access.

Imagine we have a list of orders.

class Order
    {
        private int _OrderID;
        private int _CustomerID;
        private double _Cost;
        public int OrderID
        {
            get { return _OrderID; }
            set { _OrderID = value; }
        }
        public int CustomerID
        {
            get { return _CustomerID; }
            set { _CustomerID = value; }
        }
        public double Cost
        {
            get { return _Cost; }
            set { _Cost = value; }
        }
    }

var Orders = new List<Order> {
                         new Order {
                             OrderID = 1,
                             CustomerID = 84,
                             Cost = 159.12
                         },
                         new Order {
                             OrderID = 2,
                             CustomerID = 7,
                             Cost = 18.50
                         },
                         new Order {
                             OrderID = 3,
                             CustomerID = 84,
                             Cost = 2.89
                         }
                     };

We want to get a list of the costs of all orders that were placed by the customer identified by the number 84, we would probably write something like :
     List<double> Found = new List<double>();
     foreach (Order o in Orders)
     {
        if (o.CustomerID == 84) Found.Add(o.Cost);
     }

If we had the orders stored in a table in a database and we used SQL to query it, we would write something like:
SELECT Cost FROM Orders WHERE CustomerID = 84
Here we have not specified an algorithm, or how to get the data. We have just declared what we want and left the computer to work out how to do it. This is known as declarative or logic programming.

Linq brings declarative programming features into c# language.

Let’s see some example. The Linq approach for the above question will be –
var Found = from o in Orders
            where o.CustomerID == 84
            select o.Cost;
You may be thinking at this point, "hey, this looks like SQL but kind of backwards and twisted about a bit". That is a pretty good summary. I suspect many who have written a lot of SQL will find the "select comes last" a little grating at first; the other important thing to remember is that all of the conditions are to be expressed in C# syntax, not SQL syntax. That means "==" for equality testing, rather than "=" in SQL. Thankfully, in most cases that mistake will lead to a compile time error anyway.
Here’s million dollar Question: What does this query output? Nothing!
Actually evalation in most cases is delayeduntil the result of the query is actally required. An approach known as Lazy evaluation.

Found.GetEnumerator().MoveNext(); //output 159.12
To iterate all the Orders in Found we will have to use Foreach.
foreach(var result in Found);

One of the implication of Lazy evaluation is that query expressions are re-evaluated each time they are processed. In other words, the result of the query are not stored or cached in a collection, but lazily evalauated and returmed on each request. The advantage is that changes in data are automatically reflected in the next processing cycle.
If you want to force immediate result and cache it in a collection then you may call ToArray() or ToList() method.
A Few More Simple Queries
Anonymous types.
var Found = from o in Orders
    where o.CustomerID == 84
    select new {OrderID = o.OrderID, Cost =  o.Cost,  CostWithTax = o.Cost * 1.1};


Ordering
var Found = from o in Orders
                        where o.CustomerID == 84
                        orderby o.Cost ascending
                        select new { o.OrderID, o.Cost };

 

Joins
So far we have just had one type of objects to run our query over. However, real life is usually more complex than this. For this example, let's introduce another class named Customer.
class Customer
    {
        private int _CustomerID;
        private string _Name;
        private string _Email;
        public int CustomerID
        {
            get { return _CustomerID; }
            set { _CustomerID = value; }
        }
        public string Name
        {
            get { return _Name; }
            set { _Name = value; }
        }
        public string Email
        {
            get { return _Email; }
            set { _Email = value; }
        }
    }
var Customers = new List<Customer> {
                    new Customer {
                        CustomerID = 7,
                        Name = "Emma",
                        Email = "emz0r@worreva.com"
                    },
                    new Customer {
                        CustomerID = 84,
                        Name = "Pedro",
                        Email = "pedro@cerveza.es"
                    },
                    new Customer {
                        CustomerID = 102,
                        Name = "Vladimir",
                        Email = "vladimir@pivo.ru"
                    }
                };


var Found = from o in Orders
                        join c in Customers on o.CustomerID equals c.CustomerID
                        select new { c.Name, o.OrderID, o.Cost };
            // Display results.
            foreach (var Result in Found)
                Console.WriteLine(Result.Name + " spent " +
                    Result.Cost.ToString() + " in order " +
                    Result.OrderID.ToString());

Output:
Pedro spent 159.12 in order 1
Emma spent 18.5 in order 2
Pedro spent 2.89 in order 3
Getting All Permutations With Multiple "from"s
It is possible to write a query that gets every combination of the objects from two collections. This is achieved by using the "from" keyword multiple times.
var Found = from o in Orders
                        from c in Customers
                        select new { c.Name, o.OrderID, o.Cost };
Earlier I suggested that you could think of "from" as being a little bit like a "foreach". You can also think of multiple uses of "from" a bit like nested "foreach" loops; we are going to get every possible combination of the objects from the two collections. Therefore, the output will be:
Emma spent 159.12 in order 1
Pedro spent 159.12 in order 1
Vladimir spent 159.12 in order 1
Emma spent 18.5 in order 2
Pedro spent 18.5 in order 2
Vladimir spent 18.5 in order 2
Emma spent 2.89 in order 3
Pedro spent 2.89 in order 3
Vladimir spent 2.89 in order 3
Which is not especially useful. You may have spotted that you could have used "where" in conjunction with the two "from"s to get the same result as the join:

var Found = from o in Orders
                        from c in Customers
                        where o.CustomerID == c.CustomerID
                        select new { c.Name, o.OrderID, o.Cost };
However, don't do this, since it computes all of the possible combinations before the "where" clause, which goes on to throw most of them away. This is a waste of memory and computation. A join, on the other hand, never produces them in the first place.
Grouping
var OrdersByCustomer = from o in Orders
                                   group o by o.CustomerID;
            // Iterate over the groups.
            foreach (var Cust in OrdersByCustomer)
            {
                // About the customer...
                Console.WriteLine("Customer with ID " + Cust.Key.ToString() +
                    " ordered " + Cust.Count().ToString() + " items.");
                // And what they ordered.
                foreach (var Item in Cust)
                    Console.WriteLine("    ID: " + Item.OrderID.ToString() +
                        " Cost: " + Item.Cost.ToString());
            }
Output:
Customer with ID 84 ordered 2 items.
    ID: 1 Cost: 159.12
    ID: 3 Cost: 2.89
Customer with ID 7 ordered 1 items.
    ID: 2 Cost: 18.5
Query Continuations
At this point you might be wondering if you can follow a "group ... by ..." with a "select". The answer is yes, but not directly. Both "group ... by ..." and "select" are special in so far as they produce a result. You must terminate a Linq query with one or the other. If you try to do something like:

var CheapOrders = from o in Orders
                  where o.Cost < 10;
Then it will lead to a compilation error. Since both "select" and "group ... by ..." terminate a query, you need a way of taking the results and using them as the input to another query. This is called a query continuation, and the keyword for this is "into".

In the following example we take the result of grouping orders by customer and then use a select to return an anonymous type containing the CustomerID and the number of orders that the customer has placed.

var OrderCounts = from o in Orders
                              group o by o.CustomerID into g
                              select new
                              {
                                  CustomerID = g.Key,
                                  TotalOrders = g.Count()
                              };
Notice the identifier "g", which we introduce after the keyword "into". This identifier represents an item in the collection containing the results of the previous query. We use in the select statement. Remember that each element of the collection we are querying in this second query is actually a collection itself, since this is what "group ... by ..." produces. Therefore, we can call Count() on it to get the number of elements, which is the number of orders per customer. We grouped by the CustomerID field, so that is our Key.

Query continuations can be used to chain together as many selection and grouping queries as you need in whatever order you need.
Under The Hood
Now we have looked at the practicalities of using Linq, I am going to spend a little time taking a look at how it works. Don't worry if you don't understand everything in this section, it's here for those who like to dig a little deeper.

Throughout the series I have talked about how all of the language features introduced in C# 3.0 somehow help to make Linq possible. While anonymous types have shown up pretty explicitly and you can see from the lack of type annotations we have been writing that there is some type inference going on, where are the extension methods and lambda expressions?

There's a principle in language design and implementation called "syntactic sugar". We use this to describe cases where certain syntax isn't directly compiled, but is first transformed into some other more primitive syntax and then passed to the compiler. This is exactly what happens with Linq: your queries are transformed into a sequence of method calls and lambda expressions.

The C# 3.0 specification goes into great detail about these transformations. In practice, you probably don't need to know about this, but let's look at one example to help us understand what is going on. Our simple query from earlier:

var Found = from o in Orders
                        where o.CustomerID == 84
                        select o.Cost;
After transformation by the compiler, becomes:
var Found = Orders.Where(o => o.CustomerID == 84)
                        .Select(o => o.Cost);
And this is what actually gets compiled. Here the use of lambda expressions becomes clear. The lambda passed to the Where method is called on each element of Orders to determine whether it should be in the result or not. This produces another intermediate collection, which we then call the Select method on. This calls the lambda it is passed on each object and builds up a final collection of the results, which is then assigned to Found. Beautiful, huh?

Finally, a note on extension methods. Both Where and Select, along with a range of other methods, have been implemented as extension methods. The type they use for "this" is IEnumerable, meaning that any collection that implements that interface can be used with Linq. Without extension methods, it would not have been possible to achieve this level of code re-use.

Conclusion
Linq brings declarative programming to the C# language and will refine and unify the way that we work with objects, databases, XML and whatever anyone else writes the appropriate extension methods for. It builds upon the language features that we have already seen in the previous parts of the series, but hiding some of them away under syntactic sugar. While the query language has a range of differences to SQL, there are enough similarities to make knowledge of SQL useful to those who know it. However, its utility is far beyond providing yet another way to work with databases.

No comments:

Post a Comment