LINQ in depth: advanced features

Blog

Stay updated

We continue to explore how LINQ transformed the way we access data in .NET

Wednesday, November 25, 2020

In the previous article, we took a look at the evolution of the C # language and the innovations introduced to support LINQ, starting to discover its main features.

Let’s continue now observing some peculiarities of the two syntaxes that LINQ makes available to us and, as we said, are practically equivalent.

// Method syntax
var filteredCustomersAge = customers.Where(c => c.Age > 19 && c.Age < 36);
 
// Query syntax
var filteredCustomersAge =
 from customer in customers
 where customer.Age > 19 && customer.Age < 36
 select customer;

In the previous query, which concerns the Query Syntax, the query operator where is converted (under compiling) with the call to the .Where() extension method used in the code example written with the Method Syntax.

However, some queries must be made with Method Syntax.

For example, to express a query that retrieves the number of items that match a specified condition:

var filteredCustomersAgeCount = customers
 .Where(c => c.Age >19 && c.Age <36)
 .Count(); //Count: 2

Or a query that retrieves the element with the highest value in a source sequence:

var customerWithOlderAge = customers.Select(c => c.Age).Max(); //Age: 55

However, it’s always possible to apply the Method Syntax after using the Query Syntax as in the next example. Here, we’ll use the Query Syntax to retrieve the ages of customers > 19 and < 36 years; then, using the Method Syntax, we’ll take the higher age:

int maxAgeInRange = 
 (from customer in customers
 where customer.Age > 19 && customer.Age < 36
 select customer.Age).Max();

Let’s see some examples where it’s more convenient the use of Query Syntax.

Thanks to the use of the keyword let, we can store a result and use it within the query. Suppose we have a GetYearsOfFidelity() method that returns us the years of affiliation of a customer:

var querySyntaxGoldenCustomers =
 from customer in customers
 let yearsOfFidelity = GetYearsOfFidelity(customer)
 where yearsOfFidelity > 5
 orderby yearsOfFidelity
 select customer.CustomerName;
 
var methodSyntaxGoldenCustomers = customers
 .Select(customer => new //anonymous type
  {
 YearsOfFidelity = GetYearsOfFidelity(customer),
 Name = customer.CustomerName
  })
 .Where(x => x.YearsOfFidelity > 5)
 .OrderBy(x => x.YearsOfFidelity)
 .Select(x => x.Name);

As we can see, with the Query Syntax the result is more exact. The Method Syntax requires you to create an anonymous type and use it for the rest of the query.

If we had multiple data sources, the Query Syntax would be probably the better choice because we can use the keyword from multiple times and make the code more explanatory:

var rows = Enumerable.Range(1, 3); //1,2,3
var columns = new string[] { "A", "B", "C" };
 
var querySyntax = from row in rows
 from col in columns
 select $"cell [{row}, {col}]";
 
var methodSyntax = rows.SelectMany(row => columns, (r, c) => $"cell [{r}, {c}]");

Let’s now examine a code that integrates the functionality of the Object Initializers with LINQ. Object Initializers are typically used in query expressions when they project a data source into a new data type. We use our Customer class and suppose we have a data source called IncomingOrders and that for each order with OrderSize greater than 100, we need to create a new Customer object based on the order:

var largeOrderCustomers = from o in IncomingOrders
 where o.OrderSize > 100
 select new Customer { CustomerName = o.CName, CustomerID = o.CId };

The data source can have different properties compared to the Customer class, for example OrderSize, but using the Object Initializer the data returned by the query are modeled in the desired data type with a single operation. As a result, there is now a IEnumerable object that contains the desired new Customer objects.

In this and in the previous article on LINQ, we talked about the features supporting this powerful framework and saw small examples of its use. But why did we talk about anonymous delegates?

Let’s analyze one of the queries previously seen:

var filteredCustomersAge = customers.Where(c => c.Age > 19 && c.Age < 36);

The Where method takes a condition as a parameter, in this case expressed by a lambda expression, which is a function that takes as input a Customer object and returns a Boolean:

Func<Customer, bool> func = c => c.Age > 19 && c.Age < 36;
 
var filteredCustomersAge = customers.Where(func);

But what is that function actually?

That function is a delegate! A variable that references raw executable code. The LINQ extension methods of IEnumerable<T> accept a delegate as a parameter, whether they are anonymous delegates as in the previous case with the lambda expression, or explicitly defined methods:

bool func(Customer customer)
  {
 return customer.Age > 19 && customer.Age < 36;
  }

In theory, it is possible to analyze the IL to understand what the method we are using is trying to do and apply the logic of that method to any underlying data source. But this would not be an easy job.

Fortunately, .NET provides us with the IQueryable <T> interface which gives us the same extension methods as the interface it derives from, that’s to say IEnumerable <T>. However, these methods accept expression trees as parameters instead of delegates.

Let’s just take a look at what the code found is and how it is translated into an expression in an expression tree.

For the sake of convenience, let’s consider a simple lambda expression similar to the previous one:

Func<int, int, int> function = (a,b) => a + b;

LINQ gives us a simple syntax, the first step is to add a using statement to introduce the namespace Linq.Expressions:

using System.Linq.Expressions;

Now we can create an expression tree:

Expression<Func<int, int, int>> expression = (a, b) => a + b;

The lambda expression is converted into an expression tree of type Expression <T>

which is not an executable code: all the elements of our expression are represented as nodes of a data structure.

We can see the structure in debug with Visual Studio:

The expression tree consists of four main properties:

Body: contains the body of the expression and its information.
Parameters: contains information about the parameters of the lambda expression.
NodeType: contains information about the different possible types of expression nodes, such as those that return constants, those that return parameters, those that decide if one value is less than another (<), those that decide if one is greater than another ( >), those that add values together (+), etc.
Type: Gets the static type of the expression. In this case, the expression is of type
Func <int, int, int>.

We therefore have the possibility to use and analyze these properties.

Now that we have a clearer idea, let’s analyze the code of this LINQ query, specifically LINQ to SQL:

var query = from c in db.Customers
 where c.City == "Nantes"
 select new { c.City, c.CompanyName };

The query variable returned by this LINQ expression is of type IQueryable, which contains in its definition a property of type Expression. The Expression type property is designed to contain the expression tree associated with an instance of IQueryable and it is a data structure equivalent to the executable code found in a query expression:

public interface IQueryable : IEnumerable
{
 Type ElementType { get; }
 Expression Expression { get; }
 IQueryProvider Provider { get; }
}

But what is the main reason why in some cases, for example with LINQ to SQL, we need the IQueryable interface and expression trees?

A LINQ to SQL query is not actually executed within the C # program!

The code of a query expression is translated into an SQL query that can be sent to another process as a string. In this case, a SQL Server database. Obviously it is much easier to translate a data structure such as an expression tree, rather than translating the IL or raw executable code into SQL statements.

The previous query is then first translated into an SQL statement like the one below and after executed on a server:

SELECT [t0].[City], [t0].[CompanyName]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[City] = @p0

A very sophisticated LINQ to SQL algorithm analyzes the different parts of an expression tree and derives a string containing an SQL statement that will return the requested data.

Let’s also take a look at the definition of the IEnumerable<T> interface:

public interface IEnumerable<T> : IEnumerable
{
 IEnumerator<T> GetEnumerator();
}

As we can see, it does not contain an Expression type field.

That’s why the IEnumerable<T> interface is best suited for contexts where you can convert the query expression directly into .NET code that can be executed and where you don’t need to translate it to a string or perform any other complex operation on it.

It is therefore used to query data from in-memory collections such as List and Array, and is best suited for LINQ to Object and LINQ to XML queries.

It does not support lazy loading, so it is not suitable for scenarios where we need pagination.

Also, when querying data from a database, IEnumerable performs a server-side select query, loads the data into memory on the client-side, and then filters the data.

The IQueryable<T> interface, instead, as we have seen, is used where it is necessary to translate a query expression into a string that will be passed to another process.

Therefore, it is better for querying data from out-of-memory collections such as remote databases and services, and with LINQ to SQL.

Finally, it supports lazy loading for scenarios where pagination is needed and supports custom queries using the CreateQuery and Execute methods.

In this article we have analyzed some aspects of how LINQ works and the features that support it. It would take a very long time to explain what is behind it and all that such a complex framework allows us to do.

However, I would end with a list of some of the advantages that we can obtain using it:

It allows us to write less code, more readable and maintainable, and reduces development time;
We don’t have to learn a new query language for every type of data source or data format;
It provides us with IntelliSense support for writing queries on generic collections and greater security by checking the type of objects at compile time;
It allows us to perform queries more easily even from multiple data sources in a single query and thanks to its hierarchical functionality, it allows us to compose queries by joining multiple tables in less time;
Simplify debugging thanks to its integration with the C# language;
It offers us an easy way to convert one data type to another, such as transforming SQL data into XML data;
It can be extended, which means LINQ can be used to query on new types of data sources.

See you in the next article! Stay Tuned!

Written by

Francesco Vastarella

Written by

Francesco Vastarella

See author's posts

Blog

Francesco Vastarella

Tag

News & Events

Discover more from Blexin