Executing Dynamically Generated SQL in EF Core

October 10, 2020

Entity Framework Core is primarily concerned with defining and executing pre-defined queries on a DB table, or executing a simple join on two of more tables. You can do more, but that’s its sweet spot - and for good reason. Have a think about the last project you worked on: I reckon 95% of you will be thinking about a forms-over-data application. Get a list of orders, update the product price, create a new user: really basic CRUD operations. So it makes sense that a framework like EF Core should make the 95% as easy as possible.

But what if you’re in the 5%? What if you’re working on a project where you have a query with 5 or 6 tables. Maybe you don’t even know which fields you’ll need to filter on. Well, for those users, EF Core provides two methods:



FromSqlRaw

And



FromSqlInterpolated

Both methods basically allow you to build your own SQL string, and execute it against the DB Context. It’s worth remembering that, unlike ADO.NET, you can’t just parse the output, you need to be ready for it; but that’s not the subject of this post. Here. We’re talking about a dynamically build query that returns a known type.

So, what are we trying to do?

The Problem

Let’s imagine that we have a table, called MyTable for the sake of argument, and MyTable has five columns:



MyTable
-- Field1
-- Field2
-- Field3
-- Field4
-- Field5

Now, let’s imagine that we have an app that allows the user to pick one or more fields to filter on (obviously, exposing the DB structure to the user is a bad idea unless you’re writing an SSMS clone, but bear with me here). When this comes through to EF, you’ve basically got three ways to implement this:

1. Dynamically build the query string and execute it directly.
2. Use (either in raw SQL, a stored procedure, or in Linq) the **Field1 = filter OR filter = ""** method.
3. Bring the data down to the client, and filter it there.

For the purpose of this post, we’re going to discuss option (1). All the options have merit, depending on the use case.

Let’s talk about building dynamic SQL, and some of the pitfalls.

Dynamic SQL

Building dynamic SQL is easy, right? You could just do this:



string sql = "select \* from MyTable ";

If (!string.IsNullOrWhitespace(filter1)
{
    sql += $"where Field1 = {filter1}"
}

// Add additional fields, and deal with the WHERE / AND problem

var result = \_myDbContext.MyTable.FromSqlInterpolated(sql);

So, this code is bad for several reasons. Let’s run through them.

1. It doesn’t compile

The first thing (although by far not the worst), is that this code won’t compile. The reason this won’t compile is that FromSqlInterpolated takes a FormattableString. Of course, this is easily correctable:



var result = \_myDbContext.MyTable.FromSqlInterpolated($"{sql}");

Now the code compiles, but it probably doesn’t do what you want (to be clear, it probably will work at this point).

The next issue is one of security.

2. SQL Injection

If the field above (filter1) is set to: ‘1’; DROP TABLE MyTable; (or something equivalent), your app will execute it. This is because we’re not using placeholders. What does this mean:

1. **FromSqlInterpolated** accepts an interpolated string, but what we're passing here is a pre-built string.  The code being passed into **FromSqlInterpolated** needs to be interpolated at the time; e.g.:
	- \_myDbContext.MyTable.FromSqlInterpolated($"select \* from MyTable where field1 = {filter1}");
2. Since this won't work in our case, we'll need to build up the query using **FromSqlRaw**, and pass in parameters.

3. Caching

The way that most (at least relational) databases work, is that they try to cache the most frequently used queries. The problem is that, if you do something like the query above: “select * from MyTable where Field1 = ‘myvalue’”, that gets cached. If you run that again, but pass ‘myvalue2’ then that gets cached. Run it 1000 times with different values, and other queries start to get pushed out of the cache.

So, how can we build up a dynamic SQL string, without leaving ourselves open to SQL injection, or flooding the cache?

A Solution

This is a solution, it is not the solution. In it, we’re reverting a little to an ADO.NET style of doing things, by providing SqlParameters. Let’s see what that might look like:



            string sql =
                "select \* " +
                "from MyTable ";

            var parameters = new List<SqlParameter>();

            int i = 1;
            foreach (var filter in filters)
            {
                if (i == 1)
                {
                    sql += $"where Field{i} = @filter{i} ";                    
                }
                else
                {
                    sql += $"and Field{i} = @filter{i} ";
                }
                parameters.Add(new SqlParameter($"@filter{i++}", filter));
            }

            var result = \_paymentsDbContext.MyTable
                .FromSqlRaw(sql, parameters.ToArray())
                .ToList();


We’re assuming that we have an array / list of filters here, and we just create a query that looks something like this:



select \*
from MyTable
where Field1 = @filter1
and Field3 = @filter3

Because these are placeholders, you’re protected against SQL injection, and the DB engine will cache this query (so changing the values themselves doesn’t affect the cache). It’s worth bearing in mind that if we run this again, and end up with the following:



select \*
from MyTable
where Field1 = @filter1
and Field4 = @filter4
and Field5 = @filter5

This will be separately cached, so you’d need to make a decision as to whether you are likely to have few enough queries that it doesn’t matter.

Summary

Quite often, people use EF as though the data was all local. It’s always worth remembering that each time you make a call, you are accessing the DB - despite the fact that Microsoft have gone to great lengths to make you think you are not. Each time you touch the DBMS, you change something - or, rather, something is changed as a result of you touching the DB. This might be obvious, like you insert a record, or it might be hidden, like the cache is updated. Nevertheless, the DB is a service, and it is probably the most important service in your system.

This post is based on my knowledge of relational databases, so the same may not be completely true of No-Sql databases.



Profile picture

A blog about one man's journey through code… and some pictures of the Peak District
Twitter

© Paul Michaels 2024