Category Archives: 70-461

Understanding the PIVOT function

The PIVOT function is very useful but not easily understood at first. I thank Itzik Ben-Gan, Dejan Sarka and Ron Talmage and their study guide Exam 70-461: Querying Microsoft SQL Server 2012 for the brilliant explanation.

Here’s their example replicated here. Remember there are 3 columns used in PIVOT. The Grouping column is what you want each row to be aggregated to. The Spreading column becomes the columns after PIVOTing, and the Aggregate columns are the measures.

You need a pivot aggregate function (SUM in this case) and COUNT(*) cannot be used but COUNT(<column>) can.

The reason a common table expression is used to select only the necessary columns is because all other columns beside the Spreading and Aggregate columns will become the Grouping element which may not necessary be what you want.

Continue reading

SQL Query Result Order

Every time an SQL query is run without the ORDER BY clause, the sort order of the results returned is not guaranteed.

SQL Server decides which is the best order to return the query. This is based on factors like whether indexes are used or data fragmentation on the physical drive because SQL Server will attempt to return the result in the most efficient way.

If you run the same query a few times in the same environment, you may find that the sort order remains the same. It does not mean the sort order is guaranteed but that SQL Server went through processing the data in the same way.

The same query may not retrieve the result in the same order if indexes have been rebuilt or the database is on another server or many other factors.

Just a point to note.

SQL Logical Query Processing Order

This probably applies more to beginners or people who have been using SQL for a while but have been wondering why sometimes the query works in a certain way and sometimes it didn’t.

Before I read the book for the exam 70-461, I found it relatively easy to just plonk in some code in SQL Server Management Studio and Voila! it works. Well, not all the time.

For example, this works:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE YEAR(orderdate) = 2013

But this doesn’t:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE OrderYear = 2013

Most of us will normally just go: “Hey, it’s no biggie, I’ll just use the former and then find out why later.”, but most often we will be too busy to do so. ­čÖé

Here is the “keyed-in order” of a standard SQL Query:

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY

This is how it is supposed to be typed so that it does not throw a syntax error.

This is how SQL Server processes the query:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

See the difference?

Let’s reference back to the first example:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE OrderYear = 2013

Since the query is processed in that order, SQL Server processes the WHERE clause before the SELECT clause and it has no clue of the alias “OrderYear”.. hence it fails.

Another common way to write queries with the ORDER BY clause is by using the alias:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders ORDER BY OrderYear

Again referring back to the Logical Query Processing Order, this query works because ORDER BY is processed after the SELECT clause, therefore SQL Server is fully aware of the alias “OrderDate”.

What are the benefits?

In my experience, I often re-format datetime field into something more readable, but this makes the query longer. Therefore to make code more elegant, I use the alias:

SELECT customer, CONVERT(VARCHAR(10), orderdate, 112) AS NewOrderDate FROM orders ORDER BY NewOrderDate

I find that knowing this makes me more efficient as I am coding with this in mind and make less mistakes.

If this helps anyone, do let me know. Thank you.

 

 

 

SQL and Relational Theory

Do you know the foundation of SQL?

According to the book Querying Microsoft SQL Server 2012 Training Kit on the Microsoft label, T-SQL is a Microsoft flavour of the more standard ANSI and ISO version of SQL which is based on a branch of mathematics called Relational Model.

Relational Model is further broken down into two theories – Set Theory and Predicate Logic.

Set Theory says that interaction with a set is not with the individual elements but with the set of elements on a whole. A set consists of a collection of distinct, definite objects.

Predicate Logic is when an expression evaluates out to true or false.

The combination of Set Theory and Predicate Logic gives you the foundation of SQL.

How is this useful?

Set theory implies that all elements are unique (distinct) and the order does not matter. Therefore, when constructing T-SQL queries, one considers the set of elements as a whole when manipulating data. This means that you are already walking in the right direction. Now you need to decide if you are trying to get results fast for a one off query, or optimising the results for repeat use (as used in business reports). Only deviate from standard relational model when it finally comes to presentation. For example, by adding the ORDER BY clause to sort the data.