Category Archives: SQL

Using PIVOT function without a CTE

There are some cases when you need to use a PIVOT function in SQL but you cannot use a CTE. One such case is when you create a named query in a SSAS cube.

To do so, place the first part of the CTE query as a subquery of the Non-CTE query as shown here. Remember to add an alias to the subquery. The PIVOT portion of the code remains the same.

PIVOTwithoutCTE

The basic understanding of the PIVOT function can be found here.

Note: It may increase query performance when joining another table to the pivoted table to use join hints. e.g. LEFT OUTER HASH JOIN

Microsoft Dynamics AX 2012 – Multiple Rows In Demand Forecast Table [FORECASTSALES] for each item

Why are there multiple rows for each item in the Demand Forecast Table in Microsoft Dynamics AX 2012?

This is helpful to know for BI for cubes that aggregate the data in the [FORECASTSALES] table.

Basically the overview line and each exploded line of the same item is stored in the same [FORECASTSALES] table. Therefore, if any measure in this table needs to be aggregated per item, filter out just the overview rows. You can do this by filtering the column [EXPANDID] = 0.

Continue reading

Understanding the PIVOT function

The PIVOT function is very useful but not easily understood at first. I thank Itzik Ben-Gan, Dejan Sarka and Ron Talmage and their study guide Exam 70-461: Querying Microsoft SQL Server 2012 for the brilliant explanation.

Here’s their example replicated here. Remember there are 3 columns used in PIVOT. The Grouping column is what you want each row to be aggregated to. The Spreading column becomes the columns after PIVOTing, and the Aggregate columns are the measures.

You need a pivot aggregate function (SUM in this case) and COUNT(*) cannot be used but COUNT(<column>) can.

The reason a common table expression is used to select only the necessary columns is because all other columns beside the Spreading and Aggregate columns will become the Grouping element which may not necessary be what you want.

Continue reading

Get list of dates using recursive Common Table Expression (CTE)

Many times we need a list of consecutive dates or time at runtime mainly for a recursive ETL process. This code provides a list almost instantaneously with only the start and end dates being the only input. The output is in Date format but is easily extended to a DateTime format.

For best results, you can use this in a CTE as part of the solution so that it does not take up space in the database.

The MaxRecursion option ensures that the recursion does not go on forever, however the Per Minute and Per Second versions will definitely overrun the MaxRecursion option.

Note: The code for each one are similar, the only difference is the first parameter of the DATEADD function which determines the recursive detail.

Daily
Monthly
Yearly
Hourly
Per Minute
Per Second
Continue reading

Building a simple forecast solution on SQL Server

Pricing Forecast is a marketing function that requires a lot of data crunching.

Normally there are some factors that the Marketing team wants to adjust to find out just the optimum pricing for their products.

Accordingly, the forecast solution needs to be fairly flexible but still be powerful enough to make the necessary and sometimes even complex calculation. Otherwise, if the design is too rigid and requirements change slightly, then major changes in design need to be made, which in turn means whole rounds of testing.

Continue reading

SQL Query Result Order

Every time an SQL query is run without the ORDER BY clause, the sort order of the results returned is not guaranteed.

SQL Server decides which is the best order to return the query. This is based on factors like whether indexes are used or data fragmentation on the physical drive because SQL Server will attempt to return the result in the most efficient way.

If you run the same query a few times in the same environment, you may find that the sort order remains the same. It does not mean the sort order is guaranteed but that SQL Server went through processing the data in the same way.

The same query may not retrieve the result in the same order if indexes have been rebuilt or the database is on another server or many other factors.

Just a point to note.

SQL Logical Query Processing Order

This probably applies more to beginners or people who have been using SQL for a while but have been wondering why sometimes the query works in a certain way and sometimes it didn’t.

Before I read the book for the exam 70-461, I found it relatively easy to just plonk in some code in SQL Server Management Studio and Voila! it works. Well, not all the time.

For example, this works:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE YEAR(orderdate) = 2013

But this doesn’t:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE OrderYear = 2013

Most of us will normally just go: “Hey, it’s no biggie, I’ll just use the former and then find out why later.”, but most often we will be too busy to do so. 🙂

Here is the “keyed-in order” of a standard SQL Query:

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY

This is how it is supposed to be typed so that it does not throw a syntax error.

This is how SQL Server processes the query:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

See the difference?

Let’s reference back to the first example:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders WHERE OrderYear = 2013

Since the query is processed in that order, SQL Server processes the WHERE clause before the SELECT clause and it has no clue of the alias “OrderYear”.. hence it fails.

Another common way to write queries with the ORDER BY clause is by using the alias:

SELECT customer, YEAR(orderdate) AS OrderYear FROM orders ORDER BY OrderYear

Again referring back to the Logical Query Processing Order, this query works because ORDER BY is processed after the SELECT clause, therefore SQL Server is fully aware of the alias “OrderDate”.

What are the benefits?

In my experience, I often re-format datetime field into something more readable, but this makes the query longer. Therefore to make code more elegant, I use the alias:

SELECT customer, CONVERT(VARCHAR(10), orderdate, 112) AS NewOrderDate FROM orders ORDER BY NewOrderDate

I find that knowing this makes me more efficient as I am coding with this in mind and make less mistakes.

If this helps anyone, do let me know. Thank you.

 

 

 

SQL and Relational Theory

Do you know the foundation of SQL?

According to the book Querying Microsoft SQL Server 2012 Training Kit on the Microsoft label, T-SQL is a Microsoft flavour of the more standard ANSI and ISO version of SQL which is based on a branch of mathematics called Relational Model.

Relational Model is further broken down into two theories – Set Theory and Predicate Logic.

Set Theory says that interaction with a set is not with the individual elements but with the set of elements on a whole. A set consists of a collection of distinct, definite objects.

Predicate Logic is when an expression evaluates out to true or false.

The combination of Set Theory and Predicate Logic gives you the foundation of SQL.

How is this useful?

Set theory implies that all elements are unique (distinct) and the order does not matter. Therefore, when constructing T-SQL queries, one considers the set of elements as a whole when manipulating data. This means that you are already walking in the right direction. Now you need to decide if you are trying to get results fast for a one off query, or optimising the results for repeat use (as used in business reports). Only deviate from standard relational model when it finally comes to presentation. For example, by adding the ORDER BY clause to sort the data.

Address Clean-up for Mail Merge using SSIS

Remember the good ol’ days of going through a list of addresses and manually cleaning up hundreds if not thousands of addresses to prepare for a snail mail merge? What? You’re still doing it now in the 21st century? Just kidding.

Chances are, you will need to be able to detect duplicates. I find that perhaps the hardest task to accomplish is to manually eyeball the data for duplicates because duplicates seldom mean that the data is THE SAME. More likely, data is SIMILAR but noticeably different in subtle little ways. For example, Address has been split into Address1, Address2… and so on. Or someone decides to register again but adding a middle name this time.

Whatever the case, I find that using logic (in my head) and SSIS, I can prepare a more thorough cleaning of the data. Here’s my Data Flow Model:

And here’s the Control Flow diagram showing that I read data from a flat file (.csv) and then update the Database everytime I run this SSIS Package. I suggest not disabling the update tasks even though there may not be changes because if there are changes and you forget to re-enable the tasks, then changes will not be applied!

Proper Case transformation for text.

There isn’t a built in Proper Case Transformation but the function is built into VisualBasic so you just need to include that in your reference.

 

/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.VisualBasic;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

public override void PreExecute()
{
base.PreExecute();
/*
Add your code here for preprocessing or remove if not needed
*/
}

public override void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.DerivedFirstName = Strings.StrConv(Row.FirstName, VbStrConv.ProperCase, 0);
Row.DerivedName = Strings.StrConv(Row.Name, VbStrConv.ProperCase, 0);
Row.DerivedAddress1 = Strings.StrConv(Row.AddressLine1, VbStrConv.ProperCase, 0);
Row.DerivedAddress2 = Strings.StrConv(Row.AddressLine2, VbStrConv.ProperCase, 0);
Row.DerivedCity = Strings.StrConv(Row.City, VbStrConv.ProperCase, 0);
Row.DerivedState = Strings.StrConv(Row.State, VbStrConv.ProperCase, 0);
Row.DerivedPostCode = Strings.StrConv(Row.Postcode, VbStrConv.ProperCase, 0);
Row.DerivedCountry = Strings.StrConv(Row.Country, VbStrConv.ProperCase, 0);
}

}

Happy mail merging!