Remember the good ol’ days of going through a list of addresses and manually cleaning up hundreds if not thousands of addresses to prepare for a snail mail merge? What? You’re still doing it now in the 21st century? Just kidding.
Chances are, you will need to be able to detect duplicates. I find that perhaps the hardest task to accomplish is to manually eyeball the data for duplicates because duplicates seldom mean that the data is THE SAME. More likely, data is SIMILAR but noticeably different in subtle little ways. For example, Address has been split into Address1, Address2… and so on. Or someone decides to register again but adding a middle name this time.
Whatever the case, I find that using logic (in my head) and SSIS, I can prepare a more thorough cleaning of the data. Here’s my Data Flow Model:
And here’s the Control Flow diagram showing that I read data from a flat file (.csv) and then update the Database everytime I run this SSIS Package. I suggest not disabling the update tasks even though there may not be changes because if there are changes and you forget to re-enable the tasks, then changes will not be applied!
Proper Case transformation for text.
There isn’t a built in Proper Case Transformation but the function is built into VisualBasic so you just need to include that in your reference.
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
public class ScriptMain : UserComponent
public override void PreExecute()
Add your code here for preprocessing or remove if not needed
public override void PostExecute()
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
public override void Input0_ProcessInputRow(Input0Buffer Row)
Row.DerivedFirstName = Strings.StrConv(Row.FirstName, VbStrConv.ProperCase, 0);
Row.DerivedName = Strings.StrConv(Row.Name, VbStrConv.ProperCase, 0);
Row.DerivedAddress1 = Strings.StrConv(Row.AddressLine1, VbStrConv.ProperCase, 0);
Row.DerivedAddress2 = Strings.StrConv(Row.AddressLine2, VbStrConv.ProperCase, 0);
Row.DerivedCity = Strings.StrConv(Row.City, VbStrConv.ProperCase, 0);
Row.DerivedState = Strings.StrConv(Row.State, VbStrConv.ProperCase, 0);
Row.DerivedPostCode = Strings.StrConv(Row.Postcode, VbStrConv.ProperCase, 0);
Row.DerivedCountry = Strings.StrConv(Row.Country, VbStrConv.ProperCase, 0);
Happy mail merging!