Jan 082014
 

On mssqltips.com you can find my latest post Parallel execution of child SQL Server Integration Services Packages using For Loop Containers.

The post describes a very simple way how to parallelize execution of child packages in SSIS. There exists third party components for parallel execution, but sometimes they are costly and some time they do not work exactly as expected. The for loop containers allow us to implement it using purely using the building components and at no costs with exactly defined behavior.

 Posted by at 7:07 am
Jan 082014
 

On mssqltips.com you can find my post Exporting CLR Assemblies from SQL Server back to .dll files.

This post describes how you can export the assemblies from SQL server back to physical .dll files. This can be very useful in cases when you do not have the source code for he assemblies inside the DB and you would like to do a security review of the assembly. This commonly happens if you receive the assembly as BYTE code in T-SQL.

Nov 292013
 

On www.mssqltips.com (Reading SharePoint Surveys and Lists using a SQL Server Integration Services SSIS Package) you can find an updated version of my original post Reading SharePoint List and Survey data using Integration Services package.

The updated version on mssqltips describes an easier way of reading the Surveys and Lists data with simplified use of URL protocol an more elegant way of final data extraction using the T-SQL XML methods.

Post also contains complete demo solution.

Nov 062013
 

On www.mssqltips.com you can find my latest tip Report launcher to run SSRS report subscriptions on demand.

SSRS Subscriptions are great feature, but if you would like to run the subscription on demand out of the predefined schedule, it can be a tricky and you will start to dig into the SSRS database and agent jobs.

The tip describes, how you can easily create a set of simple SSRS reports, which allow very easy on demand launching of subscriptions.

Nov 062013
 

On www.mssqltips.com you can find my tip Custom Attribute Names in Role Playing Dimensions for SSAS.

It describes details how to partially overcome the limitation of SSAS that it does not support custom attribute names in role playing dimensions.

 

Jun 272012
 

Recently I was going through the archives of posts by Kalen Delaney blog and I came across very interesting post Did you know? — Altering the length of a fixed-length column.  This article gives information how SQL Server is wasting space when you alter a fixed length column and increase its length.

It is interesting so I wanted to take a closer look on this and take a look on the physical db pages to see what happens and to see how the data are stored after alter is done.

Test data preparation and initial view of the data

So first let’s prepare a testing table with some testing data.

CREATE TABLE dbo.AlterTest (
    ID int NOT NULL IDENTITY(1,1),
    col1 char(2000),
    col2 char(1000),
    col3 int
)
GO
INSERT INTO dbo.AlterTest (col1, col2, col3)
VALUES('aaa', 'bbb', ABS(BINARY_CHECKSUM(NEWID())))
GO 4

Now if we take a look on the columns information in metadata tables using Kalen’s query to sys.system_internal_partition_columns we can see the offsets of the table columns.

SELECT  
    c.name AS column_name, 
    column_id, 
    max_inrow_length,
    pc.system_type_id, 
    leaf_offset 
FROM sys.system_internals_partition_columns pc
INNER JOIN sys.partitions p ON p.partition_id = pc.partition_id 
INNER JOIN sys.columns c ON column_id = partition_column_id AND c.object_id = p.object_id
WHERE p.object_id=object_id('AlterTest');
column_name   column_id   max_inrow_length system_type_id leaf_offset
------------- ----------- ---------------- -------------- -----------
ID            1           4                56             4
col1          2           2000             175            8
col2          3           1000             175            2008
col3          4           4                56             3008

As we can see, the physical order is the order in which the columns were defined using the CREATE TABLE statement.

Once we have the data in the table, let’s take a look on how the data are stored. The below query will work only on SQL Server 2008+ and is using undocumented virtual column %%physloc%% which provides information about rows physical location in database and undocumented function sys.fn_PhysLocCracker, which cracks the physical location to human readable FileID, PageID and SlotID.

SELECT
    DB_ID() AS DBID
    ,pl.*
    ,t.*
FROM dbo.AlterTest t
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) pl
DBID   file_id     page_id     slot_id     ID          col1    col2    col3
------ ----------- ----------- ----------- ----------- ------- ------ -----------
16     1           168         0           1           aaa     bbb    587800818
16     1           168         1           2           aaa     bbb    1396332306
16     1           171         0           3           aaa     bbb    844570652
16     1           171         1           4           aaa     bbb    1332601405

Let’s take a look on the first physical page 168.

dbcc traceon (3604,-1)
GO
dbcc page(16,1,168,3)
GO

The partial results are below

PAGE: (1:168)

BUFFER:

BUF @0x00000005011003C0

bpage = 0x000000048C308000          bhash = 0x0000000000000000          bpageno = (1:168)
bdbid = 16                          breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 44191                       bstat = 0xb
blog = 0xab21cccc                   bnext = 0x0000000000000000          

PAGE HEADER:

Page @0x000000048C308000

m_pageId = (1:168)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 98    m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594044350464                                
Metadata: PartitionId = 72057594039959552                                Metadata: IndexId = 0
Metadata: ObjectId = 549576996      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 3012                      m_slotCnt = 2                       m_freeCnt = 2062
m_freeData = 6126                   m_reservedCnt = 0                   m_lsn = (37:227:3)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED              
PFS (1:1) = 0x62 MIXED_EXT ALLOCATED  80_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 3015

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 3015

Memory Dump @0x000000002054A060

0000000000000000:   1000c40b 01000000 61616120 20202020 20202020  ..Ä.....aaa         
0000000000000014:   20202020 20202020 20202020 20202020 20202020                      
0000000000000028:   20202020 20202020 20202020 20202020 20202020                      
...
...
00000000000007BC:   20202020 20202020 20202020 20202020 20202020                      
00000000000007D0:   20202020 20202020 62626220 20202020 20202020          bbb         
00000000000007E4:   20202020 20202020 20202020 20202020 20202020                      
...
...
0000000000000BA4:   20202020 20202020 20202020 20202020 20202020                      
0000000000000BB8:   20202020 20202020 f2200923 040000                     ò 	#...

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

ID = 1                              

Slot 0 Column 2 Offset 0x8 Length 2000 Length (physical) 2000

col1 = aaa                                                                                                       

Slot 0 Column 3 Offset 0x7d8 Length 1000 Length (physical) 1000

col2 = bbb                                                                                                       

Slot 0 Column 4 Offset 0xbc0 Length 4 Length (physical) 4

col3 = 587800818                    

Slot 1 Offset 0xc27 Length 3015

We can see, that the physical storage corresponds to the metadata stored in the system table sys.system_internal_partition_columns.

Altering the column length and analyzing impacts

Now let’s alter the table and increase the length of Col1 to 3000 characters and take a look what happens.

ALTER TABLE dbo.AlterTest
    ALTER COLUMN Col1 char(3000)
GO

If we take a look on the sys.system_iternal_partition_columns we will see following:

column_name   column_id   max_inrow_length system_type_id leaf_offset
------------- ----------- ---------------- -------------- -----------
ID            1           4                56             4
col2          3           1000             175            2008
col3          4           4                56             3008
col1          2           3000             175            3012

We can see, that the offset of the Col1 has changed and the Column was moved to the end of the record. From here we can see, that the original 2000 bytes were wasted. If we take a look on the physical page 168 as above, we will see, that there is no change in the physical page as the this change to the column is metadata change only.

Impact on new records

So let’s take a look what impact this have on new records added to the table.

INSERT INTO dbo.AlterTest (col1, col2, col3)
VALUES('ccc', 'ddd', ABS(BINARY_CHECKSUM(NEWID())))
GO 4
SELECT
    DB_ID() AS DBID
    ,pl.*
    ,t.*
FROM dbo.AlterTest t
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) pl
DBID   file_id     page_id     slot_id     ID          col1   col2  col3
------ ----------- ----------- ----------- ----------- ------ ----- -----------
16     1           168         0           1           aaa    bbb   587800818
16     1           168         1           2           aaa    bbb   1396332306
16     1           171         0           3           aaa    bbb   844570652
16     1           171         1           4           aaa    bbb   1332601405
16     1           175         0           5           ccc    ddd   2059368981
16     1           177         0           6           ccc    ddd   1449062892
16     1           178         0           7           ccc    ddd   267569086
16     1           179         0           8           ccc    ddd   1325350591

From the results we can see, that the first 4 records originally inserted are occupied only two pages as two records were stored per database page. After the update we can see that each single record is occupied its own page. This is due to the fact that the record length has increased not only by the 1000 characters by which the length of the Col1 was modified but also the original 2000 bytes were wasted. there fore the data length on the page increased from 3008 bytes to 6008 bytes.

Now let’s take a closer look on the physical page. For example the first page occupied by the newly inserted data  (page 175).

dbcc page(16,1,175,3)
GO

Partial results of the DBCC command are here:

PAGE: (1:175)

BUFFER:

BUF @0x0000000503366540

bpage = 0x00000004D7F84000          bhash = 0x000000067753ED81          bpageno = (1:175)
bdbid = 16                          breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 45327                       bstat = 0x10b
blog = 0x212121cc                   bnext = 0x0000000000000000          

PAGE HEADER:

Page @0x00000004D7F84000

m_pageId = (1:175)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 98    m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594044350464                                
Metadata: PartitionId = 72057594039959552                                Metadata: IndexId = 0
Metadata: ObjectId = 549576996      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 6012                      m_slotCnt = 1                       m_freeCnt = 2079
m_freeData = 6111                   m_reservedCnt = 0                   m_lsn = (37:390:11)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = NOT ALLOCATED          
PFS (1:1) = 0x62 MIXED_EXT ALLOCATED  80_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 6015

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 6015

Memory Dump @0x000000002267A060

0000000000000000:   10007c17 05000000 df5af8d7 04000000 ffffffff  ..|.....ßZø×....ÿÿÿÿ
0000000000000014:   ffffff7f 9860f8d7 04000000 0040f8d7 04000000  ÿÿÿ..`ø×.....@ø×....
0000000000000028:   01000000 00000000 63000000 00000000 dd7618f4  ........c.......Ýv.ô
0000000000000794:   00000000 00000000 00000000 00000000 01000000  ....................
00000000000007A8:   000071d8 fe070000 c072210e 00000000 00000000  ..qØþ...Àr!.........
00000000000007BC:   00000000 00000000 00000000 00000000 00000000  ....................
00000000000007D0:   00000000 00000000 64646420 20202020 20202020  ........ddd         
00000000000007E4:   20202020 20202020 20202020 20202020 20202020                      
0000000000000BA4:   20202020 20202020 20202020 20202020 20202020                      
0000000000000BB8:   20202020 20202020 157abf7a 63636320 20202020          .z¿zccc     
0000000000000BCC:   20202020 20202020 20202020 20202020 20202020                      
0000000000000BE0:   20202020 20202020 20202020 20202020 20202020                      
000000000000175C:   20202020 20202020 20202020 20202020 20202020                      
0000000000001770:   20202020 20202020 20202020 050000                         ...

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

ID = 5                              

Slot 0 Column 67108865 Offset 0x8 Length 0 Length (physical) 2000

DROPPED = NULL                      

Slot 0 Column 3 Offset 0x7d8 Length 1000 Length (physical) 1000

col2 = ddd                                                                                                       

Slot 0 Column 4 Offset 0xbc0 Length 4 Length (physical) 4

col3 = 2059368981                   

Slot 0 Column 2 Offset 0xbc4 Length 3000 Length (physical) 3000

col1 = ccc

From the output above we can see, the now on the physical page, there is a Column 67108865 (DROPPED) following the Column1 and Column2 has physically moved to the end of the record at offset 0xbc4 (3012).  Also from the page dump we can see that the space from the offset 0x8 to 0x7d7 is containing a mess and that those 2000 bytes are wasted.

As we can see, the original query to sys.system_internals_partition_columns doesn’t show the Column with ID 67108865 (DROPPED). It’s because it uses join to the sys.columns and the DROPPED column is not part of the table, but is par tof the partition. If we use the query without join to the sys.columns it will be shown also in the query output.

SELECT 
    partition_column_id, 
    max_inrow_length,
    pc.system_type_id, 
    leaf_offset,
    is_dropped 
FROM sys.system_internals_partition_columns pc
INNER JOIN sys.partitions p ON p.partition_id = pc.partition_id 
WHERE p.object_id=object_id('AlterTest');
partition_column_id max_inrow_length system_type_id leaf_offset is_dropped
------------------- ---------------- -------------- ----------- ---------
1                   4                56             4           0
67108865            2000             175            8           1
3                   1000             175            2008        0
4                   4                56             3008        0
2                   3000             175            3012        0

It Seems that the dropped columns have IDs starting from 67108865 and the numbers increase as there are more dropped columns. Also the DROPPED columns have flag is_dropped = 1.

Multiple updates of the column size

In previous examples we took a look on the update to a single column and in Delaney’s post you can see the result if we want to update multiple column. In case we try to multiple columns, space allocated for all the original columns is dropped and new space allocated. But what happens if we update the same column multiple times? Let’s make a simple test.

CREATE TABLE AlterTest2 (
	id int not null identity(1,1),
	Col1 char(1000),
	Col2 char(1000),
	Col3 int
)
GO
ALTER TABLE AlterTest2
ALTER COLUMN Col1 char(2000)
GO
ALTER TABLE AlterTest2
ALTER COLUMN Col1 char(2500)
GO
SELECT 
    partition_column_id, 
    max_inrow_length,
    pc.system_type_id, 
    leaf_offset,
    is_dropped 
FROM sys.system_internals_partition_columns pc
INNER JOIN sys.partitions p ON p.partition_id = pc.partition_id 
WHERE p.object_id=object_id('AlterTest2');
partition_column_id max_inrow_length system_type_id leaf_offset is_dropped
------------------- ---------------- -------------- ----------- ----------
1                   4                56             4           0
67108865            1000             175            8           1
3                   1000             175            1008        0
4                   4                56             2008        0
67108866            2000             175            2012        1
2                   2500             175            4012        0

From the example we can see, that each change which increase the fixed column length causes that the original column space is dropped and additional space is allocated.

Recovering the wasted space

It is great, that extending the column width of fixed length column is a metadata only operation as it is very quick and avoids blocking especially on large tables, but on the other side as we can see, this can cause a significant space wasting.

In case there will be less new inserts into the table than the current about of rows we do not need to take care about the wasted space much (from the point of wasted storage space) as the metadata change didn’t affect the current records and on the current records we are saving the space as we have extended the record length and only the new records inserted are wasting the space.

On the other side, if we know there will be a lot of inserts and reads of the newly inserted records, than it’s a good idea to recover the wasted space as the data will consume more space and further reads will have to read more unnecessary data.

In the comments to the original post there are some suggestions how to resolve the problem.

From my perspective if the table is already clustered, we do not need to crop and recreate the clustered index, but it is enough to REBUILD the index. The rebuild operation will reorder the data and free up the wasted space and also write the original data with new record length.

In the case of heap, creating and dropping clustered index will be quite costly operation which will move the data twice. On small tables this doesn’t matter but on larger amounts of data it will be better to do simple SELECT * INTO newTable from aTable and than simply drop the original table and rename the new one to the original one.

SELECT
*
INTO dbo.AlterTest2
FROM dbo.AlterTest
GO
SELECT
    DB_ID() AS DBID
    ,pl.*
    ,t.*
FROM dbo.AlterTest2 t
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) pl
GO
DBID   file_id     page_id     slot_id     ID          col1  col2  col3
------ ----------- ----------- ----------- ----------- ----- ----- -----------
16     1           411         0           1           aaa   bbb   587800818
16     1           411         1           2           aaa   bbb   1396332306
16     1           412         0           3           aaa   bbb   844570652
16     1           412         1           4           aaa   bbb   1332601405
16     1           413         0           5           ccc   ddd   2059368981
16     1           413         1           6           ccc   ddd   1449062892
16     1           414         0           7           ccc   ddd   267569086
16     1           414         1           8           ccc   ddd   1325350591

We can see, that the new table has no wasted space and all pages are now allocated by two records.

Of course if we there are some foreign keys and indexes than those will have to be recreated. Anyway this will cost much less I/O and processing power than creating and dropping the clustered key

 Conclusion

As we can see fro the examples above, altering the fixed column length and increasing it, even it’s metadata  only operation causes, that the originally allocated space is dropped and new space is allocated in the row for all newly inserted or updated rows in the table. So be carefull when altering a fixed length columns especially when you are doing multiple alters to a single column as significat space can be wasted. Knowing the fact that the dropped columns have flag is_dropped = 1 in the sys.system_internals_partition_columns, we can use a below query to list all the tables containing DROPPED columns with wasted space.

SELECT distinct
	p.partition_id,
	p.object_id,
	o.name as table_name
FROM sys.system_internals_partitions p
INNER JOIN sys.objects o on p.object_id = o.object_id
INNER JOIN sys.system_internals_partition_columns pc ON p.partition_id = pc.partition_id
WHERE pc.is_dropped = 1
partition_id         object_id   table_name
-------------------- ----------- -----------
72057594040025088    1253579504  AlterTest2
72057594040090624    1269579561  AlterTest
Jun 122012
 

Among lot of new features introduced in SQL Server 2012 also a new Windowing functions were introduced. The new functionality allow us to use the ORDER BY clause in the OVER clause with aggregate functions and also new ROWS and RANGE clauses were introduced to limit rows. The ORDER BY allow us define the order of rows processing and the ROWS/RANGE clauses put limits on the rows being processed in partition. All the details related to the OVER clause you can find on MSDN: OVER Clause (Transact-SQL).

ROWS/RANGE clause

The ROWS clause limits the rows in a parittion by specifying a fixed number of rows preceding or folowing the current rows. The rows preceeding and following are determined by the order specified in the ORDER BY clause.

The limit can be specified by serveral methods:

  • <unsigned integer> PRECEDING -fixed number of preceding rows
  • CURRENT ROW – representing current row being processed
  • UNBOUNDED PRECEDING – all previous records
  • <unsigned integer> FOLLOWING – fixed number of following rows
  • UNBOUNDED FOLLOWING – all rows following current row

So we can specify the limits like

ROWS BETWEEN 3 PRECEEDING AND 1 FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING

RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RENGE CURRENT ROW

The RANGE clause can be only used with the UNBOUNDED limit and CURRENT ROW. The difference between ROWS and RANGE clause is, that ROWS works with physical rows and RANGE works with range of rows based on the current row value in the terms of ORDER BY clause. This means that for ROWS clause the CURRENT ROW represents the only current row being processed. For RANGE the CURRENT ROW represents all the rows with the same value in the fields specified in the ORDER BY clause within current partition as the current row being processed. So if we use RANGE and multiple rows have the same rank in the terms of order within the partition, then all those rows will represent current row.

When there is no ROWS/RANGE clause specified after the ORDER BY clause, then the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used by SQL Server.

Samples how to use the window functions

Let’s take a look on a few samples, how we can use the window functions and what results they will provide.

Test data preparation

To be able to test the new functionality

--======================
-- Create test database
--======================
CREATE DATABASE WindowFunctionsTest
GO
USE WindowFunctionsTest
GO 

--Create Testing Tables
CREATE TABLE [dbo].[Accounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
CREATE TABLE [dbo].[MultiAccounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[AccountID] [int] NOT NULL,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
--Fill test tables with data
INSERT INTO [dbo].[Accounts](
	[TransactionDate],
	[Balance]
)
SELECT '2000-1-1', 100 UNION ALL
SELECT '2000-1-1', -50 UNION ALL
SELECT '2000-1-2', 200 UNION ALL
SELECT '2000-1-3', 500 UNION ALL
SELECT '2000-1-4', -200 UNION ALL
SELECT '2000-1-5', 1000 UNION ALL
SELECT '2000-1-5', -300 UNION ALL
SELECT '2000-1-6', -300 UNION ALL
SELECT '2000-1-7', -200 UNION ALL
SELECT '2000-1-8', 2000 UNION ALL
SELECT '2000-1-9', 100 UNION ALL
SELECT '2000-1-10', -50 UNION ALL
SELECT '2000-1-10', 500 UNION ALL
SELECT '2000-1-11', 200 UNION ALL
SELECT '2000-1-12', 200 UNION ALL
SELECT '2000-1-13', 1000 UNION ALL
SELECT '2000-1-14', 1000 UNION ALL
SELECT '2000-1-15', -500 UNION ALL
SELECT '2000-1-15', -300 UNION ALL
SELECT '2000-1-16', 1000 UNION ALL
SELECT '2000-1-17', 1000 UNION ALL
SELECT '2000-1-18', -800 UNION ALL
SELECT '2000-1-19', 2000 UNION ALL
SELECT '2000-1-20', -1000
GO

INSERT [dbo].[MultiAccounts] (
	[AccountID],
	[TransactionDate],
	[Balance]
)
SELECT 1, '2000-1-1', 100 UNION ALL
SELECT 1, '2000-1-1', -50 UNION ALL
SELECT 1, '2000-1-2', 200 UNION ALL
SELECT 1, '2000-1-3', 500 UNION ALL
SELECT 1, '2000-1-4', -200 UNION ALL
SELECT 1, '2000-1-5', 1000 UNION ALL
SELECT 1, '2000-1-5', -300 UNION ALL
SELECT 1, '2000-1-6', -300 UNION ALL
SELECT 1, '2000-1-7', -200 UNION ALL
SELECT 2, '2000-1-1', 2000 UNION ALL
SELECT 2, '2000-1-2', 100 UNION ALL
SELECT 2, '2000-1-3', -50 UNION ALL
SELECT 2, '2000-1-4', 500 UNION ALL
SELECT 2, '2000-1-5', 200 UNION ALL
SELECT 2, '2000-1-6', 200 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 3, '2000-1-1', 800 UNION ALL
SELECT 3, '2000-1-2', -300 UNION ALL
SELECT 3, '2000-1-3', 1000 UNION ALL
SELECT 3, '2000-1-4', 1000 UNION ALL
SELECT 3, '2000-1-5', -800 UNION ALL
SELECT 3, '2000-1-6', 2000 UNION ALL
SELECT 3, '2000-1-7', -1000
GO

Window functions samples

If we try any of below queries they will provide the same results

--Using the ROWS clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as abowe the ROWS UNBOUNDED PRECEDING will be completed by SQL Server to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--Using the RANGE Clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as above as RANGE UNBOUNDED PRECEDING AND CURRENT ROW will be complete by SQL Server as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--NO ROWS/RANGE Clause (SQL Server will use the DEFAULT RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Results are below and we can see, a correct cumulative balance is calculated.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

ROWS clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate
GO

Results will be the same as in previous example. They are the same because there is no parallelism and we have CLUSTERED INDEX on the TransactionID (otherwise the final order could be different because the order of rows with the same TransactionDate is not guaranteed here.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
7             2000-01-05 00:00:00.000 -300                   1250
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

RANGE Clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Here we see, that the results are quite different. The final sum is the same, but the intermediate are not.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    50
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1250
7             2000-01-05 00:00:00.000 -300                   1250
8             2000-01-06 00:00:00.000 -300                   950
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
17            2000-01-14 00:00:00.000 1000                   5700
18            2000-01-15 00:00:00.000 -500                   4900
19            2000-01-15 00:00:00.000 -300                   4900
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

Here we can see, that the RANGE works as described above. All rows with the same value in the ORDER BY clause are considered as current row. Therefore for the dates ‘2000/01/01′ ,  ‘2000/01/05′ and ‘2000/01/15′ the values for each date are the same.

Working with FOLLOWING Rows

All the examples above worked with current row and all previous rows. Except this we can even work with rows following current row in particular order.

Here are a few other examples incorporating also FOLLOWING rows.

--Sum of current row and all following rows
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER () AS FinalBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

Example with Partitioning results

In previous examples we have worked with single partition. The OVER clause also allows partitioning the results. So let see some a few examples with partitioning.

--Sum of current row and all following rows partitioned, by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row, partitioned by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID]) AS FinalBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all preceeding and current row, partitioned by AccountID order is based only on TransactionDate - using RANGE
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate

When to use ROWS and when RANGE

Now we can ask, when we should use the ROWS clause and when RANGE clause to limit the rows. The answer comes fro the definition how the ROWS and RANGE clauses works. As described, ROWS works with each unique rows and RANGE handles multiple rows with the same order position as current row.

So in case the combination of fields specified in the ORDER BY clause does not uniquely specify the order of rows (as in case of examples above when only TransactionDate was used), then you should use RANGE, as the processing order of rows with the same order position is not guaranteed. In case the rows are uniquely identified, then ROWS should be used as there are no rows with the same order in the partition.

Conclusion

The new windowing functions brings new possibilities in writing T-SQL queries can simplify a lot of tasks which were problematic to write without these constructs. It allow us to bypass the recursive CTE, other solutions for calculation of running totals or averages without knocking down the server and also allow us to bypass quirky updates, CLR solutions which have some pitfalls when are used.

In my next post I will take a closer look on the Running Totals problem when using this new windowing functionality. Also I will take a closer look on the query plans produced by those constructs and give some advices for using them.

Jun 062012
 

Probably you have come to situation when you have a SSIS project with higher count of packages and you need to add for example a parameter or variable to all or most of the packages.  Normally you have to open each package and add variable/parameter to all packages one by one and specify all the properties in each package.

Using the PowerShell will save you a lot of manual work and tons of clicks inside the SQL Server Data Tools (SSDT) or  Business Intelligence Development Studio (BIDS) as PowerShell will allow you to automate the manual work. It’s very easy to manipulate the packages by PowerShell as SSIS packages are pure XML Files and PowerShell has a good XML support.

For writing, testing and running PowerShell script I suggest you download and install PowerGUI which has great IntelliSense support and is probably best PowerShell editor available. Also I suggest you to install the Run AddOn which allows you to execute only selected part of script.

Also before you start executing the scripts, ensure that you have properly set ExecutionPolicy to allow script execution. You have to run it with elevated Administrator privileges.

#Enable Local not signed Script Execution
Set-ExecutionPolicy RemoteSigned

Example how to manipulate SSIS using PowerShell

In this example we will manipulate SSIS 2012 packages and we will add a new parameter to the packages.

First thing you have to do is to retrieve list of SSIS package to be processed. This can be easily done by the Get-ChildItem cmdlet.

$files = Get-ChildItem "C:MySSISProjects*.dtsx"

Once we have the list of file to be processed we can start processing file by file using the foreach structure. In this example we will add a SSISTaskID package parameter.

#Get List of all relevat SSIS Packages
$files = Get-ChildItem "C:MyProjectsMySSISProject*.dtsx"

#process file by file
foreach($file in $files)
{
	#get the content of SSIS package as XML
	$dts = [xml](Get-Content $file.FullName)

	#create XmlNamespaceManager
	$mng = [System.Xml.XmlNamespaceManager]($dts.NameTable)
	#add a DTS namespace to the XmlNamespaceManager
	$mng.AddNamespace("DTS", "www.microsoft.com/SqlServer/Dts")

	#use XPath query to get DTS:PackageParameters node
	$params = $dts.SelectSingleNode("/DTS:Executable/DTS:PackageParameters[1]", $mng)

	#use XPath query to get eventual existing SSISTaskID parameter
	$param = $dts.SelectSingleNode("/DTS:Executable/DTS:PackageParameters/DTS:PackageParameter[@DTS:ObjectName='SSISTaskID'][1]", $mng)

	if ($param -eq $null) #parameter does not exists
	{
		#Create New Package parameter with prefix DTS and correct namespace uri
		$param = $dts.CreateElement("DTS", "PackageParameter", "www.microsoft.com/SqlServer/Dts")

		$attr = $dts.CreateAttribute("DTS", "CreationName", "www.microsoft.com/SqlServer/Dts")
		$param.Attributes.Append($attr)

		#Parameter Data Type
		$attr = $dts.CreateAttribute("DTS", "DataType", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "3" #DataType = 3 is equal to integer
		$param.Attributes.Append($attr)

		##Description
		$attr = $dts.CreateAttribute("DTS", "Description", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "This is a TaskID"
		$param.Attributes.Append($attr)

		#DTSID - unique GUID for created parameter
		$attr = $dts.CreateAttribute("DTS", "DTSID", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "{" + [System.Guid]::NewGuid().toString().ToUpper() + "}" #Each parameter has unique GUID in SSIS package, so let's generate one
		$param.Attributes.Append($attr)

		#parameter name
		$attr = $dts.CreateAttribute("DTS", "ObjectName", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "SSISTaskID"
		$param.Attributes.Append($attr)

		#Parameter Value property - setting the parameter value
		$paramValue = $dts.CreateElement("DTS", "Property", "www.microsoft.com/SqlServer/Dts");

		#setting attributes of Parameter Value Property
		$attr = $dts.CreateAttribute("DTS", "DataType", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "3"
		$paramValue.Attributes.Append($attr)

		$attr = $dts.CreateAttribute("DTS", "Name", "www.microsoft.com/SqlServer/Dts")
		$attr.Value = "ParameterValue"
		$paramValue.Attributes.Append($attr)

		#Set the value of the ParameterValue property
		$paramValue.InnerText = "0"

		#add the property to the Parameter
		$param.AppendChild($paramValue)		

		#Add the Parameter to the Params collection
		$params.AppendChild($param)
	}		

	################################################################################################
	#  Once the Parameter is created it is time to write the package back to disk
	################################################################################################

	#Create XmlWriterSettings as we want to format the SSIS package properly
	[System.Xml.XmlWriterSettings] $settings = New-Object -TypeName System.Xml.XmlWriterSettings

	$settings.Indent = $true #Indent child nodes
	$settings.NewLineOnAttributes = $true #put each attribude on separate line

	#Create XmlWriter with appropriate settings and path to overwrite the original package
	[System.Xml.XmlWriter]$writer = [System.Xml.XmlWriter]::Create($file.FullName, $settings)

	#Save the package using the XmlWriter
	$dts.Save($writer)

	#Close the writer
	$writer.Close()
}

In the script we are using XPath query to get relevant Parameter Nodes. This can be easily used for locating whatever node in the XML source is necessary. For more details related to XPath you can take a look  XPath Reference on MSDN. I our example we take first DTS:PackageParameters node under the DTS:Executable node.

For correct querying we have to use XmlNameSpaceManager as all the nodes in SSIS Package are prefixed with DTS name space.

For checking whether the parameter we would like to add exists or not we again use XPath with appropriate filters to find it by Name.

Then if the parameter doesn’t exists, we create a new one. What attributes and child nodes are necessary for such parameter you find out easy. Create the parameter using BIDS or SSDT in one of the SSIS packages, then take a look into the source code of the package and you will be able to find all the information necessary. For example in our case the parameter node in the source XML looks as below:

<DTS:PackageParameter
    DTS:CreationName=""
    DTS:DataType="3"
    DTS:Description="This is a TaskID"
    DTS:DTSID="{3252C760-114A-4E8F-86EB-D3E6BAACA0E4}"
    DTS:ObjectName="SSISTaskID">
    <DTS:Property
    DTS:DataType="3"
    DTS:Name="ParameterValue">0</DTS:Property>
</DTS:PackageParameter>

So we had to create a PackageParameterNode with CreationName, DataType, Description, DTSID and ObjectName attributes and also with child node Property, which represents default value of the parameter.

As each parameter, variable or whatever part in SSIS has it’s own unique DTSID property, which is GUID, we have used the [System.Guid]::NewGuid() method to generate unique DTSID for our parameter.

Finally once the changes into the SSIS package are applied, it is necessary to write it back to the file system. By default the source of the SSIS package is indented and and each attribute is written on separate line. SSIS packages are written in that way to be more human readable and allow an easy processing and comparison by versioning systems.

To achieve the same formatting we have to create and use instance of the XmlWriter class with appropriate XmlWriterSettings. As you can see from the script we set Indent and NewLineOnAttributes properties to true.

Once the writer is constructed, we can write the package back to the file system. It is also important to close the writer once the saving is done otherwise you will keep open handles to the processed files.

In the example we are also overwriting the original package as we specify the $file.FullName as path for the XmlWriter class. If we want, we can provide whatever name for the package and save it under different name. In case we would like to add a “_modified” suffix to the file name, we can use:

[System.IO.Path]::Combine($file.DirectoryName, $file.BaseName + "_modified") + $file.Extension

Conclusion

As we can see from the example, the SSIS manipulation using PowerShell is quite easy and after a small training with writing this modification also very quick and will save you a lot of manual work if you have to modify a higher count of packages. You can alter the above script slightly to do nearly whatever modification is necessary.

This example was done on SSIS 2012 package source, but can be relatively easy applied also on SSIS 2005 and SSI 2005 although the XML source of those packages are not so easy readable as the source of SSIS 2012.

Apr 122012
 

In one of my previous posts SSIS 2008 Columns To XML Data Flow Transformation I have presented a custom SSIS Data Flow Transformation.

As mentioned in the post, to be able to use the component, it has to be put into PipelineComponents folder of the SSIS Installation (For SQL Server 2008 common location is C:Program FilesMicrosoft SQL Server100DTSPipelineComponents) and also it needs to be installed into GAC(Global Assembly Cache). The location in the PipelineComponents is important for the BIDS designer and the GAC installation is important for SSIS execution.

Placing the component assembly into the PipelineComponents folder is very straightforward as you simply copy/past the assembly .dll file into this directory.

On the other side, installation into the GAC can be a little bit tricky and depend on the OS and .Net Framework version for which the assembly is built.

Components for SQL Server 2005 and 2008 (.Net < 4.0)

If the component is build for SQL Server 2005 or 2008, then yo use .Net framework lower than 4.0 and the installation relatively easy.

Installation on Windows Server 2003 and Windows XP

Installation of the .Net 2.0 or .Net 3.5 assemblies on the Windows Server or Windows XP is relatively easy as the GAC is located in C:Windowsassembly directory and can be easily managed by the windows Explorer. Simply navigate into this directory and Drag-Drop the assembly .dll file(s) here.

If you need to uninstall the assembly, simply find your assembly in the GAC, right-click on it and choose Uninstall. You will need to uninstall the assembly if you would like to deploy an updated version (eg. because of bug fix or for whatever reason). In that case you have to uninstall such assembly first and the install it again.

Installation on Windows Server 2008, Vista ans Windows 7

Installation of assembly into GAC on the Windows Server 2008, Vista and Windows 7 can be a little bit trickier. However the assemtrickierbly has the same location as on Windows Server 2003 and Windows XP (C:Windowsassembly), those systems are missing the explorer integration for Installation purposes (you cannot Drag-Drop assemblies into that directory).

For that purpose you can use the gacutil.exe but this is not part of the standard installation. You have to install the Windows SDK. Once installed, then this utility is located in C:Program Files (x86)Microsoft SDKsWindowsv7.0ABin directory. You do not need to install the SDK on all the machines where you need to deploy the assembly. It is enough to copy the gacutil.exe and gacutil.exe.config files to the target machines and use it to install/un-install the assembly. You have to execute it from command prompt with administrative privileges, so in case of UAC enabled, you have to run the command prompt AS Administrator.

To install use the gacutil.exe with the /i switch:

gacutil.exe /i "C:Program FilesMicrosoft SQL Server100DTSTE.SSIS.DataFlow.dll"

To list all the assemblies in GAC you can use /l switch:

gacutil.exe /l

Eventually you can put assembly name filter after the /l switch (it will list all installed versions of the assembly, as GAC support multiple versions of the same assembly to be installed at the same time):

gacutil.exe /l TE.SSIS.DataFlow

To uninstall an assembly from GAC use the /u switch:

gacutil.exe /u TE.SSIS.DataFlow

Components for SQL Server 2012 (.Net  4.0) and .Net 4.0 GAC Structure

With SQL Server 2012 the things start to be a little more tricky than with previous versions as components needs to be compiled under .Net framework 4.0.

In .Net 4.0 the GAC location has been changed and is located under C:WindowsMicrosoft.NETassembly directory.

As we can see, the structure has changed and there are separate directories for each platform. When building the components for SSIS we build them for MSIL (as it is platform independed and allow us to run the packages in both x32 and x64 runtime). In fact the structure is the same as for c:windowsassembly, but for the c:windowsassembly it is hidden in explorer. If you list the c:windowsassembly directory by dir command, you should see the same three directories on x64 system.

Each assembly has another directory in the GAC corresponding to the assembly name and under this directory there are directories for each version of the assembly and are named using the assembly strong name. Then  under this strong name directory are located the assembly .dlls

Installing .Net 4.0 assembly into GAC

As mentioned above, the GAC location and structure has changed in .Net 4.0. Also it is not possible to use the tools for .Net 3.5 and lower (I men gacutil.exe for older framework) and for security reason, you cannot also use simply Copy-Paste into GAC directory as it was for .Net < 4.0 on Server 2003 and Windows XP.

Again in you install Windows SDK and go through the directory structure, you will find tools for .Net 4.0 under the C:Program Files (x86)Microsoft SDKsWindowsv7.0ABinNETFX 4.0 Tools

In this directory you will find also the gacutil.exe for .Net 4.0. I was able to use it and install the assembly into GAC without problems on Windows 7 in the same way as it was mentioned above.

But when I try to copy it and use it on Windows Server 2008, I was not able to get the gacutil.exe for .net 40 working on it. gacutil.exe only prints a copyright information and that’s all.

 PowerShell the ultimate GAC installation solution

As I was not able to install the assemblies using the gacutil.exe I was starting to search for alternate solution of GAC installation and found the PowerShell.

You are able to install the assemblies to GAC on all above mentioned systems using a simple PowerShell Script.  For this purposes PowerShell will utilize System.EnterpriseServices.Internal.Publishh class which is able to install and uninstall the assemblies from GAC.

Only note to the Publish class. When you use it and make some typo or something, the GacInstall and GacRemove methods do not print any error message. So you have to be careful when typing the paths etc. To verify that the assembly was successfully installed /uninstalled to/ from GAC, simply browse to the GAC directory using Explorer and search for your assembly.

Install script we will use GacInstall method of the Publish class.

#Load System.EnterpriseServices assembly as it contain classes to handle GAC
[Reflection.Assembly]::LoadWithPartialName("System.EnterpriseServices")

#Create instance of Publish class which can handle GAC Installation and/or removal
[System.EnterpriseServices.Internal.Publish] $publish = new-object System.EnterpriseServices.Internal.Publish;

#Install dll into GAC using GacInstall method (Provide full path to the assembly)
$publish.GacInstall("C:Program Files\Microsoft SQL Server\110\DTS\TE.SSIS.DataFlow.dlll");

To uninstall an assembly you can use the GacRemove method of the Publish class. When uninstalling, you have to provide the complete path to the assembly dll.

#Remove from GAC using GacRemove method (Provide full path to the assembly in GAC)
$publish.GacRemove("C:\Windows\Microsoft.NET\assembly\GAC_MSIL\TE.SSIS.DataFlow\v4.0_1.0.0.0__0eedace497e8d607\TE.SSIS.DataFlow.dll")

You can save the script into a script file e.g. GACInstall.ps1 and execute it from command line:

powershell.exe -noprofile -executionpolicy RemoteSigned -file c:scriptsGACInstall.ps1

Alternatively you can run the PowerShell console and run the command by command. Again PowerShell needs to be executed with Administrative privileges so take UAC in mind.

If you would like to use the PowerShell ISE (Integrated Scripting Environment) and execute the script form there, don’t forget set the Execution policy appropriately:

#Enable Local not signed Script Execution
Set-ExecutionPolicy RemoteSigned;

If you want to use the ISE also on Windows Server 2008, you have to enable that feature on the server first, as the PowerShell ISE is not installed and enabled by default on Windows Server 2008. So go to Server Manager,  choose Add Features and add the PowerShell ISE Feature.

Using the PowerShell, you are independed of any tools like gacutil.exe etc. and you are able to install whatever assembly on whatever system which has PowerShell available.

Hope this post will help you saving troubles when deploying custom SSIS packages to server.

 Posted by at 12:05 pm
Jan 272012
 

Some times it may happen, that you need to remove accent (diacritics) from string when you are querying data or when you are storing data into particular table.

There are several methods how to achieve this. One of the possible solution is using a CONVERT with COLLATE as Twitted by MVP Mladen Prajdic and mentioned on Luke Jian blog, but this method has an issue I will mention below.

Other solution could be replacement of accented characters with the ones without accent. But this is a very problematic solution as you have to do a lot of replacement (the performance can suffer) and it will be problematic to cover all the possible accent characters and you easily miss one. Similar to this could be creation of a replacement table and implementation of function similar to Splitting function using Tally Table (Jeff Moden). But again you will have to write all possible combination of accented and not accented characters into the replacement table.

So if you do not want to change the collation and would like to avoid possible issues of the CONVERT method and avoid writing replacement tables etc., you can easily use a very simple amd known CLR method. This method is using string normalization and CharUnicodeInfo class from the System.Globalization name space. Based on this you can write a very simple scalar CLR function.

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Globalization;

public class StringsCommon
{
    /// <summary>
    /// Removes Accent (Diacritics) from string
    /// </summary>
    /// <param name="sourceString">Source string fro wchich accent should be removed</param>
    /// <returns>string without accent</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlChars RemoveAccent(SqlString sourceString)
    {
        if (sourceString.IsNull)
            return SqlChars.Null;

        string normalized = sourceString.Value.Normalize(NormalizationForm.FormD);

        StringBuilder output = new StringBuilder(sourceString.Value.Length);

        foreach (char ch in normalized)
        {
            if (CharUnicodeInfo.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)
                output.Append(ch);
        }
        return new SqlChars(output.ToString());
    }
}

Once you compile above mentioned function into an .net assembly, you can register the assembly and function in your database.

CREATE ASSEMBLY [PPSqlClrSafe]
AUTHORIZATION [dbo]
FROM 'C:CLRPPSqlClrSafe.dll'
WITH PERMISSION_SET = SAFE
GO

CREATE FUNCTION [ClrSafe].[fn_RemoveAccent](
	@sourceString [nvarchar](max)  --Source string to remove accent
)
RETURNS [nvarchar](4000)
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [PPSqlClrSafe].[StringsCommon].[RemoveAccent]
GO

Once the function is registered, you can easily use it to remove accent (diacritics) from whatever string you want

WITH Data AS (
    SELECT
        'Czech' AS [Language]
        ,N'á, é, í, ó, ú, ý, č, ď, ě, ň, ř, š, ť, ž, ů' AS [AccentChars] UNION ALL
    SELECT
        'Norwegian' AS [Language]
        ,N'ø' AS [AccentChars] UNION ALL
    SELECT
        'Estonian' AS [Language]
        ,N'õ' AS [AccentChars] UNION ALL
    SELECT
        'French' AS [Language]
        ,N'à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü' AS [AccentChars] UNION ALL
    SELECT
        'Croatian, Vietnamese' AS [Language]
        ,N'đ' AS [AccentChars] UNION ALL
    SELECT
        'Latvian' AS [Language]
        ,N'ā, ē, ī, ū, ļ, ķ, ņ, ģ' AS [AccentChars] UNION ALL
    SELECT
        'Hungarian' AS [Language]
        ,N'ö, ü, ő, ű' AS [AccentChars] UNION ALL
    SELECT
        'Polish' AS [Language]
        ,N'ą, ć, ę, ń, ó, ś, ź, ż' AS [AccentChars] UNION ALL
    SELECT
        'Romanian' AS [Language]
        ,N'ă, â, î, ș, ț' AS [AccentChars] UNION ALL
    SELECT
        'Slovak' AS [Language]
        ,N'ô, ť, ľ, ŕ, ĺ, ä' AS [AccentChars] UNION ALL
    SELECT
        'Spanish' AS [Language]
        ,N'ñ' AS [AccentChars] UNION ALL
    SELECT
        'Swedish' AS [Language]
        ,N'å' AS [AccentChars] UNION ALL
    SELECT
        'Turkish' AS [Language]
        ,N'ç, ş, ğ' AS [AccentChars] UNION ALL
    SELECT
        'Greece' AS [Language]
        ,N'Γ, δ, ξ, Φ' AS [AccentChars]
)
SELECT
    [Language]
    ,[AccentChars]
    ,[ClrSafe].fn_RemoveAccent([AccentChars]) AS [RemovedAccent]
FROM Data

Which produces below result:

Language             AccentChars                                 RemovedAccent
-------------------- ------------------------------------------- --------------------------------------------
Czech                á, é, í, ó, ú, ý, č, ď, ě, ň, ř, š, ť, ž, ů a, e, i, o, u, y, c, d, e, n, r, s, t, z, u
Norwegian            ø                                           ø
Estonian             õ                                           o
French               à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü       a, a, c, e, e, e, e, i, i, o, u, u, u
Croatian, Vietnamese đ                                           đ
Latvian              ā, ē, ī, ū, ļ, ķ, ņ, ģ                      a, e, i, u, l, k, n, g
Hungarian            ö, ü, ő, ű                                  o, u, o, u
Polish               ą, ć, ę, ń, ó, ś, ź, ż                      a, c, e, n, o, s, z, z
Romanian             ă, â, î, ș, ț                               a, a, i, s, t
Slovak               ô, ť, ľ, ŕ, ĺ, ä                            o, t, l, r, l, a
Spanish              ñ                                           n
Swedish              å                                           a
Turkish              ç, ş, ğ                                     c, s, g
Greece               Γ, δ, ξ, Φ                                  Γ, δ, ξ, Φ

When you compare it to the CONVERT with COLLATION method, you can find that this doesn’t have the problem with non existen characters. As the CONVERT converts the characters to particular character set which does not need to contain all the charecters of the source character set. We can see this e.g. on the Norwegian, and Greece characters.

If we use the CONVERT with COLLATION method we receive:

WITH Data AS (
    SELECT
        'Czech' AS [Language]
        ,N'á, é, í, ó, ú, ý, č, ď, ě, ň, ř, š, ť, ž, ů' AS [AccentChars] UNION ALL
    SELECT
        'Norwegian' AS [Language]
        ,N'ø' AS [AccentChars] UNION ALL
    SELECT
        'Estonian' AS [Language]
        ,N'õ' AS [AccentChars] UNION ALL
    SELECT
        'French' AS [Language]
        ,N'à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü' AS [AccentChars] UNION ALL
    SELECT
        'Croatian, Vietnamese' AS [Language]
        ,N'đ' AS [AccentChars] UNION ALL
    SELECT
        'Latvian' AS [Language]
        ,N'ā, ē, ī, ū, ļ, ķ, ņ, ģ' AS [AccentChars] UNION ALL
    SELECT
        'Hungarian' AS [Language]
        ,N'ö, ü, ő, ű' AS [AccentChars] UNION ALL
    SELECT
        'Polish' AS [Language]
        ,N'ą, ć, ę, ń, ó, ś, ź, ż' AS [AccentChars] UNION ALL
    SELECT
        'Romanian' AS [Language]
        ,N'ă, â, î, ș, ț' AS [AccentChars] UNION ALL
    SELECT
        'Slovak' AS [Language]
        ,N'ô, ť, ľ, ŕ, ĺ, ä' AS [AccentChars] UNION ALL
    SELECT
        'Spanish' AS [Language]
        ,N'ñ' AS [AccentChars] UNION ALL
    SELECT
        'Swedish' AS [Language]
        ,N'å' AS [AccentChars] UNION ALL
    SELECT
        'Turkish' AS [Language]
        ,N'ç, ş, ğ' AS [AccentChars] UNION ALL
    SELECT
        'Greece' AS [Language]
        ,N'Γ, δ, ξ, Φ' AS [AccentChars]
)
SELECT
    [Language]
    ,[AccentChars]
    ,CONVERT(varchar(50), [AccentChars]) COLLATE Cyrillic_General_CI_AI AS [RemovedAccent]
FROM Data

Results:

Language             AccentChars                                 RemovedAccent
-------------------- ------------------------------------------- --------------------------------------------
Czech                á, é, í, ó, ú, ý, č, ď, ě, ň, ř, š, ť, ž, ů a, e, i, o, u, y, c, d, e, n, r, s, t, z, u
Norwegian            ø                                           o
Estonian             õ                                           o
French               à, â, ç, é, è, ê, ë, î, ï, ô, ù, û, ü       a, a, c, e, e, e, e, i, i, o, u, u, u
Croatian, Vietnamese đ                                           d
Latvian              ā, ē, ī, ū, ļ, ķ, ņ, ģ                      a, e, i, u, l, k, n, g
Hungarian            ö, ü, ő, ű                                  o, u, o, u
Polish               ą, ć, ę, ń, ó, ś, ź, ż                      a, c, e, n, o, s, z, z
Romanian             ă, â, î, ș, ț                               a, a, i, ?, ?
Slovak               ô, ť, ľ, ŕ, ĺ, ä                            o, t, l, r, l, a
Spanish              ñ                                           n
Swedish              å                                           a
Turkish              ç, ş, ğ                                     c, s, g
Greece               Γ, δ, ξ, Φ                                  ?, ?, ?, ?

From above we can see, that the CLR solution provides much better results and you do not loose any characters by conversion to non Unicode character set

If you are interested, you can grab sample C# project here: PPSqlClrSafe_AccentRemoval,zip

Any comments are welcomed.