Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

PIVOT - UNPIVOT operator

Explain PIVOT and UNPIVOT operators in SQL

PIVOT and UNPIVOT are relational operators which were introduced in SQL Server 2005. These operators allow easy transformation of table-valued expression into another table.

PIVOT allows you to rotate rows into columns during the execution of an SQL Query. It performs an aggregation, and merges possible multiple rows into a single row in the output.

UNPIVOT does almost the opposite of PIVOT operator by rotating columns into rows, but it is not exactly opposite. PIVOT merges multiple rows into single while UNPIVOT fails to reproduce the original table due to the merged rows. Null input values to UNPIVOT disappear in its output.

A simple example of PIVOT operator

Suppose we have a dbo.currency_rate table which consists of three columns - in_currency_id, mn_rate and dt_currency.

-- Create table
CREATE TABLE dbo.currency_rate(in_currency_id int, mn_rate money, dt_currency datetime)

-- Insert data in table
INSERT INTO dbo.currency_rate VALUES (1, 60.15, '2015-06-15 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (2, 35.57, '2015-06-15 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (3, 78.90, '2015-06-15 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (4, 87.15, '2015-06-15 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (1, 10.18, '2015-06-16 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (2, 11.25, '2015-06-16 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (3, 08.48, '2015-06-16 09:59:16.640')
INSERT INTO dbo.currency_rate VALUES (4, 18.31, '2015-06-16 09:59:16.640')

-- Select and confirm the data before any operation
SELECT * FROM dbo.currency_rate

in_currency_id mn_rate dt_currency
1 60.15 2015-06-15 09:59:16.640
2 35.57 2015-06-15 09:59:16.640
3 78.90 2015-06-15 09:59:16.640
4 87.15 2015-06-15 09:59:16.640
1 10.18 2015-06-16 09:59:16.640
2 11.25 2015-06-16 09:59:16.640
3 8.48 2015-06-16 09:59:16.640
4 18.31 2015-06-16 09:59:16.640


Use PIVOT operator to transform rows into columns:

-- Pivot table with two rows and five columns
SELECT *
FROM
[dbo].[currency_rate]
PIVOT
(
AVG(mn_rate)
FOR [in_currency_id] IN ([1], [2], [3], [4])
) AS pivot_table;


Output:

dt_currency 1 2 3 4
2015-06-15 09:59:16.640 60.15 35.57 78.90 87.15
2015-06-16 09:59:16.640 10.18 11.25 8.48 18.31


A simple example of UNPIVOT operator

In continuation with the PIVOT operator example, save the result of the PIVOT operator in a temporary table:

SELECT *
INTO #tmp_pivot_table
FROM
[dbo].[currency_rate]
PIVOT
(
AVG(mn_rate)
FOR [in_currency_id] IN ([1], [2], [3], [4])
) AS pivot_table;


Use UNPIVOT operator to transform rows into columns:

SELECT [dt_currency], [in_currency_id], [mn_rate]
FROM #tmp_pivot_table
UNPIVOT
(
mn_rate
FOR in_currency_id IN ([1], [2], [3], [4])
) AS unpivot_table

Output:

dt_currency in_currency_id mn_rate
2015-06-15 09:59:16.640 1 60.15
2015-06-15 09:59:16.640 2 35.57
2015-06-15 09:59:16.640 3 78.90
2015-06-15 09:59:16.640 4 87.15
2015-06-16 09:59:16.640 1 10.18
2015-06-16 09:59:16.640 2 11.25
2015-06-16 09:59:16.640 3 8.48
2015-06-16 09:59:16.640 4 18.31

Conversion of Seconds to HH:MM:SS format

Recently I faced a requirement of converting seconds to HH:MM:SS format. After some R&D, I found the following solution:

CODE

DECLARE @in_seconds int

SET @in_seconds = 3661 -- One Hour One Minute and One Second

SELECT CONVERT(CHAR(8), DATEADD(SECOND, @in_seconds, 0), 108) As Hour_Minute_Second

OUTPUT

01:01:01

Note: This SQL code is applicable only for time less than 24 hours.

To overcome this limitation of 24 hours, I created the following function, which has the ability to return correct time duration for large time duration in seconds:

USE [AdventureWorks]
GO
/****** Object:  UserDefinedFunction [dbo].[fnc_convert_seconds_to_HHMMSS]    Script Date: 10/02/2013 08:48:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fnc_convert_seconds_to_HHMMSS]
(
@dc_time decimal(18,2)
)

RETURNS VARCHAR(20)

AS

BEGIN
RETURN REPLACE(STR(CONVERT(INT,@dc_time/3600), LEN(LTRIM(CONVERT(INT,@dc_time/3600)))) + ':' + STR(CONVERT(INT,(@dc_time/60)%60), 2) + ':' + STR(@dc_time%60, 2), ' ', '0')
END

Example: SELECT dbo.fnc_convert_seconds_to_HHMMSS(36460) AS vc_time_in_HHMMSS

Output: 10:07:40

Conversion of Seconds to Hours:Minutes:Seconds format

Recently I faced a requirement of converting seconds to Hours:Minutes:Seconds format. After some R&D, I found the following solution:

CODE

DECLARE @in_seconds int

SET @in_seconds = 3661 -- One Hour One Minute and One Second

SELECT CONVERT(CHAR(8), DATEADD(SECOND, @in_seconds, 0), 108) As Hour_Minute_Second

OUTPUT

01:01:01

Note: This SQL code is applicable only for time less than 24 hours.

Ranking functions in SQL

Ranking functions introduced with SQL Server 2005 allow us to assign a number to each row returned from a query. They allow us to rank each row in respect to others in several different ways. Ranking functions can be used only with the SELECT and ORDER BY statements. They cannot be used directly in a WHERE or GROUP BY clause, but can be used in a CTE or derived table.

The syntax for ranking functions is shown as follows:

<function_name>() OVER([PARTITION BY <partition_by_list>]
ORDER BY <order_by_list>)
  • function_name: Can be one of ROW_NUMBER, RANK, DENSE_RANK, and NTILE
  • OVER: Defines the details of how the ranking should order or split the data
  • PARTITION BY: Details which data the column should use as the basis of the splits
  • ORDER BY: Details the ordering of the data
There are four ranking functions in SQL Server:
  1. ROW_NUMBER – It allows us to provide incrementing sequential integer values to the rows in the result-set of a query based on logical order that is specified in the ORDER BY subclause of the OVER clause. The ROW_NUMBER function contains the OVER clause, which the function uses to determine the numbering behavior. The ORDER BY option, which determines the order in which the function applies the numbers, must be included in the query. We have the option of starting the numbers over whenever the values of a specified column change, called partitioning, with the PARTITION BY clause.

    Example Query 1: Basic use of ROW_NUMBER()
    SELECT CustomerID, FirstName + ' ' + LastName AS Name,
    ROW_NUMBER() OVER (ORDER BY LastName, FirstName) AS Row
    FROM Sales.Customer AS c
    INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

    Example Query 2: Using ROW_NUMBER() in a CTE
    WITH customers AS(
      SELECT CustomerID, FirstName + ' ' + LastName AS Name,
      ROW_NUMBER() OVER (ORDER BY LastName, FirstName) AS Row
      FROM Sales.Customer AS c
      INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID
      )
    SELECT CustomerID, Name, Row
    FROM customers
    WHERE Row > 50
    ORDER BY Row;

    Example Query 3: Using PARTITION BY option in ROW_NUMBER()
    SELECT CustomerID, FirstName + ' ' + LastName AS Name, c.TerritoryID,
    ROW_NUMBER() OVER (PARTITION BY c.TerritoryID ORDER BY LastName, FirstName) AS Row
    FROM Sales.Customer AS c
    INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

  2. RANKRANK assigns the same number to the duplicate rows and skips numbers not used. It assigns an ascending, nonunique ranking number to a set of rows, giving the same number to duplicate rows; numbers are skipped for the number of rows that have the same value.
    If rows 2 and 3 are duplicates, RANK will supply the values 1, 3, 3, and 4.

    Example Query 1: Using RANK
    SELECT ROW_NUMBER() OVER (PARTITION BY SUBSTRING(LastName,1,2) ORDER BY LastName) AS RowNum,
    RANK() OVER(ORDER BY SUBSTRING(LastName,1,2) ) AS Ranking, CONCAT(FirstName,' ',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY Ranking

  3. DENSE_RANK - This is similar to RANK, but each row number returned will be one greater than the previous setting, no matter how many rows are the same. DENSE_RANK doesn’t skip numbers.
    If rows 2 and 3 are duplicates, DENSE_RANK will supply the values 1, 2, 2, and 3.

    Example Query:
    SELECT ROW_NUMBER() OVER (PARTITION BY SUBSTRING(LastName,1,2) ORDER BY LastName) AS RowNum,
    DENSE_RANK() OVER(ORDER BY SUBSTRING(LastName,1,2) ) AS Ranking, CONCAT(FirstName,'',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY Ranking

    Note: Both RANK and DENSE_RANK are similar to the ROW_NUMBER function, but they produce the same ranking value in all rows that have the same logical ordering value.
    The difference between RANK and DENSE_RANK is that RANK indicates how many rows have a lower ordering value, whereas DENSE_RANK indicates how many distinct ordering values are lower.
    For example, a rank of 9 indicates eight rows with lower values. A dense rank of 9 indicates eight distinct lower values.

  4. NTILE – The NTILE function assigns buckets to groups of rows. It allows us to associate the rows in the result with tiles (equally sized groups of rows) by assigning a tile number to each row. This takes the rows from the query and places them into an equal (or as close to equal as possible) number of specified numbered groups, where NTILE returns the group number the row belongs to. The value in parentheses after NTILE defines the number of groups to produce, so NTILE(25) would produce 25 groups of as close a split as possible of even numbers.

    Example Query:
    SELECT NTILE(10) OVER (ORDER BY LastName) AS BatchNumber, CONCAT(FirstName,' ',LastName) AS CustomerName, UnclearedBalance
    FROM CustomerDetails.Customers
    WHERE UnclearedBalance is not null
    ORDER BY BatchNumber
The following SQL query uses all the ranking functions in a single query:
 
SELECT orderid, custid, val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum,
RANK() OVER(ORDER BY val) AS rank,
DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
NTILE(100) OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;

What is the maximum allowable size of a row in SQL SERVER?

  • If a table does not contain any sparse column, then the maximum allowable row size is 8060 bytes.
  • However, if a table contains sparse columns, then the maximum allowable row size is 8018 bytes.

How is a column changed from sparse to nonsparse or nonsparse to sparse

  • When a column changed from sparse to nonsparse or nonsparse to sparse, the storage format of the column is changed. This is accomplished by the SQL SERVER Database Engine using the following procedure:
  • The SQL SERVER Database Engine adds a new column to the table in the new storage size and format.
  • For each row in the table, the SQL SERVER Database Engine updates and copies the value stored in the old column to the new column.
  • The SQL SERVER Database Engine then removes the old column from the table schema.
  • It then rebuilds the table (if there is no clustered index) or rebuilds the clustered index to reclaim space used by the old column.

List down the limitations for using SPARSE Columns

  • A sparse column must be nullable and cannot have the ROWGUIDCOL or IDENTITY properties.
  • A sparse column cannot be of the following data types: text, ntext, image, timestamp, user-defined data type, geometry, or geography; or have the FILESTREAM attribute.
  • A sparse column cannot have a default value or bound to a rule.
  • A computed column cannot be marked as SPARSE although it can contain a sparse column.
  • A sparse column cannot be part of a clustered index or a unique primary key index.
  • A sparse column cannot be used as a partition key of a clustered index or heap, but it can be used as the partition key of a nonclustered index.
  • A sparse column cannot be part of a user-defined table type, which are used in table variables and table-valued parameters.
  • Sparse columns cannot be added to compressed tables, nor can any tables containing sparse columns be compressed.
  • When a non-sparse column is changed to a sparse column, the sparse column will consume more space for non-null values. When a row is close to the maximum row size limit, the operation can fail.
  • If a table contains a sparse column, the maximum size of a row in a table will be decreased from 8060 bytes to 8012 bytes.

What are the datatypes which cannot be specified as SPARSE?

Given below are the data types which cannot be specifies as SPARSE:
  • geography
  • geometry
  • image
  • ntext
  • text
  • timestamp
  • user-defined data types

Explain the COLUMNS_UPDATED function and its relation to Sparse Column

The COLUMNS_UPDATED function returns a varbinary value to indicate all the columns that were updated during a DML action. The bits that are returned by the COLUMNS_UPDATED function are as follows:
  • When a sparse column is explicitly updated, the corresponding bit for that sparse column is set to 1, and the bit for the column set is set to 1.
  • When a column set is explicitly updated, the bit for the column set is set to 1, and the bits for all the sparse columns in that table are set to 1.
  • For insert operations, all bits are set to 1.

Sparse Columns in SQL Server 2008

Sparse Columns are a new feature introduced in SQL Server 2008. Sparse columns are ordinary columns that have an optimized storage for NULL values. Sparse columns require more storage space for nonnull values than the space required for identical data that is not marked SPARSE. Thus, they reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values.
Sparse columns should be used when the space saved is at least 20 percent to 40 percent. Sparse columns can be used with column sets (column sets are defined by using the CREATE TABLE or ALTER TABLE statements) and filtered indexes.

Example

CREATE TABLE DocumentStore
(in_doc_id int PRIMARY KEY,
vc_title varchar(200) NOT NULL,
vc_production_specification varchar(20) SPARSE NULL,
in_production_location smallint SPARSE NULL,
vc_marketing_survey_group varchar(20) SPARSE NULL);
GO

Characteristics of Sparse Columns
  • The Sparse keyword is used by the SQL Server Database Engine to optimize the storage of values in that column. When the column value is NULL for any row, the values require no storage.
  • A table having sparse columns has the same catalog views as for a typical table. The sys.columns catalog view contains a row for each column in the table and includes a column set if one is defined.
  • Since sparse columns are a property of the storage layer, a SELECT…INTO statement does not copy over the sparse column property into a new table.

What is a Stored Procedure

A stored procedure (SP) is a precompiled set of T-SQL statements, the basic purpose of which is to group T-SQL statements together to perform a task or set of multiple tasks. It is stored in the data dictionary, and can be executed either from the SQL Server Management Studio or from within an application as required.
Stored procedures can execute batches of T-SQL statements, and return multiple result sets as well. It can accept input parameters and can return one or more output parameters to the calling procedure or batch. A stored procedure can call other stored procedures and can also return status information to the calling procedure to indicate whether they succeeded or failed.

Benefits of a Stored Procedure
  • Modular Programming – After a Stored Procedure is created, it can be invoked multiple times from multiple places from any application. If any modification is needed in the SP, it needs to be done only at one place. It increases code reusability.
  • Improved Performance – Stored Procedures executes the code faster and reduces the network traffic.
    Faster execution – Stored procedures are precompiled i.e; parsed and optimized as soon as they are created and are stored in memory. Thus it executes a lot faster than sending many lines of SQL code from your application to the SQL Server. Doing that requires SQL Server to compile and optimize your SQL code every time it runs.
    Reduced network traffic – Sending many lines of SQL code from an application over the network to SQL Server, impacts network performance. This is especially true if the SQL code is lengthy and is accessed frequently in the application. Running the code on the SQL Server (as a Stored Procedure) eliminates the need to send this code over the network. The only network traffic will be the parameters supplied ti invoke the SP and the results of any query. Stored Procedures limit data with WHERE clauses, ensuring that your application sends just the necessary data over the network wire.
  • Security – Users can execute a stored procedure without needing to execute any of the SQL statements directly. Therefore, a stored procedure can provide advanced database functionality for users who wouldn't normally have access to these tasks, but this functionality is made available in a tightly controlled way. You can grant rights to the stored procedure without granting rights to the underlying objects.
    Stored Procedures can be used to make SQL injection attacks more difficult to execute successfully.
    Stored Procedures enable you to perform error-handling at the server.

Syntax of a Stored Procedure

CREATE PROCEDURE procedure_name
[ { @parameter data_type }[ OUTPUT ]
] [ ,...n ]
[ WITH
{ RECOMPILE | ENCRYPTION } ]
AS sql_statement [ ...n ] >


Types of Stored Procedures

There are three types of Stored Procedures:
  • User Defined Stored Procedures – User Defined stored procedures are created by normal users like us.
  • SQL Server System Stored Procedures – System stored procedures usually begin with sp_ and most of them live in the Master database. They handle much of what is considered to be the nuts-and-bolts of SQL Server administration, and are written in T-SQL just like user defined SPs. Many vital functions of SQL Server rely on these SPs to remain intact!
  • Extended Stored Procedures – Extended stored procedures usually begins with xp_ or sp_ and also live in the Master database. They are not written in T-SQL. They are in fact compiled DLLs which open up a world of functionality to the SQL Server environment.

BETWEEN logical operator (SQL) is Inclusive

SQL logical operator BETWEEN is inclusive. It uses a closed interval ([a,b]).

Syntax:

 <#test_expression#> [ NOT ] BETWEEN <#begin_expression#> AND <#end_expression#>

Result Type: Boolean

BETWEEN returns TRUE if the value of the <#test_expression#> is greater than or equal to the value of <#begin_expression#> and less than or equal to the value of <#end_expression#>.

NOT BETWEEN returns TRUE if the value of <#test_expression#> is less than the value of <#begin_expression#> or greater than the value of <#end_expression#>.

Note: You can use the greater than (>) and less than operators (<) to specify an exclusive range. If any input to the BETWEEN or NOT BETWEEN predicate is NULL, the result is UNKNOWN.

Example: The example below retrieves a list of Opportunities which has been implemented last 60 days:

SELECT a.Id, a.Name, a.StageName
FROM dbo.sf_opportunity a
WHERE a.Consultation_Date_Time_test__c BETWEEN DATEADD(DAY, -60, GETDATE()) AND GETDATE() AND a.IsDeleted = 0