Indexes in SQL Server

One of the physical storage structures provided by most SQL-based DBMS is an index, which is a structure that provides rapid access to the rows of a table based on the values of one or more columns.
Indexes speed up the querying process by providing swift access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book.
Indexes are created on columns in tables or views.

What are the data types on which indexes cannot be created?

We can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar (max).

Index Structures

Some DBMS products support two or more different types of indexes, which are optimized for different types of database access:

B-tree index – It uses a tree structure of index entries and index blocks (groups of index entries) to organize the data values that it contains into ascending or descending order. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom. This type of index, which is the default type in nearly all DBMS products, provides efficient searching for a single value or for a range of values, such as the search required for an inequality comparison operator or a range test (BETWEEN) operation.

Hash index – It uses a randomizing technique to place all of the possible data values into a moderate number of buckets within the index. Since a given data value is always placed into the same bucket, the DBMS can search for that value simply by locating the appropriate bucket and searching within it. But the assignment of values to buckets does not preserve the order of data values, so a hash index cannot be used for inequality or range searches.

T-tree index – It is a variation of the B-tree index that is optimized for in-memory databases.

Bitmap index – It is useful when there are a relatively small number of possible data values.

Index-organized table – It is relatively a new option that stores the entire table in the index. This is useful for tables that have few columns other than the primary key, such as code lookup tables that typically have only a code (such as a department code) and a description (such as a department name).

Index Types

Implicit Index: They are created when a column is explicitly defined with PRIMARY KEY, UNIQUE KEY constraint.

Explicit Index: They are created using the "CREATE INDEX …" syntax.

Clustered Index: A clustered index stores the actual data rows at the leaf level of the index. It is like a telephone directory, where you find the entry you are looking for.
An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view because the set of rows can be maintained in only one order at a time.
In addition, data in a table is sorted only if a clustered index has been defined on a table.
Clustered Index is also called Physical Index.

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.

Non-Clustered index: The leaf nodes of a non-clustered index contain only the values from the indexed columns and row locators that point to the actual data rows. This means that the query engine must take an additional step in order to locate the actual data.
A non-clustered index is like the index in the back of a book. You can quickly search the index for the topic you want, and then you get a reference to a page that you must look up to find the rest of the information.
A row locator’s structure depends on whether it points to a clustered table or to a heap. If referencing a clustered table, the row locator points to the clustered index, using the value from the clustered index to navigate to the correct data row. If referencing a heap, the row locator points to the actual data row.
Non-Clustered indexes cannot be sorted like clustered indexes; however, you can create more than one non-clustered index per table or view.
Non-Clustered Index is also called Logical Index.

Note: SQL Server 2005 supports up to 249 non-clustered indexes, and SQL Server 2008 support up to 999.

In addition to an index being clustered or non-clustered, it can be configured in other ways:


  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered indexes can be composite indexes.
  • Unique Index: Indexes on primary keys are a special type called a unique index, in which each value can appear only once. This is how the database system ensures that primary key values are never duplicated in the tables. If the index is composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. A unique index is automatically created when you define a primary key or unique constraint:
      Primary Key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, non-clustered index on the primary key.

The index on the primary key can be clustered or non-clustered. In SQL Server, it defaults to being a clustered index. The SQL Server syntax for manually creating a clustered index on the primary key fields is shown below:

CREATE CLUSTERED INDEX index_name ON table_name (column_name1, column_name2, …);

        Unique Key: When you define a unique constraint, SQL Server automatically creates a unique, non-clustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. 
     CREATE UNIQUE CLUSTERED INDEX index_name
            ON table_name (column_name1, column_name2, …);
  • Covering index: This is a type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Syntax to CREATE INDEX:

CREATE INDEX index_name
ON table_name;

Single-Column Indexes:

CREATE
INDEX index_name
ON table_name (column_name1);

Composite Indexes:

CREATE INDEX index_name;
ON table_name (column1, column2…);

Unique Indexes:

CREATE
UNIQUE INDEX index_name 

ON table_name (column_name1, column_name2...);

The DROP INDEX Command:

An index can be dropped using SQL DROP command. Care should be taken when dropping an index because performance may be slowed or improved. Some indexes are not so easy to drop, namely any index supporting a unique or primary key constraint.

DROP INDEX index_name;

View Existing Indexes:

EXEC sp_helpindex table_name;

Rename an Index:

EXEC sp_rename 'table_name.index_name', 'index_name';

When should indexes be avoided / used? 


Although indexes are intended to enhance a database's performance, there are times when they should be avoided. The following guidelines indicate when the use of an index should be reconsidered:
  • Indexes should not be used on small tables or tables that have frequent, large batch update or insert operations.
  • Indexes should not be used on columns that contain a high number of NULL values. 
  • Columns that are frequently manipulated should not be indexed.
  • If you have a lot of indexes, then every time you add, delete, or amend a row in a table (DML operations like INSERT, UPDATE, DELETE), all the indexes must be updated which takes time. So while indexes can speed up retrieval, they may slow some maintenance operations. Indexes also take up room on your storage device.
  • Indexes should be used only on columns which are used to search the table frequently.
  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.
  • Create non-clustered indexes on columns used frequently in your statement’s predicates and join conditions.
  • Consider indexing columns used in exact-match queries.

What is De-normalization?


De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.
De-normalization is an approach to speed up (optimizing) read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized.
De-normalization is the process of introducing redundancy into a table by incorporating data from a related table. Tables are usually de-normalized to prevent expensive SQL join operations between them. One should always normalize to Third Normal Form (3NF) and only apply de-normalization selectively as a last resort if performance problems are experienced.

Note: De-normalizations are not free and introduces the following problem into the design:
  • More disk space is used as the same data is duplicated in more than one table
  • DML operations are more expensive as the same data must be maintained in more than one table
  • Risk of  "out of sync" data increases 

Explain the various Normal Forms

First Normal Form (1NF) – Eliminate Repeating Groups: A relation R is said to be in First Normal Form (1NF) if and only if all the attributes of the relation R are atomic in nature. A table (relation) is in 1NF if
  • There are no duplicate rows in the table
  • Each cell is single-valued (i.e., there are no repeating groups or arrays)
  • Entries in a column (attribute, field) are of the same kind 
Note: Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.

Second Normal Form (2NF) – Eliminate Redundant Data: A relation R is said to be in Second Normal Form (2NF) if and only if
  • It is in the First Normal Form (1NF), and 
  • No partial dependency exists between non-key attributes and key attributes
Note: If an attribute depends on only a part of the multi-valued key, remove it to a separate table.

Third Normal Form (3NF) – Eliminate Columns Not Dependant On Key: A relation R is said to be in Third Normal Form (3NF) if and only if
  • It is in Second Normal Form (2NF) 
  • No transitive dependency exists between non-key attributes and key attributes 
Note: If attributes do not contribute to a description of the key, remove them to a separate table. All attributes must be directly dependent on the primary key.

Boyce-Codd Normal Form (BCNF): A table is in BCNF if it is in 3NF and if every determinant is a candidate key. A 3NF relation is almost always in BCNF. However, the following conditions define some situations when a 3NF relation may not be in BCNF:
  • The candidate keys are composite 
  • There are more than one candidate keys in the relation 
  • There are some common attributes in the relation 
Note: If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables. 

Fourth Normal Form (4NF) – Isolate Independent Multiple Relationships: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies. 

Note: No table may contain two or more 1:n or n:m relationships that are not directly related. 

Fifth Normal Form (5NF) – Isolate Semantically Related Multiple Relationships: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table. 

Note: There may be practical constrains on information that justify separating logically related many-to-many relationships. 

Domain-Key Normal Form (DKNF): A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and domains. It is a model which is free from all modification anomalies. 

Summary of the normalization steps:

Input Relation
Transformation
Output Relation
All Relations
 Eliminate variable length record. Remove multi-attribute lines in table.
1NF
1NF
 Remove dependency of non-key attributes on part of a multi-attribute key.
2NF
2NF
 Remove dependency of non-key attributes on other non-key attributes.
3NF
3NF
 Remove dependency of an attribute of a multi attribute key on an attribute of another (overlapping) multi-attribute key.
BCNF
BCNF
 Remove more than one independent multi-valued dependency from relation by splitting relation.
4NF
4NF
 Add one relation relating attributes with multi-valued dependency.
5NF