Differentiate between DDL and DML commands

The Structured Query Language (SQL) comprises one of the fundamental building blocks of modern database architecture. SQL defines the methods used to create and manipulate relational databases on all major platforms.

SQL comes in many flavors. Oracle databases utilize their proprietary PL/SQL. Microsoft SQL Server makes use of Transact-SQL. However, all of these variations are based upon the industry standard ANSI SQL.

SQL commands can be divided into two main sublanguages: Data Definition Language (DDL) and Data Manipulation Language (DML).

Data Definition Language: The Data Definition Language (DDL) contains the commands used to create and destroy databases and database objects. These commands are primarily used by database administrators during the setup and removal phases of a database project. After the database structure is defined with DDL, database administrators and users can utilize the Data Manipulation Language (DML) to insert, retrieve and modify the data contained within it. Let's take a look at the structure and usage of four basic DDL commands:

CREATE - Installing a database management system (DBMS) on a computer allows you to create and manage many independent databases. For example, you may want to maintain a database of customer contacts for your sales department and a personnel database for your HR department. The CREATE command can be used to establish each of these databases on your platform. For example, the command:

CREATE DATABASE employees

creates an empty database named "employees" on your DBMS. After creating the database, your next step is to create tables that will contain data. Another variant of the CREATE command can be used for this purpose. The command:

CREATE TABLE personal_info (first_name char(20) not null, last_name char(20) not null, employee_id int not null)

establishes a table titled "personal_info" in the current database. In our example, the table contains three attributes: first_name, last_name and employee_id.

USE - The USE command allows you to specify the database you wish to work with within your DBMS. For example, if we're currently working in the sales database and want to issue some commands that will affect the employees’ database; we would preface them with the following SQL command:

USE employees

It's important to always be conscious of the database you are working in before issuing SQL commands that manipulate data.

ALTER - Once you've created a table within a database, you may wish to modify the definition of it. The ALTER command allows you to make changes to the structure of a table without deleting and recreating it. Take a look at the following command:

ALTER TABLE personal_info
ADD salary money null

This example adds a new attribute to the personal_info table -- an employee's salary. The "money" argument specifies that an employee's salary will be stored using a dollars and cents format. Finally, the "null" keyword tells the database that it's OK for this field to contain no value for any given employee.

DROP - The final command of the Data Definition Language, DROP, allows us to remove entire database objects from our DBMS. For example, if we want to permanently remove the personal_info table that we created, we'd use the following command:

DROP TABLE personal_info

Similarly, the command below would be used to remove the entire employees’ database:

DROP DATABASE employees

Use this command with care! Remember that the DROP command removes entire data structures from your database. If you want to remove individual records, use the DELETE command of the Data Manipulation Language.
  
Data Manipulation Language: The Data Manipulation Language (DML) is used to retrieve, insert and modify database information. These commands will be used by all database users during the routine operation of the database. Let's take a brief look at the basic DML commands:

INSERT – The INSERT command in SQL is used to add records to an existing table. Returning to the personal_info example from the previous section, let's imagine that our HR department needs to add a new employee to their database. They could use a command similar to the one shown below:

INSERT INTO personal_info VALUES ('bart','simpson',12345,$45000)

Note that there are four values specified for the record. These correspond to the table attributes in the order they were defined: first_name, last_name, employee_id, and salary.

SELECT - The SELECT command is the most commonly used command in SQL. It allows database users to retrieve the specific information they desire from an operational database. Let's take a look at a few examples, again using the personal_info table from our employees’ database.
The command shown below retrieves all of the information contained within the personal_info table. Note that the asterisk is used as a wildcard in SQL. This literally means, "Select everything from the personal_info table."

SELECT * FROM personal_info

Alternatively, users may want to limit the attributes that are retrieved from the database. For example, the Human Resources department may require a list of the last names of all employees in the company. The following SQL command would retrieve only that information:

SELECT last_name FROM personal_info

Finally, the WHERE clause can be used to limit the records that are retrieved to those that meet specified criteria. The CEO might be interested in reviewing the personnel records of all highly paid employees. The following command retrieves all of the data contained within personal_info for records that have a salary value greater than $50,000:

SELECT * FROM personal_info WHERE salary > $50000

UPDATE - The UPDATE command can be used to modify information contained within a table, either in bulk or individually. Each year, our company gives all employees a 3% cost-of-living increase in their salary. The following SQL command could be used to quickly apply this to all of the employees stored in the database:

UPDATE personal_info SET salary = salary * 1.03

On the other hand, our new employee Bart Simpson has demonstrated performance above and beyond the call of duty. Management wishes to recognize his stellar accomplishments with a $5,000 raise. The WHERE clause could be used to single out Bart for this raise:

UPDATE personal_info SET salary = salary + $5000 WHERE employee_id = 12345

DELETE - Finally, let's take a look at the DELETE command. You'll find that the syntax of this command is similar to that of the other DML commands. Unfortunately, our latest corporate earnings report didn't quite meet expectations and poor Bart has been laid off. The DELETE command with a WHERE clause can be used to remove his record from the personal_info table:

DELETE FROM personal_info WHERE employee_id = 12345

Multithreaded Programming

A multithreaded program contains two or more parts that can run concurrently. Each part of such a program is called a thread, and each thread defines a separate path of execution. Thus multithreading is a specialized form of multitasking.

Process: A process is, in essence, a program that is executing.

Thread: A thread is a dispatchable unit of executable code.

Types of Multitasking: There are two distinct types of multitasking: Process-Based Multitasking and Thread Based Multitasking. C# and .NET Framework supports both the types.

Process-Based Multitasking: Process-Based Multitasking handles the concurrent execution of programs.

Thread Based Multitasking: Thread Based Multitasking deals with concurrent execution of pieces of the same program.

States of a Thread: There are various states of a thread as mentioned below:

1)       Running
2)       Ready to run – as soon as it gets CPU time.
3)       Suspended – a running thread can be suspended (temporary halt to its execution).
4)       Resume – a suspended thread can later be resumed.
5)       Blocked – a thread can be blocked when waiting for a response.
6)       Terminated – a thread can be terminated, in which case its execution ends and cannot be resumed.

Threads in .NET Framework: The .Net Framework defines two types of threads: foreground and background. By default, when we create a thread, it is a foreground thread, but it can be changed to a background thread.
A background thread is automatically terminated when all foreground threads in its process have stopped.

The classes that support multithreaded programming are defined in the System.Threading namespace.

The Thread Class: C# multithreading system is built upon the Thread class, which encapsulates a thread of execution. The Thread class is sealed. It defines several methods and properties that help manage threads.

Creating and Starting a Thread: To create a thread, we instantiate an object of type Thread. The simplest Thread constructor is shown below:

public Thread (ThreadStart entryPoint)

Here, entryPoint is the name of the method that will be called to begin execution of the thread.

ThreadStart is a delegate defined by the .NET Framework as shown below:

public delegate void ThreadStart ()

The entryPoint method must have a void return type and take no arguments.

The newly created thread starts running after we call its Start() method. Start() is defined by Thread and calls the method specified by entryPoint. Form of Start() is shown below:

public void Start()

Note: If we try to call Start() on a thread that has already been started, a ThreadStateException will be thrown.

Web Service Standards

WSDLWeb Services Description Language – WSDL is an XML based standard that specifies how a client can interact with a web service. It tells what methods are present in a web service, what parameters and return values each method uses, and how to communicate with them. It specifies what protocol (HTTP GET or HTTP POST or SOAP (over HTTP)) should be used for transmission over the internet.
The WSDL standard is fairly complex, but its underlying logic is hidden from the developer in ASP.NET programming. ASP.NET automatically creates WSDL documents for a web service. It can also create a proxy class based on a WSDL document. This proxy class allows a client to call a web service without worrying about networking or formatting issues.

SOAPSimple Object Access Protocol – A client can use three protocols to communicate with a web service in .NET:
a)       HTTP GET – It communicates with a web service by encoding information in the query string and retrieves information as a basic XML document.
b)       HTTP POST – It places parameters in the request body (as form values) and retrieves information as a basic XML document.
c)       SOAP – It uses XML for both request and response messages. Like HTTP GET and HTTP POST, SOAP works over HTTP, but it uses a more detailed XML-based language for bundling information.

SOAP messages are widely supported by many platforms.

Although .NET has the ability to support all three of these protocols, it restricts the first two for better security. By default, it disables HTTP GET, and it restricts HTTP POST to the local computer.

SOAP messages are not handled directly by our applications. Instead, .NET will translate the information in a SOAP message into the corresponding .NET data types before the data reaches our code. This allows us to interact with web services in the same way we interact with any other object.

DISCODiscovery (DISCO is short form for Discovery) – Disco is a Microsoft specific standard, that is used for discovering Web Services. A .disco file specifies the location of a web service on a web server. It provides a straight forward way to create a repository of web service links that can be used automatically by .NET.

Benefits of a .disco file:
a)       It is only used for web services.
b)       We can insert <disco> elements for as many web services as we want, including the ones that resides on the other web servers.

UDDIUniversal Description, Discovery and Integration – The goal of UDDI is to provide repositories where businesses can advertise all the web services they have. Generally groups of companies and consortiums band together to set up their own UDDI registries organized into specific industries. The web services are registered in the UDDI registry. Many of these UDDI registries are restricted and not publicly available.
The UDDI registry defines a complete programming interface that specifies how SOAP messages can be used to retrieve information about a business or register the web services for a business.

WS-Interoperability Basic Profile – It is a recent standard that sets out a broad range of rules and recommendations designed to guarantee interoperability across the web service implementations on the different vendors. It specifies a recommended sub-set of SOAP 1.1 and WSDL 1.1 specifications and lays out a few ground rules. WS – Interoperability is strongly backed by all web service vendors (including Microsoft, IBM, SUN and Oracle).

Web Services

Web Services are individual units of programming logic that exist on a web server. They can be integrated into all sorts of applications. It can convert our application into a Web Application, which can publish its content (function or message) to the rest of the world.

Web Services are built on a foundation of open standards. These standards allow web services to be created with .NET but consumed on other platforms or vice versa.

The root standard for all the individual web service standards is XML. Because XML is text-based, web service invocations can pass over normal HTTP channels. So, web services are governed by cross platform standards as well as easier to use.

Application programmers tend to treat a web service as a set of methods that can be called over the internet. XML gurus treat web services as a way to exchange XML messages.

Web Service Client: An application that uses a web service is called a Web Service Client.

Message: The information exchanged between the client application and the web service is called a message. Messages include the calls made by a client application to a Web Method and the data returned by the Web Method to the client.

When should Web Services be used?

Web Services should be used when our application needs to cross platform boundaries. Web Services are a perfect choice if we need to provide data to a Java Client running on a UNIX computer.
Web Services should also be used when our application needs to cross trust boundaries, which is crossed when our system incorporates applications from more than one company or organization. Web Services work well if we need to provide some information from a database to an application written by other developers.

When should Web Services not be used?

Web Services should not be used if we aren’t crossing platform or trust boundaries. Web Services are generally a poor way to share functionality between different types of applications in the same company.

Indexes in SQL Server

One of the physical storage structures provided by most SQL-based DBMS is an index, which is a structure that provides rapid access to the rows of a table based on the values of one or more columns.
Indexes speed up the querying process by providing swift access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book.
Indexes are created on columns in tables or views.

What are the data types on which indexes cannot be created?

We can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar (max).

Index Structures

Some DBMS products support two or more different types of indexes, which are optimized for different types of database access:

B-tree index – It uses a tree structure of index entries and index blocks (groups of index entries) to organize the data values that it contains into ascending or descending order. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom. This type of index, which is the default type in nearly all DBMS products, provides efficient searching for a single value or for a range of values, such as the search required for an inequality comparison operator or a range test (BETWEEN) operation.

Hash index – It uses a randomizing technique to place all of the possible data values into a moderate number of buckets within the index. Since a given data value is always placed into the same bucket, the DBMS can search for that value simply by locating the appropriate bucket and searching within it. But the assignment of values to buckets does not preserve the order of data values, so a hash index cannot be used for inequality or range searches.

T-tree index – It is a variation of the B-tree index that is optimized for in-memory databases.

Bitmap index – It is useful when there are a relatively small number of possible data values.

Index-organized table – It is relatively a new option that stores the entire table in the index. This is useful for tables that have few columns other than the primary key, such as code lookup tables that typically have only a code (such as a department code) and a description (such as a department name).

Index Types

Implicit Index: They are created when a column is explicitly defined with PRIMARY KEY, UNIQUE KEY constraint.

Explicit Index: They are created using the "CREATE INDEX …" syntax.

Clustered Index: A clustered index stores the actual data rows at the leaf level of the index. It is like a telephone directory, where you find the entry you are looking for.
An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view because the set of rows can be maintained in only one order at a time.
In addition, data in a table is sorted only if a clustered index has been defined on a table.
Clustered Index is also called Physical Index.

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.

Non-Clustered index: The leaf nodes of a non-clustered index contain only the values from the indexed columns and row locators that point to the actual data rows. This means that the query engine must take an additional step in order to locate the actual data.
A non-clustered index is like the index in the back of a book. You can quickly search the index for the topic you want, and then you get a reference to a page that you must look up to find the rest of the information.
A row locator’s structure depends on whether it points to a clustered table or to a heap. If referencing a clustered table, the row locator points to the clustered index, using the value from the clustered index to navigate to the correct data row. If referencing a heap, the row locator points to the actual data row.
Non-Clustered indexes cannot be sorted like clustered indexes; however, you can create more than one non-clustered index per table or view.
Non-Clustered Index is also called Logical Index.

Note: SQL Server 2005 supports up to 249 non-clustered indexes, and SQL Server 2008 support up to 999.

In addition to an index being clustered or non-clustered, it can be configured in other ways:


  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered indexes can be composite indexes.
  • Unique Index: Indexes on primary keys are a special type called a unique index, in which each value can appear only once. This is how the database system ensures that primary key values are never duplicated in the tables. If the index is composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. A unique index is automatically created when you define a primary key or unique constraint:
      Primary Key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, non-clustered index on the primary key.

The index on the primary key can be clustered or non-clustered. In SQL Server, it defaults to being a clustered index. The SQL Server syntax for manually creating a clustered index on the primary key fields is shown below:

CREATE CLUSTERED INDEX index_name ON table_name (column_name1, column_name2, …);

        Unique Key: When you define a unique constraint, SQL Server automatically creates a unique, non-clustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. 
     CREATE UNIQUE CLUSTERED INDEX index_name
            ON table_name (column_name1, column_name2, …);
  • Covering index: This is a type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Syntax to CREATE INDEX:

CREATE INDEX index_name
ON table_name;

Single-Column Indexes:

CREATE
INDEX index_name
ON table_name (column_name1);

Composite Indexes:

CREATE INDEX index_name;
ON table_name (column1, column2…);

Unique Indexes:

CREATE
UNIQUE INDEX index_name 

ON table_name (column_name1, column_name2...);

The DROP INDEX Command:

An index can be dropped using SQL DROP command. Care should be taken when dropping an index because performance may be slowed or improved. Some indexes are not so easy to drop, namely any index supporting a unique or primary key constraint.

DROP INDEX index_name;

View Existing Indexes:

EXEC sp_helpindex table_name;

Rename an Index:

EXEC sp_rename 'table_name.index_name', 'index_name';

When should indexes be avoided / used? 


Although indexes are intended to enhance a database's performance, there are times when they should be avoided. The following guidelines indicate when the use of an index should be reconsidered:
  • Indexes should not be used on small tables or tables that have frequent, large batch update or insert operations.
  • Indexes should not be used on columns that contain a high number of NULL values. 
  • Columns that are frequently manipulated should not be indexed.
  • If you have a lot of indexes, then every time you add, delete, or amend a row in a table (DML operations like INSERT, UPDATE, DELETE), all the indexes must be updated which takes time. So while indexes can speed up retrieval, they may slow some maintenance operations. Indexes also take up room on your storage device.
  • Indexes should be used only on columns which are used to search the table frequently.
  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.
  • Create non-clustered indexes on columns used frequently in your statement’s predicates and join conditions.
  • Consider indexing columns used in exact-match queries.

What is De-normalization?


De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.
De-normalization is an approach to speed up (optimizing) read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized.
De-normalization is the process of introducing redundancy into a table by incorporating data from a related table. Tables are usually de-normalized to prevent expensive SQL join operations between them. One should always normalize to Third Normal Form (3NF) and only apply de-normalization selectively as a last resort if performance problems are experienced.

Note: De-normalizations are not free and introduces the following problem into the design:
  • More disk space is used as the same data is duplicated in more than one table
  • DML operations are more expensive as the same data must be maintained in more than one table
  • Risk of  "out of sync" data increases