Tables without Clustered Indexes?

Today, we experienced performance issues with some of the SSRS reports that were deployed as part of the latest application/database release. While investigating this performance problem, I realized that the underlying tables for these report queries do not have clustered index. I was even more surprised when I realized that some of these tables are huge and have few million rows in them. A good practice is to have a clustered index on all tables in SQL Server, as it helps to improve query performance. This is due to the fact that clustered indexes affect a table’s physical sort order, and a table that doesn’t have a clustered index is stored in a set of data pages called a heap where:

  • Data is stored in the order in which it is entered.
  • Rows are not stored in any particular order.
  • Pages aren’t sequenced in any particular order.
  • There is not a linked list linking the data pages.

When a table has clustered index, SQL Server physically sorts table rows in clustered index order based on clustered index key column values. In short, leaf node of clustered index contains data pages, and scanning them will return the actual data rows. Therefore, table can have only one clustered index.

When to have a clustered index on table?

Although it is not mandatory to have a clustered index per table, but, according to the MSDN article (Clustered Index Design Guidelines), with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:

  • The table is large and does not have nonclustered index. Having clustered index improves performance, because without it, all rows of the table should be read to find any row.
  • Column or columns are frequently queried and data is returned in sorted ordered. Having clustered index on the sorting column or columns prevents sorting operation and returns the data in sorted order.
  • Column or columns are frequently queried and data is grouped together. As data must be sorted before it is grouped, having clustered index on the sorting column or columns prevents sorting operation.
  • Column or columns data that are frequently used in queries to search data ranges from the table. Having clustered indexes on the range column will avoid sorting entire table data.

So in order to resolve these performance issues, I re-wrote these queries and created clustered indexes on tables where appropriate. Moreover, I analyse further, and used the following two queries, to find out which tables in other databases do not have a clustered index defined.

The first query return names of all tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.tables system catalog to sys.dm_db_partition_stats dynamic management view to obtain this information (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

The second query is slightly a modified version of first query and returns the names of actively queried tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.dm_db_index_usage_stats to the first query to identify actively queried tables (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
			            INNER JOIN [sys].[dm_db_index_usage_stats] us
						ON ps.[object_id] = us.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
			AND COALESCE(us.[user_seeks] ,
				         us.[user_scans] ,
				         us.[user_lookups] ,
				         us.[user_updates]) IS NOT NULL
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

I hope you will find this post informative. For further information about clustered index design guideline, see MSDN resource here.

Further Reading:

Advertisement

Guide to SQL Server Table Indexes (Part 1)

One of the most important tasks for every database administrator (DBA) is to ensure that query times are consistent with service-level agreement (SLAs) or within user expectations. One of the most effective techniques to achieve this objective is to create indexes on tables for your queries. SQL Server uses indexes to sort and organize table data. It creates indexes based on ordering values from one or more selected columns. SQL Server automatically creates indexes when you define a primary key or unique constraint. You can use indexes to manage and track rows in a table or an indexed view.

Indexes improve the performance on most data scans by reducing the overall time query takes to run and the amount of work and resources it consumes. The amount of performance improvement depends on the size of the tables involved, the index design, and the type of query. You can see the role of indexes by observing what happens during queries and data manipulation.

Checkout the part-1 of my five part article series on Guide to SQL Server Table Indexes here, in which you’ll learn about the basics of SQL Server database table’s indexes, the difference between the clustered and non-clustered indexes, and how the leaf nodes, non-leaf nodes, and heaps associated with data storage.

This article is published on SQL-SERVER-PERFORMANCE.COM.