Tables without Clustered Indexes?

Today, we experienced performance issues with some of the SSRS reports that were deployed as part of the latest application/database release. While investigating this performance problem, I realized that the underlying tables for these report queries do not have clustered index. I was even more surprised when I realized that some of these tables are huge and have few million rows in them. A good practice is to have a clustered index on all tables in SQL Server, as it helps to improve query performance. This is due to the fact that clustered indexes affect a table’s physical sort order, and a table that doesn’t have a clustered index is stored in a set of data pages called a heap where:

  • Data is stored in the order in which it is entered.
  • Rows are not stored in any particular order.
  • Pages aren’t sequenced in any particular order.
  • There is not a linked list linking the data pages.

When a table has clustered index, SQL Server physically sorts table rows in clustered index order based on clustered index key column values. In short, leaf node of clustered index contains data pages, and scanning them will return the actual data rows. Therefore, table can have only one clustered index.

When to have a clustered index on table?

Although it is not mandatory to have a clustered index per table, but, according to the MSDN article (Clustered Index Design Guidelines), with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:

  • The table is large and does not have nonclustered index. Having clustered index improves performance, because without it, all rows of the table should be read to find any row.
  • Column or columns are frequently queried and data is returned in sorted ordered. Having clustered index on the sorting column or columns prevents sorting operation and returns the data in sorted order.
  • Column or columns are frequently queried and data is grouped together. As data must be sorted before it is grouped, having clustered index on the sorting column or columns prevents sorting operation.
  • Column or columns data that are frequently used in queries to search data ranges from the table. Having clustered indexes on the range column will avoid sorting entire table data.

So in order to resolve these performance issues, I re-wrote these queries and created clustered indexes on tables where appropriate. Moreover, I analyse further, and used the following two queries, to find out which tables in other databases do not have a clustered index defined.

The first query return names of all tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.tables system catalog to sys.dm_db_partition_stats dynamic management view to obtain this information (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

The second query is slightly a modified version of first query and returns the names of actively queried tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.dm_db_index_usage_stats to the first query to identify actively queried tables (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
			            INNER JOIN [sys].[dm_db_index_usage_stats] us
						ON ps.[object_id] = us.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
			AND COALESCE(us.[user_seeks] ,
				         us.[user_scans] ,
				         us.[user_lookups] ,
				         us.[user_updates]) IS NOT NULL
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

I hope you will find this post informative. For further information about clustered index design guideline, see MSDN resource here.

Further Reading:

Advertisements

SQL Server index related dynamic management views and functions (Part 1)

Microsoft SQL Server provides DBAs and Developers with several index related dynamic management views and functions, which they can use to maintain, modify, tune and identify missing indexes. Some of these dynamic management views (DMVs) and functions (DMFs) you might use are listed as follow:

  • sys.dm_db_index_operational_stats — Returns information about low level input/output (I/O) operations, locking and latching contention, and access methods.
  • sys.dm_db_index_usage_stats — Returns an index operations count, identifying operations performed and the last time each operation was performed.
  • sys.dm_db_index_physical_stats — Returns information about table and index size and fragmentation, including both in-row and LOB data.
  • sys.dm_db_missing_index_details — Returns detailed information about missing indexes, including the table, columns used in equality operations, columns used in inequality operations, and columns used in include operations.
  • sys.dm_db_missing_index_group_stats — Returns information about groups of missing indexes, which SQL Server updates with each query execution (not based on query compilation or recompilation).
  • sys.dm_db_missing_index_groups — Returns information about missing indexes contained in a missing index group.

Checkout the part-1 of my three part article series on index-related dynamic management views and function here, in which I covered the following two index-related dynamic management view and function: sys.dm_db_index_operational_stats (DMF) and sys.dm_db_index_usage_stats (DMV).

This article is published on SSWUG.org.

Find unused indexes using sys.dm_db_index_usage_stats

The SQL Server maintains all indexes defined against a table regardless of their usage. Index maintenance can cause significant amounts of CPU and I/O usage, which can be detrimental to performance in a write-intensive system. With this in mind, it makes sense to identify and remove any indexes that are not being used as they are a pointless drain on resources.

Even though there are several different methods in SQL Server which you can use to identify unused indexes but since SQL Server 2005 onwards the most common way to monitor unused indexes is to use sys.dm_db_index_usage_stats DMV. As the name suggests this DMV returns the information that is tracked about index usage from SQL Server cache.

SQL Server caches the following information for each index (for user queries and system queries):

  • User Seeks – The number of times index has been used in a seek operation (either looking up a single row, or doing a range scan) along with the time of the last seek.
  • User Scans – The number of times index has been used in a scan operation (e.g. a SELECT * FROM TableName operation) along with the time of the last scan.
  • User Lookups – The number of times index has been used in a lookup operation (this means a bookmark lookup – where a non-clustered index does not fully cover a query and additional columns must be retrieved from the base table row) along with the time of the last lookup.
  • User Updates – The number of times index has been used in an update operation (this counts inserts, updates, and deletes) along with the time of the last update.

This information helps us to identify and remove the unused index accurately because every individual seek, scan, lookup, or update on the specified index by one query execution is counted as a use of that index and increments the corresponding counter in this view. Using this DMV carefully removes the risks associated with dropping useful indexes.

Syntax:

-- Ensure a USE statement has been executed first.
SELECT * FROM sys.dm_db_index_usage_stats
GO

Example:

Following query helps you to find all unused indexes within database using sys.dm_db_index_usage_stats DMV. I took the approach of finding the indexes that have a lot of updates to them but don’t have any user scans/seeks/lookups which means we are only inserting into the indexes and our queries are not really using them:

-- Ensure a USE statement has been executed first.
SELECT u.*
FROM [sys].[indexes] i
INNER JOIN [sys].[objects] o ON (i.OBJECT_ID = o.OBJECT_ID)
LEFT JOIN [sys].[dm_db_index_usage_stats] u ON (i.OBJECT_ID = u.OBJECT_ID)
	AND i.[index_id] = u.[index_id]
	AND u.[database_id] = DB_ID() --returning the database ID of the current database
WHERE o.[type] <> 'S' --shouldn't be a system base table
	AND i.[type_desc] <> 'HEAP'
	AND i.[name] NOT LIKE 'PK_%'
	AND u.[user_seeks] + u.[user_scans] + u.[user_lookups] = 0
	AND u.[last_system_scan] IS NOT NULL
ORDER BY 1 ASC

Output:

Key considerations when using sys.dm_db_index_usage_stats:

  • SQL Server cache is flushed whenever you restart SQL Server instance, and is not persisted across instance restarts.
  • All cache entries for indexes in a database are removed when the database status is changed to read_only.
  • Rebuilding an index clears the stats from SQL Server cache for the index involved.
  • It is important to make sure that index monitoring is performed over a representative time period. If you only monitor indexes during specific time frames you may incorrectly highlight indexes as being unused. The safest method is to use a monitoring period which spans the whole lifecycle of your application, including any OLTP and batch operations.