Tables without Clustered Indexes?

Today, we experienced performance issues with some of the SSRS reports that were deployed as part of the latest application/database release. While investigating this performance problem, I realized that the underlying tables for these report queries do not have clustered index. I was even more surprised when I realized that some of these tables are huge and have few million rows in them. A good practice is to have a clustered index on all tables in SQL Server, as it helps to improve query performance. This is due to the fact that clustered indexes affect a table’s physical sort order, and a table that doesn’t have a clustered index is stored in a set of data pages called a heap where:

  • Data is stored in the order in which it is entered.
  • Rows are not stored in any particular order.
  • Pages aren’t sequenced in any particular order.
  • There is not a linked list linking the data pages.

When a table has clustered index, SQL Server physically sorts table rows in clustered index order based on clustered index key column values. In short, leaf node of clustered index contains data pages, and scanning them will return the actual data rows. Therefore, table can have only one clustered index.

When to have a clustered index on table?

Although it is not mandatory to have a clustered index per table, but, according to the MSDN article (Clustered Index Design Guidelines), with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:

  • The table is large and does not have nonclustered index. Having clustered index improves performance, because without it, all rows of the table should be read to find any row.
  • Column or columns are frequently queried and data is returned in sorted ordered. Having clustered index on the sorting column or columns prevents sorting operation and returns the data in sorted order.
  • Column or columns are frequently queried and data is grouped together. As data must be sorted before it is grouped, having clustered index on the sorting column or columns prevents sorting operation.
  • Column or columns data that are frequently used in queries to search data ranges from the table. Having clustered indexes on the range column will avoid sorting entire table data.

So in order to resolve these performance issues, I re-wrote these queries and created clustered indexes on tables where appropriate. Moreover, I analyse further, and used the following two queries, to find out which tables in other databases do not have a clustered index defined.

The first query return names of all tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.tables system catalog to sys.dm_db_partition_stats dynamic management view to obtain this information (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

The second query is slightly a modified version of first query and returns the names of actively queried tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.dm_db_index_usage_stats to the first query to identify actively queried tables (See below):

DECLARE @MinTableRowsThreshold [int];

SET @MinTableRowsThreshold = 5000;

;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
						ON ps.[object_id] = t.[object_id]
			            INNER JOIN [sys].[dm_db_index_usage_stats] us
						ON ps.[object_id] = us.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
			AND COALESCE(us.[user_seeks] ,
				         us.[user_scans] ,
				         us.[user_lookups] ,
				         us.[user_updates]) IS NOT NULL
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

I hope you will find this post informative. For further information about clustered index design guideline, see MSDN resource here.

Further Reading:

8 thoughts on “Tables without Clustered Indexes?

  1. Hi,

    Nice piece on heaps. I prefer this tiny script to detect them, though:

    SELECT OBJECT_NAME(i.object_id) AS TableName, p.rows
    FROM sys.indexes i
    INNER JOIN sys.partitions p ON p.object_id = i.object_id AND p.index_id = i.index_id
    WHERE i.index_id = 0
    ORDER BY p.rows DESC

    Furthermore, I would not set 5000 as the sample threshold. As per MSDN, heaps are only recommended for really tiny tables “This can be acceptable when the table is tiny, such as a list of the 12 regional offices of a company”
    http://technet.microsoft.com/en-us/library/hh213609.aspx

    Like

    • Thanks. Yes indeed its another way to detect them. However, I only suggest you to create clustered index on tables that are frequently queried. No point in creating it on a tables that are rarely queried. Therefore, I suggest readers to use second query in this blog post to find tables that are actively queried but has no clustered index.

      Like

    • Without going through tedious testing to prove a point, I suspect that 5,000 rows is a lot closer to the tipping point than 12, dependent on row length. Remember that rows on a page aren’t actually stored sequentially even if a clustered index exists. It’s analogous to scattering some playing cards on a table and trying to find the highest spade, 4 cards no problem, 20 no problem, but eventually it is more efficient to sort them. It is often worth looking at tables with too many indexes as well as too few. Glen Berry publishes queries that show the cost of inserts compared to how many time the index is read.

      Like

      • Thanks for the comments Colin. The figure of 5000 rows I used in my script is for demo purpose. I have not said that create clustered index on every table. I’m saying create where it’s appropriate. I agree user must test it properly before creating clustered index. As I have seen creating clustered index on a wrong column may break application. Again, testing is a key

        Like

  2. Pingback: Tables without Clustered Indexes? | modyrefay

  3. Pingback: Tables without Clustered Indexes? | Garrett's Blog

  4. Pingback: Should every user table have a clustered Index? | modyrefay

  5. Besides the primary key clustered index, there are non-clustered indexes that can also provide performance gains on columns (or column sets) that are frequently queried. Such extra indexes are often really necessary. And there should (as a general rule) be non-clustered indexes on columns that have FK constraints.
    There’s a famous SQL expert (I forget her name, I think Tritt is last name?) who says that large tables with frequent inserts should use Identity columns as PKs, because that causes new rows to be added to the end of the table, rather than force-fitted into the middle causing page rewrites (or whatever it’s called when a page has to have an extension block added).

    But sometimes an Identity column is not useful, especially in tables with static data, such as lookup tables — such as a State or Zipcode table. In a State table (in the U.S.A.), for example, the best PK would be a 2-character column [State_Abbrev] (for the state abbreviations, e.g. ‘NC’, ‘AZ’, ‘TX’, etc.)
    There’s a good discussion of PKs, clustered indexes, and identity columns vs other column types for PKs in this web page:
    http://sqlblog.com/blogs/aaron_bertrand/archive/2010/02/08/bad-habits-to-kick-putting-an-identity-column-on-every-table.aspx

    Like

Leave a comment