Home » Microsoft » 70-762 » Which indexing strategy should you use?
You are developing a database reporting solution for a table that contains 900 million rows and is 103 GB.
The table is updated thousands of times a day, but data is not deleted.
The SELECT statements vary in the number of columns used and the amount of rows retrieved.
You need to reduce the amount of time it takes to retrieve data from the table. The must prevent data duplication.
Which indexing strategy should you use?
A. a nonclustered index for each column in the table
B. a clustered columnstore index for the table
C. a hash index for the table
D. a clustered index for the table and nonclustered indexes for nonkey columns
Correct Answer: B
Explanation/Reference:
Columnstore indexes are the standard for storing and querying large data warehousing fact tables. It uses column-based data storage and query processing to achieve up to 10x query performance gains in your data warehouse over traditional row-oriented storage.
A clustered columnstore index is the physical storage for the entire table.
Generally, you should define the clustered index key with as few columns as possible.
A nonclustered index contains the index key values and row locators that point to the storage location of the table data. You can create multiple nonclustered indexes on a table or indexed view. Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.
References: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-2017
I think I’ll go with B as well
I don’t like answer B. Column store index should not be updated frequently.
“Columnstore indexes work well when the data is stable. Queries should be updating and deleting less than 10% of the rows.” (from https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-design-guidance?view=sql-server-ver15)
I would choose D (though nonclustered indexes are data duplicates).
OK, thousands of updates a day is less then 10% from 900 million rows. So I agree with B.