 |
| Home |
 |
| Events |
|
 |
| Services |
 |
| Resources |
 |
| Partners |
 |
| Communications |
 |
| Contact Us |
 |
 |
 |
|
 |
| Q&A |
Q: How can I use Profiler to Filter on a stored procedure name or a procedure object id?
The answer to this one may seem a bit frustrating BUT the way that Profiler works is that ONLY events that generate a value for the column can be filtered on that column. If you ONLY want to see things with a very specific value it can take quite a bit of time to remove all of the uninteresting events to narrow it down to only what you need. As for stored procedures - out of the stored procedure events - the objectname is NOT produced for SP:StmtStarting or SP:StmtCompleted but the ObjectID is. So - there are often cases where you might see a value for objectname and not for objected or visa versa. For more details check out the following topics in the BOL.
Check out this topic in the SQL Server Books Online:
- Stored Procedures Event Category
Q: When you CREATE a proc with recompile...does it then always recompile every time it executes?
This is in fact the pure definition of it. If you create with recompile then no plan is saved and therefore each execution generates a new plan!
Q: IF the "dbo." is better then why doesn't Microsoft default to it? In other words, if only the object name is specified why not assume "dbo." and if not found then look for "user." instead.
I guess I see your point but there is the fact that this is not only a flexibility change (in ownership on creation) but also a backward compatibility change. This behavior has been for many versions (in fact - I think *all* versions). The only change is that the object name (the owner) defaults to the local user (the user creating the object) rather than the dbo if the creator is only a member of the db_owner role and not "the" DBO (i.e. the database creator or the database owner changed by the Sys Admin executing the sp_changedbowner procedure). It's really more of a problem with object creation scripts and applications that assume the creator (or the person running the script) is a Sys Admin (the Sys Admin will act just like "the" dbo and all objects will default to dbo.). This is also potentially a problem when you go from development systems to production (where in Dev you're a Sys Admin and in production you might only be "a" member of the db_owner role).
Q: Which event class in profiler identifies recompiles?
The Event Class is SP:Recompile but that's typically not enough. What you need to make sure to add is the Data Column: EventSubClass. This column will further describe the reason for the recompile. To get more details on this review this whitepaper and this KB Article.
Q: Monitoring SP recompilation will help identify procs recompiling too much, but what do you suggest for identifying procs that are NOT compiling and should? (i.e., monitor stored proc durations? or something of this nature?
There is no direct way to look for stored procedures that aren't be recompiling and should... This can only occur by evaluating optimal plans and executing procedures with recompile (which I showed in demos). In other words, you can really only know by doing a significant amount of testing - at development time preferably - but if you're monitoring an existing application and you suspect that performance is affected by plans which are bad then there are a couple of things to look for... I would consider looking for plans that have a high number of I/Os (a bad choice that results in an excessive bookmark lookup) and/or plans where there are lots of scans and/or plans that seem to very a lot - might do an aggregation where you look at min execution and compare to max execution - try to resolve why there's such an extreme variation...could be a bad plan!
Q: Please, please, please provide more stuff on temp tables. There's an institutional insistence on using #temp tables, & I need to determine with certainty when they're appropriate--and especially when they're not.
Well - we should almost consider another webseminar for this one! In a short list of things to do - instead of using temp tables - here are the things that I'd recommend learning a bit about:
- Derived Tables - use these to help minimize the complexity of a query by placing a "subquery" in the location of the FROM clause.
- Views - define each of the temp tables as a view and then join the views instead of the temp tables. IF this gives you better performance then stick with it...if not - try seeing if there are better indexes to create instead.
If none of these yield better performance - then you can certainly go back to using temp tables. Luckily derived tables and views are fairly easy and fast to create/test! BUT - you might also consider whether or not the overall process can be more streamlined.
Q: Is it possible to get each other's email addresses and start, say a Yahoo! group. Or better yet, a chat site hosted by Penton with an occasional guest appearance by Kim?
This is flattering but I doubt they want to move to "chat" sites with the mag and more than anything I think they'd rather have people to continue to use what's available... So - the best choice is to start a thread on the SQL Magazine forum (on www.sqlmag.com) and there are also the public newsgroups on Microsoft. I tend to get there as often as I can and in addition to my experience there are TONS of MVPs and other SQL Server experts that can quickly help out. Worst case scenario you can always email me (Kimberly@SQLSkills.com or Kimberly@SolidQualityLearning.com) and I'll try to take a quick stab at it! J To see the list of newsgroups as well as instructions for using the NNTP newsgroups/browser go to: http://www.microsoft.com/sql/community/newsgroups/default.asp.
Q: I've set up profile traces before and would like to set up a trace to start and stop at a specific time. I don't see start time as an option in profiler. How do I go about setting up a job to perform a trace during a specific time window?
The steps to automating traces are somewhat "outside" of (or used in addition to) the Profiler tool - and because of this - probably hard to find! The key thing to remember is that a trace can be scripted. The option to script a trace is under the File menu in Profiler: choose File, Script Trace, For SQL Server 2000 (or For SQL Server 7.0 if needed - they are very different traces). Once scripted you will see that each trace will generate a TraceID. Once you have the TraceID you can use the sp_trace_setstatus stored procedure to start and stop a trace. Additionally you can use the SQL Agent to automate the starting and stopping of a trace and programmatically you can use WAITFOR DELAY to put periodic starts and stops into an automated job. There are lots of topics to review here and a great place to start is the webcast. Once through the webcast checkout the two Books Online topics below. This will give you all of the pieces for automating Profiler Traces!
Make sure to watch the SQL Server Webcast titled "What's New and How to Effectively Use Profiler" at http://support.microsoft.com/servicedesks/Webcasts/WC111400/wcblurb111400.asp.
Check out these topics in the SQL Server Books Online:
- Starting, Pausing, and Stopping Traces
- sp_trace_setstatus
Q: When running the 3 step code set to clear cache and buffers, which database do you run it against (e.g. master, or each affected db)?
First, the "3 step code" to which this question refers is:
CHECKPOINT
-- Executed within an active database
-- this statement will force dirty pages
-- (pages which have changed since they
-- were read into cache) to be written (and
-- therefore "cleaned") to disk. The process
-- of "cleaning" the pages does not remove
-- them from cache but instead it allows
-- them to be removed.
DBCC DROPCLEANBUFFERS -- This removed the "clean" buffers.
DBCC FREEPROCCACHE -- This clears all plans from cache.
If you want to see what's currently in cache use the system table syscacheobjects. A typical query could be:
SELECT *
FROM master.dbo.syscacheobjects
WHERE CacheObjType = 'Executable Plan'
So within the code this question is answered - somewhat?! The answer is that checkpoint only applies to the database in which it is executed (the two DBCC statements apply serverwide). If you have numerous active databases it is likely that you will want to execute a checkpoint in more than one database. The rights to execute a checkpoint fall to the database owner or higher (meaning SysAdmin). In general the ability to execute many DBCC statements is typically ONLY for System Administrators however a Server Administrator can also execute the DBCC FREEPROCCACHE statement.
Q: So if you add an index to a table, should you run sp_recompile on that table?
Since sp_recompile is only a one time thing and very inexpensive (remember - it doesn't actually recompile it just invalidates plans) I would recommend it anytime you've made optimization changes from which you think the procedures will benefit. There are some cases where SQL Server automatically does this behind the scenes - for example, when the schema changes the procedures are automatically recompiled. Check out the sql script "Views&Procs with schema changes.sql" for an example of a procedure recompile due to a schema change. Adding indexes does not force a recompile.
Q: Do you have any suggestions what to do about a lot of existing user procs with prefix sp_ ?
This is a tough one because it's hard to say how much work it's going to be. If you only have 10 or so then it's not too bad... What might be best is to create a couple of scripts... One which is all of the drop procedures (don't run it yet J) and another which includes all of the drops, the creates and the permissions (although the drops are not really necessary they might help if you make a few mistakes - to keep running and dropping and re-creating the entire script until everything works. Also you really only want to execute all of the search and replace requests once! In this process I'd make sure you do a couple of things:
- Update the names to NOT use sp_, a simple change would be to replace all "sp_" with "usp_" for user-defined stored procedure. Make sure to check for any system procedures that still require the name sp_!!! Use a select from master.dbo.sysobjects and msdb.dbo.sysobjects to create a list of names that you don't want to rename!
- Update all of the calls (where you EXEC proc) to be EXEC dbo.procname (if they are owned by the dbo).
- Add SET NOCOUNT ON at the beginning of each procedure (typically a good habit and causes few problems)
Then test, test, test!!!
IMPORTANT NOTE: Applications will need to know of these changes, users may need to be notified and if there are any cross-database stored procedure calls you will need to ensure that these all work properly. The testing part is really the harder part.
Q: Is it safe to say that when your stored procedure returns various row counts and has many parameters you should create it using "WITH RECOMPILE"?
As Anthony mentioned the first time around - I'll continue with my "nice pc answer" and say that "it depends." Additionally, I've always tried to minimize my use of the word always! J
But - in this case I would probably agree that this might be an OK way to go about it? At least for those procedures where performance significantly varies and where the number of rows and distribution of data varies, it might prove to be an easy and even OK choice. However, it is always (hmm!) best to check to see first if the optimal plans vary - it could be that (even though the distribution varies) the plans don't change. Also, you may find that in reality - the stored procedure is quite large but only part of the plan varies significantly. In this case you would be creating unnecessary overhead in recompiling the entire procedure and would benefit more from isolating and modularizing the offending line(s) of code into their own procedure and have it recompile on each execution (i.e. create the nested procedure WITH RECOMPILE).
Q: Is there a reason that the query optimizer doesn't manipulate queries to isolate the column name for you?
AND
Q: So, just to confirm, regarding SARGs like LEFT(MyField,5) will that always scan? Is there no way to exploit indices?
Both of these questions are similar enough that I will take one answer for both... This is a historical issue and EVERY release of SQL Server and even a few competing products behave this way. The primary reason is that NOT all queries can be easily re-written/converted and always guaranteed to produce the same results. Yes, it's fairly obvious for certain types of operations - especially mathematical - but the behavior of the server was to go the "safe" route at the expense of performance. At least that's how I've always understood it. SQL Server has always done this...
If you have ANY computation on the column then SQL Server will absolutely force a scan. I took the specific case listed about LEFT(LastName, 5) = 'TRIPP' and compared that to column like 'Tripp%' and the performance difference was staggering!
Here is the result:
SELECT * FROM Member
WHERE LEFT(LastName,5) = 'Tripp'
-- Causes a TABLE SCAN
-- Performs 143 I/Os (small table, luckily J)
-- Showplan estimate is 92.71% overall cost between these two queries
SELECT * FROM Member
WHERE LastName LIKE 'Tripp%'
-- Uses an Index
-- Performs 9 I/Os
-- Showplan estimate is 7.29% overall cost between these two queries
Simple rule - try re-writing SARGs and get users to understand some of these benefits!
Q: Does it matter if you isolate column on left or right?
No.
Q: If "*" is slower, would selecting a constant (like "1") work even better? To clarify, in an EXISTS subquery, would selecting a constant value (like "1") work better than selecting "*"?
I've had this question before! So - I'll post my previous questions/answers. These are from the Indexing Session's Q&A:
What about using "*" in this case: IF EXISTS(SELECT * FROM Users WHERE UserID = X ?
Another very common question and also relates to some "folklore" that seems to be going around... An existence test does NOT return columns. Because of this the * (or constant for that matter) is irrelevant. In the ANSI standard the definition of an existence test is that it is introduced with * - this is preferred and does NOT cause any performance problems. In fact, it is preferred syntax. No benefit in using a constant.
What about using * in COUNT(*)?
Ah - this used to be a big one... People would often put in a column and SQL Server would be forced to read that data - even when the column did not allow NULL values. However, in SQL Server 2000 they are aware that * and any (column) where the column does not allow NULL values is a row count. If you need a row count then they know where they can find it - in a leaf scan of the smallest non-clustered index. So - yes, * is just fine here and in fact, it is preferred.
OK - you guys made some good points... * is not ALWAYS evil. In these cases it is just fine. I would also add that it is appropriate when you are selecting from something which already limits the column list (i.e. a view, stored procedure or table valued functions). I'm sure we could probably come up with a few more... just remember that less is more when it comes to optimization!
Q: So in general you weigh cost (execution plan-wise) more than I/O?
This is a great question! Yes, I usually do - as SQL Server does as well... SQL Server uses a "cost-based" optimizer. The primary goal is for the "overall" cost of a query's plan to be as low as possible. This means they have to account for I/O, memory, etc. What may appear to have less I/Os may be more costly in terms of temporary resources, etc. The showplan tells you the "overall" estimated cost - this is a better view of the whole picture but more than anything you need to look at all numbers - when I/Os are less than a few thousand apart it's probably not that significant as a few thousand logical I/Os can be performed in milliseconds.
Q: What columns to add when looking for scans in Profiler? Can I filter for specific tables?
Well - you'd think you could just add "ObjectName" as a data column and all would be well... but it's not. Some "events" don't produce values for every data column. Within the books online there are topics for Profiler event types and within each one they list the data columns for which there are values provided.
To see a list of all of the categories with links to their data column lists, etc. check out the Books Online topic titled "Monitoring with SQL Profiler Event Categories."
Here are the KEY things I would do to get better information about SCANs:
- Make sure you use FILTERS - one that you REALLY need if you use Profiler to look at scans is "Exclude system IDs" (the checkbox on the filters tab). This will SIGNIFICANTLY reduce the excessive information produced.
- As for data columns - you must add ObjectID NOT ObjectName. Unfortunately using filters here is a bit harder BUT you can use either a system table OR a system function to determine your table's IDs. The system table is sysobjects and the function is called object_id. Here are two sample queries:
SELECT Name, ID FROM sysobjects WHERE type = 'U'
-- or to get just the ID for a specific table
SELECT object_id('member')
- Finally, make sure you set a filter for the specific database you want to profile.
Q: We have 5000 to 8000 I/O sec at peak. Unsure how much of that is INSERTing? These inserts are in a heap.
Not sure where you're going with this one other than you're trying to decide if you would be better off clustering instead of using a heap. In this case I'd need more info to really decide - and it's mainly because you might be exceeding the 400 inserts/sec rule here. BUT - typically the performance of inserts can be better with a clustered index over a heap (with the exception of bulk loading into an empty heap - it is better to load into an empty table and create the indexes after the load rather than load into an already indexed table). So - in your specific case I would try to setup a reproducible test environment. I would:
Backup the database and load thousands (via simulated application clients and/or osql executions) of rows into the heap
Restore the database, make the table clustered - with an identity (just to start) and then try loading the same thousands of rows - if the performance is worse then you might try one other option...
Again re-restore the first version of the database, make the table clustered - with a combination of columns that create multiple hotspots (possibly use a foreign key column and the primary key column - use a foreign key column that has 10-20 distinct values - ie. lots of duplicates). I would aim for something that creates 10-20 hot spots. Finally, try loading the same thousands of rows - if again, the performance is worse then you might be a rare case where the heap structure is better than a clustered table
But I'll stress that looking at inserts is only ONE piece of the puzzle - this table has other issues. For example, scans are likely to be VERY expensive if there are updates and therefore forwarding pointers. You might want to find out how many pages the table has and then perform a query which does a scan - to see how many forwarding pointers you have.
To determine pages:
exec sp_spaceused tname, true
Take the data column and divide the number by 8 to determine pages
Execute:
SET STATISTICS IO ON
SELECT count(*) FROM tname WITH (index (0))
If difference between the number of pages scanned in the select is higher than the number of pages calculated by the sp_spacesused pages values divided by 8 then you have forwarding pointers - and possibly lots of them. Each one adds unnecessary overhead in a scan. There are many pro's and con's here - again, this is just a couple but a clustered table wouldn't have these.
Finally, if you want to determine the estimated number of I/Os needed to do each insert - use statistics I/O in query analyzer and run the same process to insert data while capturing I/Os.
Q: When I see Clustered Index Scan, I have NO IDEA how to coerce it instead to Seek?
The easiest way to start is to highlight the query and run it through index tuning wizard. This will work very well on most queries. When ITW doesn't give any advice and you still want to try and move forward - manually - implementing indexes then you'll need to evaluate quite a bit about a query to get the "right" index(es) and the right balance of indexing pros to maintenance and overhead cons. There's a lot of info needed to fully answer this but it's a good recommendation to consider the online webseminar series on Indexing: Internals, Performance and Maintenance - as this is what I talked about for 4 sessions! Here are the links:
Q: I'm getting a lot of institutional resistance to views. Can views negatively impact performance?
This is becoming a frequently asked question! I'm hearing a lot of comments like this these days and I have a feeling it has to do with rumors that are FLYING all over the place that say that SQL Server creates temp tables for views when they are used within other queries...people also say that views add unnecessary overhead. Ugh! There are cases where when views are used improperly they might cause unnecessary work to be performed but generally a view does not add any additional costs v. the same query executing. Views are not usually the problem - it's the usage of the views that can be problematic. In fact, I was hearing this so much that I ended up writing a FREE TSQLSolutions article specifically detailing this problem. All you need to do is register to read these... I think this will help convince you that views are not inherently evil. J It's part II that speaks more to performance issues.
And one other thing I'll add to this is that OFTEN views can IMPROVE performance by focusing the users on JUST they data they need, providing a well written join to handle complex schemas (with the users struggling and/or executing poorly written code against the server) and simplifying permissions. I'm a fan of views - as long as they're used properly!
Q: Where is the best place to learn about the individual elements of the showplan, for instance Hash?
AND
Q: Where can we learn EXACTLY what all the various icons mean in the Graphical ShowPlan? They're explained, sort of, in BOL--it's helpful, but the info is shallow. I want MORE. Indeed, I want it ALL!
Another set of questions I can lump into one answer BUT the bad news is that I'm not aware of any single reference that covers all of these. There are quite a few KB Articles that do a great job of focusing you on specific elements and explain them really well and I'd certainly start with those and scan through them (start with KB Q224587 - Troubleshooting Application Performance with SQL Server).
Also, books like Kalen Delaney's Inside SQL Server 2000 also cover quite a few of them along with lots of internals and examples of what's going on BUT there's no single reference. Would be a good idea for a book.... Hmmm. Also, if more general "algorithms" are desired you could search sites like the acm.org for information about Hashing/Hash algorithms - these are fairly standard. Kalen's book does have good information about the different types of hashing that occur within SQL Server.
Q: If I have a table with a few indices, but SysIndexes reveals that there are, say, a half-dozen colums with AutoStatically generated indices--does that mean I could benefit from more indices?
Another set of questions I can lump into one answer BUT the bad news is that I'm not aware of any single reference that covers all of these.
Q: How can I found out how many times a query is executed?
I answered this one during the seminar but I wanted to make sure that you felt clear about the options you need to put together using Profiler. First, you want to make sure that you don't "lose" queries from the Profiler Workload... By setting the "Server processes trace data" you will guarantee that no queries are lost. From the BOL:
If selected, no events will be skipped under stress conditions; however, performance of the server may be affected, depending on the number of events being traced. If this check box is cleared, then the processing is performed by the client application, and there is a possibility that some events will not be traced under stress conditions.
You can certainly start by getting a trace without setting this option and see what queries are the most frequent BUT with a system under load you may not be getting a truly accurate picture.
Once you have the trace saved as a workload file you can import it into a table. If you choose NOT to set the "server processes trace data" then you can save the workload rows directly into a table.
Once the data is within a table - you can aggregate the "TextData" based on a substring of TextData. In fact, I've chosen two slides from Brian Moran's "Quick Strike" methodology lecture titled: Solve Performance Problems Using A Repeatable, Structured Methodology. He titles this method of analyzing data creating "query classes." Query Classes are where "enough" of the text is similar to consider it a group. If you're interested in the entire session he will be delivering it at the SQL Magazine Connections conference in Palm Springs this October. For those of you with access to these slides - Brian's slides have been placed directly into the slide deck immediately after slide 15: Finding Frequently Executed Queries. For more information about the upcoming SQL Magazine Connections conference, check out: www.sqlconnections.com.
Generally, the recommendation is to aggregate over 30-50 characters. For example,
SELECT LEFT(TextData, 30), COUNT(*)
FROM WorkloadTable
GROUP BY LEFT(TextData, 30)
HAVING COUNT > x
Q: Is there a more efficient way to do user defined SORT BYs other than assembling the query on the application server?
I'm not sure I completely understand this but I suspect it's really asking for further clarification for a point I was trying to make during the session...
Often applications bring large amounts of data down to the client and then format, sort and/or eliminate some rows there (after the server has processed the larger set and after the network has delivered significantly more data than necessary). My primary goal in optimization from a client-server perspective is ask the server for exactly what you want and make sure the server is optimized to handle it (i.e. indexes, optimized procedural code, etc.).
Now, having said that - there are exceptions (aren't there always J)... IF an application wants a subset of data displayed in numerous sort orders over a period of time AND they're OK with having a "stale" set of data... then it's more than ok to deliver a reasonable amount of data to the client and manipulate it there (possibly with periodic refreshes).
There are lots of books regarding application best practices as well as tips and tricks on many application developer's sites. One title that might interest you (and there are a few articles on SQL Mag as well) is "ADO and ADO.Net Examples and Best Practices" by William R. Vaughn. Bill will also be at SQL Magazine Connections in the fall.
Q: When you are going through profiler there are delays and you tend to go to quick through this demo area with the screens not appearing in a timely manner?
I apologize for this. I will make sure that we notice this type of comment faster next time. Hopefully the screen shots display better in the replay. More than anything I recommend two things to help you with Profiler: Watching the PSS WebCast (link at the end of this Q&A) AND Playing with Profiler. You'll be surprised at how easy it becomes if you stay focused and limit the number of events and data columns as well as use lots of filter. Filter for a specific database, filter out system information (via the checkbox), add additional filters for other text that pops up if you're not interested in seeing it! Once you focused Profiler to just what you need you won't get overwhelmed by it (otherwise it's just too much information).
Q: When a query hits a page, for say a limited number of columns, does SQL Server load the entire page into memory, or just the selected columns?
SQL Server's smallest unit of I/O is a page. So the answer to this is YES! You should always consider the "physical" make up of a data row and try to keep you "internal" fragmentation on a page to a minimum. Internal fragmentation is when empty space is unintentionally left on pages within the table. The key word here is "un"-intentionally... Often administrators will leave a small amount of free space to minimize table fragmentation but too much and/or too many splits OR bad design - could make this less than optimal. There are lots of concepts here but some great resources out there to get more info. Check out:
Index Defragmentation Best Practices (listed at the end)
DBCC SHOWCONTIG (in the BOL)
Inside SQL Server 2000, Kalen Delaney
ISBN: 0-7356-0998-5
Q: How can I get a copy of your slides?
For those of you who attended the online seminar a copy of these slides are provided for your PERSONAL USE. As just an fyi - I rarely give out actual slide decks... It's important to me that you keep these for your own personal use. Thanks a lot! I really appreciate your discretion.
Q: Is it useful to remain disciplined, say using TinyInt & SmallInt when appropriate, in hopes that the cumulative effect will be to fit more rows on datapages...or do you visit this only when necessary?
This is great! I actually struggled with whether or not I'd add a few slides about table design and/or row size optimization, etc. and I chose not to... But at least I can answer this here. YES!!! I always try to find the most appropriate data type for the job. I would make sure that you always use tinyint over smallint or int IF you truly only need less than 256 values. More than anything this is dictated by the data however here are a few rules:
-
For character data of less than 5 characters - even if the actual length of the data varies... use CHAR. There's overhead for variable width data and it's often not worth it considering the column width. Of course there could be exceptions (say where the column is usually 1 character (90% of the time) and 1-5 chars the rest of the time. You might save a tad by using varchar but nothing except some application testing, etc. will help you to determine whether or not the performance overhead outweighs the space savings. Also the space savings may be ZERO. Depending on how many rows fit on a page there's almost always a small amount of free space - if you have a tad less free space on the page it really doesn't change the overall space needed for the table. Again, lots of concepts here but BOL, Inside SQL Server and the Database Structures whitepaper on MSDN should all help with this!
-
For character data between 5 and 20 bytes - that's much harder! It depends a lot on how much of it is used on average (per row) and whether or not the data is extremely volatile... In this grey area, use your best guesstimate.
-
For character data greater than 20 bytes - I almost always go varchar if the data varies. This is where the space savings starts to significantly outweigh the variable character data overhead.
For all of the other data types - just find the best one for the job. If you don't really need the full range of date values supported by datetime use smalldatetime...
Q: I am used to seeing the clustered index on the PK columns. But in many cases the PK is not ever-increasing. How important is it to have a clustered index on a table?
There's a lot to answer in this one! Generally speaking a clustered index can help improve performance and help keep a table compact - if chosen appropriately. When you don't have a clustered index data is not ordered and insert performance is compromised because heaps (a table without a clustered index) tend to be more optimized for space efficiency instead of performance. In order to optimize for space, SQL Serve needs to locate gaps for inserts - the space into which the new row will be inserted. The lookup for this free space can be expensive. SQL Server uses a structure called a PFS to find the free space (PFS = Page Free Space) but these lookups add time. Having a clustered index dictates the location for the row and no lookup is necessary. In general - and when chosen appropriates - a table performs better with a clustered index. Having said that, there are a few things I tend to look for in the clustered index and with each of these is the reason. The most important are that the clustered index is:
-
Narrow - narrow because all non-clustered indexes depend on the clustered index and actually include the clustering key in their index. The wider the clustering key the wider (and probably unnecessarily) the non-clustered indexes.
-
Static - static for many of the same reasons as above. Since the non-clustered indexes depend on the clustering key a change to a value which is overly duplicated will be very expensive.
-
Unique - Again, because non-clustered indexes depend on the clustered index they use the clustering key to reference rows. Therefore, each row must be uniquely identifiable. If the clustered index is not already unique SQL Server will have to "uniqify" the value by adding a 4-byte integer to the duplicate values.
All of these things add time and overhead and should be avoided.
As for an ever-increasing - key... there are a couple of reasons why I like the identity value. First, it naturally meets all of the above criteria. Second it minimizes page splits which in turn keep the table more compact and less fragmented (ah - a good lead in to the rest of Part IV). Finally, it creates a hot spot - while that's normally not desired, it can be beneficial as the pages into which you are inserting - are already in cache. This effectively minimizes cache requirements.
Q: How do deletes affect page splits?
This is a good question - especially as I didn't directly address this in the slide (good catch). A delete creates open space - which can actually be beneficial to a point. If other rows are modified and the page already has space to grow then a delete can effectively make room for later rows to expand without causing splits. However, if that space is not reused by another row modification then that empty space creates a problem known as internal fragmentation. With all types of fragmentation you must rebuild or defrag in order to reclaim that space. Note one exception: if all rows on the page are deleted SQL Server will mark that page as reserved but unused, if all pages within an extent are unused then that extent is freed for other tables to reserve/used.
Q: But if primary key is a non-clustered index and you do your queries on the primary key how does clustered-index on say identity col help?
Typically requests to a primary key are highly selective. For highly selective queries both indexes are highly affective at answering the request with very few reads. Having the identity column helps keep the table more compact, less fragmented, requires less rebuilds, has less splits and large scans stay more efficient.
Q: Would you recommend introducing a surrogate key identity column to an intersection table satisfying a many to many relationship between two table both of which have identity columns?
Well, there are really two questions here.
-
First, would I recommend a surrogate key if your tables don't have an identity-like column on which you can cluster? Probably!
-
And do I recommend intermediate tables to define a many-to-many relationships? Yes, usually!
Q: Does running DBCC SHOWCONTIG use a lot of resources or cause locking?
Yes, it can. First, it runs a full table scan. Second, some calculations are more expensive than others. You might want to check out the WITH FAST option for SHOWCONTIG. While not all values are produced, you might get enough with fewer locks. Regardless, you're still going to get a shared table level lock during showcontig.
Q: What is the default fill factor?
If FILLFACTOR is not specified, SQL Server will use the default fillfactor. The default fillfactor is configurable using sp_configure and is set to 0 by default. 0 means that SQL Server will fill the leaf pages to 100% and the non-leaf pages such that there is at least 1 space available.
Q: Do foreign keys have be dropped first when using Create with drop existing?
Not if the key value/index definition is not changing. If you use CREATE with DROP_EXISTING to change the clustering key, then yes as the references will no longer make sense.
Q: If using Database Maintenance Plans, what mechanism does SQL Server use for database reindexing? Drop and recreate or dbcc reindex?
DBCC DBREINDEX.
Q: When I create a database I can allocate all space up front (making it more physically contiguous) or allocate a small amount of space and let it grow automatically. But with auto grow I will get fragmentation of pages wont I? And if so how do indexes help with this? Or should you also pre-allocate your database space?
In general, it's always best to pre-allocate space to databases. This minimizes FILE fragmentation. Index fragmentation (either internal or external index fragmentation) can occur regardless of whether or not the file is fragmented. But - it's good to minimize a lot of autogrowths.
Q: Can you use the dbcc commands across linked servers?
Not really. But you might be able to create a procedure which runs these DBCC commands and then execute the procedure through 4 part naming.
Q: Indexed Views seem overly complex - are they every useful?
You're correct! There are a TON of restrictions and often you'll want to avoid indexing a view when it creates too much duplicate data that needs to be updated or when it creates a hot row scenario (like an aggregate on country) BUT these can also more drastically than any other index - increase your performance... You should do some testing with these and always remember INSERT/UPDATE and DELETE performance as well. This is really another form of covering and as I mentioned in Part II - not everything should be covered!
Q: Can we Index a View based on a View?
We were right at 60 minutes and I completely spaced on the answer to this one when you asked but the answer is a simple NO! You cannot create an index on a view which contains views or functions. Check out the "Indexes on Views on Views - NOT Allowed.sql" script to generate the following error:
Server: Msg 1937, Level 16, State 1, Line 1
Cannot index the view 'Credit.dbo.test2'. It references another view or function 'dbo.test'.
Q: I assume indexed views referencing a table that is bulk loaded must be dropped before a bulk load to achieve a fast bulk load?
The performance of a bulk operation is impacted by many factors - only one of which is existing indexes. Yes, it's generally true that the fastest load is achieved by loading into a completely empty heap BUT that's not always possible. Depending on the percentage of the data loaded and the total number of indexes you really have two options:
-
Drop all indexes, load data, recreate indexes.
Best when a large percentage of data (relative to the table) is being loaded
-
Leave indexes in place, load data, possibly rebuild indexes after load.
Best when a small percentage of data (relative to the table) is being loaded.
In addition to indexes, the logging mode can significantly impact the performance of a bulk operation. A database logs operations based on the recovery model chosen for that database. Choosing a recovery model is not something to chose randomly - you should spend a bit of time reviewing the books online BUT setting the recovery model to BULK_LOGGED for the duration of the bulk operation can significantly reduce the logging overhead and time of the operation. More than anything you should do some testing. Also, consider checking out some of the past presentations on www.SQLDev.net. Gert Drapers has a few sessions on High Performance Data Loading.
Q: Where can we get all of these scripts/demos, etc.?
As an attendee you have already received an email which should have included the sample scripts BUT I also keep them on www.SQLSkills.com under Resources, Past Events or Events, Past Events. All of the Demo Scriptsare there and available!
Q: If you need more columns out of the table than any of your indexes cover, so you have to do a bookmark lookup, is it still so bad to use SELECT * ?
This is an excellent question! In fact, only the day before this particular webseminar someone said something similar and even said "don't I get these columns for free" if I have to do a table scan or access a view... I responded - nothing's free; there's no free lunch. Whenever you ask for something within the select list (or anywhere for that matter) you are telling SQL Server that it is REQUIRED. Currently, there is no designation for "return these if they're not expensive" and in fact, regardless, you are potentially wasting bandwidth so it's always better to limit your column list as much as possible. In fact, there are some optimizer choices that are somewhat surprising, like the fact that they can use multiple indexes to cover a query - which is less possible when the query is unnecessarily wide.
Q: What about using "*" in this case: ?
Another very common question and also relates to some "folklore" that seems to be going around... An existence test does NOT return columns. Because of this the * (or constant for that matter) is irrelevant. In the ANSI standard the definition of an existence test is that it is introduced with * - this is preferred and does NOT cause any performance problems. In fact, it is preferred syntax.
Q: What about using * in COUNT(*)?
Ah - this used to be a big one... People would often put in a column and SQL Server would be forced to read that data - even when the column did not allow NULL values. However, in SQL Server 2000 they are aware that * and any (column) where the column does not allow NULL values is a row count. If you need a row count then they know where they can find it - in a leaf scan of the smallest non-clustered index. So - yes, * is just fine here and in fact, it is preferred.
OK - you guys made some good points... * is not ALWAYS evil. In these cases it is just fine. I would also add that it is appropriate when you are selecting from something which already limits the column list (i.e. a view, stored procedure or table valued functions). I'm sure we could probably come up with a few more...just remember that less is more when it comes to optimization!
Q: What if you had individual nonclustered indexes rather than composite indexes?
This is a great question! I've been to a few sites where their indexing strategy is to create single column indexes on every column - YIKES! Without getting into all of the gory details here I want to stress that wider indexes tend to handle more and different types of requests. BUT - I would not work so hard to cover queries if all of your queries are highly-selective point queries. In another way, I recommend that you create narrow indexes for highly selective SARGs and slightly wider covering indexes for the low selectivity queries. If you end up with one index on ColumnA for point queries and another index on ColumnA, ColumnB and ColumnC for a low selectivity range query then I would DEFINITELY drop the single column index on ColumnA alone. Don't get me wrong - there are queries which could benefit from the narrower single column index on just ColumnA BUT if the queries are all point queries then the width of the index will not likely impact performance at all. It is definitely not worth it having both indexes. So, look at it another way - you won't be adding a lot of new indexes, instead you will be slightly widening a few of the existing indexes.
Let me see if I can summarize a bit here.
Let's look at a few of the ways each of the following indexes can be used:
An index on one column (remember that the index will always include the clustering key if the table is clustered - which is interesting in that SQL Server just inherently covers more queries) but this is still a less useful index. An index on title and includes title_id (because that's the clustering key) is useful for these types of queries:
-
SELECT title and/or title_id FROM titles
-
SELECT title and/or title_id FROM titles WHERE title like 'Gone with%'
-
SELECT title and/or title_id FROM titles WHERE title_id > 100 - SQL Server will use this index when the query is not selective enough to benefit from using the clustered index - because the table is that much wider than this index...
-
This index is also beneficial when ONLY these columns are requested in a join, or when a count is requested (although there might be a smaller non-clustered index to cover a count(*)).
An index on multiple columns - may make the index only a shade wider (for example if the column added is reasonably narrow (remember that the row is the widest structure) and yet cover a much larger number of queries! In fact, the index is useful when you submit ANY combination of these columns - even when you are not joining or performing a SARG on the high-order element (the first column) in the index.
Q: Can you discuss the downsides of having to many indexes?
This question goes right along with the previous couple... I'm not recommending that you index every column or even cover every query. Instead I recommend that you minimize having a lot of narrow indexes (again, unless you have a lot of point queries). Too many indexes can cause blocking problems on volatile columns and too much overhead as SQL Server always maintains index data on every INSERT/DELETE and every UPDATE which directly affects a column which is indexed.
Q: Say you have a table with columns: id (int), startdate and enddate. How would you define the clustered index if SARG was select * from table where '1/1/2003' between startdate and enddate? You also might query select * from table where id = 7. The id column is unique.
This is a tough one! SQL Server MUST do a comparison between two columns and in fact, you're better off saying
WHERE startdate <= value and value <= enddate
And if this query returns a LARGE set then the indexes which would be most beneficial would be
(startdate, enddate, id)
or
(enddate, startdate, id)
as to which one SQL Server would choose? It would depend on how selective that value really is... If there are more row that start before that date then end after then the second index will be more useful then the first. If there are more rows that end after that date then start before then the first index will be best. BUT - there will be a scan of the entire set before OR after (depending on which index is chosen) because they'll need to check the enddate or the startdate, respectively.
So - as for the real question... How would I define the clustering key... I don't really have enough information yet... J More than anything I'd need to know the general activity on this table as well as its width (meaning are these the ONLY columns in the table or are there others). If the table is quite a bit wider then I would strongly consider covering; however, that might change if you also tell me that the table is very volatile. If the data is very volatile then I would tell you to do the low-selectivity queries within an analysis database and leave the data gathering for the OLTP server and I would certainly cluster (again, especially if the table is very wide and volatile) on the ID. If the performance is not significantly affected by these indexes or the data is not volatile then I might consider clustering on startdate, enddate, and id - this might be OK because it's likely to be increasing and may not cause a lot of fragmentation... but I'd also need to know a lot more about the table. I hope this gives you a few ideas?
Q: Would you use the combination of multiple indexes to cover a query which has too many columns to be included in a single index?
This is probably the toughest to try and answer... Covering a query with two indexes is not necessarily something I strive for purposely. Often I use this as a benefit of covering a couple of the higher priority queries... Having said that - if the query performs extremely poorly and it is a predominantly read environment then an indexed view might be appropriate...good lead in for Part III. :)
Q: I sometimes use a construct in an SP such as WHERE (Col1 = @Param1 OR @Param1 IS NULL) to provide for optional parameters to stored procs. What are the performance ramifications of this?
Without seeing more of the code I'd have to suggest that you avoid this like the plague! J OK - I'd need to see more code BUT after a quick test I think I have a good code sample which shows why it's better to have multiple procedures and limit the flexibility a bit OR use dynamic string execution...which unfortunately will start a whole separate thread as far as "sql injection" issues/attacks and security (the user executing the procedure will need permission to directly execute the statement). Before I show the code I want to stress that this procedure could not be hacked because of the addition of QUOTENAME (for character params) which protect the value coming in.
OK - from this test it is VERY obvious that you ONLY want DSE of these two procedures but I would still argue that more than one procedure is likely to produce better results (and not have the same security issues) BUT one procedure with all purpose parameters is definitely NOT a good choice! See the ProcedureParameters.sql procedure to test this within the credit database. You will need to modify some data to get rows to return to these queries.
CREATE PROC GetMemberInfoParam
@Lastname varchar(30) = NULL,
@Firstname varchar(30) = NULL,
@member_no int = NULL
AS
SELECT * FROM member
WHERE (lastname LIKE @lastname OR @lastname IS NULL)
AND (member_no = @member_no OR @member_no IS NULL)
AND (firstname LIKE @firstname OR @firstname IS NULL)
go
exec GetMemberInfoParam @Lastname = 'Tripp' with recompile
exec GetMemberInfoParam @Firstname = 'Katie' with recompile
exec GetMemberInfoParam @Member_no = 9912 with recompile
go
CREATE PROC GetMemberInfoParam2
@Lastname varchar(30) = NULL,
@Firstname varchar(30) = NULL,
@member_no int = NULL
AS
IF @LastName IS NULL AND @FirstName IS NULL AND @Member_no IS NULL
RAISERROR ('You must supply at least one parameter.', 16, -1)
DECLARE @ExecStr varchar(1000),
@MemberNoStr varchar(100)
SELECT @ExecStr = 'SELECT * FROM member WHERE '
IF @LastName IS NOT NULL
SELECT @Lastname = 'lastname LIKE ' + QUOTENAME(@lastname, '''')
IF @FirstName IS NOT NULL
SELECT @Firstname = 'firstname LIKE ' + QUOTENAME(@firstname, '''')
IF @Member_no IS NOT NULL
SELECT @MemberNoStr = 'member_no = ' + convert(varchar(5), @member_no)
SELECT @ExecStr = @ExecStr + ISNULL(@LastName, ' ')
+
CASE
WHEN @LastName IS NOT NULL AND @FirstName IS NOT NULL
THEN ' AND '
ELSE ' '
END
+
ISNULL(@FirstName, ' ')
+
CASE
WHEN (@LastName IS NOT NULL OR @FirstName IS NOT NULL)
AND @MemberNoStr IS NOT NULL
THEN ' AND '
ELSE ' '
END
+
ISNULL(@MemberNoStr, ' ')
EXEC(@ExecStr)
go
exec GetMemberInfoParam2 @Lastname = 'test', @FirstName = 'Katie'
exec GetMemberInfoParam2 @Firstname = 'Katie'
exec GetMemberInfoParam2 @Firstname = 'Katie', @Member_no = 842
exec GetMemberInfoParam2 @Member_no = 9912
exec GetMemberInfoParam2 @Lastname = 'Florini', @Member_no = 9912
go
The first procedure generates a horrible plan (I knew it would be bad but I'm somewhat stumped by its choice) but the second procedure generates a good plan for each and every execution (as expected). Also, I spent a few minutes trying to come up with a way you could copy this code as well... Let me know. I think this is a good example of what you can do!
Q: In many cases, I have found in working with large tables it is easier to use temp tables to get the smaller result set before the joins. Comments?
And you're not alone...this was an extremely common practice in 6.5 - especially due to the 16-columns-per-join limit; however this might do more harm than good in SQL 7.0 or 2000 (NOT ALWAYS). So, since you're already doing this - do NOT re-write your code...yet. Something I would try - which will help you make the final decision as to whether or not you need to re-write your code - is to create views using the same definition which goes into creating the temp tables and then join the views... If that yields better performance than stick with it! It's possible that SQL Server might beat your join (and the cost of creating all of those temp tables). If it's good news - let me know... :) Ha - let me know either way!
Q: What would be the index strategy for very large (millions of records), thin (2-3 columns) lookup tables where all/most columns are used in association with the EXISTS clause?
An existence test should be treated just like a join (because that's in fact what it is). What you should look at are the columns on which you are testing existence. Depending on the clustering key and the selectivity of the join you might consider an index on the column(s) defined in the EXISTS clause. Again, since the tables are narrow this is a tough decision. Since the table is narrow then you might be better off clustering for the join - it would be great if the join column is narrow, unique and an identity but if not it might not be a problem. Since an existence test only tests for the existence of a row then the clustered index could be on this column (if there are two columns put the more selective first) but this table might become significantly fragmented. You should use DBCC INDEXDEFRAG to clean up the fragmentation (while keeping the table accessible...good lead-in for Part IV on Index Maintenance).
Q: Can you compare/contrast the indices that BOL says that SQL Server creates for constraints, including unique constraints?
There's quite a bit of confusion around indexes created with constraints (Primary Key and Unique Key) v. indexes created with the CREATE INDEX statement but in fact they're really similar. The only difference is that a constraint is a logical concept - enforced through an index. An index is a physical concept that can enforce uniqueness or aid in lookups, etc. In terms of performance - a unique constraint on the SocialSecurityNumber column will behave exactly the same as the CREATE UNIQUE INDEX statement; however, I recommend you create unique columns with constraints as SQL Server allows foreign keys to reference them - that's the only difference!
Q: Should the volume of data (rows) influence the decision on choice of a heap?
This is a great question - and one I often miss... I often spend a lot of time talking about large tables and minimizing fragmentation, etc. yet people often wonder what about the little tables? While the size of the table does play a large role in the index choices you must remember that it's important to index EVERY table. Even the small tables benefit from indexes - especially in the area of locking. With indexes SQL Server can lookup and lock at a smaller granularity! Also, don't forget that indexes are the only way to ensure uniqueness. So - would I choose a heap for smaller tables...probably not! In fact, I generally only recommend heaps for bulk loading.
Q: Can you talk about rule of thumb regarding clustering on non key columns, or AK?
Here I tend to think about uniqueness and whether or not it's static... If the column(s) are unique (which a key would be!) and the data is static then that's a good start. The negative is that it may not be narrow? Well, if it's narrow as well then you've got some of the primary factors covered! The only last item would be - is the key ever increasing? If not then this key may cause unnecessary fragmentation (i.e. clustering on Social Security Number). So - for larger tables, I'd probably avoid this. For smaller tables and/or systems where availability is not as critical - management issues (rebuilding/defraging) won't be as difficult (smaller tables rebuild/defrag faster).
Q: How about using a select top n instead of a select *? Does this improve performance?
When talking about "minimizing data" returned to the client - almost every option is a good one! Using TOP can be especially important when users have the habit of running a select with an order by and then canceling the query after they get the data they're looking for... This causes the server to process the entire set, start delivering the data and then the client says...Stop - that's all I need. The server has already done the work and had optimized to get to the last row quickly - you didn't want ALL rows you only wanted the first 10 (or whatever TOP set). If you want the TOP n then asking for it causes SQL Server to optimize for that set! So, YES - this should help improve performance as well!! Also, if you're really lazy (like me!) then check out my script (ColumnList.sql) to help you simply pull back the list of columns from a table - to cut and paste and then remove the columns you're not interested in. This makes it easier to write your code when you only want 11 out of the 47 columns your table has! :)
Q: Are you suggesting an extra identity column in place of a composite pk?
The choice of the primary key is more a logical database design choice and that's really up to you (although if it is wide then all of your foreign key tables might be unnecessarily wide)... My biggest concern is the clustering key! Although they are often though as one and the same they really don't have to be. Having said that - when the table is large and management issues (rebuilding/defraging) are difficult then I will consider adding an arbitrary [identity] column solely for clustering!
Q: What about ITW? Any way to use it with cross-db queries?
Ah - this question was a bit different in my view! I gave a few recommendations on how to use ITW - and I'll give them here for completeness however, it is a bummer! It doesn't work on cross database queries and this is a tough one. However, over the next couple of sessions I'll give you a few ways to pull apart queries and break them down so that you can manually index them - with ease! If you want to use ITW though here are my general recommendations:
-
Choose your Clustering Key
-
Add you constraints: Primary Key (maybe NC?) and Unique Key constraints
-
Manually index your foreign key columns
-
Capture a workload (get a realistic set of queries, make sure that their frequencies are realistic as well...if everyone uses stored procedures then you might want to extract some of the queries, hard-code the parameters and then run a few of these as "sample" queries)
-
Run ITW
-
Since ITW won't get everything...then you'll still need to manually index a few things - stay tuned to the next few web seminars and I'll help you figure out some of these strategies!
Resources:
Q: What about allowing SQL Server to name index v. doing it yourself, systematically - recommendation?
The only type of indexes that SQL Server doesn't require a name for are constraint based indexes. Primary Key and Unique Key constraints can be specified with either of the following syntax:
-- Named Constraints
ALTER TABLE Employee
ADD CONSTRAINT EmployeePK
PRIMARY KEY CLUSTERED (EmployeeID)
ALTER TABLE Employee
ADD CONSTRAINT EmployeeSSNUK
UNIQUE NONCLUSTERED (SSN)
-- Unnamed constraints
ALTER TABLE Employee
ADD PRIMARY KEY CLUSTERED (EmployeeID)
ALTER TABLE Employee
ADD UNIQUE NONCLUSTERED (SSN)
go
sp_helpconstraint Employee
In a quick example I executed the two bolded statements from above (i.e. the named primary key constraint and the unnamed unique key constraint). Here are the two names:
-
EmployeePK
-
UQ__Employee__114A936A
The reason why these are so "ugly" is because constraint names must be unique - database wide! So - to do this SQL Server adds the random hex number to the end.
Q: Primary key for partitioned tables?
This is a tough one. Quite a few people choose to create the PK as a uniqueidentifier when their tables are partitioned... However the GUID doesn't necessarily make for a good clustered index - it's wide[r] and it's not monotonically increasing. More than anything is depending on your partitioning scheme but you might consider a surrogate key added solely for clustering and then have the PK as non-clustered.
Q: Would you recommend building the clustered index after a bulk load?
Yes, definitely! Try some tests... I've found substantial gains in loading data into a heap first and then creating indexes! The only negative is that the build of the clustered requires space to build... So definitely do some tests but I think you'll be very surprised at the gains with creating the indexes AFTER the bulk load!
Q: Recommendation on auto statistics and the background indexes it creates?
In SQL Server 7.0 - not always but MOST of the time. In SQL Server 2000 - YES! The way the SQL Server 2000 manages the updating of statistics has been significantly improved!
Check out the whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/en-us/dnsql2k/html/statquery.asp?frame=true
Q: Why does it take as long to drop a clustered index if you only have to drop the b-tree?
In earlier releases (pre-7.0), dropping a clustered index was instantaneous because you're right - all they needed to do was drop the b-tree...simple! However, in SQL Server 7.0+ non-clustered indexes depend on whether or not the table is clustered... So when you drop a clustered index all of the non-clustered indexes must be rebuilt.
Q: Do we need an Index on a temporary table that we will only scan once?
Tough one! Hard to say as it depends on the access (is there an order by or group by)...but if you're truly only doing a table scan on the temp table - after you build it then a heap might be just fine. I'd probably end up doing a bit of testing with this one!
Q: What about clustering on values that aren't strictly monotonically increasing but have a general tendency to increase (for example, a date of an event's occurrence)?
Yes, it's OK to create the clustered index on a column which is ever increasing - that is not an identity column BUT you should make sure that it is unique. Unfortunately using the datetime datatype might not be helpful here because of how SQL Server stores datatime data - it's very UNlikely to be unique. You will get the ever increasing benefits but you add some overhead with "uniqification" of the duplicate datetime values.
Check out the short sidebar on datetime precision for more details! http://www.sqlmag.com/Articles/Index.cfm?ArticleID=25174&
Q: What about the advantages of using a wider clustering key to cause more NC indexes to be covered?
It's a good point but are those columns (that you're covering) requested in every query? You might be making your nonclustered indexes unnecessarily wide when only a few queries really need all of those columns... not to mention - it might not be static, unique or ever-increasing - so you lose those benefits!
Q: How about indexing related to constraints (I've been a bit confused about entries under Object Browser's Constraint v. Indexes nodes)
Yes, I would agree that it's not very obvious what's going on here... A constraint is more a logical concept from relational theory - an index is a physical structure. However, SQL Server enforces some of the relational requirements of a constraint (i.e. uniqueness) by creating indexes. This seems confusing but it can be boiled down to this - if you have Primary Key and Unique Key constraints then you will have indexes with exactly the same names created (to enforce the uniqueness requirement). If you manually create indexes - then these will ONLY be listed under the index tab!
|
|