MSDN Webcast: A Primer to Proper SQL Server Development
Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures, Part 3 of 10
Presented by Kimberly L. Tripp, SQLskills.com
Q: Can I view a recording of this webcast? Here’s the specific replay link.
Q: Where can we get the demo scripts? The demo scripts are in this zip (20050819MSDNScripts.zip (3.67 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.
Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J
Q: Where are the links to all prior Webcast Q&As from this series?
Q: Is this Session 2 of the "Effectively Designing a Scalable and Reliable Database" webcast series? The Title bar here reads "MSDN Webcast: MSDN: A Primer to Proper SQL Server Development (Part…)". A bit misleading don't you think? So, I finally figured out the mystery here. When I originally submitted my proposal for this series, I titled it like this: Effectively Designing a Scalable and Reliable DatabaseA Database Developer’s Primer to Proper SQL Server Development (Level 200-300) But somehow, through a few edits and since each module also has a more detailed title shown in this following list, then I think the title just got edited down to fit. In fact, with subtitles like these, I can barely fit the entire title, subtitle and session title on a single slide. J
Part 1: Creating a Recoverable Database
Part 2: Creating a Reliable and Automated Backup Strategy
Part 3: Designing Tables that Scale – Best Practices in Data Types and Initial Table Structures
So, the mystery ends here… It’s all the same content and the general idea is that building a scalable and reliable database takes a variety of best practices that together make it happen; there is no “DBCC MAKEITGOFASTER(‘TRUE’)” option. I'll unveal a new title slide in part 4… oh, the excitement!
Q: How can I replay previous sessions? I thought we were going to get emails for replaying — but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.
Q: What does CLR stand for? Common Language Runtime. Quick WHATIS.COM Definition here.
Q: If something uses sp_OA* now in SQL 2000, am I guaranteed it will use SQLCLR in SQL 2005? Not automatically; however, you should consider rewriting you sp_OA* is used to automate COM objects; these, by definition, are not framework objects. Moving to SQL Server 2005, you can continue to run with your previous sp_OA procedures to access COM objects but if you can write SAFE .NET Frameworks code to do the same thing then you should convert your sp_OA* code where possible. Not everything can be converted but the basic principle is that which you use sp_OA* is a good thing to target for conversion. For some great details on the differences and how to move forward – go here. Also helpful is this KB Article regarding COM and .NET in SQL Server 2000 Using extended stored procedures or SP_OA stored procedures to load CLR in SQL Server is not supported
Q: I loved how XML data is available interactively in SQL Server Management Studio (SSMS); can HTML data columns be displayed in the same way? Yes, if you store HTML in a column of type XML then the column data will appear as a link and when clicked, you can edit it in the XML Editor window in SSMS.
Q: Is the "XML" datatype really a "varchar(max)"? No. In previous releases the XML type really was just a “blob” stored in the database. There were no inherent optimizations and you were not able to define a scheme for an XML column. In SQL Server 2005, XML data is stored natively and offers the ability to put indexes on the columns as well as better access and manipulation. You *can* store XML as varchar(max) but it will be harder to access/code. There’s a very nice description of the tradeoffs in this whitepaper:XML Options in Microsoft SQL Server 2005
Q: What is the difference between numeric and decimal? In SQL Server they are synonyms. I seem to remember hearing that there was a difference in the standards…somewhere but with a few references and looking around I can’t find any. Regardless, I’d make sure that you standardize on one or the other – just in case they do change, at least all of your data is consistent.
Q: I use decimal datatype for money values. Why it may be suboptimal to use decimal (instead of money datatype)? I wouldn’t say that it’s suboptimal – except maybe in storage length. However, decimal can be more precise.
Q: When should you use a float? A number of my clients use Float – i.e. Banks, Stocks, etc. What would be your argument against this? Just the lack of an absolute minimum precision/scale.
Q: We have a database where datatypes are set. But to use a simple percentage calculation we had to add .0 to the calculation to get the correct percentage. For example 100(5/9) is truncated to 0, but 100((5+.0)/9) yields correct result. Is there any standard procedure without massive conversions to assure a correct result? It’s a matter of data type. In the case of 100, 5 and 9 – these are all integers – without any decimal place. So – the division of 5 divided by 9 yields 0. However, 5 divided by 9.0 or 5.0/9 – both have a decimal and in that case the other data types are implicitly converted to the higher/more precise data value. A cool way to find out what the actual “base type” and/or the level of precision/scale is to ask these questions as if the data is of type sql_variant:
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Scale') – numeric(8,6)
SELECT SQL_VARIANT_PROPERTY(5/9, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Scale') – int
Q: I am surprised by the nchar/nvarchar suggestions. I've always been told to NOT use these data types. Can you give the "reasoning" for this suggestion here? The primary reason that you want to use Unicode data is to avoid a cost in “translation” between the client and the server – when you’re using web-based applications that natively use Unicode. Additionally, with ever growing customer bases – you can more effectively store international data properly by supporting Unicode in the client and in the database. You are correct that these take 2 bytes per character rather than one – but that’s what gives them the larger range of characters to use/display/manipulate. Also, if you plan to use the SQLCLR against these columns, you will need to use Unicode as opposed to ASCii as the SQLCLR requires it.
Q: Does Unicode use twice as much space? Yes, Unicode columns are 2 bytes per character. If you defined a column of type nchar(100), that column will require 200 bytes of storage.
Q: Sometimes, and not always, it becomes a tradeoff in Storage Space v. Performance, doesn’t it? That’s an excellent point. You always need to look at the trade-offs associated with any decision and in fact, this is yet another. So, I always want the row size to be as optimal and as reasonable as possible. However, there are many compelling reasons – international and/or web-based applications as well as SQLCLR functionality.
Q: Is the N' keyword usable in MSSQL 2000? Yes, all Unicode character strings should be preceded with an upper case N in all releases where Unicode is supported (SQL Server 7.0 was the first release that supported Unicode).
Q: Is there any relationship between the OS's code page and nvarchar/varchar? Yes. When SQL Server is installed- it chooses the server’s default code page from the Operating System. A code page (or collation) defines the order in which characters are evaluated. The additional attributes a collation can define are: whether or not the characters are case-sensitive; the sensitivity of width and kana-type. This may not seem obvious but if these characters sets are native to you then the order to the data would probably make more sense. Once the collation is set for the server, the database inherits the server’s collation by default, when tables are created – each character-based column inherits the database’s default collation by default. However, at ANY of these levels the collation can be changed. There are benefits to doing so – in that you can keep data within a specific table ordered for a specific character set. Wow, now that I’m getting started, this could be a session in and of itself. So, just to give you a few tips:
- Be careful using temporary tables if your database’s collation is different than your server’s collation.
- Changing collation on the fly (for example, if you want to do case-sensitive searching through a view – of which I’ve included a sample file) can be expensive.
- Changing a database’s collation is supported but you might have troubles converting from a case-sensitive to a case insensitive database.
The files that can help you to get more familiar with verifying collations, changing collations, etc. are CaseSensitiveSearchingInView.sql and ChangingDatabaseCollation.sql.
Q: Another great use of adjunct tables is to isolate culture-specific data from the base (and usually indexed) data. So, this is a great point. If you change the collation for a column – and index it – then you can get better performance when accessing that column. Changing a collation on the fly (as in the script sample for the previous questions), is what’s expensive. However, you should make sure to do a lot of testing.
Q: I have a US code page OS but Latin general 850 collation set for SQL Server; I have no problems storing and retrieving data. Exactly! That’s the benefit of collation information being stored with each column. It allows the application to store many types of data – natively and then just deal with their display at the client. For those of you who want more details on how to do this, see COLLATE in the books online.
Q: Somewhere along the line I was always told to define varchars in multiples of 8 characters. Something to do with space allocated with new rows. Any ideas if it has any premise? Well, my first response to this was that I’ve never heard of this… So, I thought I’d check around and well, the couple of folks I’ve chatted with haven’t heard this either… IF anyone has – feel free to comment on this blog.
Q: How about images? Is it better to store them in the db or just the url? This is really an old debate. I still think there’s validity to both sides of it as well. I’d say that you should go through the pro’s and con’s and really choose what’s best for your application. If you use a lot of urls then you might be able to reference more without having to manage a copy of that data… If the url is your data then you might find that managing it in the database (from a backup/restore perspective) is easier – and you know that it’s always going to be there v. the url occasionally not working. Wow, I’d almost like to get a panel of folks together to have everyone hash out their opinions on this one. J SQL Server 2005 does make managing images easier but there are still points to both arguments!
Q: How would SQL Server support becoming a data server of video file? The only real format for this data would be binary. However, there is a 2GB limitation to a single value.
Q: Are you aware of a stored procedure to handle blob data effectively? i.e. for image storage? No. But there is an executable that comes with SQL Server 2000’s samples and I think it’s called textcopy.exe. That might be some good sample code to review.
Q: Does SQL Server support the SQL 2003 NULLS FIRST and NULLS LAST keywords of the ORDER BY clause? No support for NULLS FIRST or NULLS LAST but the order by can include ASC for ASCending or DESC for DESCending. NULL values are treated as the lowest value so NULLS FIRST would be ASC and NULLS LAST would be DESC. But – that’s the only support for specific placement of NULL values.
Q: Is there a "best practice" for the new C# nullable types? i.e. int? vs. int — re:MSSQL2005? When programming with the SQLCLR, always use types that allows NULLs (assuming your data can contain nulls) and then make sure to always follow the appropriate behavior with NULL. For example, concatenating NULL should yield NULL.
Q: Can I change (refactor) the underlying type of a UDT (e.g., from int to bigint) in SQL Server 2005? Unless I've missed something, I can't do it in SQL Server 2000 unless I drop the type from all uses. No, as the user-defined data type (UDDT) is only used at create time to map to the datatype; the connection is then broken (altering the UDDT definition will only impact new tables and variable definitions). User-defined Types (UDTs) are even more schema bound than UDDTs and it's a much tougher job to alter them (you need to write conversion from the old to the new). This is much more complex since they fall outside the conversion matrix and the only thing SQL knows is the binary representation. So if the data type was smart enough to add a signature to the serialized data, so that the next version can understand which data type is represents and determine if it can convert to it you might have a shot, otherwise you have to add a column, and convert columns and drop a column afterwards.
Q: Can I easily replace the SQL Server 2000 UDT with a SQL Server 2005 CLR Custom Type? I have a client who wants to — eventually — migrate to SQL Server 2005, but for the immediate future wants me to build their application with SQL Server 2000. I'm very interested in SQLCLR types, especially WRT to maintaining type consistency across T-SQL, CLR SP's and external data objects, so using CLR custom types. This is a typical thought for the SQLCLR – “I am going to build an object database.” Type fidelity between the client and the server exists today, but they want class fidelity, so they are going to create the customer object UDT, which is not really for what the SQLCLR was intended. You should be thinking scalar types with additional custom characteristics – not object types – as the scalar types will give you optimal and predictable performance.
Q: Can you find ANSI_NULL_DFLT_OFF setting as on or off? Does DBCC USEROPTIONS do that? Yes, DBCC USEROPTIONS is a nice, quick way of seeing the currently set LIST of session settings. However, there are better ways to check these settings programmatically. it’s not as nice to check it programmatically The files that can help you to get more familiar with ANSI Null Issues is ANSINullIssues.sql.
Q: It would be great if you could show us a list of classic fields "Account No", "Account Balance", "Description" etc… and tell us what type you would use inc. "Null/Not Null" etc. Well, this is a bit subjective and with a bank account there would probably be a lot of business rules that surround the account number. However, if this were a sales table and you needed an ordered then it would be a bit easier (again, business rules might change this). Typically, I’d go for a Primary Key which is simple and narrow and not necessarily a “natural” key. However, some of this reasoning comes from how tables are joined and how SQL Server internally stores data. Account Balance – I’d go with a precise numeric. Based on length of storage and the business rules behind required precision, you might choose a numeric(19,6). This will require 9 bytes of storage. For description, I would definitely go with a variable type – probably nvarchar(200) depending on what description really means. Here’s how the table would probably look on creation:
CREATE TABLE Account(
AccountID int IDENTITY NOT NULL CONSTRAINT AccountPK PRIMARY KEY,
Balance numeric(19,6) NOT NULL CONSTRAINT AccountBalanceDflt DEFAULT 0,
Description nvarchar(200) NULL)
INSERT Account (Description) VALUES ('The account of Kimberly Tripp.')
PRINT 'Kimberly Tripp''s account number is ' + convert(varchar(10), @@IDENTITY) + '.'
Q: In the Q&A you might want to tell your listeners that in SQL the string 'abc' is equal to 'abc ' when compared – unlike for C, C++ ,C# , VBNET, etc. This will help with the char v. varchar issue… It’s a good point! You should make sure that you do some testing to make sure that data behaves the way you expect. Trailing blanks may be trimmed with some data types and not others and once again, ANSI settings may impact this behavior.
Q: What are the NULL block and varchar block in the row layout? The NULL block helps to store and identify rows that are set to NULL. In the past, SQL Server stored this as a single space in a varchar column ad there was no real way to differentiate between a real space (as a value) from a NULL value (stored as a space). When the storage engine was re-architected for SQL Server 7.0 the NULL block was added to store a NULL value specifically.
Q: How would the table structures differ between the 32-bit and 64-bit in SQL Server 2005 and SQL Server 2000? They don’t! There is no difference in the on-disk format between any of the versions of SQL Server. In fact, this was even true of the MIPS/Alpha/Win32 releases when SQL Server 6.0 released on all of those platforms.
Q: do you have a script to calculate rows per pages (using the formula you showed? Ha! Seemed a bit painful I know… But that formula works.But, yes – it’s a heck of a lot easier to just do this yourself if you have a table already. In SQL Server 2000 you need to use DBCC SHOWCONTIG with TABLERESULTS and in SQL Server 2005, you use a new DMV called sys.dm_db_index_physical_stats. This takes 5 parameters and has some really nice new features to it.
Q: I might have missed this, but is the SQL Server Management Studio (SSMS) the replacement for Enterprise Manager? Yes, SSMS replaces the Enterprise Manager and the Query Analyzer. And, for a more scaled down query tool, you can use Express Manager (XM). To download XM, you can go here. If you want more details regarding the latest SQL Server CTP, you can go here.
Q: Is Management Studio backward compatible for use with SQL Server 2000? Yes! You can use SSMS to connect to SQL Server 2005 and SQL Server 2000.
Q: How do we write our queries for Insert/Update/Select on a database that we have no idea how it was setup – so we don’t experience the local issue that you talked about? Basically, the best way is for the developer to create an interface that masks the complexities of the database schema through stored procedures, functions and views.
Q: Do indexes point to the row level or the page level? There are two primary parts to an index – the leaf level and the non-leaf level. The leaf-level of a clustered index IS the data (a clustered index is an ordered table).
Q: What is the best practice or rule of thumb on using identity primary keys on tables in a database? And would that be all tables and/or some tables or not all? This is a great question. However, I might phrase it a bit differently because I *think* I know where you’re going with this… Generally, I recommend that large tables have a clustered index on an identity column. This can help to improve performance of inserts by not causing splits (inserts go to the end) and by helping non-clustered indexes look up the corresponding rows in the clustered table by using a clustering key which is unique, narrow and static. This is really quite a bit related to the session on indexing… I think you should definitely attend session!
Q: How could a LOB end up in the leaf level? This is really another great question for the indexing session – and a bit on internals as well. In SQL Server 2005, rows can span pages (so you can have a row greater than 8060 bytes) and rows can have LOB types (even in the leaf level of an index). I’ll explain how this can happen with the new INCLUDE option for indexes – in the next session!
Q: Are you talking at the Vegas launch of 2005? Yes! You can see the complete list of my upcoming events here.
Comment: By the way, thanks for the great Web cast. I don't know of many other companies that do this sort of thing — take questions directly from their customers.
Thanks! This is part of what I love about these webcasts… it allows me to hear what you’re interested in and it allows me to keep learning. To be honest, often I have to look things up and/or hit the KB and/or ask some of the other SQL geeks I know – in order to answer some of these questions. It really helps to keep me on my toes! In fact, this week I bothered two of my favorite people, my partner Patrick and my good friend Gert.
Thanks again for another GREAT week of questions!
See you later today – Friday, Aug 26th