How to Handle Power Grid Failure

The Blizzard

On March 13-14, 2019, most of eastern Colorado experienced a pretty severe blizzard. A blizzard is when you have snow combined with sustained winds of 35 mph or more that last for three hours or more. This storm was unusual because it was characterized by extremely low barometric pressure. Lamar, CO had an official reading of 970.4 millibars, which may be the lowest in Colorado history. The lower the barometric pressure, the more severe the storm typically is. A storm like this is more serious when you have a power grid failure.

You may have heard this storm called a “bomb cyclone”, which is a reference to bombogenesis. Bombogenesis is when you have a pressure drop of at least 24 millibars in 24 hours. The weather station at my house recorded a low of 979.3 millibars around 10AM on March 13, 2019. You can convert from inHg to millibars by multiplying inHg times 33.8637526.

 

My Electric Company

I get my electricity from Intermountain Rural Electric Association (IREA). IREA is an electric distribution cooperative that covers 5,000 square miles in Colorado. They buy power from Xcel Energy and the Western Area Power Administration. They also buy some solar power from the 13-megawatt Victory Solar facility in Bennett, CO. Unfortunately, they also own 25.3% of Comanche Unit 3 in Pueblo, CO. Comanche Unit 3 is a 750-megawatt coal-fired generation plant in Pueblo, CO that went online in 2010.

I don’t like IREA. Until recently, they were extremely hostile to renewable energy, especially residential solar PV electricity. Unlike most utility companies, they have never offered any rebates or incentives for energy efficiency upgrades. This includes things like getting energy efficient appliances, upgrading insulation, getting a solar PV system, etc.. None of this is encouraged or incentivized by IREA.

IREA also used to include a regular screed in their monthly newsletter about the folly of renewable energy. It was too expensive, it was too unreliable, IREA had a multitude of objections. They have finally stopped doing that, giving in to the changing economics of renewable energy.

 

Power Grid Failure

We lost power twice on March 13, 2019. The first time was for about two hours from about 12PM till 2PM. That was not a big deal. Due to my preparations, we got through that with not much of a problem. The second outage was 16 hours, from about 5PM until about 9:30AM on March 14, 2019. That was less fun, due to the length and the time of day (and night) that it happened.

Our house is primarily electric. We do have a natural gas fueled water heater and furnace, which both require electricity to actually operate. They don’t have pilot lights, and you need electricity for the furnace fan and the exhaust vent fan for the water heater.

 

Residential Solar PV Power

We have a 9.7KW rooftop grid-tied solar PV system that went into service in July of 2015. It consists of (29) 335 watt SunPower SPR-X21-335 panels connected to two SunPower SMA 5000 TL US 22 inverters (PDF). This system has generated 63.5 MWh of electricity so far.

With a typical grid-tied solar PV system, when the public grid goes down, your solar system is automatically shutdown. This happens to avoid backfilling the grid and being a potential safety hazard to utility workers. This means you have no power at all when the public grid is down.

You can get special model inverters that have a dedicated “secure power supply” that lets you flip a switch during a power outage. If the sun is up, (after a short delay), you can then pull up to 1500 watts from two dedicated outlets next to each inverter. This would let you run a decent amount of stuff during the day during an extended power outage (like a zombie apocalypse). It would not have helped last night, since it was dark. This video explains how it works:

Sunny Boy TL-US Secure Power Supply

IREA Limitations

IREA limits you to a 10KW or smaller solar PV system. By Colorado state law, IREA has to offer net-metering, which means that when your solar system is producing more electricity than you are using, your electric meter will run backwards. IREA will track your net metering credits (on an annualized basis). They settle up every May, and issue a bill credit for your net metering credit for the year. Settling up in May is the worst time of the year for the consumer, coming pretty soon after winter.

IREA charges $10.00 per month to be connected to their system. Our solar PV system generates more electricity than we use just about every month. The exception is December and January, because of the shorter days and the sun being lower in the sky. The annual net metering credit is usually about $90.00, so we usually pay about $30.00/year for electricity.

 

Energy Storage

Even with SPS inverters, residential solar PV by itself will not get you through an extended power outage unless you have storage. You won’t get any solar power production at all during nighttime. Your production will also be lowered by bad weather, especially if you have snow on your panels. So far, I have some very limited energy storage capability. This includes:

(3) APC BX1500G UPS

(2) Cyber-Power LX1500GU UPS (two 12V 9.0 Ah batteries)

(1) DeWalt DCB1800B Portable Power Station (with four 20V 5.0 Ah batteries)

The APC and Cyber-Power UPS units both have about 216 watt hours of storage each. The DeWalt Portable Power Station has about 400 watt hours of storage. This means that I only have about 1480 watt hours total (nearly 1.5 kWh). So, I could run a 1500 watt load for about one hour, or a 100 watt load for about 15 hours. That really isn’t enough capacity for most purposes, but it is better than nothing!

 

Energy Storage Performance

In my case, I was able to keep my Arris SB8200 cable modem and ASUS router going through the entire outage. I was also able to keep our two cell phones charged. We had regular internet until Comcast’s neighborhood equipment probably exhausted their batteries sometime while we were sleeping.

Once the sun came up on March 14, I was able to pull power from my SPS outlets on one inverter in order to start charging most of the UPS units and the DeWalt Portable Power Station. I was also able to turn on a small electric heater plugged into the other inverter.

 

Charging from SPS

Figure 1: Charging From SPS Outlets

 

Power Grid Restoration

Luckily, the grid power was restored mid-morning on March 14. If this hadn’t happened, we could have probably limped along for an extended period, not being very comfortable. As it was, our interior house temperature got down to 55 degrees Fahrenheit after 16 hours with no heat. This was while it was 18 degrees Fahrenheit outside. Our house is very well insulated, with R-54 in the ceiling and about R-40 in the walls. We were losing about one degree per hour, which wasn’t too bad.

My Tesla Model 3 was sitting in the garage with a fully charged battery. Over time, it would lose its charge since it automatically keeps the battery pack warm enough to protect the batteries. When the house got too cold to bear, we could have slept in the Tesla with the seat heaters on. If I had a long enough heavy-gauge extension cord (which I don’t), I could have also slowly charged the Tesla from the SPS outlet during daylight hours. Officially, you are NOT supposed to use any type of extension cord to charge a Tesla.

 

Lessons Learned

Well, I don’t have enough energy storage to deal with an extended outage, especially during a blizzard in the winter. I knew that, but this experience reinforced that knowledge. I can keep a small amount of electronics going during the night, and recharge my storage during the day. I also have enough power generation during the day to do a little bit of electrical heating. But that isn’t enough to stay very comfortable.

On the other hand, my current setup can deal very well with short electrical outages. All of the UPS units are mainly there for surge protection and line conditioning. The battery capacity is just a small bonus. So what can I do about this?


Long Term Solutions

The cheapest solution would be a gasoline powered generator, hooked up to a transfer switch. But then I would have to be storing gasoline, and rotating it to keep it from going bad. Plus, I would probably only have a few gallons on hand. I could also siphon gasoline from my wife’s car, but her tank is only about 13 gallons.

Another solution would be a natural gas powered generator from somebody like Generac. This would work as long as you had natural gas.

Generac

Figure 2: Generac Guardian

 

The high-tech, expensive solution is a couple of Tesla Powerwall 2 units. These are 13.5 kWh battery packs with their own inverters. You can use these with or without solar PV. If you have solar PV, it will charge the Powerwall(s) first, before going back to the public grid. When the public grid goes down, you automatically pull energy from the Powerwall until it is exhausted. If you have solar PV, you will run off of that during the day, and then draw from the Powerwall during the night. This would be the most environmentally friendly solution, which is important to me.

 

support-backup-whole-essential_installation-02.jpg

Figure 3: Tesla Powerwall 2

 

Figure 4 shows how this works.

powerwall_energy_consumption

Figure 4: Solar PV and Powerwall

 

The Tesla app that I already use for my car also lets you monitor your solar system and your Powerwall units.

Tesla app

Figure 5: Tesla App for Powerwall

 

Conclusion

This was an interesting, if slightly uncomfortable experience. It was short enough that we did not get truly cold, and we did not lose the food in our refrigerator and freezer. It proved that my limited preparation so far can handle relatively short outages but not longer outages. This is especially true during a winter blizzard which limits your solar production during the day. I also got some real-world experience with how the SPS circuit works.

I am going to make a few small changes and improvements to my current setup, and then decide what I want to do for a much better setup that can handle a longer grid outage. Just like with SQL Server, going through an actual outage teaches you how resilient your system actually is. It also shows where you have room for improvement.

 

 

 

A SQL Server Hardware Tidbit a Day – Day 19

For Day 19 of this series, I am going to talk a little about RAID, which stands for Redundant array of independent disks or Redundant array of inexpensive disks, depending on who you believe.

RAID is a technology that allows the use of multiple hard drives, combined in various ways, to improve redundancy, availability and performance, depending on the RAID level used. When a RAID array is presented to a host in Windows, it is called a logical drive. Using RAID, the data is distributed across multiple disks in order to:

    • Overcome the I/O bottleneck of a single disk
    • Get protection from data loss through the redundant storage of data on multiple disks
    • Avoid any one hard drive being a single point of failure
    • Manage multiple drives more effectively

Regardless of whether you are using traditional magnetic hard drive storage or newer solid state storage technology, most database servers will employ some sort of RAID technology. RAID improves redundancy, improves performance, and makes it possible to have larger logical drives. RAID is used for both OLTP and DW workloads. Having more spindles in a RAID array helps both IOPS and throughput, although ultimately throughput can be limited by a RAID controller, HBA, NIC, or the PCI-E slot that is being used.

Keep in mind that while RAID does provide redundancy in your data storage, it is not a substitute for an effective backup strategy or a high availability/disaster recovery (HA/DR) strategy. Regardless of what level of RAID you use in your storage subsystem, you still need to run SQL Server full, differential, and log backups as necessary to meet your recovery point objective (RPO) and recovery time objective (RTO) goals.

There are a number of commercially-available RAID configurations, which I’ll review over the coming sections, and each has associated costs and benefits. When considering which level of RAID to use for different SQL Server components, you have to carefully consider your workload characteristics, keeping in mind your hardware budget. If cost is no object, I am going to want RAID 10 for everything, i.e. data files, log file, and tempdb. If my data is relatively static, I may be able to use RAID 5 for my data files. It is also fairly common to use RAID 5 for SQL Server backup files.

During the discussion, I will assume that you have a basic knowledge of how RAID works, and what the basic concepts of striping, mirroring, and parity mean.

RAID 0 (disk striping with no parity)

RAID 0 simply stripes data across multiple physical disks. This allows reads and writes to happen simultaneously, across all of the striped disks, so offering improved read and write performance, compared to a single disk. However, it actually provides no redundancy whatsoever. If any disk in a RAID 0 array fails, the array is off-line and all of the data in the array is lost. This is actually more likely to happen than if you only have a single disk, since the probability of failure for any single disk goes up as you add more disks. There is no disk space loss for storing parity data (since there is no parity data with RAID 0), but I don’t recommend that you use RAID 0 for database use, unless you enjoy updating your resume! RAID 0 is often used by serious computer gaming enthusiasts in order to reduce the time it takes to load portions of their favorite games. They do not keep any important data on their “gaming rigs”, so they are not that concerned about losing one of their drives. Even this usage is declining over time as SSDs become more affordable.

RAID 1 (disk mirroring or duplexing)

You need at least two physical disks for RAID 1. Your data is mirrored between the two disks, i.e. the data on one disk is an exact mirror of that on the other disk. This provides redundancy, since you can lose one side of the mirror without the array going off-line and without any data loss, but at the cost of losing 50% of your space to the mirroring overhead. RAID 1 can improve read performance, but can hurt write performance in some cases, since the data has to be written twice.

On a database server, it is very common to install the Windows Server operating system on two of the internal drives, configured in a RAID 1 array, and using an embedded internal RAID controller on the motherboard. In the case of a non-clustered database server, it is also common to install the SQL Server binaries on the same two drive RAID 1 array as the operating system. This provides basic redundancy for both the operating system and the SQL Server binaries. If one of the drives in the RAID 1 array fails, you will not have any data loss or down-time. You will need to replace the failed drive and rebuild the mirror, but this is a pretty painless operation, especially compared to reinstalling the operating system and SQL Server!

RAID 5 (striping with parity)

RAID 5 is probably the most commonly-used RAID level, for both general file server systems and for SQL Server. RAID 5 requires at least three physical disks. The data, and calculated parity information, is striped across the physical disks by the RAID controller. This provides redundancy because if one of the disks goes down, then the missing data from that disk can be reconstructed from the parity information on the other disks. Also, rather than losing 50% of your storage, in order to achieve redundancy, as for disk mirroring, you only lose 1/N of your disk space (where N equals the number of disks in the RAID 5 array) for storing the parity information. For example, if you had six disks in a RAID 5 array, you would lose 1/6th of your space for the parity information. As you add more disks to a RAID 5 array, the chances of losing any one of the disks goes up (due to simple statistics), so that is a reliability consideration for larger arrays.

However, you will notice a very significant decrease in performance while you are missing a disk in a RAID 5 array, since the RAID controller has to work pretty hard to reconstruct the missing data. Furthermore, if you lose a second drive in your RAID 5 array, the array will go offline, and all of the data will be lost. As such, if you lose one drive, you need to make sure to replace the failed drive as soon as possible. RAID 6 stores more parity information than RAID 5, at the cost of an additional disk devoted to parity information, so you can survive losing a second disk in a RAID 6 array.

Finally, there is a write performance penalty with RAID 5, since there is overhead to write the data, and then to calculate and write the parity information. As such, RAID 5 is usually not a good choice for transaction log drives, where we need very high write performance. I would also not want to use RAID 5 for data files where I am changing more than 10% of the data each day. One good candidate for RAID 5 is your SQL Server backup files. You can still get pretty good backup performance with RAID 5 volumes, especially if you use backup compression and striped backups.

RAID 10 and RAID 0+1

When you need the best possible write performance, you should consider either RAID 0+1 or, preferably, RAID 10. These two RAID levels both involve mirroring (so there is a 50% mirroring overhead) and striping but differ in the details in how it is done in each case.

In RAID 10 (striped set of mirrors), the data is first mirrored and then striped. In this configuration, it is possible to survive the loss of multiple drives in the array (one from each side of the mirror), while still leaving the system operational. Since RAID 10 is more fault tolerant than RAID 0+1, it is preferred for database usage.

In RAID 0+1 (mirrored pair of stripes) the data is first striped, and then mirrored. This configuration cannot handle the loss of more than one drive in each side of the array.

RAID 10 and RAID 0+1 offer the highest read/write performance, but incur a roughly 100% storage cost penalty, which is why they are sometimes called “rich man’s RAID”. These RAID levels are most often used for OLTP workloads, for both data files and transaction log files. As a SQL Server database professional, you should always try to use RAID 10 if you have the hardware and budget to support it. On the other hand, if your data is less volatile, you may be able to get perfectly acceptable performance using RAID 5 for your data files. By “less volatile”, I mean if less than 10% of your data changes per day, then you may still get acceptable performance from RAID 5 for your data files(s).

A SQL Server Hardware Tidbit a Day – Day 11

For Day 11 of this series, I am going to talk about some of the basic things that you should consider from a hardware perspective when you are trying to increase the basic resiliency and availability of an individual database server. These are some of the first steps you would take as part of designing a high availability solution for your data tier.

The basic principal here is to try to eliminate as many single points of failure as possible at the hardware and configuration level. I believe you should do these things regardless of what other high availability techniques you decide to use. When you are choosing components for a database server (as opposed to a web server, for example), here are some basic things to include:

  1. Two internal drives in a RAID 1 configuration for the operating system and SQL Server binaries. These drives should be using the embedded hardware RAID controller that is available on most new rack mounted servers. I try to get at least 146GB, 15K 2.5” drives for this purpose. Using 15K drives will help Windows Server boot a little faster, and will help SQL Server load a little faster when the service first starts up. Using 146GB (or larger) drives will give you more room to accommodate things like SQL Server error log files, dump files, etc., without being worried about drive space. Another increasingly viable alternative is to use two of the newer, entry-level data center SSDs, such as the 200GB Intel DC S3700 in a RAID 1 configuration to get even better performance and reliability for your system drive.
  2. Use dual power supplies for the server, each plugged into separate circuits in your server room or data center. The server should also be plugged into an Uninterruptable Power Supply (UPS) on each circuit, and ideally have a backup power source, such as a diesel generator for your data center. The idea here is to protect against an internal power supply failure , a cord being kicked out of a plug, a circuit breaker tripping, or loss of electrical power from the utility grid.
  3. You should have multiple network ports in the server, with Ethernet connections into at least two different network switches. These network switches should be plugged into different electrical circuits in your data center. Most new rack mounted servers have at least four gigabit Ethernet ports embedded on the motherboard.
  4. You should have multiple RAID controller cards (if you are using Direct Attached Storage), multiple Host Bus Adapters (HBAs) if you are using a fiber channel SAN, or multiple PCI-e Gigabit (or better) Ethernet cards with an iSCSI SAN. This will give you better redundancy and better throughput, depending on your configuration.
  5. Wherever your SQL Server data files, log files, tempdb files, and SQL Server backup files are located, they should be protected by an appropriate RAID level, depending on your budget and performance needs. We want to keep our databases from going down due to the loss of a single drive. One thing to keep in mind, is that RAID is not a substitute for an appropriate SQL Server backup and restore strategy!  Never, never, never let anyone, whether it is a SAN vendor, a server admin from your Operations team, or your boss, talk you into not doing SQL Server backups as appropriate for your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) metrics.  I cannot emphasize this point enough!  There is absolutely no substitute for having viable SQL Server backup files.

Despite this fact, you will undoubtedly be pressured multiple times in your career, by different people, into not running SQL Server database backups for one reason or another. You really need to stand your ground and not give in to this pressure. There is an old saying: “If you don’t have backups, you don’t have a database”.

I also want to note one configuration setting I like to use for database servers, to reduce their boot and SQL Server startup time. For a standalone database server, reducing your total reboot time has a direct effect on your high availability numbers. I always go into the BIOS setup for the server, and disable the memory testing that normally occurs during the POST sequence.

This will shave a significant amount of time off of the POST sequence (often many minutes), so the server will boot faster. I think this is pretty low risk, since this testing only occurs during the POST sequence. It has nothing to do with detecting a memory problem while the server is running later (which is the job of your hardware monitoring software). I am sure some people may disagree with this setting, so I would love to hear your opinions.