In this episode of the State of Enterprise IT Security podcast, Brad Bussie, e360 CISO, joined by Erin Carpenter, e360 Marketing Director, explore the recent critical issues faced by CrowdStrike due to a defective software update. They examine the incident's widespread impact on Windows systems, discuss preventive measures organizations could take to avoid similar problems, and discuss the broader implications for cybersecurity and organizational resilience. The episode highlights the importance of robust testing processes and diversified IT environments in maintaining system stability and security.
Key Topics:
Listen to the Episode:
Episode 27: From Updates to Outages: A Deep Dive into CrowdStrike's Recent Issue
[00:00:45] Hey everybody! Brad Bussie, Chief Information Security Officer here at e360. Thank you for joining me for the State of Enterprise IT Security Edition. Today, we're going to change it up a little bit, and we're going to be talking about CrowdStrike.
[00:01:01] And I'm sure all of you have had some kind of. An interface with with this issue that CrowdStrike had, and we're going to dive into it a little bit, and I'm pretty excited because joining me today is Erin Carpenter, and she is our Senior Director of Marketing and what you all may not know is she's the one that makes the show look awesome and helps with a lot of the content development as well as Just making sure that you're getting a consistent experience.
[00:01:37] So this is exciting for me. It's always good when I'm not here just by myself. So Erin, thank you for joining me today.
[00:01:45] Erin Carpenter: Thank you so much for having me, Brad. It's a pleasure to be in front of the camera this time. And, on that note about consistent experience. We are providing an inconsistent experience today, but, you [00:02:00] know, I mean, truthfully, this is, it's actually exciting to be in front of the camera for this particular episode because, while I'm not a CISO and I'm not an IT leader, unless you're living under a rock or completely, completely off the grid,...
[00:02:15] you're hearing about CrowdStrike. We're hearing about Microsoft. We're hearing about the blue screen of death... and it's, it's a hot topic. So even I have a range of questions and, you and I are going to discuss that today. And I mean, you and I offline go off on tangents and I'm asking you these technical questions because I'm so curious and I, and I know so many other people are as well, who, are in my shoes. So let's dive in and, I want to hear what you have to say about this.
[00:02:46] Brad Bussie: Yeah. I honestly, I think I'll start with, the first piece of this, and this is something that, that CrowdStrike has gone out and been very clear on is that this was not a cyber attack.
[00:03:03] It was a software update that proved to have a defect that really impacted most. Windows systems, and this impacted not just a laptop or a PC. There were some servers that were impacted, and it also impacted cloud systems. So if there was a cloud instance that was running the windows operating system, it blue screened.
[00:03:34] As well, so it is interesting how far reaching this impact was, and I'm sure by now many of you have heard like the Mac operating system was immune, the Linux operating system, and Unix was immune. It was mainly Windows machines that were running CrowdStrike and that were turned on during the [00:04:00] update period.
[00:04:01] So this was like Thursday night, and It's interesting because there were some Windows systems that were actually not impacted by this, and I started looking into why. And you know, by now, I'm pretty, I'm not going to say maniacal, but I like to make sure that things are updated. and it turns out that the Windows systems that were not impacted were ones that weren't patched.
[00:04:29] they were running an older version of Windows, whether it was, you know, a month or two back or, or whatever. Those were not impacted. So, fun fact, some of us woke up to blue screen of death. Others woke up and were like, what's everybody talking about? My system's working fine.
[00:04:49] Erin Carpenter: Hey, by the way, go figure.
[00:04:51] Don't use that as an excuse to not update your systems. Right?
[00:04:54] Brad Bussie: Right. Exactly. Yeah. Yeah. And that's and I think we'll touch on that today, too, because, you know, that kind of leads to we are always talking about time is of the essence when it comes to cyber security. And if there's a zero day. threat or a zero day attack.
[00:05:13] Sometimes the only protection that you have is when it is actually patched. Whatever piece of software is vulnerable, it gets patched. We push it out and that system is, is now safe. So we're definitely not condoning, not updating things.
[00:05:33] Erin Carpenter: That's right. That's right. Well, thank you for the quick recap. Really appreciate that.
[00:05:37] So I'm curious. What could we have done differently? And, and you and I have talked in the past, there are so many different perspectives to, discuss around this, a leader perspective from an end user, from a perspective of, well, just go ahead, what could we have done differently?
[00:05:57] Brad Bussie: So I'm going to take this two different [00:06:00] ways.
[00:06:00] First of all, having been a product manager. I have built software, built security software, and I've, seen this type of an issue happen before. We write something, we, we push it to a system and the system doesn't like it for whatever reason. And that system ends up crashing. And then we have to roll back what we did.
[00:06:26] And, and try again. So normally this type of a, I'm going to call it a defect in, in an update. It's caught during a QA process. So what that means is I have maybe we'll just, I'm going to use a fake number. We have like 20 different systems. Some of them are Linux. Some of them are Mac. The majority of them are our windows PCs.
[00:06:51] Now they're supposed to be running a sample of like real world. patch levels, real world, different software installed because you're trying to catch anything that could potentially impact your user base. So, generally, these types of problems are caught in QA. It gets rolled back, we try again, and then we push out to the QA machines again, and make sure everything works.
[00:07:23] Now, I think we still need some more time to hear from CrowdStrike exactly what happened and why this particular defect wasn't caught before it got rolled out to everyone. So the second piece of this, if I'm looking at it from a IT perspective. I'm gonna remove, I'm taking off my CISO hat, and I'm putting on, like, my origin story, which was, I was in IT before I was in cyber.
[00:07:56] And I was patching machines, I was doing all those things, I was a help desk [00:08:00] guy, I was running around crazy. But one of the things that I always made sure of is that we didn't have automatic updates turned on in production, and I know that's challenging for cyber security professionals, because we want to get that instant update to make sure that we are securing systems
[00:08:21] and, making them less vulnerable In this instance. I think it shined a light on, we need to have a process where even for security updates, we test them in a development environment, we then move on to what I would call a staging environment, which is maybe a small subset of my organization, maybe 20, maybe 50 different machines of different operating systems and patch levels. They get the update. We see how that goes for a day and then we push it to production.
[00:08:56] Obviously, if there's a very impactful zero day, we need a process where we can push it all at once and, you know, kind of cross our fingers and hope we don't have a blue screen problem.
[00:09:08] But for this one in particular, like if I'm looking at what could we have done differently. This is a bit challenging because CrowdStrike, and this comes from George, the CEO himself, he said, like, they have not changed their deployment process since the beginning. The way that they update the Falcon sensor is the same way they've always updated it since day one.
[00:09:33] I think what we've experienced is we've hit a critical mass of how many Organizations are running CrowdStrike, and I think what it comes back to is, I've even said this, no one ever got fired for buying CrowdStrike, and that has hit us Globally, because CrowdStrike's got a good product, they got a good sales team, and they've, they've acquired [00:10:00] and built a platform.
[00:10:02] So it does a lot more than what they first started doing with, with Falcon back in the day. So what could we have done differently? Hindsight being 2020, only a couple of things. But you have to look at it as this is kind of unprecedented. Nothing like this on the scale has happened before, even with ransomware attacks, even with some of the, you know, kind of the more famous breaches,
[00:10:31] hospital system taken offline, there were millions and millions of devices at the same exact moment taken offline. And honestly, I think that shines a light on, we've gotten a little as a, I'll say as a culture, a little bit lazy on the way that we are patching and doing software, rollouts, and I think because it's security software, we're like, let's do this up to date in the moment.
[00:11:04] And I have a strong suspicion that this is going to be different, going forward for a lot of organizations.
[00:11:11] Erin Carpenter: Yeah, Brad, I was reading a bunch of Reddit threads and some of them were, comparing this to the likes of, of the Y2K that didn't happen. also, saying, Oh, this is a, an extinction level event for CrowdStrike.
[00:11:29] gosh, I hope, I really hope not. I really hope not. What are some of the other commentaries, some, you know,.. What are the broader implications though, for the other solutions, the other platforms that we have in our systems? I mean, we have an over reliance on big name platforms and big name solutions.
[00:11:52] This could happen. I mean, you've said it to me. This could happen with any cybersecurity platform with any [00:12:00] other systems as well.
[00:12:01] Brad Bussie: It could. And I think this goes back to the concept of cyber resiliency and just resiliency in a system overall. And I think what we proved is that as a whole, we are not resilient when it comes to our systems.
[00:12:27] And what I mean by that is if I were going to look at this Backwards and I were going to say, how do I make sure that I can survive as a company, as a business, as a team, I need to build in some redundancies and differences within my. So, maybe half of my team should be running Macs. Maybe the other half should be running PCs.
[00:12:57] Maybe we should have a split in our cyber security, I'll call it platforms. Now, the challenge with this is it introduces complexity. And from a cyber perspective, complexity is, is how most organizations will end up getting breached because it's just too challenging with the size of teams that we have, honestly, the size of teams that we can afford.
[00:13:28] And the talent that's actually out there. So trying to do these things that I'm suggesting are a little bit challenging, but if I want to prevent something like this from happening, I would focus more on like critical infrastructure, like things that can't go down, things that shouldn't have happened that did, you know, when you look at an airplane, how many redundancies are built into that airplane?
[00:13:56] Sometimes for systems, there's like six different redundancies. If [00:14:00] one fails, the second picks up. If that fails, there's a third piece that picks up. So, so I'm always looking at this as should we design our systems like an airplane where there's six different things. That have to go wrong before the thing has an actual problem.
[00:14:17] But then I look at how expensive that would be. And that's, I think why we had this significant impact that we did, because we are relying on these, I'd say a subset of very large platforms because it's become more cost effective. And that's really, I think where all of this is narrowing down to is.
[00:14:40] Affording a resilient solution is probably going to be out of the reach of most organizations. Are there ways? Sure. I don't think we need to talk too much about that today, but I think we are a bit over reliant on too, too few of these platforms. And I think what you're going to see coming out of this, and this, this may actually be like a government mandate.
[00:15:09] And I'm, trying to wrap my brain around what the implications would actually look like is if, let's just say the federal government steps in and says only a certain number of critical infrastructure, like firms or departments can have CrowdStrike. The rest need to have something else, or only half can have CrowdStrike.
[00:15:34] The other half of the same agency or department have to have something else. Sentinel one, you know, Microsoft defender, Palo, like whatever. And that way we, we make sure that if there's a failure in one, not everything goes down at once. That's a potential. We'll just, we'll just kind of have to see. I think it's still kind of early and I've, you know, I've watched a lot of podcasts where people are [00:16:00] talking about this.
[00:16:00] I've seen some pretty shady stuff from some of the CrowdStrike competitors that are like, our infrastructure is better. The way we design things is better. Come, come on home. And, and that's just like, that's a low blow and you know, not a lot of class right now. Let's just say.
[00:16:18] Erin Carpenter: Yeah, yeah, this is not the time and we all know that it could happen to anyone.
[00:16:23] Yeah, it could happen to anyone. You know, essentially , we didn't prepare for this, but, your analogy with airplanes made me think because I did used to fly and I remember there's a reason why I never flew an airplane with one engine and there's a reason why there's always a second pilot for the human redundancy as well.
[00:16:45] Could this Add an incentive for these companies to have more failover technology of some sort. So forgive my ignorance in that question, but I'm, I'm just thinking, are there other failover type technologies that the security firms could put in place so that if something like that happens?
[00:17:07] Brad Bussie: Yeah. This one, I mean, this one in particular was, I don't think there's anything that CrowdStrike could have built in, to prevent or to help this one, unfortunately, because what ends up happening is when a system starts to blue screen, it takes a person behind the keyboard to remediate it will say 9 out of 10 times.
[00:17:32] And if you read, there were some, some reports that, Hey, if you reboot your system 15 times, because of this blue screen, it'll come back. And maybe that worked for a handful, but for most that, that did absolutely nothing. And I think what this shed light on is there's one very important thing. That is cyber professionals that have kind of been ingrained in us, which is we need to [00:18:00] protect the operating system and we need to protect that hard drive or that image.
[00:18:07] So what do we do? We encrypt it. And what a lot of people aren't talking about is that is what made this so incredibly complicated because I couldn't just call you, Erin. I couldn't just say, hey, I want you to hit the button 20 times. It'll drop you into a command prompt. And then I want you to enter this command and then reboot and you're back up and running that.
[00:18:33] That's pretty easy to do. Here's how it actually went, though. Hey, Erin, I have to find your BitLocker key because BitLocker is encrypting and protecting this hard drive and you can't even get to the command that is required to delete the file because it's protected in the operating system. So here's what happens.
[00:19:02] I go and thankfully our IT department is pretty awesome. And, we had all of our keys logged with the, the device and I just called Chris, our IT guy, and I said, Hey, do you happen to have my encryption key? Yeah, here you go. Boom. He emails it to me, granted, I was on my phone cause my laptop was blue screened. But I was able to take the key, type it in, run the command.
[00:19:29] I was back up and running in about five minutes. Unfortunately, a large group of companies have their device It's blue screening. They called IT what do I do? IT realizes they don't have the key. They don't have the encryption key anymore. They don't know what happened to it. Was it lost? Was it ever recorded in the 1st place?
[00:19:54] Did they use the right kind of image? That is unknown. [00:20:00] And the sad thing is, there's probably five different things to try. CrowdStrike's got a bunch of stuff right now, as far as tools that you can run. Microsoft has published a couple of things, but then they kind of point the finger and say, we'll go to CrowdStrike.
[00:20:15] I've actually talked to a couple of different, companies, they couldn't find their keys and they are now re-imaging every single device that is blue screening because they cannot delete that file because they cannot unlock that drive. So the ones that you're hearing about that are still down, they either have, you know, 100,000 endpoints with this problem and a small IT staff, or they are now in the middle of re-imaging hundreds, thousands, tens of thousands of devices.
[00:20:55] I
[00:20:57] mean, I can't imagine the extra time and expense. Involved in doing that not to mention the business interruption costs, which are significant
[00:21:08] I've heard some some crazy stuff like a CISO friend of mine he's keeping track of every person that's spending hours and all of the the other hands that they're recruiting, And he's like I'm gonna send that bill to CrowdStrike. We were kind of laughing because he's like, you know, that's probably not going to do anything But then we started to realize, there's probably going to be a class action lawsuit, at least in the United States, where I think a lot of companies are going to come together
[00:21:39] and I think CrowdStrike is going to be found, maybe not fully liable, because I think some of the things we talked about, there were ways that organizations could have kept this from completely doing like a business stoppage. For instance, like if a machine was [00:22:00] sleeping or was offline, when the update initially got pushed out that Thursday, CrowdStrike caught this error pretty quickly and pushed out the kind of the proper fix.
[00:22:12] And anything that booted up the next morning did not have the problem. So I think it's going to be somewhere in the middle. And I, and I believe that there's going to be something that happens. And I think Erin, you're going to ask me something that you and I were talking about too, about our, our friends across the pond, how that's going to potentially shape up.
[00:22:34] Erin Carpenter: Yeah, let's talk about that. So, GDPR. There was an article in fast company this morning about CrowdStrike having European size data problems on its hands. So the, intro talked about while CrowdStrike already faces criticism for bricking eight and a half million PCs worldwide, wide data protection experts say it's possible the company also breached data protection laws.
[00:23:05] So this spills over into a whole other set of issues. Let's talk about that.
[00:23:10] Brad Bussie: It does. Okay. So I think when you look at this and you think, well, wait a second, GDPR, we're talking data privacy. We're talking about, you know, a breach. We're talking about the generally the exposure, the theft or the improper use of information that doesn't fit this particular incident.
[00:23:30] But what I will say is if you zoom in on some of the verbiage in GDPR, one piece of it is availability, availability, someone or, or some company impacting a user's ability to access their information. And I, feel the, the way that they're framing this is the fact that systems were down. [00:24:00] You know, unable to access data, unable to log into things, you were blue screening, whatever, that could be a violation of availability per GDPR.
[00:24:12] And I think the letter of the law is CrowdStrike would have had to have alerted, and said something to the EU about the fact that there was a challenge here. And that didn't to our knowledge, it didn't happen. I think it's going to remain to be seen, like, what actually comes of this. But I think that's the angle that they're taking in the article is they're still reviewing if this is actually something that could go to court, but it's looking plausible.
[00:24:51] Erin Carpenter: Yeah. I mean, that's, that's quite a twist in the story. It's you know, that's, that's going to be interesting to see how that plays out.
[00:24:58] All right. All right. So last topic on my mind. well, actually we have a couple more, but let's say that you know, I'm a security leader and I had all of a sudden I have a lot of pressure that I did not have before from the CEO, the board, people around the organization that say, you know what, what do we do?
[00:25:24] Marker
---
[00:25:24] Erin Carpenter: Are we going to continue using CrowdStrike? what are you doing to prevent this in the future? yeah, we need to move away for legal, security, PR reasons? How do you, how do you defend? What decision do you make, first of all, and how do you defend that decision should you decide to keep them as part of your solutions stack?
[00:25:45] Brad Bussie: Right. Boy, this is, this is a can of worms. so I think it depends on the organization. I know a couple of organizations that were pretty much all Mac [00:26:00] OS, didn't even have a challenge. Their challenge came from the ripple effect. The, the third parties, like for instance, I went onto Amazon. I wanted to buy something and there was a banner up top that said, due to a third party IT outage or issue, your delivery may be delayed one to two days.
[00:26:25] And I saw this banner all over the place. So just because that organization that had all Macs wasn't impacted, didn't mean they weren't actually impacted down the supply chain. So I think what this exposes is an over reliance like, like I talked about. So if I'm an organization that is leveraging CrowdStrike, I have to, and I'm sure you could see it on my bookshelf, have a little bit of extreme ownership of this, of this particular, it might be over there, but this,
[00:27:00] Erin Carpenter: I love Jocko Willink.
[00:27:02] Yes. Let's hop to him. Great book.
[00:27:03] Brad Bussie: Yes. He's a, he's a good dude. So I will have to look at this and say, what could I do differently? And as a collaborative CISO, I've already spoken with our IT manager and leader over there, and I said, Hey. Let's create a process and procedure where we do just like we do with windows updates, we apply it to some dev machines, then we apply it to a small subset of our user base in staging.
[00:27:33] Then we push it to production. I'd like to do the same thing now with CrowdStrike. So we, we turn off the auto update. But we need a break glass scenario and someone who was watching for critical updates that are zero day related For a while, I'm going to be a little hesitant to push all of that at once, but I'm going to [00:28:00] make sure we push it,
[00:28:02] we watch for a day and then maybe we push it to everything else. Also, for us. And this is what I would recommend to other organizations is updating your incident response policy and procedures, because I think a lot of us have this where we, you know, we don't have access to systems anymore or the cloud or the internet or whatever.
[00:28:29] But I don't think all of us were really prepared for. Hey, like I don't have access, but neither does the rest of my supply chain. And we still have to do business. So like, what do we do? So I think we're going to, we're going to be rewriting some of this and bringing in more of the resiliencies that I talked about where we have the ability to recover from something like this.
[00:29:01] And I'll even get a little more nerdy here and say, for instance, like when I rebooted my machine and I was in safe mode, and I realized that I didn't have my BitLocker key, what I did have access to is restoring to a previous point in time. This is Windows functionality. But I hate to say it is from a security perspective,
[00:29:24] I asked that that stuff be turned off. So somebody couldn't go back to a period of time and circumvent a security control a patch, something being pushed to a machine. I didn't want somebody to get physical access to a device. Put it in safe mode and go to a previous restore point. I think some of that needs to now be balanced As what is the impact from a cyber perspective to what is the impact of business operations.
[00:29:56] So I think again, I said this it's going to be somewhere in the middle [00:30:00] Do we still allow a windows device to have? A a restore point. I think we need to You We need to talk about that.
[00:30:09] So, if I'm being asked by the board, you know, should I keep CrowdStrike? I would say, let's go back to Monday of last week.
[00:30:24] CrowdStrike was, was generally considered the best endpoint security solution on the market. And I think that's why they've hit so hard on the fact that this was not a security event. It wasn't a breach. It was bad code that then became an it issue that then became an availability issue, which becomes a cyber issue.
[00:30:50] So I think you have to kind of look back and say, what's changed? And if I can make a couple of changes in, in my, organization that is going to help me not just if there's another Crowdstrike event of, of pushing a bad update, but if somebody else ever does it, then, I mean, this has happened before, for Microsoft too, where stuff was blue screening, not on the scale.
[00:31:17] And I think that's where we keep going back to the scale of this is why it's so painful. So I think it's going to be a blend of some of this, I know that just for our organization, we're going to make a couple of changes, and it's really going to come down to each organization saying, you know, what's your risk tolerance, and I'm always a, when something happens once, let's all make some changes and let's be cautious.
[00:31:47] If it happens again, then it's time to do something completely different.
[00:31:50] Erin Carpenter: Fool me once, that's on you. Or fool me, what's the saying, fool me twice? Fool me once, yeah. [00:32:00] Yeah, yeah. That's wrong. Actually. So, what I hear you say also is regarding the communication is. Let also let them know this could happen no matter what and here's our plan, right?
[00:32:12] Here's our plan to address it in the future. And I almost wonder if it's kind of like that restaurant you know the restaurant that will be when they have a black swan health issue, right someone gets sick and and there are a lot of PR issues around it. Well, guess which restaurant is probably the safest one to eat at after that?
[00:32:33] Brad Bussie: Right.
[00:32:33] Erin Carpenter: Yeah. It's probably, cause they've learned. So you know, I guarantee you CrowdStrike, well I shouldn't guarantee, never guarantee. but I bet that they are more prepared than any other security firm to handle something like this in the future. So they're.
[00:32:49] Brad Bussie: Oh yeah. And, and having, you know, been a product manager, I can say maybe they're going to reimagine and reinvent their update.
[00:33:00] capabilities and their, their process, because they've been doing this the same way since day one of Falcon's inception. This is how they, they push updates. Maybe they're going to change it and make it different. So that I can't either can't blue screen or the likelihood. Of introducing something that is that incompatible with an operating system.
[00:33:25] You like that won't happen again, or you're going to see a huge improvement in their QA process. And I don't know, there's so many different things that I think they're going to be doing. And if you look at some of the. Cybersecurity firms in the past that have gotten breached themselves, and I won't name their name because I don't want to throw shade, but they, they have a program that they call Project Bedrock, where they went back to the beginning, they looked at every line of code, they made it secure by design, they got [00:34:00] rid of their, their third party call centers, they brought it all back in house, they made some significant investments and changes, And the market does notice those things and I think in this instance, if there is a, there has to be like a kind of an, they need to go a little over in their, their change and like announcing what they're doing because if it's just, you know, It's more of the, we're sorry, things will get back to normal and then it's business as usual,
[00:34:34] that's going to be bad.
[00:34:35] Erin Carpenter: No, for sure. I bet their PR team is working overtime.
[00:34:41] Brad Bussie: A hundred percent. Yeah.
[00:34:42] Erin Carpenter: That's great. Well, that's all I have. Is there anything that I missed, Brad, that we should cover? I don't think so, but as always, if people have questions or if they want to go deeper on a particular topic, they can reach out to us on LinkedIn or any of the other socials and, you know, I'll happily sit down, have a conversation, Erin and I both, we would, we would love to sit down and have a deeper conversation.
[00:35:10] Yeah, please do. LinkedIn, however, you're listening to this, if you're watching it on YouTube, you're watching on LinkedIn...
[00:35:16] if you're listening on the podcast, just shoot us a comment. We'd love to hear. Well, thank you so much again. I really appreciate you bringing me on the show this for this episode. It was a lot of fun. And, until next one, have a good one.
[00:35:28] Brad Bussie: Thanks everybody.