Tech Sessions - Ep. 8 Maximizing Efficiency: Cost Optimization through SQL Server Re-architecture on Google Cloud

Written by Jeff Dickman | Oct 22, 2024 10:23:31 PM

Overview:

In this episode of the Tech Sessions podcast, host Jeff Dickman and e360's Senior Director of DevOps and Architecture, Roy Douber, dive deep into the world of SQL Server re-architecture on Google Cloud. Discover how e360 tackled a massive SQL environment plan for a client to deliver $30 million in savings over five years. Learn about the challenges, solutions, and key strategies for optimizing SQL performance and cost efficiency. Perfect for IT leaders and cloud enthusiasts looking to transform their infrastructure and achieve operational excellence. Don't miss out on these expert insights and practical tips!

Listen to the Episode:

Watch the Episode:

Key Topics Covered:

Initial challenges and state analysis of the client's SQL environment
Implementing failover clustering and automation for improved performance
Achieving significant cost savings and operational efficiencies

Key Takeaways:

The importance of a thorough financial analysis alongside performance testing in SQL re-architecture projects.
How failover clustering can reduce costs and improve disaster recovery capabilities.
The role of automation in minimizing operational toil and ensuring consistency
Insights into selecting the right storage solutions and configurations for high-performance SQL services
Pro tips for architects and engineers on testing, automation, and adaptability

Read the Transcript:

Maximizing Efficiency: Cost Optimization through SQL Server Re-architecture on Google Cloud

[00:00:00] Jeff Dickman: Hello, everyone, and welcome to today's tech sessions podcast. I'm your guest host, Jeff Dickman, filling in for Kevin Kohn today. Today I'm with Roy Douber, who is our senior director of DevOps and architecture at e360. we're going to be talking today about SQL migrations and opportunities for improvement with customers around cost efficiencies or technical upgrades, things like that that we can we can cover with that.

[00:00:34] And so, Roy, did you want to briefly introduce yourself and give a little bit of experience that you have within the cloud environment?

[00:00:41] Roy Douber: Yeah, so my name is Rory Douber. I've been here at e360 roughly two and a half years at this point. We've worked on projects in the data space, in the DevOps space, in the cloud space.

[00:00:55] this particular episode is around SQL. however, we tackle a lot of problems and that's kind of what I pride myself throughout my career. solving, solving difficult problems. and I have a past as a site reliability engineer lead as well.

[00:01:12] Pleasure. Pleasure to be here.

[00:01:14] Jeff Dickman: So Roy, in this, in this discussion that we're going to have today, we, we're going to be talking about taking an environment running a large amount of smaller servers and how we, how we are a large amount of servers and how we, how we scale that down for a customer to, to provide improved performance.

[00:01:30] Is that correct?

[00:01:31] Roy Douber: Yeah, absolutely.

[00:01:33] Jeff Dickman: All right. so let's get, let's get started then. can you describe the, the initial challenges that we had with the customer and how we, how we started to address those?

[00:01:43] Roy Douber: Yeah. So the client

[00:01:47] came to us with a particular challenge. First of all, they had a large footprint that was very difficult to manage.

[00:01:56] and they were struggling to just keep the lights on. They, [00:02:00] the environment was going in and out of sync. they were having a lot of operational toil within the environment and they were looking at, avenues to reduce that. and so the first thing we did was really look at their, not, not just their billing, but look at the state and look at how they service their customers.

[00:02:27] And index that, including the financial picture of what this looks like. Obviously, whatever re-architecture needs to be done, for operational toil or not, it needs to be financially sound. and we, so we started with kind of, the financial picture of the existing current state. And then we went from, we did the full current state analysis.

[00:02:54] We made sure we understood exactly how the customer, the client serves all of their clients. This is one of the largest estates in the world for SQL, in this specific, sector. and we, so we analyzed everything. We came up with a picture of what it looks like. we represented that to the customer.

[00:03:20] The customer agreed, yes, this is what it looks like. This is how much we pay for licensing. This is how much we pay for, Windows licensing. This is how much we pay for, you know, the boxes that are custom-sized to fit our clientele on. and, and so from there we gave them multiple options of, potential architectures before we even began any engineering, we were like, okay, well, here's, here's some things we could do.

[00:03:49] Right. And then we narrowed, we, we continuously chiseled down at kind of the top three solutions. and then we, we went deeper, [00:04:00] and, you know, the idea was to address those operational inefficiencies, to address the high costs, to take, to take, ideally to take the number of servers down and service more customers within a cluster, right, which was important for them because they had, some noisy, noisy client issues with the smaller servers.

[00:04:28] meaning one customer would go online potentially, and oftentimes they'd have to move customers around to kind of fit Into their SAAS better because certain clients were just too big to be good neighbors within the multi-tenant environment.

[00:04:46] Jeff Dickman: Okay. So it sounds like they were playing Tetris quite a bit with their servers.

[00:04:50] Yeah. Would you, would you say that the issues that this client was experiencing were that, that they're pretty common with, with customers SQL in the cloud?

[00:05:00] Roy Douber: I think so. Especially in a large multi-tenant environment where you're trying to optimize the database for each client. and, and, you know, any, any rising costs is, is directly offset to the client, right?

[00:05:17] So you don't, you don't want to raise your costs on a client continuously. So you're very sensitive to any of those types of changes. I would say it's, it's very, very common, to run into some of these noisy neighbor type situations and moving clients around. And what, what we've seen is that oftentimes the best way to address that is to have a larger moat essentially.

[00:05:46] So if you have a larger server that then, then the, then it, then it mutes some of the spikes. And, in usage, so a large client comes online, does some batch processing job [00:06:00] that they require for the SAAS, you know, that's what the, they, they're trying to do within the, the platform. that's okay. If you have a larger server that can handle that, that, that spike.

[00:06:13] in traffic. okay. So, so it creates, it creates a scenario where they can fit three times the clientele on a larger server, but yet the cost is not three times.

[00:06:25] Jeff Dickman: So does that mitigate the noisy, the noisy neighbors that we have to come on a bigger server?

[00:06:31] Roy Douber: Yeah, because you have, you have more capacity to handle, the traffic.

[00:06:34] So you've kind of, you're kind of muting the response. So, whereas initially you'd be serving, let's say, a hundred clients, you could serve 300 clients, and not all those clients come online at the exact same time. and so, your peaks, your peaks are lower, because you have more resourcing to handle traffic.

[00:06:57] The load.

[00:06:58] Jeff Dickman: Okay, cool. So as you as you worked through this, can you kind of give us an overview of the solution that you implemented with this particular customer?

[00:07:07] Roy Douber: Yes.

So we, so the customer had a very specific use case in that they required in memory databases. and so for this client, they were having a lot of pain with, always on availability groups.

[00:07:30] And so, we defaulted on a secondary option, which is failover clustering, which in its nature also allows you to have, one less instance online at any given time. So with a, with an always on availability group, you, you have to have three instances online as a minimum. and I believe. In some [00:08:00] cases you'd want to have even more than that because it just, again, it, it allows you to have more flexibility and it's also, there's some licensing magic where you benefit if you have more instances online, but they were running into a lot of desynchronization issues.

[00:08:17] And so. Failover clustering has a, it's much easier to manage and I feel like fits a bit more cleanly into the cloud than always on availability groups. You end up, using just a different set of configurations and, for, for this specific client, this was the right move. There is, there, there's certainly situations where always on availability.

[00:08:49] in this specific case, failover clustering makes the most sense, not only from a cost perspective, but from a data, you know, disaster recovery scenarios and other scenarios. It just made a lot more sense. and the idea is as well. So when you're using failover clustering, let's say you have those hundred clients that sit.

[00:09:16] Okay. on one instance, it's an active, active configuration. So if one instance does fail, the hundred clients fail over directly into the secondary instance. So you're still online, there's still high availability. there's still DR solutions that can facilitate a regional outage. and it still allows you to have kind of that, you know, the benefit of using a larger server.

[00:09:47] And not having to run kind of an N plus two style configuration, for, for, for your SQL, environments.

[00:09:58] Jeff Dickman: Okay, cool. [00:10:00] So we're, we're talking about one third less servers here, like what we, from a quantity standpoint, what does that really look like?

[00:10:08] Roy Douber: So, the, you know, the, the, the, depending on the, with this specific solution, there was a ratio requirement around CPU to memory that fits the profile of the usage of the client best.

[00:10:25] and so if you can find a high enough memory server and a low enough work count. That is beneficial for the clients and, and the, the workload, because. Windows gets licensed by the core, I believe, and SQL gets licensed by the core. Uh, and so, you're paying by the core, but if you can, allow for more memory throughput, great, right?

[00:10:57] Like your, your clients will have a much better customer experience. And so that was the premise.

[00:11:05] Jeff Dickman: So when you're, when you're looking at this, When you're looking at this setup, right, it sounds like it was a lot of servers. Did you have to like set up SQL on all of them or did you have an automated process for deploying SQL?

[00:11:18] Roy Douber: Oh, so everything is automated, and terraformed out and thought through down to the nitty gritty optimized configuration for, IO and how we optimize down to the very nitty gritty because this, these, these, these are production clusters serving clients that need need the data as quickly as possible.

[00:11:45] Jeff Dickman: So, okay, cool. when you're, when we're talking about SQL, right, storage is important and high performance with the storage is really important. what kind of things did you implement around storage to ensure that the, the SQL service would see, see the necessary IOPS for, for that? So,

[00:12:03] Roy Douber: so we, we, we did a full analysis on various storage technologies that are available to us.

[00:12:10] And even we're looking at some preview or future forward looking storage capabilities and landed on a multi rider style configuration that allows for the highest level of performance per cost for the client and still allows for the level of IOPS required so that it meets and exceeds the current Configuration, by by a significant margin.

[00:12:37] So when we ran our testing, not only did we run into certain configuration flags and certain optimizations that we made along the way for the client that improved the overall SQL posture. so everything was running faster. Their startup times were faster. we found one particular trace lag that we were able to enable that sped up, sped up, I don't remember if it was restore time or, the, the startup time by like 10 times, when, when you recover an instance.

[00:13:15] So we, we, we did, we, we were very thorough, with, with our analysis of the internal environment and that led to some serious savings. post the extensive, testing and validations that we ran for the client, even with their current environment, they were able to implement some of the, trace flags and changes and optimizations that we made in testing for the new environment.

[00:13:43] So a huge testament of success. building out this, and, and just looking at an older environment that just needed some love and caring.

[00:13:55] Jeff Dickman: Okay. So it sounds like when you're doing this kind of an activity where you're going to be, you know, upgrading your databases, changing your architecture, potentially even moving between clouds.

[00:14:03] one of the things that you want to have is a really solid testing plan to validate performance and take care of those things. would you say that you need to have a, a similar approach for the financial analysis of the project?

[00:14:16] Roy Douber: Oh, yeah. So it starts, I think everything starts with finance. 'cause at the end of the day, as a, as a large company with a large SQL estate, you wanna be able to go to your board and say, Hey, if I make this big investment to, to make this change, it has to be worth it in dollars and and cents.

[00:14:38] So that, that's a part of it. And, you know. Not only from a cost, so your cost is one piece, but then also exercising the system in production under various settings, under various load scenarios with load testing and Really ensuring that the end result is going to be something that, the, the, the team that manages this can support.

[00:15:06] First of all, that the operational toil is far, far lower. 'cause you know, so financial is one piece, but there's also soft costs. If, if my DBA team is, is working 12 hours a day, just trying to keep the lights on, then I'm not moving my customer storyline forward. so there, there is no incremental improvements that I can make to the underlying system or systems, for my clients so that they, so that the system continues to improve from a performance, from a value perspective.

[00:15:45] Across the board. that that was very much the case, but that that never shows up in the financial reporting, right? That's that's, often often turned, you know, soft cost. But we think that, you know, and your engineer sleeping at night is equally important to the, the financial savings. You want to retain the engineers.

[00:16:07] You want to keep them around. You want, you want, you want the engineers to work not just on operationally toiling tasks. You want them to be able to go and tinker with the system and improve performance and look at ways they can create additional value.

[00:16:26] Jeff Dickman: Yeah, for sure. Well rested engineers definitely get more done.

[00:16:29] So, so, so when we're talking about the cost savings, that we, we, we did an analysis and we, we came up with, with what those cost savings were going to be. sometimes when we do that, and we actually implement the solution, you know, the, the cost savings aren't always, aren't always the best. What we projected in this case, I think we exceeded expectations of it.

[00:16:49] can you, can you talk about how much we saved this customer through the course of this project? Yeah.

[00:16:54] Roy Douber: So, so this, this is, you know, over five years, this was In the tens of millions of dollars. so this is, this is awesome. This is an awesome exercise. So we mapped out all of their resource costing over time.

[00:17:12] We had a chart that show you basically a graph, a graphic that showed what would happen if they continue going down this path. what if they were to do a 12 month migration? And, there were two different configurations. one was, one was actually looking at using, you know, some kind of a data pop structure where they almost take things back on premise, which is something that they also considered.

[00:17:44] but it ended up that, you know, the benefits of cloud and, years invested in the cloud and terraforming and all the value that, you know, cloud provides it. You know, while it may not appear in, again, when you just look at, you know, a machine that you're spinning up anywhere in the world, I guess the, the, the cloud is very beneficial

[00:18:09] Jeff Dickman: and

[00:18:11] Roy Douber: having the flexibility of the cloud and being able to, spin, spin up SQL instances on demand for this client was very, very important, especially as their product is growing.

[00:18:25] all the time. Okay. So that's that. That was the decision. We that that was the decision we went with. And, I think the end forecast was roughly 30 million over five years.

[00:18:37] Jeff Dickman: Wow. That's pretty phenomenal. Yeah. So, so you, you talked a little bit about like the soft improvements, you know, not just the engineer sleeping, but, less time spent on, on what I like to call the tyranny of the urgent, you know, those, those outages or those issues or playing Tetris with.

[00:18:52] Yeah. With the data to make it all perform. Were there other operational improvements that the client experienced post migration?

[00:18:59] Roy Douber: So we, so the client had a DR strategy. Uh, I don't know, it felt like they don't, they didn't exercise it enough for it to be a property, our strategy. So we revamped that as well as a part of this.

[00:19:15] and obviously we mentioned before there was faster recovery times. More optimized resource usage. So really, you know, the noisy neighbor issue that we discussed should happen far, far less if you have a larger server deck. Kind of blunt the load, across various, various clients. and then from an administrative overhead, right?

[00:19:41] Like, like we mentioned, I mean, I got a chance to look at the client's page or do the, alerts and how often the, the, the, the DBA team was being paged, and it was basically. If you took the number of DBAs and the number of hours, it was [00:20:00] more than a working week. of, of, of paging and escalation. So no chance to improve and always operating with their hair on fire.

[00:20:09] And, ideally with this new configuration, a lot of that should go away. we'd expect it's a much simpler. it's a much simpler configuration and therefore, you know, keeping it simple, I think, is, is, is very valuable in this scenario, especially with a large estate, and they're going to reap the benefits.

[00:20:31] We believe that, once this is fully rolled out, they're going to experience less than half the amount of toil they currently experience.

[00:20:41] Jeff Dickman: Okay, cool. So you also mentioned just a little bit ago that, There's a lot of benefits to using Google for, for this and within GCP, taking advantage of services that are there.

[00:20:53] Did you use GCP's on demand scaling for this project? And if so, how did it benefit things?

[00:21:00] Roy Douber: Yeah, so, so, on the some on demand scaling, definitely more on, more on the storage side, some of the latest, latest and greatest services like GCBDR for backup and recovery, some of their, latest SSD, PDSSDs that are configurable for IOPS, were leveraged.

[00:21:31] And we implemented, high availability and data integrity. And we were looking at, kind of what is the best possible scenario for the client from a, from a failover perspective. from a startup perspective, there are various configurations that we were looking at at any given point and kind of from a, from a GCP perspective, we were able to leverage some technology and some we were not.

[00:22:04] So, for example, because it's in memory databases, we could not use Cloud SQL. but. We, we, you know, we could use GCE and we could, and, and that works just as well. there's also a dependency on windows. So cloud SQL currently runs on Linux. So that kind of, you know, impacted the decision making that we had to make along the way.

[00:22:34] So that was also looked at as a potential option, but it was just not chosen. and the multirider technology. is, is, is, isn't, isn't kind of a really new feature, in preview right now, I believe. And we were leveraging that and we were also using some under the hood Windows technology that was required because of the, the choices that were made along the way.

[00:23:06] This is also, this is also a relatively old application, right? So there's certain constraints that you have, I mean, anybody that's running, A SAAS application at scale had to make some choices along the way. And sometimes you have to stick to those choices. You don't have the option of refactoring the code today or.

[00:23:30] In the next couple of years, even. And so you, you stick with the choices and, optimize against those choices. And that's, that's exactly what we did.

[00:23:41] Jeff Dickman: Yeah. And you know, when you talk about that, right, the, the, the idea that you had these guardrails or these boundaries that were around, you know, sort of what you could do with the application, you couldn't use cloud SQL because you needed.

[00:23:52] You know, capabilities that only exist in SQL for Windows and, it had to be in memory. And so, you know, you have all those pieces and yet you [00:24:00] were able to still achieve a forecasted 30 million savings over five years with that is from a, from a financial standpoint, pretty awesome. Right. how do you, how do you compare that to the pieces of this that would make it technically awesome?

[00:24:15] Like what was. In your opinion, the most innovative thing that you did within this SQL solution that you built.

[00:24:25] Roy Douber: so I think, I think, just simplifying, right. I really think that, the original state things were, were kind of The bespoke best practice, but the reality is that, sometimes simple is better.

[00:24:49] so technically awesome, you know, the users were experiencing issues. the customer experience wasn't great. the, there were a lot of outages on unplanned outages for the clients. And by keeping things simple, we were, we're going to, we're, we're working towards alleviating all of that and creating a more awesome environment for it, for the clients.

[00:25:19] And I think that's the most important thing. If, if, If the client senses that every time they log into the system, it's clunky or slow, then, then you've, you, over time, you've lost the confidence of the client. And so I think, I think that's what we achieved. We went from, an architecture that on paper should be the best possible architecture, but realistically was not.

[00:25:46] In this specific use case, and we architected this something that will be simple to maintain, grow into, and continue leveraging the future roadmap in [00:26:00] GCP to continuously reduce costs over time, but there's some future roadmap. coming to GCP. That's going to reduce the cost of this technology even further.

[00:26:12] Jeff Dickman: Wow. Very cool. Yeah. So you mentioned that you did automation around this, right? That there's there's definitely automation in play in this. How did you enhance the deployment process for the solutions? Automation? Like what did you automated? What tools did you use to do that automation?

[00:26:28] Roy Douber: Yeah. So, so, again, the application relatively old, it, we, they're using, desired state configuration, which is a PowerShell kind of add on, to manage it, which is still very widely used.

[00:26:45] It's not a, not a bad tool necessarily. It's just less cloud native. and so the, the move, the move that we made here is moving to everything towards more Ansible. But So Terraform and Ansible were tool choices that we made. we're also leveraging Nomad for some of the, server builds. So image, images.

[00:27:14] and it'll be a much more cloud native, cloud friendly automation process moving forward. In some cases, because there's a timeline, right, and we're trying to save money, we had to wrap some of, we had to wrap some of the automation scripts. With Ansible, with the idea that in the future, we're going to refactor, but we tried to take as much of this as, as a first principles approach.

[00:27:45] So we remove what we felt was not needed and just did the basic, the most basic spin up so we could get up and running and online and achieve the, financial cost savings that we're shooting for. So, okay. Yeah. The longer

[00:28:03] Jeff Dickman: a day.

[00:28:04] Roy Douber: Go

[00:28:04] Jeff Dickman: ahead. Yeah, it sounds like as you did this, you, you sort of planned for the future and you've, you've ensured that there's going to be some scalability and some flexibility for the customer in the future as they, as new services are released in Google and, and as, you know, maybe they, they evolve their application.

[00:28:22] can you, can you talk a little bit more about that?

[00:28:26] Roy Douber: Yeah, so, so, so, the customer is going through a bit of a renaissance, right? As I think many, other companies are going through, there's a, a big platform engineering push. so, so they're, they're, they're hiring SREs and platform engineers and they're building, they're building internal resources as a service internally, which was just not the case up until this point.

[00:28:53] And we were trying to fit. The solution for all those, all those goals, the customer was, pushing towards, so working with a client to understand exactly what they've got in their ecosystem, not just from a SQL perspective, but holistically, right. we're, we're going down the backstage route. We're going down the, you know, We want to use Ansible, we want to use Terraform, we want to use, Terraform Cloud.

[00:29:24] how do we, how do we build this in such a way that it's maintainable, but it's also not so bespoke so that, you know, a new, a new engineer can come in from another company that has kind of these, you know, relatively prevalent technologies and come and get to work, which, you know, was not the case before.

[00:29:44] So again, simplifying, using, using, kind of established processes, established practices in the cloud, using established tooling in the cloud, bringing things to a point where it's no longer a question [00:30:00] of how you deploy this, this application, which, you know, again, we're, we're, we've improved their buildup startup time by 6 to 10 times over, through this process.

[00:30:12] It's, you know, we're completely, we're always, we were always thinking about, you know, the future engineers, those engineers that are new, the junior engineer coming in from another company or from school, what tools would they like to use? how could, how could they, you know, without having to train them on kind of technologies that were going out of favor or even looking at some of these older scripts that existed in the environment that nobody even understood what they do anymore because there's been, you know, either churn or people have just moved on from those scripts, but they're still being used in a production setting.

[00:30:55] why, why was this flag enabled? Why was this? Particular piece installed here. Nobody knew. So first principle approach, I'm a, I'm a big fan of deleting as long as things continue to work, delete, delete, delete until it breaks. that's, that's essentially what was done here. I mean, we, we didn't, oftentimes we didn't even look at the old code and we just, planned ahead, had the conversation of what, what was required rather than what was not required.

[00:31:31] Jeff Dickman: And sure. Yeah. So, so as you, it sounds like as you went through this project, that was, some of the things that you had to figure out was like what worked, what didn't work, what was just old legacy, kind of like Klingons in the code that you had to had to deal with in order to get things working and some of that I'm sure helped improve.

[00:31:51] Performance as well as you were doing startup and failover and stuff like that, because those things didn't have to be executed anymore as they were removed. so [00:32:00] one of the things that I love about projects is that there's always, you know, there's always a learning curve and there's always ways that we, we can evaluate these projects to, to do better, you know, in the project and next time for the next project.

[00:32:13] So, within this project, were there any unique insights that you gained? Around the project and you know how the project meant and things that you thought were valuable that you would recommend to other other folks is there. Looking to maybe do similar projects around Microsoft SQL and implementing it on Google.

[00:32:31] Roy Douber: Yeah. So, think keep things simple, right? run through, create a very thorough testing and validation plan. leverage cloud native solutions as much as possible, right? But as long as they fit into this, the final, the final, the final, the end solution that you're trying to implement, right? You can't always, you can't always get what you want, right?

[00:33:02] Unfortunately, you know, so, so in some cases you have some constraints that you have to work with. and then choose the right solutions, storage solutions, configurations that fit the workload, right? I think

[00:33:23] Oftentimes, as engineers, as architects, we always like the cool, shiny things. sometimes the cool and shiny things is not the right answer for, for an enterprise, large environment. Sometimes it might be, the dollar for, for, the path more well traveled, that might not be as, as, as, as common, or, or as commonly recommended, or the newest shiny solution that came, [00:34:00] came out of the labs in the last couple of years.

[00:34:03] it might be the more tried and true.

[00:34:06] Jeff Dickman: Yeah, as a geek, as a geek, it's hard for me to agree with that, but it's absolutely true that, you know, the, the tried and true tooling and capabilities are sometimes the right choice. while you wait for the newer stuff to sort of mature and you'll get rounded out and so the rough edges are a little bit more gone, but sometimes it can be painful because implementing the cool stuff is always really fun.

[00:34:31] Roy Douber: And you know, there's there's a small skunkworks projects that really fit for some of those some of those new cutting edge technologies or bleeding edge technologies. Yeah, there and eventually maybe they end up a large, you know, a large estate. That, that does all the magic, but, but, generally speaking, it's not, it's not, those would not be an overnight success, right?

[00:34:58] Like there's bugs that need to be worked out. There's configurations that are not well documented. And so in this specific case, I think, with the old configuration, the client kind of bit off a bit more than they could chew. And, I'm glad that together we were able to come to a decision that, was more reasonable from a toil perspective, not from, from a toil, from a cost, from a value to the client perspective.

[00:35:25] Jeff Dickman: Yeah. So, No project goes on without issues or problems. Were there any lessons learned that you had on this project? Things that were like totally unexpected, but you know, if you ever do it again, you're going to, you're going to think about,

[00:35:41] Roy Douber: you know, we, again, we started with many solutions, some on the bleeding edge, some on the cutting edge.

[00:35:51] I think we ran into issue. We vetted technologies that just did not work, right? Like, we believed that they would work. And when it came down to exercising the solution, we realized that, you know, the in memory databases became a constraint on the storage solutions that we picked, right? So we had to, we had to scrap them.

[00:36:17] so, I think, I think it's, I think there is a lot of lessons to be learned through any projects. There's a lot of failures. and your best bet is to have, you know, multiple configurations that you're going to vet and eventually land on one that works or at least one, maybe a couple, and then pick from the couple.

[00:36:43] So we never came in with the notion that. Everything we were going to propose, we're going to work. I mean, this is a unique problem for a unique client with unique specifications. You, you, you build and you discover as you go. It's not, if it were easy, the client wouldn't have, wouldn't have requested any assistance whatsoever.

[00:37:06] And the engine, the very smart engineers, the client have would have resolved this issue a long time ago, but they, It came to us because we had some know how and we, we've had some, we've had some expertise with these problems and then, you know, they could, they could lever and together putting our brains together to solve that problem was ultimately way more beneficial for all of us.

[00:37:29] Yeah. Everybody.

[00:37:31] Jeff Dickman: Yeah. Do you have any pro tips for, for architects or engineers that might be taking on a project like this?

[00:37:37] Roy Douber: Yeah, so

[00:37:38] if it's at this scale when you're talking, a massive estate cost-benefit analysis. Before any migration test, test, test, automate as much as you can. You don't want to be doing anything manual.

[00:37:57] and, I don't [00:38:00] believe, you know, humans are good at very, at many things, but, we're very prone to errors. So automate, automate all the time. any, any, any sub process or any subsystem that you can should be automated. And then, be adaptable, right? regularly review and optimize configurations and adapt to any change requirements or any changing, roadmaps from the hypervisors that you're working in, because you know, ultimately, that adds the most value for your customers or for your company.

[00:38:45] Jeff Dickman: Yeah. Okay, cool. Yeah. Those are, those are great, great pro tips. so the, the, the migration that we went through. With this customer was taking their, their legacy SQL environment, and, and optimizing it with, Failover cluster, implementing automation for SQL. we did a huge amount of financial analysis on this to make sure that we're implementing the right solution.

[00:39:09] and then in addition to that, the, the technical analysis to make sure that the right technology was being put into place, as well as the, the test plan that was done for that was all, was all pretty. pretty heavy from a, lifting standpoint and from a, from a workload standpoint. how many just, you know, rough estimate, how long did this project run to get from the point where we started having conversations to the point that, we actually implemented with the customer?

[00:39:39] Roy Douber: I think we're, we're coming up. It's still ongoing obviously, but I think it's coming up on roughly a little bit less than a year, 10 months or so. Okay, very, very thorough. Nothing like this can be done in a vacuum or quickly.

[00:39:58] Jeff Dickman: Yeah. Yeah. So it sounds like a lot of due diligence went into this. And, I imagine we spent a lot of time, you know, collaborating with the customer and presenting to their executive leadership on the plan.

[00:40:09] potentially what they were going to see from a performance and a cost standpoint, so that the business impacts would be fully understood there as well. hey, Roy, I got a question for you. So, e360 does a lot with cloud. where can potential customers or anyone interested go to, to find out more about what e360 does or get a demo of the capabilities that we offer?

[00:40:34] Roy Douber: Obviously, e360.com has a thorough, list of, of services we can offer from DevOps to cloud architecture to, bespoke software engineering, whatever, whatever your needs are, we generally have an answer. And if we don't, then that's okay too, we'll, we'll find somebody that we can forward you to. but we could do a lot, and, and, We encourage you guys to explore everything that we have to offer, including, you know, our SQL, SQL migration, or cloud migration, engineering teams are fantastic.

[00:41:15] And we've done quite a few of these now. this is, this being one of the larger ones. And, also we now have a tech sessions on YouTube and, and, and a LinkedIn, believe it's e360 pulse. You can see where we have, some tech, tech news that we release. and you can also follow this podcast for more insights and updates on the latest and greatest in technology, or in this case, maybe.

[00:41:46] The kind of tried and true technology that you may want to use in your day to day.

[00:41:54] Jeff Dickman: Awesome. Hey, Roy, thanks for your time today. I appreciate you walking through this project and the things that, that we're done on it and how we helped improve efficiency and costs for our customer. I think it's been really fantastic. And, to our listeners, thank you for hanging in there and listening with us as we covered all of this.

[00:42:09] This has been another tech sessions podcast from e360. And, I hope you have a great day.

[00:42:16] Roy Douber: Thank you.

View full post