The pandemic has rapidly accelerated three important technology trends: 1. cloud migrations, 2. security related to those migrations and 3. data-driven decision making and customer experiences. Let’s focus on the third of those initiatives because, of the three, serious progress in the category of data analytics has been extremely slow.
In the world of analytics infrastructure, it’s time to stop playing on the margins and instead pursue a major leap forward. We can and must do better than what we’re accepting today as “normal.” I’ve built my career around databases and data infrastructure for the past 30 years and am shocked to see just how frustrated the industry is when it comes to fully leveraging data.
A new study commissioned by Dremio and produced by Wakefield Research found the vast majority of respondents were using at least a data warehouse (84%) with about half using a data lake (51%); many are actively using both (37%). Presumably, all of them are attempting to empower business users, but only 28% report that it is “very easy” for their end users to develop insights on their own. This is no doubt frustrating to both the data engineering teams and the business data consumers, and why wouldn’t it be? It’s not for lack of spending. 55% say their company is spending more on storing/using data than they were two years ago, but only 22% of those who increased spend believe they have seen a fully realized return on that investment. Of course they’re frustrated. One could even say “desperate” when 63% of respondents acknowledged that they have thrown money at bad investments to try and improve their analytics situations.
My experience is that the primary culprit behind these unrealized expectations is complexity. I’ve seen it from early in my career right through to the present day. What else could explain such an incredibly low bar for expectations around something as basic as data freshness? Only 16% of respondents said they expect “same-day freshness” in their data set. This is in a world where everything is happening at machine speed. What have they settled for instead? Sadly, 51% say they expect “fresh” data within weeks or longer.
Under the covers, data copying and movement are often nefariously chipping away at timelines and efficiency. An overwhelming 80% of respondents said that Extract Transform and Load (ETL) times are underestimated when it comes to project planning, leading to delays and failed business objectives. Copies make matters worse. 60% said they have 10 or more copies of the same datasets floating around to satisfy various analytics needs. This, of course, introduces security and governance issues, and it also directly impacts business decisions. Just over 80% reported that data analysts have used inconsistent versions of what should have been the same data set in their decision making. With long-running ETL jobs and proliferating copies, it’s not a mystery as to why.
Yet with all those frustrations, few believe we can stop investing. In fact, 79% of data leaders report being concerned with ongoing costs of scaling their infrastructure. Remember, this is on top of what I mentioned earlier about throwing additional money into existing bad investments. Vendor lock-in and closed systems account for 76% of leaders’ concerns, limiting their ability to explore new and innovative solutions.
As the old saying goes, “Everyone complains about the weather, but nobody does anything about it!” For those of us who have been around this industry for a while, these concerns are not new. Seeing them in quantitative form is unsettling, but we all know there have been long-standing problems that have only gotten marginally better over time. We need them to get a lot better – and fast. Is that possible? I believe it is.
Perhaps an analogy is a good place to start. As the application world moved towards native internet and mobile applications, there was a massive shift in application architectures. We went from a world of client/server to microservices. In that latter world, things became more modularized and more open. No longer did a developer have to make changes to a massive monolithic code base to add functionality or enhance an existing feature. The nature of cloud resources allowed for radical improvements to application performance and availability SLAs, but that was only possible because we embraced a fundamental architecture change underneath.
Data architectures need to be revolutionized in the same way.
The most fundamental of these changes comes in seeing the data layer as its own, first-class tier in the architecture diagram. No longer should we focus on bringing data to the service (i.e. putting the data inside a data warehouse); instead we should bring the analytics and business intelligence services directly to the data (data lakes). This represents a big step forward in rethinking the accessibility, manageability and flexibility of using our data. Imagine a world where your data lands in one place, and from there is accessible by n-number of services, each accessing the data through open standards. Once you begin with an architecture like this – an “open data architecture” – numerous innovations and possibilities emerge, all geared towards faster time to data access, lower costs, more flexibility, and better governance.
Sounds dreamy, I know. But that’s how innovation works – by dreaming big and not accepting the status quo of the discouraging stats presented in this article. And if you start taking a look around at what leading data companies are doing, you may realize that this dream is becoming a reality quicker than you might think.