Data Analytics

For Next-Generation Data Analytics, Go Back To The Basics

Pinterest LinkedIn Tumblr

We all know that Silicon Valley is always trying to build a better mousetrap, but often, that mousetrap is flawed. Maybe it improves one feature but breaks three others in the process. Or perhaps the new mousetrap is just too complicated to deploy at scale. What’s even more frustrating? The existing mousetrap usually works just fine — maybe even better than the new one.

Low-code Application Development Company

Sometimes innovation happens for innovation’s sake rather than to meet a real business need. And Silicon Valley is notorious for “drinking its own Kool-Aid.” But the most powerful tool for guiding a business is one every company already has.

Structured data makes up the vast majority of information that businesses need to make critical business decisions. There are no bells and whistles — just the data that companies collect every day using the systems that create the corporate backbone, from manufacturing to marketing and from operations to sales to finance.

So if Silicon Valley companies want to be truly revolutionary, they should reference the playbook of consumer packaged goods (CPG), financial services and manufacturing companies. In other words, they should go back to the basics.

All of these traditional companies have one thing in common: They never veer from the best practices that have served them well for decades. What makes them successful is that they all adhere to three main principles:

1. Agree on an internal system of record.

It’s a given that you’re going to have data — and yes, even the same data — stored in different departments. For example, sales, marketing and support departments all collect and store customer data, and some of it is redundant.

That’s OK if you have one central location that holds all of the data (i.e., one system of record). That one system is your company’s “secret sauce” on which every department in the company can agree.

2. Consolidate data in a cloud data warehouse.

Everyone’s extolling the virtues of MongoDB and Elasticsearch, but that’s like buying a Ferrari when you’re only planning to drive the speed limit. It’s “too much car” for the purpose. 

They’re optimized for all of the new data types and sources that you really don’t need if you’re storing and analyzing structured data. So, you’d be optimizing for the wrong thing (stay tuned for more on that topic in my next piece). Doesn’t it make more sense to optimize for 90% of the data that’s valuable than for the 10% that’s not? 

Another way to look at it is that it’s easier to start with a rigid structure and dial that down a bit than to start with chaos and attempt to impose order. To be honest, it’s next to impossible. This means it’s easier to start with a data warehouse that is highly structured, like a relational database management system (RDBMS), and then work in semi-structured data.

Even semi-structured data needs to have some structure imposed on it to make it useful, and the most popular and powerful RDBMSs (e.g., Snowflake, BigQuery and RedShift) can all handle semi-structured data fairly easily.

3. Create a solid dimensional schema.

This final step is really only a requirement for companies that have begun to scale their use of analytics and want to enable a wide range of users across the company to consume data. At this stage, companies need to define a common data model, which essentially gets everyone on the same page. 

Each standard table (or view) in the common model contains a set of columns (or dimensions) with preprocessed, deduplicated and clean data, which acts as a starting point for everyone’s custom analyses and dashboards.

For example, analysts in several departments might be interested in analyzing your customers for a variety of reasons. Information about customers comes from several different sources and must be merged and cleaned before it is ready to analyze. You don’t want multiple analysts to duplicate their work and come up with different representations of a customer.

Instead, you want the merging and cleaning to be done once by a core team of analysts, resulting in a common customer table or view that the rest of the analysts in the company can use as a starting point.

So while data engineering may be fashionable, 99% of the time, it’s just not needed. What you do need 99% of the time is not just a good data warehouse, but the right data warehouse to drive your analytics-driven business decisions.

ThinkDataAnalytics is a data science and analytics online portal that provides the latest news and content on AI, Analytics, Big Data, Data Mining, Data Science, and Machine Learning. A team of experts with extensive experience in the field runs