Data Lakes For Dummies. Alan R. Simon
Читать онлайн книгу.are employees happy working here?
Descriptive, diagnostic, predictive, and discovery analytics all help you gain valuable insights into different aspects of your organization, its performance, possible risks, and much more. However, you need more than insights! You need to drive those insights into decisions and actions.
Prescriptive analytics is a relative newcomer into the overall analytics continuum. “Wait a minute!” you may be thinking. “I’ve been making decisions and taking actions for a long time!” The “secret sauce” of prescriptive analytics, however, is making those decisions and taking those actions with a healthy assist from your organization’s data being fed into increasingly sophisticated analytics. And yes, you guessed it: Your data lake will play a starring role in driving prescriptive analytics. So, your data lake will help you with the following scenarios:
Based on market forecasts and the overall economy, you need to cut approximately 10 percent of your headcount. What are your options? How do you get the work done? Can you shift some of the work to lower-cost contractors? Should you try a voluntary early retirement program to reduce the number of involuntary terminations? Name four or five scenarios with all the data and all the trimmings!
Then, out of those four or five scenarios, which one is “best” and why? Are there any downside surprise risks you should be aware of?
Table 2-1 shows you the relationship between the easy-to-understand questions and the more formal names you’ll use as you plan your data lake.
TABLE 2-1 Matching Analytics and Business Questions
Question | Type of Analytics |
---|---|
What happened? | Descriptive analytics |
Why did it happen? | Diagnostic analytics |
What’s happening right now? | Descriptive analytics |
What’s likely to happen? | Predictive analytics |
What’s something interesting and important out of this mountain of data? | Discovery analytics |
What are our options? | Prescriptive analytics |
What should we do? | Prescriptive analytics |
Mapping your analytics needs to your data lake road map
Jan, your CPO, is thrilled with the work that Raul and his team have done compiling the HR analytics continuum. They’ve produced an exhaustive list of more than 500 analytical functions that will be supported by the data lake, covering the broad continuum from simple “What happened?” descriptive analytics through more than a dozen complex prescriptive analytics scenarios.
Now what?
As you might guess, that 500-plus master list of HR analytics isn’t going to be available the first day your data lake goes operational. A data lake is built in a phased, incremental manner, probably over several years.
But where to start?
In Chapter 17, I show you how to build your road map that will take you from your first ideas about your data lake all the way through multiple phases of implementation.
Your data lake road map should be driven by your organization’s analytical needs rather than by available data. You should address your highest-impact, highest-value analytics needs first, for two reasons:
You need the initial operating capability (IOC) of your data lake to come with some “oomph.” In other words, you want people across your organization to sit up and take notice that the data lake is, from its first days, providing some really great analytics.
You want to build your data lake using a “pipeline” approach that not only loads your data lake with lots of data but carries that data all the way through to critical business insights.
Building the best data pipelines inside your data lake
A data pipeline is an end-to-end flow of data from the original sources all the way to the end users of analytics. Figure 2-7 shows a data pipeline overlaying the journey from source systems, through the data lake’s bronze zone (the home of raw data), through the cleansing of that data into the silver zone, into the gold zone that consists of curated “packages” of data, and then finally to the users who consume the data-driven insights.
FIGURE 2-7: A data pipeline into, through, and then out of the data lake.
You can think of a data pipeline in the same context that you may think of shopping. Suppliers sell and ship their products to wholesalers, who then resell and ship some of those products to a wholesaler. The wholesaler then resells and ships the products yet again to a retailer, which is where you come to buy whatever it is that you’re looking for. Figure 2-8 shows how this paradigm can apply to data pipelines within a data lake.FIGURE 2-8: An easy way to understand data pipelines and data lakes.
Addressing future gaps and shortfalls
Your road map is only the beginning of your data lake journey. You may think you have a pretty good idea of what your data and analytical needs are over the next couple of years, and you do a good job of prioritizing the various phases of how your data lake will be built.
The world is constantly changing, though, which means that the farther out your data lake road map stretches, the more likely it is that any given phase will be preempted by changing priorities and new analytical needs.
As your organization’s analytical needs evolve and — hopefully — become more sophisticated over time, you’ll continually adjust your data lake plans to reflect the real world.
Think of a data lake as a living entity that is subject to constant change. Remember that century-long life span of a U.S. Air Force B-52, with changing missions over the years being addressed by constantly incorporating new technology to extend the plane’s value.
Speedboats, Canoes, and Lake Cruises: Traversing the Variable-Speed Data Lake
You can stream all kinds of data into your data lake as quickly as that data is created in your source applications.