Data Lakes For Dummies. Alan R. Simon
Читать онлайн книгу.interest. Here’s what the icons mean:
These are the tricks of the data lake trade. You can save yourself a great deal of time and avoid more than a few false starts by following specific tips collected from the best practices (and learned from painful experiences) of those who preceded you on the path to the data lake. Data lakes are often filled with dangerous icebergs. (Okay, bad analogy, but you hopefully get the idea.) When you’re working on your organization’s data lake efforts, pay particular attention to situations that are called out with this icon.
If you’re more interested in the conceptual and architectural aspects of data lakes than the nitty-gritty implementation details, you can skim or even skip material that is accompanied by this icon.
Some points are so critically important that you’ll be well served by committing them to memory. You’ll even see some of these points repeated later in the book because they tie in with other material. This icon calls out this crucial content.
Beyond the Book
In addition to the material in the print or e-book you’re reading right now, this product comes with a free Cheat Sheet for the three types of data for your data lake, four zones inside your data lake, five phases to building your data lake, and more. To access the Cheat Sheet, go to www.dummies.com
and type Data Lakes For Dummies Cheat Sheet in the Search box.
Where to Go from Here
Now it’s time to head off to the lake — the data lake, that is! If you’re totally new to the subject, you don’t want to skip the chapters in Part 1 because they’ll provide the foundation for the rest of the book. If you already have some exposure to data lakes, I still recommend that you at least skim Part 1 to get a sense of how to get beyond all the hype, buzzwords, and generalities related to data lakes.
You can then read the book sequentially from front to back or jump around as needed. Whatever path works best for you is the one you should take.
Part 1
Getting Started with Data Lakes
IN THIS PART …
Separate the data lake reality from the hype.
Steer your data lake efforts in the right direction.
Diagnose and avoid common pitfalls that can dry up your data lake.
Chapter 1
Jumping into the Data Lake
IN THIS CHAPTER
Defining and scoping the data lake
Diving underwater in the data lake
Dividing up the data lake
Making sense of conflicting terminology
The lake is the place to be this season — the data lake, that is!
Just like the newest and hottest vacation destination, everyone is booking reservations for a trip to the data lake. Unlike a vacation, though, you won’t just be spending a long weekend or a week or even the entire summer at the data lake. If you and your work colleagues do a good job, your data lake will be your go-to place for a whole decade or even longer.
What Is a Data Lake?
Ask a friend this question: “What’s a lake?” Your friend thinks for a moment, and then gives you this answer: “Well, it’s a big hole in the ground that’s filled with water.”
Technically, your friend is correct, but that answer also is far from detailed enough to really tell you what a lake actually is. You need more specifics, such as:
How big, dimension-wise (how long and how wide)
How deep that “big hole in the ground” goes
How much variability there is from one lake to another in terms of those length, width, and depth dimensions (the Great Lakes, anyone?)
How much water you’ll find in the lake and how much that amount of water may vary among different lakes
Whether a lake contains freshwater or saltwater
Some follow-up questions may pop into your mind as well:
A pond is also a big hole in the ground that’s filled with water, so is a lake the same as a pond?
What distinguishes a lake from an ocean or a sea?
Can a lake be physically connected to another lake?
Can the dividing line between two states or two countries be in the middle of a lake?
If a lake is empty, is it still considered a lake?
If one lake leaves Chicago, heading east and travels at 100 miles per hour, and another lake heads west from New York … oh wait, wrong kind of word problem, never mind… .
So many missing pieces of the puzzle, all arising from one simple question!
You’ll find the exact same situation if you ask someone this question: “What’s a data lake?” In fact, go ahead and ask your favorite search engine that question. You’ll find dozens of high-level definitions that will almost certainly spur plenty of follow-up questions as you try to get your arms around the idea of a data lake.
Here’s a better idea: Instead of filtering through all that varying — and even conflicting — terminology and then trying to consolidate all of it into a single comprehensive definition, just think of a data lake as the following:
A solidly architected, logically centralized, highly scalable environment filled with different types of analytic data that are sourced from both inside and outside your enterprise with varying latency, and which will be the primary go-to destination for your organization’s data-driven insights
Wow, that’s a mouthful! No worries: Just as if you were eating a gourmet fireside meal while camping at your favorite lake, you can break up that definition into bite-size pieces.
Rock-solid water
A data lake should remain viable and useful for a long time after it becomes operational. Also, you’ll be continually expanding and enhancing your data lake with new types and forms of data, new underlying technologies, and support for new analytical uses.