Data Science For Dummies. Lillian Pierson
Читать онлайн книгу.target="_blank" rel="nofollow" href="#fb3_img_img_78bb0556-9878-5c63-a778-afdab112c607.png" alt="Remember"/> Mathematics uses deterministic methods to form a quantitative (or numerical) description of the world; statistics is a form of science that’s derived from mathematics, but it focuses on using a stochastic (probabilities) approach and inferential methods to form a quantitative description of the world. I tell you more about both in Chapter 4. Data scientists use mathematical methods to build decision models, generate approximations, and make predictions about the future. Chapter 4 presents many mathematical approaches that are useful when working in data science.
In this book, I assume that you have a fairly solid skill set in basic math — you will benefit if you’ve taken college-level calculus or even linear algebra. I try hard, however, to meet readers where they are. I realize that you may be working based on a limited mathematical knowledge (advanced algebra or maybe business calculus), so I convey advanced mathematical concepts using a plain-language approach that’s easy for everyone to understand.
Deriving insights from statistical methods
In data science, statistical methods are useful for better understanding your data’s significance, for validating hypotheses, for simulating scenarios, and for making predictive forecasts of future events. Advanced statistical skills are somewhat rare, even among quantitative analysts, engineers, and scientists. If you want to go places in data science, though, take some time to get up to speed in a few basic statistical methods, like linear and logistic regression, naïve Bayes classification, and time series analysis. These methods are covered in Chapter 4.
Coding, coding, coding — it’s just part of the game
Coding is unavoidable when you’re working in data science. You need to be able to write code so that you can instruct the computer in how to manipulate, analyze, and visualize your data. Programming languages such as Python and R are important for writing scripts for data manipulation, analysis, and visualization. SQL, on the other hand, is useful for data querying. Finally, the JavaScript library D3.js is often required for making cool, custom, and interactive web-based data visualizations.
Although coding is a requirement for data science, it doesn’t have to be this big, scary thing that people make it out to be. Your coding can be as fancy and complex as you want it to be, but you can also take a rather simple approach. Although these skills are paramount to success, you can pretty easily learn enough coding to practice high-level data science. I’ve dedicated Chapters 6 and 7 to helping you get to know the basics of what’s involved in getting started in Python and R, and querying in SQL (respectively).
Applying data science to a subject area
Statisticians once exhibited some measure of obstinacy in accepting the significance of data science. Many statisticians have cried out, “Data science is nothing new — it’s just another name for what we’ve been doing all along!” Although I can sympathize with their perspective, I’m forced to stand with the camp of data scientists who markedly declare that data science is separate, and definitely distinct, from the statistical approaches that comprise it.
My position on the unique nature of data science is based to some extent on the fact that data scientists often use computer languages not used in traditional statistics and take approaches derived from the field of mathematics. But the main point of distinction between statistics and data science is the need for subject matter expertise.
Because statisticians usually have only a limited amount of expertise in fields outside of statistics, they’re almost always forced to consult with a SME to verify exactly what their findings mean and to determine the best direction in which to proceed. Data scientists, on the other hand, should have a strong subject matter expertise in the area in which they’re working. Data scientists generate deep insights and then use their domain-specific expertise to understand exactly what those insights mean with respect to the area in which they’re working.
The following list describes a few ways in which today’s knowledge workers are coupling data science skills with their respective areas of expertise in order to amplify the results they generate.
Clinical informatics scientists combine their healthcare expertise with data science skills to produce personalized healthcare treatment plans. They use healthcare informatics to predict and preempt future health problems in at-risk patients.
Marketing data scientists combine data science with marketing expertise to predict and preempt customer churn (the loss of customers from a product or service to that of a competitor’s, in other words). They also optimize marketing strategies, build recommendation engines, and fine-tune marketing mix models. I tell you more about using data science to increase marketing ROI in Chapter 11.
Data journalists scrape websites (extract data in bulk directly from the pages on a website, in other words) for fresh data in order to discover and report the latest breaking-news stories. (I talk more about data storytelling in Chapter 8.)
Directors of data science bolster their technical project management capabilities with an added expertise in data science. Their work includes leading data projects and working to protect the profitability of the data projects for which they’re responsible. They also act to ensure transparent communication between C-suite executives, business managers, and the data personnel on their team who actually do the implementation work. (I share more details in Part 4 about leading successful data projects; check out Chapter 18 for details about data science leaders.)
Data product managers supercharge their product management capabilities with the power of data science. They use data science to generate predictive insights that better inform decision-making around product design, development, launch, and strategy. This is a classic type of data leadership role, the likes of which are covered in Chapter 18. For more on developing effective data strategy, take a gander at Chapters 15 through 17.
Machine learning engineers combine software engineering superpowers with data science skills to build predictive applications. This is a classic data implementation role, more of which is discussed in Chapter 2.
Communicating data insights
As a data scientist, you must have sharp verbal communication skills. If a data scientist can’t communicate, all the knowledge and insight in the world does nothing for the organization. Data scientists need to be able to explain data insights in a way that staff members can understand. Not only that, data scientists need to be able to produce clear and meaningful data visualizations and written narratives. Most of the time, people need to see a concept for themselves in order to truly understand it. Data scientists must be creative and pragmatic in their means and methods of communication. (I cover the topics of data visualization and data-driven storytelling in much greater detail in Chapter