Smarter Data Science. Cole Stryker
Читать онлайн книгу.organization may not have any indication as to the actual consumer of the product, representing a lack of depth.
Breadth and depth can be approximated in terms of percentages and mapped to an intersection. Figure 2-2 shows an example where a breadth sliver is shown to be approximately 75%, and a depth sliver is approximately 25%. The third box combines the breadth and depth slivers together.
In Figure 2-3, the quality of the information is graded against the breadth and depth. The diamond grid pattern indicates that the data quality is known to be poor. The diagonal stripes pattern indicates that the data quality is moderate, which means that the information in specific conditions may prove to be unreliable, while the square grid pattern is used to indicate that the information is of high quality and is reliable.
Figure 2-3: Grading
Therefore, even if the breadth and depth are not both 100%, the available data can be graded in the context of the information that is at hand.
Across the overall trust matrix, if the information for a particular need could be measured in terms of breadth and depth for each aspect across each time horizon and then graded, a person or a machine could evaluate an aspect of risk in terms of consuming information. Being able to accurately quantify a risk in terms of how much is known and at what level of detail, an organization can pursue being data-driven with confidence, knowing that all subsequent actions or decisions are being made on the merit of the data.
Furthermore, a data-driven organization using its data as critical evidence to help inform and influence on strategy will need a means to weigh options against any inherent risk. A data-driven organization must develop an evidence-based culture in which data can be evaluated against a means to establish trust and that the analytics and AI performed against the data is deemed to be highly relevant, informative, and useful in determining next steps.
The Importance of Metrics and Human Insight
For organizations that make gut-feel decisions and are apprehensive about pursuing data-driven means, the ability to measure is vital. The ends and means model shown in Chapter 1, “Climbing the AI Ladder,” Figure 1-4 illustrates the necessity to balance what needs to be measured with something that can produce something that is measurable and is aligned to what ultimately needs to be measured.
The use of AI requires an organization to become data-driven, especially when a person is in a decision-making loop. Machine-to-machine communication fosters the ability of a machine to act independently to make decisions based purely on the information at hand. Orchestrating a person into a communication flow allows for decision augmentation and to act as a gatekeeper.
In the 1960s, the euphemism Mad Men was supposedly created by those working in the field of advertising, where—in the United States—the industry was heavily centered around Madison Avenue in New York City (the men of Madison Avenue). Mad Men created messages for the masses. Whether messages were regionally appropriate or whether an advertisement resonated exceptionally well with the discrete needs of each singular individual was not the core focus. Eventually, the gut feel of the Mad Men approach gave way to the focus group–oriented view of the Media Men. In turn, the Media Men have given way to the Math Men. The Math Men are the men and women of data science whose provinces are the hordes of big data and thick data, algorithms, and machine learning that derive insight from data. As the new-collar worker expands into all aspects of corporate work from using model-based outcomes, each decision is going to be based on data. New-collar workers are data-driven and so are their decisions.
THE ZACHMAN FRAMEWORK
The six interrogatives—what, how, where, who, when, why—provide a methodical means toward inquiry. However, the use of the interrogatives in the Zachman Framework provide for a structural device in the framework. Because the Zachman Framework is structural in nature and is not a methodology, the framework is actually an ontology for describing the enterprise.
The Zachman Framework is not a methodology because the framework is not prescriptive or anchored on a process. The framework is concerned about creating, operating, or changing essential components that are of interest to an enterprise. The components can be big or small and include the enterprise itself, a department, a cloud, an app, a container, a schema, and an AI model.
Democratizing Data and Data Science
Despite the explosion of interest in data collection and storage, many organizations will intentionally relegate data science knowledge to a discrete, small number of employees. While organizations must foster areas of specialization, the need to designate the data scientist label to a small cohort of employees seems to stem from a misguided belief that AI is somehow magic.
In the long-term, neither data science nor AI should be the sole purview of the data scientist. The democratization of data science involves opening up the fundamentals of data science to a broader set of employees, paving the way for the establishment of new roles, including the citizen data scientist.
For example, a citizen data scientist would “create or generate [AI] models that use advanced diagnostic analytics or predictive and prescriptive capabilities, and whose primary job function is outside the field of statistics and analytics” (www.gartner.com/en/newsroom/press-releases/2017-01-16-gartner-says-more-than-40-percent-of-data-science-tasks-will-be-automated-by-2020). Citizen data scientists would extend the type of analytics that can be associated with self-service paradigms that are offered by organizations.
A citizen data scientist is still able to make use of advanced analytics without having all of the skills that characterize the conventional data scientist. Skills associated with a conventional data scientist would include proficiency in a programming language, such as Python or R, and applied knowledge of advanced-level math and statistics. By contrast, a citizen data scientist may possess intrinsic and expert domain knowledge that the data scientist does not possess. When a data scientist does additionally possess domain knowledge, they are jokingly referred to as a unicorn.
Attempting to relegate the handling of data associated with AI to a small specialized team of people within a company can be fraught with challenges. Some data scientists may find it wearisome to communicate insight and nuance to other employees who lack specific data literacy skills, such as the ability to read and work with digitized data. Business stakeholders can become frustrated because data requests are not addressed quickly and may appear to fail at addressing their questions.
Many software tools that have been designed for use by the data scientist community end up residing solely within each data science team. But while logical, creating a silo of data software tools and restricting tool access to a small team (such as a team of data scientists) can create its own dilemma. All departments across an organization can generate analytical needs. Each need can span a spectrum of complexity: from ultra-simple to insanely tricky. But realistically, not every requirement is going to be anchored on the insanely tricky end of the analytical need spectrum. Many needs may be solvable or addressable by someone with basic analytical training. By instituting the citizen data scientist, organizations can better tailor the initiatives suited for the deep expertise of the data scientist.
Democratizing data science empowers as many people as possible to make data-driven decisions. Empowering begins with education and is sustained through continual education. If AI is to impact 100% of all future jobs, education on AI and data literacy (data literacy is addressed in Chapter 7, “Maximizing the Use of Your Data: Being Value Driven;” statistical literacy is covered in