Excel Power Pivot & Power Query For Dummies. Michael Alexander
Читать онлайн книгу.table is divided into rows (called records) and columns (called fields), with the first row (the heading on top of each column) containing the names of the fields in the database.
Each row is a single record containing fields that are related to that record. In a manual system, the rows are individual forms (sheets of paper), and the fields are equivalent to the blank areas on a printed form that you fill in.
Each column is a field that includes many properties specifying the type of data contained within the field and how the database should handle the field’s data. These properties include the name of the field (Company) and the type of data in the field (Text). A field may include other properties as well. For example, the Address field’s Size property tells the database the maximum number of characters allowed for the address.
At the intersection of a record and a field is a value — the actual data element. For example, in a field named Company, a company name entered into that field would represent one data value.
When working with Microsoft Access, the term field is used to refer to an attribute stored in a record. In many other database systems, including SQL Server, column is the expression you hear most often in place of field — field and column mean the same thing. The exact terminology that’s used relies somewhat on the context of the database system underlying the table containing the record.
Queries
Most relational database systems allow the creation of queries (sometimes called views). A query extracts information from the tables in the database; a query selects and defines a group of records that fulfill a certain condition. Most database outputs are based on queries that combine, filter, or sort data before it’s displayed. Queries are often called from other database objects, such as stored procedures, macros, or code modules. In addition to extracting data from tables, queries can be used to change, add, or delete database records.
An example of a query is when a person at the sales office tells the database, “Show me all customers, in alphabetical order by name, who are located in Massachusetts and who made a purchase over the past six months.” Or “Show me all customers who bought Chevrolet car models within the past six months, and display them sorted by customer name and then by sale date.”
Rather than ask the question using English words, a person uses a special syntax, such as Structured Query Language (or SQL), to communicate to the database what the query will need to do.
Understanding Relationships
After you understand the basic terminology of databases, it’s time to focus on one of their more useful features: A relationship is the mechanism by which separate tables are related to each other. You can think of a relationship as a kind of VLOOKUP, in which you relate the data in one data range to the data in another data range using an index or a unique identifier. In databases, relationships do the same thing, but without the hassle of writing formulas.
Relationships are important because most of the data you work with fits into a multidimensional hierarchy of sorts. For example, you may have a table showing customers who buy products. These customers require invoices that have invoice numbers. Those invoices have multiple lines of transactions listing what they bought. A hierarchy exists there.
Now, in the one-dimensional spreadsheet world, this data typically would be stored in a flat table, like the one shown in Figure 1-1.
Because customers have more than one invoice, the customer information (in this example, CustomerID and CustomerName) has to be repeated. This causes a problem when that data needs to be updated.
FIGURE 1-1: Data is stored in an Excel spreadsheet using a flat-table format.
For example, imagine that the name of the company Aaron Fitz Electrical changes to Fitz and Sons Electrical. Looking at Figure 1-1, you see that multiple rows contain the old name. You would have to ensure that every row containing the old company name is updated to reflect the change. Any rows you miss will not correctly map back to the right customer.
Wouldn’t it be more logical and efficient to record the name and information of the customer only one time? Then, rather than have to write the same customer information repeatedly, you could simply have some form of customer reference number.
This is the idea behind relationships. You can separate customers from invoices, placing each in their own tables. Then you can use a unique identifier (such as CustomerID) to relate them together.
Figure 1-2 illustrates how this data would look in a relational database. The data would be split into three separate tables: Customers, InvoiceHeader, and InvoiceDetails. Each table would then be related using unique identifiers (CustomerID and InvoiceNumber, in this case).
FIGURE 1-2: Databases use relationships to store data in unique tables and simply relate these tables to each other.
The Customers table would contain a unique record for each customer. That way, if you need to change a customer’s name, you would need to make the change in only that record. Of course, in real life, the Customers table would include other attributes, such as customer address, customer phone number, and customer start date. Any of these other attributes could also be easily stored and managed in the Customers table.
The most common relationship type is a one-to-many relationship. That is, for each record in one table, one record can be matched to many records in a separate table. For example, an invoice header table is related to an invoice detail table. The invoice header table has a unique identifier: Invoice Number. The invoice detail will use the Invoice Number for every record representing a detail of that particular invoice.
Another kind of relationship type is the one-to-one relationship: For each record in one table, one and only one matching record is in a different table. Data from different tables in a one-to-one relationship can technically be combined into a single table.
Finally, in a many-to-many relationship, records in both tables can have any number of matching records in the other table. For instance, a database at a bank may have a table of the various types of loans (home loan, car loan, and so on) and a table of customers. A customer can have many types of loans. Meanwhile, each type of loan can be granted to many customers.
If your head is spinning from all this database talk, don’t worry. You don’t need to be an expert database modeler to use Power Pivot. But it’s important to understand these concepts. The better you understand how data is stored and managed in databases, the more effectively you’ll leverage Power Pivot for reporting.
Chapter 2
Introducing Power Pivot
IN THIS CHAPTER
Getting to know the Internal Data Model
Activating the Power Pivot add-in