SAS Viya. Kevin D. Smith
Читать онлайн книгу.63.667470 19.236588
ProbT
0 3.331256e-129
1 4.374977e-129
2 1.994305e-57
3 3.209704e-42
+ Elapsed: 0.0256s, user: 0.019s, sys: 0.009s, mem: 1.74mb
The summary action displays summary statistics in a form that is familiar to SAS users. If you want them in a form similar to what Pandas users are used to, you can use the describe method (just like on DataFrames).
In [20]: iris.describe()
Out[20]:
SepalLength SepalWidth PetalLength PetalWidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Note that when you call the describe method on a CASTable object, it calls various CAS actions in the background to do the calculations. This includes the summary, percentile, and topk actions. The output of those actions is combined into a DataFrame in the same form that the real Pandas DataFrame describe method returns. This enables you to use CASTable objects and DataFrame objects interchangeably in your workflow for this method and many other methods.
Data Visualization
Since the tables that come back from the CAS server are subclasses of Pandas DataFrames, you can do anything to them that works on DataFrames. You can plot the results of your actions using the plot method or use them as input to more advanced packages such as Matplotlib and Bokeh, which are covered in more detail in a later section.
The following example uses the plot method to download the entire data set and plot it using the default options.
In [21]: iris.plot()
Out[21]: <matplotlib.axes.AxesSubplot at 0x5339050>
If the plot doesn’t show up automatically, you might have to tell Matplotlib to display it.
In [22]: import matplotlib.pyplot as plt
In [23]: plt.show()
The output that is created by the plot method follows.
Even if you loaded the same data set that we have used in this example, your plot might look different since CAS stores data in a distributed manner. Because of this, the ordering of data from the server is not deterministic unless you sort it when it is fetched. If you run the following commands, you plot the data sorted by SepalLength and SepalWidth.
In [24]: iris.sort_values(['SepalLength', 'SepalWidth']).plot()
Closing the Connection
As with any network or file resource in Python, you should close your CAS connections when you are finished. They time out and disappear eventually if left open, but it’s always a good idea to clean them up explicitly.
In [25]: conn.close()
Conclusion
Hopefully this 10-minute guide was enough to give you an idea of the basic workflow and capabilities of the Python CAS client. In the following chapters, we dig deeper into the details of the Python CAS client and how to blend the power of SAS analytics with the tools that are available in the Python environment.
1 Later in the book, we show you how to store your password so that you do not need to specify it in your programs.
Chapter 3: The Fundamentals of Using Python with CAS
The SAS SWAT package includes an object-oriented interface to CAS as well as utilities to handle results, format data values, and upload data to CAS. We have already covered the installation of SWAT in an earlier chapter, so let’s jump right into connecting to CAS.
There is a lot of detailed information about parameter structures, error handling, and authentication in this chapter. If you feel like you are getting bogged down, you can always skim over this chapter and come back to it later when you need more formal information about programming using the CAS interface.
Connecting to CAS
In order to connect to a CAS host, you need some form of authentication. There are various authentication mechanisms that you can use with CAS. The different forms of authentication are beyond the scope of this book, so we use user name and password authentication in all of our examples. This form of authentication assumes that you have a login account on the CAS server that you are connecting to. The disadvantage of using a user name and password is that you typically include your password in the source code. However, Authinfo is a solution to this problem, so we’ll show you how to store authentication information using Authinfo as well.
Let’s make a connection to CAS using an explicit user name and a password. For this example, we use an IPython shell. As described previously, to run IPython, you use the ipython command from a command shell or the Anaconda menu in Windows.
The first thing you need to do after starting IPython is to import the SWAT package. This package contains a class called CAS that is the primary interface to your CAS server. It requires at least two arguments: CAS host name or IP address, and the port number that CAS is running on1. Since we use user name and password authentication, we must specify them as the next