SAS Viya. Kevin D. Smith

Читать онлайн книгу.

SAS Viya - Kevin D. Smith


Скачать книгу
63.667470 19.236588

      ProbT

      0 3.331256e-129

      1 4.374977e-129

      2 1.994305e-57

      3 3.209704e-42

      + Elapsed: 0.0256s, user: 0.019s, sys: 0.009s, mem: 1.74mb

      The summary action displays summary statistics in a form that is familiar to SAS users. If you want them in a form similar to what Pandas users are used to, you can use the describe method (just like on DataFrames).

      In [20]: iris.describe()

       Out[20]:

      SepalLength SepalWidth PetalLength PetalWidth

      count 150.000000 150.000000 150.000000 150.000000

      mean 5.843333 3.054000 3.758667 1.198667

      std 0.828066 0.433594 1.764420 0.763161

      min 4.300000 2.000000 1.000000 0.100000

      25% 5.100000 2.800000 1.600000 0.300000

      50% 5.800000 3.000000 4.350000 1.300000

      75% 6.400000 3.300000 5.100000 1.800000

      max 7.900000 4.400000 6.900000 2.500000

      Note that when you call the describe method on a CASTable object, it calls various CAS actions in the background to do the calculations. This includes the summary, percentile, and topk actions. The output of those actions is combined into a DataFrame in the same form that the real Pandas DataFrame describe method returns. This enables you to use CASTable objects and DataFrame objects interchangeably in your workflow for this method and many other methods.

      Since the tables that come back from the CAS server are subclasses of Pandas DataFrames, you can do anything to them that works on DataFrames. You can plot the results of your actions using the plot method or use them as input to more advanced packages such as Matplotlib and Bokeh, which are covered in more detail in a later section.

      The following example uses the plot method to download the entire data set and plot it using the default options.

      In [21]: iris.plot()

      Out[21]: <matplotlib.axes.AxesSubplot at 0x5339050>

      If the plot doesn’t show up automatically, you might have to tell Matplotlib to display it.

      In [22]: import matplotlib.pyplot as plt

      In [23]: plt.show()

      The output that is created by the plot method follows.

image

      Even if you loaded the same data set that we have used in this example, your plot might look different since CAS stores data in a distributed manner. Because of this, the ordering of data from the server is not deterministic unless you sort it when it is fetched. If you run the following commands, you plot the data sorted by SepalLength and SepalWidth.

      In [24]: iris.sort_values(['SepalLength', 'SepalWidth']).plot()

image

      As with any network or file resource in Python, you should close your CAS connections when you are finished. They time out and disappear eventually if left open, but it’s always a good idea to clean them up explicitly.

      In [25]: conn.close()

      Hopefully this 10-minute guide was enough to give you an idea of the basic workflow and capabilities of the Python CAS client. In the following chapters, we dig deeper into the details of the Python CAS client and how to blend the power of SAS analytics with the tools that are available in the Python environment.

      Chapter 3: The Fundamentals of Using Python with CAS

       Connecting to CAS

       Running CAS Actions

       Specifying Action Parameters

       CAS Action Results

       Working with CAS Action Sets

       Details

       Getting Help

       Dealing with Errors

       SWAT Options

       CAS Session Options

       Conclusion

      The SAS SWAT package includes an object-oriented interface to CAS as well as utilities to handle results, format data values, and upload data to CAS. We have already covered the installation of SWAT in an earlier chapter, so let’s jump right into connecting to CAS.

      There is a lot of detailed information about parameter structures, error handling, and authentication in this chapter. If you feel like you are getting bogged down, you can always skim over this chapter and come back to it later when you need more formal information about programming using the CAS interface.

      In order to connect to a CAS host, you need some form of authentication. There are various authentication mechanisms that you can use with CAS. The different forms of authentication are beyond the scope of this book, so we use user name and password authentication in all of our examples. This form of authentication assumes that you have a login account on the CAS server that you are connecting to. The disadvantage of using a user name and password is that you typically include your password in the source code. However, Authinfo is a solution to this problem, so we’ll show you how to store authentication information using Authinfo as well.

      Let’s make a connection to CAS using an explicit user name and a password. For this example, we use an IPython shell. As described previously, to run IPython, you use the ipython command from a command shell or the Anaconda menu in Windows.

      The first thing you need to do after starting IPython is to import the SWAT package. This package contains a class called CAS that is the primary interface to your CAS server. It requires at least two arguments: CAS host name or IP address, and the port number that CAS is running on1. Since we use user name and password authentication, we must specify them as the next


Скачать книгу