SAS Viya Tips: Finding and Utilising the Number of Rows in a CAS Table Through Macros

Today we will be having a look into troubleshooting the SAS Viya CAS tools.

This article might be somewhat more specific than a few of our broader discussions here. However, SAS Viya is a reasonably new tool, and consequently there is not currently a massive body of content on the use of CAS-enabled procedures available on the internet. This also means that everyone’s favourite code debuggers, such as ChatGPT, are unable to provide useful assistance, an example of a time where generative AI isn’t always the answer.

Often it can be useful to be able to find out the number of rows in a dataset, for example, you may have conditional logic which will only execute certain parts of your program if a particular dataset is not empty.

I was working on a project using SAS Viya 3.5. I found that many of the options normally used in SAS to count rows may not actually be CAS-enabled. For example I had a piece of code that utilised the automatic data step variable “_N_” to check if there were any rows in a dataset. This variable wasn’t CAS enabled, and it hence forced CAS to not utilise multiple workers, and caused our code to fail on very large datasets!

So we managed to combine ideas from a few different places, to figure out how to efficiently count the number of rows in a CAS table, and put this value into a macro variable to be able to utilise the value elsewhere in the code.

Example Code

proc cas; /* 1 */

table.recordCount result=obs /                /* 2 */  

table={caslib="your_CASLIB" name="your_table"}; /* 3 */

  call symputx('nobs',obs.recordCount[1,1],"L");  /* 4 */

quit;

  1. To run Proc Cas you will need to have an active CAS session.

  2. This step tells Proc Cas to take the record count of the table, and place it into a variable called “obs”.

  3. This step selects the table from the CASLIB and the table name. Note that the table needs to be loaded into memory for this to work.1

  4. This step extracts the actual record count from the variable obs (“obs” is a dictionary - see here for more info), and then places it into the macro variable “nobs”.2

Following this code, the macro variable “nobs” will now have the number of rows in the dataset specified. Another benefit to this approach, as opposed to more traditional SAS methods, is that using CAS tends to be much more efficient than SAS, as demonstrated in this great post.

I hope this helped and if you found this interesting, then why not check out some of our other articles on CAS in the links above. If you have got any further questions, then feel free to get in touch with me at martin@butterflydata.co.uk.

Finally, just two things worth noting about this walkthrough:

  1. I believe there’s no way to assess the number of rows in a dataset that is not loaded, as SAS has no way of viewing the dataset, however, it could be possible to interact with the the metadata of a database, depending on your CASLIB connection.

  2. Note that when using CAS datasets, you have to specify the macro variable scope during call symputx, or it will return a null value, I believe.

Previous
Previous

International Women's Day 2024 - #InspireInclusion

Next
Next

Hackolade - Another View of Data Management