Dynamically Creating Code with SAS

The ability to dynamically create code allows you to make incredibly flexible systems for processing data. For example, on a project Butterfly Data has worked on, we built a data quality system which checked the quality of 190+ data types, each of which had different formats. When it came to importing this data, obviously we could write 190 different import scripts, or we could use the metadata to dynamically write code that does that for us. (Note – this blog does assume a situation with good quality metadata, which you ideally would have).

Let’s assume you already have your data imported into SAS, but your dataset doesn’t have the proper names or formats. Of course, you could manually do this:

data your_data_2;

   set your_data_1;

   rename col1 = your_col1 col2= your_col2 … ;

   attrib col1 informat=$char30. col2 informat=MMDDYY10. … ;

run;

This step works fine when using one dataset. But as I mentioned earlier, what if you need to build a system that can accept 100s of different types of datasets? Well, assuming you had fairly simple metadata, it could look something like this:

column_sequence column_name column_format
1 your_col1 $char30.
2 your_col2 MMDDYY10.


This would then allow you to format a dataset using dynamically created code. An argument could be made that maintaining metadata is as much effort as maintaining code, but generally you should have metadata already (which is already half the battle). Furthermore, this means that if you want to change something in the import process, it can just be changed in one place in the code, rather than in many more places.

Now, how can we use this metadata to format our dataset?

Using the metadata, we want to create a column, which is in SAS code, as below:

data your_metadata_2;

   your_metadata_1;

   field_name = cat(“COL”, strip(column_sequence), ” =  “,  strip(column_name));

   field_format = cat(strip(column_name),” informat=”,strip(column_format));

run; 

For the first row of our metadata, described above, this would create the two statements “COL1 = your_col1” and “your_col1 informat=$CHAR30.” for field_name and field_format respectively.

So now we have dynamically created SAS code using metadata! But you may have noticed we’ve not actually run the code yet. This is a two step trick using the incredibly versatile “into” statement in SAS’s PROC SQL, to put all the code into a macro variable and then running the macro variable, as code, in a data step. The second steps work because the macro variable is compiled, before the data step is executed, and hence you can use the macro variable to dynamically create code! This would look something like this:

proc sql noprint;

   select field_name, field_format

   into :field_names separated by ‘ ‘, :field_formats separated by ‘ ‘

   from your_metadata_2;

quit;

data your_data_2;

   set your_data_1;

   rename &field_names.;

   attrib &field_formats.;

run;

The first step produces macro variables, which for the “field_names” macro variable would be a string something like this: “COL1 = your_col1 COL2 = your_col2 ...” Now imagine you were doing this to a dataset with 100 fields, you’ve saved yourself a lot of typing (and let’s be honest, mistyping incredibly basic words, which is where most of us developers make mistakes so avoid typing things out manually at all cost).

Now we have code that can change the names and formats of any dataset inputted to the system, using the metadata. This means you could even get your system to process a new format of data without making any code changes at all. 

Once you can get your head round how to use the PROC SQL “into” statement in combination with SAS macro, you can dynamically create really complicated pieces of code. This will allow you to develop metadata driven data processing systems, with great flexibility and reliability.

Previous
Previous

Happy Holidays from Butterfly Data!

Next
Next

Butterfly Data Automates Forecast Process from 5 Days to 2