SAS Data Management Server integration

Admittedly, this is a niche interest article, but I hope it will be helpful to anyone who has the need to use data jobs within SAS Data Management Studio (DMStudio) and SAS Data Management Server (DMServer) or simply for those who are interested in learning a little more about both of these suites.

For context, recently a client had a particular requirement to use DMServer to conduct repeatable data quality (DQ) assessment on a variety of datasets. The design was developed for data to be prepared and have enriched metadata appended using SAS code (SAS Studio, Enterprise Guide) before being passed to DMServer to score the DQ. To enable the integration of DMServer functions within SAS code there is a PROC function (PROC DMSRVDATASVC), this blog details a list of findings/feature of using the DMSRVDATASVC function that are not immediately apparent from the SAS documentation. 

  1. The use and format of PARMLIST – an example of the format of a PARMLIST is [PARMLIST= "'V1'='Name_Prefix','V2'='Name_First'"]. The value pairs need to have a matching variable configured in the DMS data job. Once variables are passed in this method then they can be used in the data job by encasing them in double % (e.g. %%V1%%).

  2. The limitation of pass through variables - variables passed into a data job from a PARMLIST are like Macro variables that can be set in the DMS config files. Through development a key difference was identified, macro variables are available when the data job is called and can be used to configure the job (e.g. naming fields, naming embedded data job parameters) but PARMLIST variables appear to only be set after the data job has been compiled. While this can be accounted for in the design it is a difference worth knowing before you start using this function.

  3. The need to use TRIM – the TRIM statement can be added to the PROC DMSRVDATASVC statement. As documented, this option trims the leading and trailing white space from the fields being passed to DMServer. If this function is not used then the function will left field the data by adding white space to the right of the field, so that the field matches the defined SAS field length - i.e. for a String $8, with contents “test” will be received by DMServer as “test “. This can prevent the correct DQ assessment of the original data. The downside of the TRIM option is the masking of actual leading and trailing spaces in the data.

  4. Embedded data jobs – to enable modular rule development, embedded data jobs were used within a primary data job.

    1. Non absolute referencing - when developing a data job in DMStudio the path of embedded data jobs must be specified, the location is written with a variables in place to point to the dataflux repository location on either DMStudio or DMServer (e.g. dfr:///data_services/V1/address(DataJob)) the dfr:// resolves in either DMStudio or DMServer so the data job is able to be deployed to a server and still function. There is an issue with this reference hardcoding into the job the directory of the called job, e.g. V1 in the previous example. If the primary data job is moved to another directory (e.g. V2) it will still be pointing to the original embedded data job. While this might be convenient in some cases it can cause issues in other e.g. development branches. It is possible to instead use a relative path, this is possible by changing the file location to point to the embedded data job file, i.e. change    dfr:///data_services/V1/address(DataJob) to address.ddf. This will work if the embedded data job is in the same directory as the primary data job.

    2. No macro variables when naming – as mentioned in the PARMLIST section above, it is not possible to use variables to name the imbedded file, i.e.  %%V1%%.ddf does not work.

    3. KEY_VALUES to pass through variables – the KEY_VALUES option in an embedded job works the same as a PARMLIST in PROC DMSRVDATASVC statement. It is able to map variables from a primary data job into an embedded data job. Without passing variables in this way the variables will not be available in embedded data jobs.

  5. Reference data – to validate data against a defined list of values it is possible to import the list of values to a data job. Unlike embedded data jobs, the imported data node will work with variables. This allows for the PROC DMSRVDATASVC statement to pass in a variable that will define the list of values to use for a particular data job.

Hopefully this quick look at the DMSRVDATASVC function in SAS DMStudio and DMServer will be useful and will lead to more effective use of these SAS suites in your day to day work.

Previous
Previous

VS Code for Python Programming

Next
Next

Every Cloud has a beneficial lining