using base sas® to automate quality checks of excel ...torsas.ca/attachments/file/20200228/using...

38
© 2020. All rights reserved. IQVIA ® is a registered trademark of IQVIA Inc. in the United States, the European Union, and various other countries. Lisa Mendez, PhD Andrew T. Kuligowski Using Base SAS ® to Automate Quality Checks of Excel ® Workbooks that have Multiple Worksheets

Upload: others

Post on 29-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • © 2020. All rights reserved. IQVIA® is a registered trademark of IQVIA Inc. in the United States, the European Union, and various other countries.

    Lisa Mendez, PhD

    Andrew T. Kuligowski

    Using Base SAS® to Automate

    Quality Checks of Excel ®

    Workbooks that have Multiple

    Worksheets

  • 1

    +The Process

    • Determine how to identify the smaller problems within the

    larger, overwhelming problem

    • Solve each problem using SAS code

    • Putting it all together

    • Implementing the code

    • Lessons Learned

    Overview

  • 2

    • Unfamiliar with the data – thrown into the deep end

    • Five Markets

    - ADHD, BNZD, CNNB, CDNE, and PAIN

    • Each market had seven (7) Excel workbooks to be checked

    • Each workbook had various multiple worksheets (variables were the same but

    worksheet names were different for each market)

    - ADHD – 7 worksheets

    - BNZD – 24 worksheets

    - CNNB – 7 worksheets

    - CDNE – 5 worksheets

    - PAIN – 55 worksheets

    Background

  • 3

    Background

    • Sample (note: the number of workbooks have increased since the writing of the original paper

    and presentation

  • 4

    Background

    • Sample of

    variable names

    (partial) and

    worksheet

    names (partial)

  • 5

    • Let’s do the math…

    • 5 markets multiplied by 7 workbooks (35 workbooks) that had a total of 98

    worksheets that needed to be checked

    - That is 3,430 worksheets

    - For 27 quarters!!!!

    - For a grand total of 92,610 worksheets

    • That can be just a little bit overwhelming!

    The Overarching Problem

  • 6

    1. lexjansen.com (more information at the end of the discussion)

    2. SAS Communities

    3. SAS Support

    4. University websites

    Research

  • 7

    • XLSX Engine

    - Allows you to read and write Microsoft Excel files as if they were data sets in a

    library

    - Advantage is that it accesses the XLSX file directly - does not use the Microsoft

    data APIs as a go-between

    - You have to have a license for SAS/ACCESS to PC Files to utilize the XLSX

    engine

    - SAS University Edition, the SAS/ACCESS product is part of the that package

    Getting the Data into SAS

    libname Cadhd1 XLSX

    "C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\ADHD\

    RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx";

  • 8

    Getting the Data into SAS

    libname Cadhd1 XLSX "C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\

    ADHD\RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx";

    The libname statement sets up the

    datasets, and you will see them in the

    cadhd1 library, but the datasets will be

    empty

    Names of datasets are

    the names of the

    worksheets

  • 9

    • Using PROC SQL and SAS Dictionary Tables

    Loading the Data

  • 10

    Loading the Data

    Note: All Caps

    where

    libname="CADHD1"

  • 11

    Loading the Data

  • 12

    Loading the Data

  • 13

    • Macro variables (will be used in the macro)

    Loading the Data

    Output from the log:52

    53 %put &snamlist_1; /* show the macro variable snamlist in the

    log */

    LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_METH*

    ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER

    54 %put &n_1; /* show the macro variable n_1 I the log */

    8

  • 14

    • SAS Macro

    Loading the Data

    54 %put &n_1; /* show the macro variable n_1 I the log */

    8

    LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_ME

    TH*ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER

  • 15

    • Lessons Learned

    - After consulting with Vince DelGabo, he suggested using Proc Datasets to copy

    in a library

    - Different ways to deal with invalid worksheet names

    › Rename the dataset

    › Delete the invalid SAS dataset

    Loading the Data

  • 16

    Loading the Data

  • 17

    • Need templates to compare

    • Load templates each quarter

    - Ensure permanent template library (libname statement)

    - By Market

    › List of variable names

    › List of worksheet names

    Validating Worksheet and Variable Names

  • 18

    • Once templates are loaded, compare worksheet names

    Validating Worksheet Names

  • 19

    • Dataset created after PROC SQL compare for Worksheet Names

    • All worksheet names match – no errors

    Validating Worksheet Names

  • 20

    • Create an error report

    Validating Worksheet Names

  • 21

    • Sample of Worksheet Error

    Exporting Error Report to Excel

  • 22

    • Once templates are loaded, compare variable names

    • Use Proc Contents to get a current list of variable names

    Validating Variable Names

  • 23

    • Dataset created after PROC SQL

    compare for Variable Names

    • Everything Matches

    • Note: change variable names either

    before PROC SQL, or in the PROC

    SQL statement

    Validating Variable Names

  • 24

    Exporting Error Report to Excel

    The macro variable ‘x’ is used to

    number the reports that correspond

    with each workbook

  • 25

    • Used within a macro

    • One Excel file per Market

    • Multiple worksheets for each

    workbook checked

    • No errors for this workbook

    Exporting Error Report to Excel

    Each worksheet corresponds to a workbook

  • 26

    • Lessons Learned:

    - Do not output if there are no errors, or output “no error” message, because

    most of the workbooks do not have variable name or worksheet name errors

    • Found code to deal with data sets with no observations in order to not write out to

    Excel

    - See next two slides

    Exporting Error Report to Excel

  • 27

    Export only datasets that have error messages

    Code continues on next slide

  • 28

    Export only datasets that have error messages

  • 29

    • A macro variable was created, using the same methods as before for all the

    worksheet/dataset names

    • The macro variable was used in conjunction with a macro to execute a data step

    multiple times to check all the data within a worksheet/dataset

    Validating Data

  • 30

    Validating Data

  • 31

    • Similar code was written to check the products within a workbook

    • A pre-loaded template was used to ensure the correct products were in the

    correct worksheet/dataset

    • A macro was used, along with a data step, and a PROC SQL step to compare

    product names in the pre-loaded template with the product names of the current

    data

    • An exception report was created for the values check

    • Utilized lesson learned from previous Excel export

    - For these exception reports, only MS Excel workbooks were created for each

    worksheet only if any errors were found

    Validating Data

  • 32

    • Run multiple SAS programs from one “Main” program

    “Main” Program

  • 33

    • Many macros are used to create many datasets in the process of checking one

    workbook

    • To ensure there is enough space in the SAS session, PROC Datasets is used to

    clean up the libraries used in the program

    • To delete all files in a SAS data library at one time use the KILL option

    • CAUTION: The KILL option deletes all members of the library immediately after

    the statement is submitted

    Deleting Datasets

  • 34

    Deleting Datasets

  • 35

    • When faced with overwhelming task break it down

    • Solve one problem at a time

    • Doing research online may help provide different solutions

    - Find one that works for your problem, and YOU prefer

    - Don’t be afraid to code your program and do some steps that are not as efficient

    (“down and dirty”)

    • When utilizing macros, get the program to work before coding the macro(s)

    • Enhance your program for efficiency when you have more time

    Conclusion

  • 36

    LexJansen.com

  • © 2020. All rights reserved. IQVIA® is a registered trademark of IQVIA Inc. in the United States, the European Union, and various other countries.

    Lisa Mendez

    [email protected]

    Andrew T. Kulligowski

    [email protected]

    Thank you!