mining for graph-based library usage patterns in cobol systems
TRANSCRIPT
Ruben Opdebeeck
Johan Fabry
Tim Molderez
Jonas De Bleser
Coen De Roover
Mining for Graph-Based Library Usage Patterns in COBOL Systems
COBOL Library Usage Pattern Mining
2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
Goal: Understand library usages to estimate modernisation effort
COBOL Library Usage Pattern Mining
2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
IC102A (0)
IC102A (2)
order
DN1 (Group) (1)
use
use
IC102A (3)
use
order
IC102A (4)
order
IC102A (6)
order
DN2 (Group) (5)
use
use
STOP RUN (7)
order
418248 (no-op, L268) (8)
order
IC105A (0)
IC105A (3)
order
MAIN-DN2 (Group) (1)
use
use
IC105A (4)
use
IC105A (5)
use
MAIN-DN1 (Group) (2)
use
use
use
use
order
order
IC104A (6)
order
STOP RUN (10)
order
GROUP-02 (Group) (7)
use
GROUP-01 (Group) (8)
use
ELEM-77 (Group) (9)
use
1757336 (no-op, L272) (11)
order
IC107A (0)
STOP RUN (4)
order
TABLE-1 (Group) (1)
use
IDN1 (Group) (2)
use
TABLE-2 (Group) (3)
use
650331 (no-op, L275) (5)
order
IC109A (0)
STOP RUN (2)
order
GRP-01 (Group) (1)
use
454282 (no-op, L270) (3)
order
IC110A (0)
STOP PROGRAM (3)
order
GRP-01 (Group) (1)
use
WS1 (Group) (2)
use
IC111A (0)
STOP PROGRAM (4)
order
LS1 (Group) (1)
use
WS2 (Group) (2)
use
GRP-01 (Group) (3)
use
IC113A (0)
STOP RUN (6)
order
SQ-FS3R1-F-G-120 (Group) (1)
use
IC113A (5)
use
ERROR-FLAG (Group) (2)
use use
WRK-CS-09V00 (Group) (3)
use use
RECORDS-IN-ERROR (Group) (4)
use use
order
214890 (no-op, L343) (7)
order order
IC115A (0)
IC115A (3)
order
FILE-REC-SQ-FS3 (Group) (1)
use
use
IC115A (4)
use
IC115A (5)
use
IC115A (6)
use
GROUP-LINKAGE-VARIABLES (Group) (2)
use
use
use
use
useorder
order
order
STOP RUN (7)
order
180074 (no-op, L290) (8)
order
IC117M (0)
STOP RUN (1)
order
833320 (no-op, L279) (2)
order
IC118M (0)
STOP PROGRAM (1)
order
IC202A (0)
ID1[IC202A] (5)
order
DN3 (Group) (1)
use
use
ID2[] (6)
use
ID2[] (7)
use
ID1[IC202A] (9)
use
IC202A (11)
use
IC202A (12)
use
ID1[IC202A] (13)
use
ID1[IC202A] (14)
use
DN2 (Group) (2)
use
use
use
use
IC202A (8)
use
use
ID1[IC202A] (10)
use
use
use
use
use
DN4 (Group) (3)
use
use
use
use
use
use
use
use
use
DN1 (Group) (4)
use
use
use
use
use
use
use
use
use
use
use
order
order
order
order
order
order
order
order
order
STOP RUN (15)
order
518479 (no-op, L270) (16)
order
ID1[IC204A] (0)
ID1[IC204A] (3)
order
TABLE-1 (Group) (1)
use
use
IC204A (5)
use
IC204A (7)
use
IC204A (8)
use
ID1[IC204A] (10)
use
IC204A (11)
use
ID1[IC204A] (14)
use
IC205A (16)
use
ID1[IC204A] (18)
use
DN1 (Group) (2)
use
use
use
use
use
use
use
use
use
order
DN5 (Group) (4)
use
CANCEL IC204A (6)
order
order
order
CANCEL ID1[IC204A] (9)
order
order
order
CANCEL ID1[IC204A] (12)
order
CANCEL ID1[IC204A] (13)
order
order
CANCEL IC205A (15)
order
order
order
TABLE-2 (Group) (17)
use
CANCEL ID1[IC204A] (19)
order
STOP RUN (20)
order
1359146 (no-op, L272) (21)
order
Library Usages
Goal: Understand library usages to estimate modernisation effort
COBOL Library Usage Pattern Mining
2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
IC102A (0)
IC102A (2)
order
DN1 (Group) (1)
use
use
IC102A (3)
use
order
IC102A (4)
order
IC102A (6)
order
DN2 (Group) (5)
use
use
STOP RUN (7)
order
418248 (no-op, L268) (8)
order
IC105A (0)
IC105A (3)
order
MAIN-DN2 (Group) (1)
use
use
IC105A (4)
use
IC105A (5)
use
MAIN-DN1 (Group) (2)
use
use
use
use
order
order
IC104A (6)
order
STOP RUN (10)
order
GROUP-02 (Group) (7)
use
GROUP-01 (Group) (8)
use
ELEM-77 (Group) (9)
use
1757336 (no-op, L272) (11)
order
IC107A (0)
STOP RUN (4)
order
TABLE-1 (Group) (1)
use
IDN1 (Group) (2)
use
TABLE-2 (Group) (3)
use
650331 (no-op, L275) (5)
order
IC109A (0)
STOP RUN (2)
order
GRP-01 (Group) (1)
use
454282 (no-op, L270) (3)
order
IC110A (0)
STOP PROGRAM (3)
order
GRP-01 (Group) (1)
use
WS1 (Group) (2)
use
IC111A (0)
STOP PROGRAM (4)
order
LS1 (Group) (1)
use
WS2 (Group) (2)
use
GRP-01 (Group) (3)
use
IC113A (0)
STOP RUN (6)
order
SQ-FS3R1-F-G-120 (Group) (1)
use
IC113A (5)
use
ERROR-FLAG (Group) (2)
use use
WRK-CS-09V00 (Group) (3)
use use
RECORDS-IN-ERROR (Group) (4)
use use
order
214890 (no-op, L343) (7)
order order
IC115A (0)
IC115A (3)
order
FILE-REC-SQ-FS3 (Group) (1)
use
use
IC115A (4)
use
IC115A (5)
use
IC115A (6)
use
GROUP-LINKAGE-VARIABLES (Group) (2)
use
use
use
use
useorder
order
order
STOP RUN (7)
order
180074 (no-op, L290) (8)
order
IC117M (0)
STOP RUN (1)
order
833320 (no-op, L279) (2)
order
IC118M (0)
STOP PROGRAM (1)
order
IC202A (0)
ID1[IC202A] (5)
order
DN3 (Group) (1)
use
use
ID2[] (6)
use
ID2[] (7)
use
ID1[IC202A] (9)
use
IC202A (11)
use
IC202A (12)
use
ID1[IC202A] (13)
use
ID1[IC202A] (14)
use
DN2 (Group) (2)
use
use
use
use
IC202A (8)
use
use
ID1[IC202A] (10)
use
use
use
use
use
DN4 (Group) (3)
use
use
use
use
use
use
use
use
use
DN1 (Group) (4)
use
use
use
use
use
use
use
use
use
use
use
order
order
order
order
order
order
order
order
order
STOP RUN (15)
order
518479 (no-op, L270) (16)
order
ID1[IC204A] (0)
ID1[IC204A] (3)
order
TABLE-1 (Group) (1)
use
use
IC204A (5)
use
IC204A (7)
use
IC204A (8)
use
ID1[IC204A] (10)
use
IC204A (11)
use
ID1[IC204A] (14)
use
IC205A (16)
use
ID1[IC204A] (18)
use
DN1 (Group) (2)
use
use
use
use
use
use
use
use
use
order
DN5 (Group) (4)
use
CANCEL IC204A (6)
order
order
order
CANCEL ID1[IC204A] (9)
order
order
order
CANCEL ID1[IC204A] (12)
order
CANCEL ID1[IC204A] (13)
order
order
CANCEL IC205A (15)
order
order
order
TABLE-2 (Group) (17)
use
CANCEL ID1[IC204A] (19)
order
STOP RUN (20)
order
1359146 (no-op, L272) (21)
order
Library Usages Pattern Mining
Goal: Understand library usages to estimate modernisation effort
COBOL Library Usage Pattern Mining
2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
IC102A (0)
IC102A (2)
order
DN1 (Group) (1)
use
use
IC102A (3)
use
order
IC102A (4)
order
IC102A (6)
order
DN2 (Group) (5)
use
use
STOP RUN (7)
order
418248 (no-op, L268) (8)
order
IC105A (0)
IC105A (3)
order
MAIN-DN2 (Group) (1)
use
use
IC105A (4)
use
IC105A (5)
use
MAIN-DN1 (Group) (2)
use
use
use
use
order
order
IC104A (6)
order
STOP RUN (10)
order
GROUP-02 (Group) (7)
use
GROUP-01 (Group) (8)
use
ELEM-77 (Group) (9)
use
1757336 (no-op, L272) (11)
order
IC107A (0)
STOP RUN (4)
order
TABLE-1 (Group) (1)
use
IDN1 (Group) (2)
use
TABLE-2 (Group) (3)
use
650331 (no-op, L275) (5)
order
IC109A (0)
STOP RUN (2)
order
GRP-01 (Group) (1)
use
454282 (no-op, L270) (3)
order
IC110A (0)
STOP PROGRAM (3)
order
GRP-01 (Group) (1)
use
WS1 (Group) (2)
use
IC111A (0)
STOP PROGRAM (4)
order
LS1 (Group) (1)
use
WS2 (Group) (2)
use
GRP-01 (Group) (3)
use
IC113A (0)
STOP RUN (6)
order
SQ-FS3R1-F-G-120 (Group) (1)
use
IC113A (5)
use
ERROR-FLAG (Group) (2)
use use
WRK-CS-09V00 (Group) (3)
use use
RECORDS-IN-ERROR (Group) (4)
use use
order
214890 (no-op, L343) (7)
order order
IC115A (0)
IC115A (3)
order
FILE-REC-SQ-FS3 (Group) (1)
use
use
IC115A (4)
use
IC115A (5)
use
IC115A (6)
use
GROUP-LINKAGE-VARIABLES (Group) (2)
use
use
use
use
useorder
order
order
STOP RUN (7)
order
180074 (no-op, L290) (8)
order
IC117M (0)
STOP RUN (1)
order
833320 (no-op, L279) (2)
order
IC118M (0)
STOP PROGRAM (1)
order
IC202A (0)
ID1[IC202A] (5)
order
DN3 (Group) (1)
use
use
ID2[] (6)
use
ID2[] (7)
use
ID1[IC202A] (9)
use
IC202A (11)
use
IC202A (12)
use
ID1[IC202A] (13)
use
ID1[IC202A] (14)
use
DN2 (Group) (2)
use
use
use
use
IC202A (8)
use
use
ID1[IC202A] (10)
use
use
use
use
use
DN4 (Group) (3)
use
use
use
use
use
use
use
use
use
DN1 (Group) (4)
use
use
use
use
use
use
use
use
use
use
use
order
order
order
order
order
order
order
order
order
STOP RUN (15)
order
518479 (no-op, L270) (16)
order
ID1[IC204A] (0)
ID1[IC204A] (3)
order
TABLE-1 (Group) (1)
use
use
IC204A (5)
use
IC204A (7)
use
IC204A (8)
use
ID1[IC204A] (10)
use
IC204A (11)
use
ID1[IC204A] (14)
use
IC205A (16)
use
ID1[IC204A] (18)
use
DN1 (Group) (2)
use
use
use
use
use
use
use
use
use
order
DN5 (Group) (4)
use
CANCEL IC204A (6)
order
order
order
CANCEL ID1[IC204A] (9)
order
order
order
CANCEL ID1[IC204A] (12)
order
CANCEL ID1[IC204A] (13)
order
order
CANCEL IC205A (15)
order
order
order
TABLE-2 (Group) (17)
use
CANCEL ID1[IC204A] (19)
order
STOP RUN (20)
order
1359146 (no-op, L272) (21)
order
Library Usages Pattern Mining Library Usage Patterns
V1[C1] CANCEL V1[C1] C2 GOBACK
order order order
orderD3 (Group)D2 (Group)
D1 (Group)
use
use use
D5 (Group) D6 (Group)D4 (Group) D7 (Group)
use use
use
use
Goal: Understand library usages to estimate modernisation effort
COBOL Library Usage Pattern Mining
2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
IC102A (0)
IC102A (2)
order
DN1 (Group) (1)
use
use
IC102A (3)
use
order
IC102A (4)
order
IC102A (6)
order
DN2 (Group) (5)
use
use
STOP RUN (7)
order
418248 (no-op, L268) (8)
order
IC105A (0)
IC105A (3)
order
MAIN-DN2 (Group) (1)
use
use
IC105A (4)
use
IC105A (5)
use
MAIN-DN1 (Group) (2)
use
use
use
use
order
order
IC104A (6)
order
STOP RUN (10)
order
GROUP-02 (Group) (7)
use
GROUP-01 (Group) (8)
use
ELEM-77 (Group) (9)
use
1757336 (no-op, L272) (11)
order
IC107A (0)
STOP RUN (4)
order
TABLE-1 (Group) (1)
use
IDN1 (Group) (2)
use
TABLE-2 (Group) (3)
use
650331 (no-op, L275) (5)
order
IC109A (0)
STOP RUN (2)
order
GRP-01 (Group) (1)
use
454282 (no-op, L270) (3)
order
IC110A (0)
STOP PROGRAM (3)
order
GRP-01 (Group) (1)
use
WS1 (Group) (2)
use
IC111A (0)
STOP PROGRAM (4)
order
LS1 (Group) (1)
use
WS2 (Group) (2)
use
GRP-01 (Group) (3)
use
IC113A (0)
STOP RUN (6)
order
SQ-FS3R1-F-G-120 (Group) (1)
use
IC113A (5)
use
ERROR-FLAG (Group) (2)
use use
WRK-CS-09V00 (Group) (3)
use use
RECORDS-IN-ERROR (Group) (4)
use use
order
214890 (no-op, L343) (7)
order order
IC115A (0)
IC115A (3)
order
FILE-REC-SQ-FS3 (Group) (1)
use
use
IC115A (4)
use
IC115A (5)
use
IC115A (6)
use
GROUP-LINKAGE-VARIABLES (Group) (2)
use
use
use
use
useorder
order
order
STOP RUN (7)
order
180074 (no-op, L290) (8)
order
IC117M (0)
STOP RUN (1)
order
833320 (no-op, L279) (2)
order
IC118M (0)
STOP PROGRAM (1)
order
IC202A (0)
ID1[IC202A] (5)
order
DN3 (Group) (1)
use
use
ID2[] (6)
use
ID2[] (7)
use
ID1[IC202A] (9)
use
IC202A (11)
use
IC202A (12)
use
ID1[IC202A] (13)
use
ID1[IC202A] (14)
use
DN2 (Group) (2)
use
use
use
use
IC202A (8)
use
use
ID1[IC202A] (10)
use
use
use
use
use
DN4 (Group) (3)
use
use
use
use
use
use
use
use
use
DN1 (Group) (4)
use
use
use
use
use
use
use
use
use
use
use
order
order
order
order
order
order
order
order
order
STOP RUN (15)
order
518479 (no-op, L270) (16)
order
ID1[IC204A] (0)
ID1[IC204A] (3)
order
TABLE-1 (Group) (1)
use
use
IC204A (5)
use
IC204A (7)
use
IC204A (8)
use
ID1[IC204A] (10)
use
IC204A (11)
use
ID1[IC204A] (14)
use
IC205A (16)
use
ID1[IC204A] (18)
use
DN1 (Group) (2)
use
use
use
use
use
use
use
use
use
order
DN5 (Group) (4)
use
CANCEL IC204A (6)
order
order
order
CANCEL ID1[IC204A] (9)
order
order
order
CANCEL ID1[IC204A] (12)
order
CANCEL ID1[IC204A] (13)
order
order
CANCEL IC205A (15)
order
order
order
TABLE-2 (Group) (17)
use
CANCEL ID1[IC204A] (19)
order
STOP RUN (20)
order
1359146 (no-op, L272) (21)
order
Library Usages Pattern Mining Library Usage Patterns
V1[C1] CANCEL V1[C1] C2 GOBACK
order order order
orderD3 (Group)D2 (Group)
D1 (Group)
use
use use
D5 (Group) D6 (Group)D4 (Group) D7 (Group)
use use
use
use
Goal: Understand library usages to estimate modernisation effort
Groums for OO languages
3
Collection.stream()List Predicate
Stream.filter(Predicate)Stream
Stream.count() Streamlong
order
order
use
use
use
use
def
def
def
void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }
Graph-based object usage modelJava snippet using Collection API
T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.
Groums for OO languages
3
Collection.stream()List Predicate
Stream.filter(Predicate)Stream
Stream.count() Streamlong
order
order
use
use
use
use
def
def
def
void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }
Graph-based object usage modelJava snippet using Collection API
Library calls
Control flow
T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.
Groums for OO languages
3
Collection.stream()List Predicate
Stream.filter(Predicate)Stream
Stream.count() Streamlong
order
order
use
use
use
use
def
def
def
void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }
Graph-based object usage modelJava snippet using Collection API
Data
Data flow
T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
COBOL Primer
4
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
COBOL Primer
5
Paragraphs
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
COBOL Primer
6
Call statement Calls an external program
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
COBOL Primer
7
Perform statement
Go to statement
Jumping leads to complex control flow
Challenges
8
Challenge SolutionDefinition of a library Marked by the user
Granularity of Groums Whole programInter-paragraph control flow Graph inlining
Iteration through GOTO Inlining stackProgram termination Auxiliary nodes
Inter-program data flow Only data usagesProgram name in variable Special encoding
Language support tradeoffs Focus on control flow constructs
Details in the paper
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
Groums for COBOL
9
Single paragraph (often) too small
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
Groums for COBOL
9
Single paragraph (often) too small
Inter-paragraph Groums
⇒
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
Groums for COBOL
9
Single paragraph (often) too small
Inter-paragraph Groums
⇒
Inter-paragraph control flow
⇒
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
Groums for COBOL
9
Single paragraph (often) too small
Inter-paragraph Groums
⇒
Inter-paragraph control flow
⇒Graph inlining
⇒
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
Phase 1: Intermediate Groums
10
logger
PERFORM P1 THRU P3
GO TO After_P0
order
order
loggerZ
STOP RUN
order
loggerB
GO TO After_P3
order
loggerA
GO TO P3
order
Phase 1: Intermediate Groums
11
logger
PERFORM P1 THRU P3
GO TO After_P0
order
order
loggerZ
STOP RUN
order
loggerB
GO TO After_P3
order
loggerA
GO TO P3
order
Explicit Jumps Represent PERFORM/GOTO statements
Phase 1: Intermediate Groums
12
logger
PERFORM P1 THRU P3
GO TO After_P0
order
order
loggerZ
STOP RUN
order
loggerB
GO TO After_P3
order
loggerA
GO TO P3
order
Implicit Jumps End of the paragraph
Phase 1: Intermediate Groums
13
logger
PERFORM P1 THRU P3
GO TO After_P0
order
order
loggerZ
STOP RUN
order
loggerB
GO TO After_P3
order
loggerA
GO TO P3
order
Termination Node STOP RUN/GOBACK/…
Phase 2: Groum Inlining
14
logger
PERFORM P1 THRU P3
GO TO After_P0
order
order
loggerZ
STOP RUN
order
loggerB
GO TO After_P3
order
loggerA
GO TO P3
order
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
①
①
②
③
④
②
③
④
Phase 2: Groum Inlining
15
logger
loggerA
loggerB
loggerZ
STOP RUN
order
order
order
order
IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.
PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.
RQ1: Does Extraction Scale?
16
Setup
All CALL statements considered library calls
Real-world, industrial COBOL cases from
Raincode Labs
>10GB of ASTs
RQ1: Does Extraction Scale?
16
Setup
All CALL statements considered library calls
Real-world, industrial COBOL cases from
Raincode Labs
Results
Max. 100 call nodes
Nodes + edges>10GB of ASTs
RQ2: Does Mining Scale? (Setup)
17
Graph-based Mining of Multiple Object Usage Patterns
Tung Thanh [email protected]
Hoan Anh [email protected]
Nam H. [email protected]
Jafar M. [email protected]
Tien N. [email protected]
Electrical and Computer Engineering DepartmentIowa State University
ABSTRACTThe interplay of multiple objects in object-oriented pro-gramming often follows specific protocols, for example cer-tain orders of method calls and/or control structure con-straints among them that are parts of the intended objectusages. Unfortunately, the information is not always docu-mented. That creates long learning curve, and importantly,leads to subtle problems due to the misuse of objects.
In this paper, we propose GrouMiner, a novel graph-basedapproach for mining the usage patterns of one or multipleobjects. GrouMiner approach includes a graph-based repre-sentation for multiple object usages, a pattern mining algo-rithm, and an anomaly detection technique that are efficient,accurate, and resilient to software changes. Our experimentson several real-world programs show that our prototype isable to find useful usage patterns with multiple objects andcontrol structures, and to translate them into user-friendlycode skeletons to assist developers in programming. It couldalso detect the usage anomalies that caused yet undiscovereddefects and code smells in those programs.
Categories and Subject DescriptorsD.2.7 [Software Engineering]: Distribution, Maintenance,and Enhancement
General TermsAlgorithms, Design, Reliability
1. INTRODUCTIONIn object-oriented programming, developers must deal with
multiple objects of the same or different classes. Objectsinteract with one another via their provided methods andfields. The interplay of several objects, which involves ob-jects’ fields/method calls and the control flow among them,often follows certain orders or control structure constraintsthat are parts of the intended usages of corresponding classes.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ESEC-FSE’09, August 24–28, 2009, Amsterdam, The Netherlands.Copyright 2009 ACM 978-1-60558-001-2/09/08 ...$10.00.
In team development, newly introduced program-specificAPIs by one or more team members often lack of usage doc-umentation due to busy schedules. Other developers haveto look through new code to understand the programmingusages. This is a very inefficient, confusing, and error-proneprocess. Developers often do not know where to start. Evenworse, some of them do not properly use newly introducedclasses, leading to errors. Moreover, specific orders and/orcontrol flows of objects’ method calls cannot be checked atcompile time. As a consequence, errors could not be caughtuntil testing and even go unnoticed for a long time. Thesealso occur often in the case of general API usages.
In this paper, we propose GrouMiner, a new approachfor mining the usage patterns of objects and classes usinggraph-based algorithms. In our approach, the usage of a setof objects in a scenario is represented as a labeled, directedacyclic graph (DAG), of which nodes represent objects’ con-structor calls, method calls, field accesses, and branchingpoints of control structures, and edges represent temporalusage orders and data dependencies among them.
A usage pattern is considered as a (sub)graph that fre-quently “appears” in the object usage graphs extracted fromall methods in the code base. Appearance here means thatit is label-isomorphic to an induced subgraph of each objectusage graph, i.e. satisfying all temporal orders and data de-pendencies between the corresponding nodes in that graph.
GrouMiner detects those patterns using a novel graph-based algorithm for mining the frequent induced subgraphsin a graph dataset. The patterns are generated increasinglyby their sizes (i.e. the number of nodes). Each pattern Q ofsize k+1 is discovered from a pattern P of size k via extend-ing the occurrence(s) of P in every method’s graph G in thedataset with relevant nodes of G. The generated subgraphsare then compared to find isomorphic ones. To avoid thecomputational cost of graph isomorphism solutions, we useExas [24], our efficient structural feature extraction methodfor graph-based structures to extract a characteristic vec-tor for each subgraph. It is an occurrence-count vector ofsequences of nodes and edges’ labels. The generated sub-graphs having the same vector are considered isomorphicand counted toward the frequency of the corresponding can-didate. If it exceeds a threshold, the candidate is consideredas a pattern and is used to discover the larger patterns.
After the patterns are mined, they could be translatedinto user-friendly code skeletons, which assist developers inobject usages. The patterns that are confirmed by the devel-opers as typical usages could also be used to automaticallydetect the locations in programs that deviate from them. A
CorporaCase # Groums Min. support
Case 1 273 30Case 2 (full) 3925 400Case 2 (limited) 3573 400
#Unique programs
Patterns with ≥ 2 call nodes
GrouMiner • Apriori pattern growth • GrouMiner*: Memory optimisations
BigGroum • Clustering, slicing, embeddings
via SAT formulae • Scales better than GrouMiner
Mining Framework Usage Graphs from AppCorpora
Sergio Mover, Sriram Sankaranarayanan, Rhys Braginton Pettee Olsen, Bor-Yuh Evan ChangUniversity of Colorado Boulder, USA
Abstract—We investigate the problem of mining graph-
based usage patterns for large, object-oriented frameworks like
Android—revisiting previous approaches based on graph-based
object usage models (groums). Groums are a promising approach
to represent usage patterns for object-oriented libraries because
they simultaneously describe control flow and data dependencies
between methods of multiple interacting object types. However,
this expressivity comes at a cost: mining groums requires solving
a subgraph isomorphism problem that is well known to be
expensive. This cost limits the applicability of groum mining to
large API frameworks.
In this paper, we employ groum mining to learn usage patterns
for object-oriented frameworks from program corpora. The
central challenge is to scale groum mining so that it is sensitive
to usages horizontally across programs from arbitrarily many
developers (as opposed to simply usages vertically within the
program of a single developer). To address this challenge, we
develop a novel groum mining algorithm that scales on a large
corpus of programs. We first use frequent itemset mining to
restrict the search for groums to smaller subsets of methods in
the given corpus. Then, we pose the subgraph isomorphism as
a SAT problem and apply efficient pre-processing algorithms
to rule out fruitless comparisons ahead of time. Finally, we
identify containment relationships between clusters of groums
to characterize popular usage patterns in the corpus (as well
as classify less popular patterns as possible anomalies). We find
that our approach scales on a corpus of over five hundred open
source Android applications, effectively mining obligatory and
best-practice usage patterns.
I. INTRODUCTION
We consider the problem of mining graph-based descrip-tions of application programming interface (API) usage pat-terns from large software repositories. Such usage patterns areinvaluable in understanding how the API is typically used bydevelopers and help highlight anomalous usage. This workcan in turn lead to automatic approaches to detecting potentialdefects, automatic code completion, and automatic repair. TheAPI usage mining problem has been well-studied in the pasttwo decades with numerous proposed approaches (e.g., [1],[2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]).The work continues, as each approach has an expressivityversus efficiency tradeoff. As the API usage patterns get moreexpressive, the problem of capturing common patterns fromlarge repositories becomes ever harder. The ideal usage patterndescribes (a) the control flow of the various methods called,including control structures such as branches and loops; and(b) the data flow between how an object created by one methodcall gets utilized in another call. Commonly used APIs suchas Android involve a large collection of methods and API
usage protocols, each involving multiple object types as wellas constraints on both the control and data flows.
Graph-based models of API usage are therefore very naturalfor such applications. A previous work [15] considered theproblem of mining program-specific usage patterns usingGRaph-based Object Usage Models (groums), an abstractedcontrol-flow graph with data dependency edges. In a groum,nodes represent API calls (or control structures) and edgesrepresent data dependencies or control flow between the nodes.
While groums are a natural and promising representation forexpressing API usage, the main unsolved challenge in applyinggroum mining for finding framework usage patterns is to effi-ciently scale on a large corpora of Android applications (apps).Efficiently mining groums from a large corpus is challengingbecause of the cost of computing subgraph isomorphisms, thebasic operation used to compute graph patterns. To illustratethis challenge, we tried to run the GrouMiner tool [15] onover five hundred Android projects with over 70,000 methodsdownloaded from GitHub. While this was never the intendedapplication of GrouMiner, this approach also did not produceresults despite our running it over 3 days.
To address this challenge, we make the following contribu-tions:
• We combine frequent itemset mining with groum miningto restrict the search for relevant groums to smaller subsetsof API methods (Section II). At a high level, we first use themined itemsets as a basis for partitioning the entire corpus ofgroums into smaller clusters.
• We use a SAT-based encoding, coupled with filteringtechniques to avoid unnecessary solver calls (Section IV), forcomputing subgraph isomorphisms, the main operation usedto construct the lattice-ordered bins of groums.
• We organize the groums of each sub-corpus defined by aitemset cluster into lattice-ordered bins of groums (logroums),using containment-based subsumption relation to define theorder of the lattice. Groum bins can then be labeled as POPU-LAR, ANOMALOUS, or ISOLATED usage patterns (Section V)according to their frequency of occurrence in the corpus andthe ordering relationship with other bins. Using the mineditemsets, we then further slice the graphs according to theitems (API calls), to compute subgraph isomorphisms onsmaller instances.
• We evaluate our approach, BIGGROUM, on a corpusconsisting of over five hundred Android apps (Section VI).We first show that BIGGROUM is efficient, being able to minethe patterns for the entire corpus of apps, and that our con-
RQ2: Does Mining Scale? (Results)
18
CaseGrouMiner* BigGroum
Time (min) # P Time (min) # Clusters # P
Case 1 0.95 3 10.4 3 4Case 2 (full) - - - - -Case 2 (limited) 121.8 7 618.2 1 0
xFailed: Graphs
too large
Library usages Frequent itemsetmining
Cluster #1
Cluster #2
Cluster #n
...
SAT solver
SAT solver
SAT solver
Library usagepatterns #1
Library usagepatterns #2
Library usagepatterns #n
BigGroum
Conclusion
19
COBOL Library Usage Pattern Mining2
IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".
IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.
IC102A (0)
IC102A (2)
order
DN1 (Group) (1)
use
use
IC102A (3)
use
order
IC102A (4)
order
IC102A (6)
order
DN2 (Group) (5)
use
use
STOP RUN (7)
order
418248 (no-op, L268) (8)
order
IC105A (0)
IC105A (3)
order
MAIN-DN2 (Group) (1)
use
use
IC105A (4)
use
IC105A (5)
use
MAIN-DN1 (Group) (2)
use
use
use
use
order
order
IC104A (6)
order
STOP RUN (10)
order
GROUP-02 (Group) (7)
use
GROUP-01 (Group) (8)
use
ELEM-77 (Group) (9)
use
1757336 (no-op, L272) (11)
order
IC107A (0)
STOP RUN (4)
order
TABLE-1 (Group) (1)
use
IDN1 (Group) (2)
use
TABLE-2 (Group) (3)
use
650331 (no-op, L275) (5)
order
IC109A (0)
STOP RUN (2)
order
GRP-01 (Group) (1)
use
454282 (no-op, L270) (3)
order
IC110A (0)
STOP PROGRAM (3)
order
GRP-01 (Group) (1)
use
WS1 (Group) (2)
use
IC111A (0)
STOP PROGRAM (4)
order
LS1 (Group) (1)
use
WS2 (Group) (2)
use
GRP-01 (Group) (3)
use
IC113A (0)
STOP RUN (6)
order
SQ-FS3R1-F-G-120 (Group) (1)
use
IC113A (5)
use
ERROR-FLAG (Group) (2)
use use
WRK-CS-09V00 (Group) (3)
use use
RECORDS-IN-ERROR (Group) (4)
use use
order
214890 (no-op, L343) (7)
order order
IC115A (0)
IC115A (3)
order
FILE-REC-SQ-FS3 (Group) (1)
use
use
IC115A (4)
use
IC115A (5)
use
IC115A (6)
use
GROUP-LINKAGE-VARIABLES (Group) (2)
use
use
use
use
useorder
order
order
STOP RUN (7)
order
180074 (no-op, L290) (8)
order
IC117M (0)
STOP RUN (1)
order
833320 (no-op, L279) (2)
order
IC118M (0)
STOP PROGRAM (1)
order
IC202A (0)
ID1[IC202A] (5)
order
DN3 (Group) (1)
use
use
ID2[] (6)
use
ID2[] (7)
use
ID1[IC202A] (9)
use
IC202A (11)
use
IC202A (12)
use
ID1[IC202A] (13)
use
ID1[IC202A] (14)
use
DN2 (Group) (2)
use
use
use
use
IC202A (8)
use
use
ID1[IC202A] (10)
use
use
use
use
use
DN4 (Group) (3)
use
use
use
use
use
use
use
use
use
DN1 (Group) (4)
use
use
use
use
use
use
use
use
use
use
use
order
order
order
order
order
order
order
order
order
STOP RUN (15)
order
518479 (no-op, L270) (16)
order
ID1[IC204A] (0)
ID1[IC204A] (3)
order
TABLE-1 (Group) (1)
use
use
IC204A (5)
use
IC204A (7)
use
IC204A (8)
use
ID1[IC204A] (10)
use
IC204A (11)
use
ID1[IC204A] (14)
use
IC205A (16)
use
ID1[IC204A] (18)
use
DN1 (Group) (2)
use
use
use
use
use
use
use
use
use
order
DN5 (Group) (4)
use
CANCEL IC204A (6)
order
order
order
CANCEL ID1[IC204A] (9)
order
order
order
CANCEL ID1[IC204A] (12)
order
CANCEL ID1[IC204A] (13)
order
order
CANCEL IC205A (15)
order
order
order
TABLE-2 (Group) (17)
use
CANCEL ID1[IC204A] (19)
order
STOP RUN (20)
order
1359146 (no-op, L272) (21)
order
Library Usages Pattern Mining Library Usage Patterns
V1[C1] CANCEL V1[C1] C2 GOBACK
order order order
orderD3 (Group)D2 (Group)
D1 (Group)
use
use use
D5 (Group) D6 (Group)D4 (Group) D7 (Group)
use use
use
use
Goal: Understand library usages to estimate modernisation e!ort
Challenges8
Challenge SolutionDefinition of a library Marked by the user
Granularity of Groums Whole programInter-paragraph control flow Graph inlining
Iteration through GOTO Inlining stackProgram termination Auxiliary nodes
Inter-program data flow Only data usagesProgram name in variable Special encoding
Language support tradeoffs Focus on control flow constructs
Details in the paper
RQ1: Does Extraction Scale?
CaseGrouMiner* BigGroum
Time (min) # P Time (min) # Clusters # P
Case 1 0.95 3 10.4 3 4Case 2 (full) - - - - -Case 2 (limited) 121.8 7 618.2 1 0
RQ2: Does Mining Scale?