mining for graph-based library usage patterns in cobol systems

29
Ruben Opdebeeck Johan Fabry Tim Molderez Jonas De Bleser Coen De Roover [email protected] [email protected] [email protected] [email protected] [email protected] Mining for Graph-Based Library Usage Patterns in COBOL Systems

Upload: others

Post on 29-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Ruben Opdebeeck

Johan Fabry

Tim Molderez

Jonas De Bleser

Coen De Roover

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

Mining for Graph-Based Library Usage Patterns in COBOL Systems

Page 2: Mining for Graph-Based Library Usage Patterns in COBOL Systems

COBOL Library Usage Pattern Mining

2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

Goal: Understand library usages to estimate modernisation effort

Page 3: Mining for Graph-Based Library Usage Patterns in COBOL Systems

COBOL Library Usage Pattern Mining

2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

IC102A (0)

IC102A (2)

order

DN1 (Group) (1)

use

use

IC102A (3)

use

order

IC102A (4)

order

IC102A (6)

order

DN2 (Group) (5)

use

use

STOP RUN (7)

order

418248 (no-op, L268) (8)

order

IC105A (0)

IC105A (3)

order

MAIN-DN2 (Group) (1)

use

use

IC105A (4)

use

IC105A (5)

use

MAIN-DN1 (Group) (2)

use

use

use

use

order

order

IC104A (6)

order

STOP RUN (10)

order

GROUP-02 (Group) (7)

use

GROUP-01 (Group) (8)

use

ELEM-77 (Group) (9)

use

1757336 (no-op, L272) (11)

order

IC107A (0)

STOP RUN (4)

order

TABLE-1 (Group) (1)

use

IDN1 (Group) (2)

use

TABLE-2 (Group) (3)

use

650331 (no-op, L275) (5)

order

IC109A (0)

STOP RUN (2)

order

GRP-01 (Group) (1)

use

454282 (no-op, L270) (3)

order

IC110A (0)

STOP PROGRAM (3)

order

GRP-01 (Group) (1)

use

WS1 (Group) (2)

use

IC111A (0)

STOP PROGRAM (4)

order

LS1 (Group) (1)

use

WS2 (Group) (2)

use

GRP-01 (Group) (3)

use

IC113A (0)

STOP RUN (6)

order

SQ-FS3R1-F-G-120 (Group) (1)

use

IC113A (5)

use

ERROR-FLAG (Group) (2)

use use

WRK-CS-09V00 (Group) (3)

use use

RECORDS-IN-ERROR (Group) (4)

use use

order

214890 (no-op, L343) (7)

order order

IC115A (0)

IC115A (3)

order

FILE-REC-SQ-FS3 (Group) (1)

use

use

IC115A (4)

use

IC115A (5)

use

IC115A (6)

use

GROUP-LINKAGE-VARIABLES (Group) (2)

use

use

use

use

useorder

order

order

STOP RUN (7)

order

180074 (no-op, L290) (8)

order

IC117M (0)

STOP RUN (1)

order

833320 (no-op, L279) (2)

order

IC118M (0)

STOP PROGRAM (1)

order

IC202A (0)

ID1[IC202A] (5)

order

DN3 (Group) (1)

use

use

ID2[] (6)

use

ID2[] (7)

use

ID1[IC202A] (9)

use

IC202A (11)

use

IC202A (12)

use

ID1[IC202A] (13)

use

ID1[IC202A] (14)

use

DN2 (Group) (2)

use

use

use

use

IC202A (8)

use

use

ID1[IC202A] (10)

use

use

use

use

use

DN4 (Group) (3)

use

use

use

use

use

use

use

use

use

DN1 (Group) (4)

use

use

use

use

use

use

use

use

use

use

use

order

order

order

order

order

order

order

order

order

STOP RUN (15)

order

518479 (no-op, L270) (16)

order

ID1[IC204A] (0)

ID1[IC204A] (3)

order

TABLE-1 (Group) (1)

use

use

IC204A (5)

use

IC204A (7)

use

IC204A (8)

use

ID1[IC204A] (10)

use

IC204A (11)

use

ID1[IC204A] (14)

use

IC205A (16)

use

ID1[IC204A] (18)

use

DN1 (Group) (2)

use

use

use

use

use

use

use

use

use

order

DN5 (Group) (4)

use

CANCEL IC204A (6)

order

order

order

CANCEL ID1[IC204A] (9)

order

order

order

CANCEL ID1[IC204A] (12)

order

CANCEL ID1[IC204A] (13)

order

order

CANCEL IC205A (15)

order

order

order

TABLE-2 (Group) (17)

use

CANCEL ID1[IC204A] (19)

order

STOP RUN (20)

order

1359146 (no-op, L272) (21)

order

Library Usages

Goal: Understand library usages to estimate modernisation effort

Page 4: Mining for Graph-Based Library Usage Patterns in COBOL Systems

COBOL Library Usage Pattern Mining

2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

IC102A (0)

IC102A (2)

order

DN1 (Group) (1)

use

use

IC102A (3)

use

order

IC102A (4)

order

IC102A (6)

order

DN2 (Group) (5)

use

use

STOP RUN (7)

order

418248 (no-op, L268) (8)

order

IC105A (0)

IC105A (3)

order

MAIN-DN2 (Group) (1)

use

use

IC105A (4)

use

IC105A (5)

use

MAIN-DN1 (Group) (2)

use

use

use

use

order

order

IC104A (6)

order

STOP RUN (10)

order

GROUP-02 (Group) (7)

use

GROUP-01 (Group) (8)

use

ELEM-77 (Group) (9)

use

1757336 (no-op, L272) (11)

order

IC107A (0)

STOP RUN (4)

order

TABLE-1 (Group) (1)

use

IDN1 (Group) (2)

use

TABLE-2 (Group) (3)

use

650331 (no-op, L275) (5)

order

IC109A (0)

STOP RUN (2)

order

GRP-01 (Group) (1)

use

454282 (no-op, L270) (3)

order

IC110A (0)

STOP PROGRAM (3)

order

GRP-01 (Group) (1)

use

WS1 (Group) (2)

use

IC111A (0)

STOP PROGRAM (4)

order

LS1 (Group) (1)

use

WS2 (Group) (2)

use

GRP-01 (Group) (3)

use

IC113A (0)

STOP RUN (6)

order

SQ-FS3R1-F-G-120 (Group) (1)

use

IC113A (5)

use

ERROR-FLAG (Group) (2)

use use

WRK-CS-09V00 (Group) (3)

use use

RECORDS-IN-ERROR (Group) (4)

use use

order

214890 (no-op, L343) (7)

order order

IC115A (0)

IC115A (3)

order

FILE-REC-SQ-FS3 (Group) (1)

use

use

IC115A (4)

use

IC115A (5)

use

IC115A (6)

use

GROUP-LINKAGE-VARIABLES (Group) (2)

use

use

use

use

useorder

order

order

STOP RUN (7)

order

180074 (no-op, L290) (8)

order

IC117M (0)

STOP RUN (1)

order

833320 (no-op, L279) (2)

order

IC118M (0)

STOP PROGRAM (1)

order

IC202A (0)

ID1[IC202A] (5)

order

DN3 (Group) (1)

use

use

ID2[] (6)

use

ID2[] (7)

use

ID1[IC202A] (9)

use

IC202A (11)

use

IC202A (12)

use

ID1[IC202A] (13)

use

ID1[IC202A] (14)

use

DN2 (Group) (2)

use

use

use

use

IC202A (8)

use

use

ID1[IC202A] (10)

use

use

use

use

use

DN4 (Group) (3)

use

use

use

use

use

use

use

use

use

DN1 (Group) (4)

use

use

use

use

use

use

use

use

use

use

use

order

order

order

order

order

order

order

order

order

STOP RUN (15)

order

518479 (no-op, L270) (16)

order

ID1[IC204A] (0)

ID1[IC204A] (3)

order

TABLE-1 (Group) (1)

use

use

IC204A (5)

use

IC204A (7)

use

IC204A (8)

use

ID1[IC204A] (10)

use

IC204A (11)

use

ID1[IC204A] (14)

use

IC205A (16)

use

ID1[IC204A] (18)

use

DN1 (Group) (2)

use

use

use

use

use

use

use

use

use

order

DN5 (Group) (4)

use

CANCEL IC204A (6)

order

order

order

CANCEL ID1[IC204A] (9)

order

order

order

CANCEL ID1[IC204A] (12)

order

CANCEL ID1[IC204A] (13)

order

order

CANCEL IC205A (15)

order

order

order

TABLE-2 (Group) (17)

use

CANCEL ID1[IC204A] (19)

order

STOP RUN (20)

order

1359146 (no-op, L272) (21)

order

Library Usages Pattern Mining

Goal: Understand library usages to estimate modernisation effort

Page 5: Mining for Graph-Based Library Usage Patterns in COBOL Systems

COBOL Library Usage Pattern Mining

2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

IC102A (0)

IC102A (2)

order

DN1 (Group) (1)

use

use

IC102A (3)

use

order

IC102A (4)

order

IC102A (6)

order

DN2 (Group) (5)

use

use

STOP RUN (7)

order

418248 (no-op, L268) (8)

order

IC105A (0)

IC105A (3)

order

MAIN-DN2 (Group) (1)

use

use

IC105A (4)

use

IC105A (5)

use

MAIN-DN1 (Group) (2)

use

use

use

use

order

order

IC104A (6)

order

STOP RUN (10)

order

GROUP-02 (Group) (7)

use

GROUP-01 (Group) (8)

use

ELEM-77 (Group) (9)

use

1757336 (no-op, L272) (11)

order

IC107A (0)

STOP RUN (4)

order

TABLE-1 (Group) (1)

use

IDN1 (Group) (2)

use

TABLE-2 (Group) (3)

use

650331 (no-op, L275) (5)

order

IC109A (0)

STOP RUN (2)

order

GRP-01 (Group) (1)

use

454282 (no-op, L270) (3)

order

IC110A (0)

STOP PROGRAM (3)

order

GRP-01 (Group) (1)

use

WS1 (Group) (2)

use

IC111A (0)

STOP PROGRAM (4)

order

LS1 (Group) (1)

use

WS2 (Group) (2)

use

GRP-01 (Group) (3)

use

IC113A (0)

STOP RUN (6)

order

SQ-FS3R1-F-G-120 (Group) (1)

use

IC113A (5)

use

ERROR-FLAG (Group) (2)

use use

WRK-CS-09V00 (Group) (3)

use use

RECORDS-IN-ERROR (Group) (4)

use use

order

214890 (no-op, L343) (7)

order order

IC115A (0)

IC115A (3)

order

FILE-REC-SQ-FS3 (Group) (1)

use

use

IC115A (4)

use

IC115A (5)

use

IC115A (6)

use

GROUP-LINKAGE-VARIABLES (Group) (2)

use

use

use

use

useorder

order

order

STOP RUN (7)

order

180074 (no-op, L290) (8)

order

IC117M (0)

STOP RUN (1)

order

833320 (no-op, L279) (2)

order

IC118M (0)

STOP PROGRAM (1)

order

IC202A (0)

ID1[IC202A] (5)

order

DN3 (Group) (1)

use

use

ID2[] (6)

use

ID2[] (7)

use

ID1[IC202A] (9)

use

IC202A (11)

use

IC202A (12)

use

ID1[IC202A] (13)

use

ID1[IC202A] (14)

use

DN2 (Group) (2)

use

use

use

use

IC202A (8)

use

use

ID1[IC202A] (10)

use

use

use

use

use

DN4 (Group) (3)

use

use

use

use

use

use

use

use

use

DN1 (Group) (4)

use

use

use

use

use

use

use

use

use

use

use

order

order

order

order

order

order

order

order

order

STOP RUN (15)

order

518479 (no-op, L270) (16)

order

ID1[IC204A] (0)

ID1[IC204A] (3)

order

TABLE-1 (Group) (1)

use

use

IC204A (5)

use

IC204A (7)

use

IC204A (8)

use

ID1[IC204A] (10)

use

IC204A (11)

use

ID1[IC204A] (14)

use

IC205A (16)

use

ID1[IC204A] (18)

use

DN1 (Group) (2)

use

use

use

use

use

use

use

use

use

order

DN5 (Group) (4)

use

CANCEL IC204A (6)

order

order

order

CANCEL ID1[IC204A] (9)

order

order

order

CANCEL ID1[IC204A] (12)

order

CANCEL ID1[IC204A] (13)

order

order

CANCEL IC205A (15)

order

order

order

TABLE-2 (Group) (17)

use

CANCEL ID1[IC204A] (19)

order

STOP RUN (20)

order

1359146 (no-op, L272) (21)

order

Library Usages Pattern Mining Library Usage Patterns

V1[C1] CANCEL V1[C1] C2 GOBACK

order order order

orderD3 (Group)D2 (Group)

D1 (Group)

use

use use

D5 (Group) D6 (Group)D4 (Group) D7 (Group)

use use

use

use

Goal: Understand library usages to estimate modernisation effort

Page 6: Mining for Graph-Based Library Usage Patterns in COBOL Systems

COBOL Library Usage Pattern Mining

2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

IC102A (0)

IC102A (2)

order

DN1 (Group) (1)

use

use

IC102A (3)

use

order

IC102A (4)

order

IC102A (6)

order

DN2 (Group) (5)

use

use

STOP RUN (7)

order

418248 (no-op, L268) (8)

order

IC105A (0)

IC105A (3)

order

MAIN-DN2 (Group) (1)

use

use

IC105A (4)

use

IC105A (5)

use

MAIN-DN1 (Group) (2)

use

use

use

use

order

order

IC104A (6)

order

STOP RUN (10)

order

GROUP-02 (Group) (7)

use

GROUP-01 (Group) (8)

use

ELEM-77 (Group) (9)

use

1757336 (no-op, L272) (11)

order

IC107A (0)

STOP RUN (4)

order

TABLE-1 (Group) (1)

use

IDN1 (Group) (2)

use

TABLE-2 (Group) (3)

use

650331 (no-op, L275) (5)

order

IC109A (0)

STOP RUN (2)

order

GRP-01 (Group) (1)

use

454282 (no-op, L270) (3)

order

IC110A (0)

STOP PROGRAM (3)

order

GRP-01 (Group) (1)

use

WS1 (Group) (2)

use

IC111A (0)

STOP PROGRAM (4)

order

LS1 (Group) (1)

use

WS2 (Group) (2)

use

GRP-01 (Group) (3)

use

IC113A (0)

STOP RUN (6)

order

SQ-FS3R1-F-G-120 (Group) (1)

use

IC113A (5)

use

ERROR-FLAG (Group) (2)

use use

WRK-CS-09V00 (Group) (3)

use use

RECORDS-IN-ERROR (Group) (4)

use use

order

214890 (no-op, L343) (7)

order order

IC115A (0)

IC115A (3)

order

FILE-REC-SQ-FS3 (Group) (1)

use

use

IC115A (4)

use

IC115A (5)

use

IC115A (6)

use

GROUP-LINKAGE-VARIABLES (Group) (2)

use

use

use

use

useorder

order

order

STOP RUN (7)

order

180074 (no-op, L290) (8)

order

IC117M (0)

STOP RUN (1)

order

833320 (no-op, L279) (2)

order

IC118M (0)

STOP PROGRAM (1)

order

IC202A (0)

ID1[IC202A] (5)

order

DN3 (Group) (1)

use

use

ID2[] (6)

use

ID2[] (7)

use

ID1[IC202A] (9)

use

IC202A (11)

use

IC202A (12)

use

ID1[IC202A] (13)

use

ID1[IC202A] (14)

use

DN2 (Group) (2)

use

use

use

use

IC202A (8)

use

use

ID1[IC202A] (10)

use

use

use

use

use

DN4 (Group) (3)

use

use

use

use

use

use

use

use

use

DN1 (Group) (4)

use

use

use

use

use

use

use

use

use

use

use

order

order

order

order

order

order

order

order

order

STOP RUN (15)

order

518479 (no-op, L270) (16)

order

ID1[IC204A] (0)

ID1[IC204A] (3)

order

TABLE-1 (Group) (1)

use

use

IC204A (5)

use

IC204A (7)

use

IC204A (8)

use

ID1[IC204A] (10)

use

IC204A (11)

use

ID1[IC204A] (14)

use

IC205A (16)

use

ID1[IC204A] (18)

use

DN1 (Group) (2)

use

use

use

use

use

use

use

use

use

order

DN5 (Group) (4)

use

CANCEL IC204A (6)

order

order

order

CANCEL ID1[IC204A] (9)

order

order

order

CANCEL ID1[IC204A] (12)

order

CANCEL ID1[IC204A] (13)

order

order

CANCEL IC205A (15)

order

order

order

TABLE-2 (Group) (17)

use

CANCEL ID1[IC204A] (19)

order

STOP RUN (20)

order

1359146 (no-op, L272) (21)

order

Library Usages Pattern Mining Library Usage Patterns

V1[C1] CANCEL V1[C1] C2 GOBACK

order order order

orderD3 (Group)D2 (Group)

D1 (Group)

use

use use

D5 (Group) D6 (Group)D4 (Group) D7 (Group)

use use

use

use

Goal: Understand library usages to estimate modernisation effort

Page 7: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Groums for OO languages

3

Collection.stream()List Predicate

Stream.filter(Predicate)Stream

Stream.count() Streamlong

order

order

use

use

use

use

def

def

def

void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }

Graph-based object usage modelJava snippet using Collection API

T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.

Page 8: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Groums for OO languages

3

Collection.stream()List Predicate

Stream.filter(Predicate)Stream

Stream.count() Streamlong

order

order

use

use

use

use

def

def

def

void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }

Graph-based object usage modelJava snippet using Collection API

Library calls

Control flow

T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.

Page 9: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Groums for OO languages

3

Collection.stream()List Predicate

Stream.filter(Predicate)Stream

Stream.count() Streamlong

order

order

use

use

use

use

def

def

def

void my_method(List<String> lst) { return lst.stream() .filter(s -> s.startsWith("test_")) .count(); }

Graph-based object usage modelJava snippet using Collection API

Data

Data flow

T.T. Nguyen, H.A. Nguyen, N.H. Pham, J.M. Al-Kofahi, T.N. Nguyen. Graph-Based Mining of Multiple Object Usage Patterns. In Proc. ESEC/FSE’09, 2009.

Page 10: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

COBOL Primer

4

Page 11: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

COBOL Primer

5

Paragraphs

Page 12: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

COBOL Primer

6

Call statement Calls an external program

Page 13: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

COBOL Primer

7

Perform statement

Go to statement

Jumping leads to complex control flow

Page 14: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Challenges

8

Challenge SolutionDefinition of a library Marked by the user

Granularity of Groums Whole programInter-paragraph control flow Graph inlining

Iteration through GOTO Inlining stackProgram termination Auxiliary nodes

Inter-program data flow Only data usagesProgram name in variable Special encoding

Language support tradeoffs Focus on control flow constructs

Details in the paper

Page 15: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Groums for COBOL

9

Single paragraph (often) too small

Page 16: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Groums for COBOL

9

Single paragraph (often) too small

Inter-paragraph Groums

Page 17: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Groums for COBOL

9

Single paragraph (often) too small

Inter-paragraph Groums

Inter-paragraph control flow

Page 18: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Groums for COBOL

9

Single paragraph (often) too small

Inter-paragraph Groums

Inter-paragraph control flow

⇒Graph inlining

Page 19: Mining for Graph-Based Library Usage Patterns in COBOL Systems

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Phase 1: Intermediate Groums

10

logger

PERFORM P1 THRU P3

GO TO After_P0

order

order

loggerZ

STOP RUN

order

loggerB

GO TO After_P3

order

loggerA

GO TO P3

order

Page 20: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Phase 1: Intermediate Groums

11

logger

PERFORM P1 THRU P3

GO TO After_P0

order

order

loggerZ

STOP RUN

order

loggerB

GO TO After_P3

order

loggerA

GO TO P3

order

Explicit Jumps Represent PERFORM/GOTO statements

Page 21: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Phase 1: Intermediate Groums

12

logger

PERFORM P1 THRU P3

GO TO After_P0

order

order

loggerZ

STOP RUN

order

loggerB

GO TO After_P3

order

loggerA

GO TO P3

order

Implicit Jumps End of the paragraph

Page 22: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Phase 1: Intermediate Groums

13

logger

PERFORM P1 THRU P3

GO TO After_P0

order

order

loggerZ

STOP RUN

order

loggerB

GO TO After_P3

order

loggerA

GO TO P3

order

Termination Node STOP RUN/GOBACK/…

Page 23: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Phase 2: Groum Inlining

14

logger

PERFORM P1 THRU P3

GO TO After_P0

order

order

loggerZ

STOP RUN

order

loggerB

GO TO After_P3

order

loggerA

GO TO P3

order

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Page 24: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Phase 2: Groum Inlining

15

logger

loggerA

loggerB

loggerZ

STOP RUN

order

order

order

order

IDENTIFICATION DIVISION. PROGRAM-ID. exc. PROCEDURE DIVISION. P0. DISPLAY "--- Start ---". CALL "logger". PERFORM P1 THRU P3.

PZ. CALL "loggerZ". DISPLAY "--- End ---". STOP RUN. P3. CALL "loggerB" P1. CALL "loggerA". GO TO P3.

Page 25: Mining for Graph-Based Library Usage Patterns in COBOL Systems

RQ1: Does Extraction Scale?

16

Setup

All CALL statements considered library calls

Real-world, industrial COBOL cases from

Raincode Labs

>10GB of ASTs

Page 26: Mining for Graph-Based Library Usage Patterns in COBOL Systems

RQ1: Does Extraction Scale?

16

Setup

All CALL statements considered library calls

Real-world, industrial COBOL cases from

Raincode Labs

Results

Max. 100 call nodes

Nodes + edges>10GB of ASTs

Page 27: Mining for Graph-Based Library Usage Patterns in COBOL Systems

RQ2: Does Mining Scale? (Setup)

17

Graph-based Mining of Multiple Object Usage Patterns

Tung Thanh [email protected]

Hoan Anh [email protected]

Nam H. [email protected]

Jafar M. [email protected]

Tien N. [email protected]

Electrical and Computer Engineering DepartmentIowa State University

ABSTRACTThe interplay of multiple objects in object-oriented pro-gramming often follows specific protocols, for example cer-tain orders of method calls and/or control structure con-straints among them that are parts of the intended objectusages. Unfortunately, the information is not always docu-mented. That creates long learning curve, and importantly,leads to subtle problems due to the misuse of objects.

In this paper, we propose GrouMiner, a novel graph-basedapproach for mining the usage patterns of one or multipleobjects. GrouMiner approach includes a graph-based repre-sentation for multiple object usages, a pattern mining algo-rithm, and an anomaly detection technique that are efficient,accurate, and resilient to software changes. Our experimentson several real-world programs show that our prototype isable to find useful usage patterns with multiple objects andcontrol structures, and to translate them into user-friendlycode skeletons to assist developers in programming. It couldalso detect the usage anomalies that caused yet undiscovereddefects and code smells in those programs.

Categories and Subject DescriptorsD.2.7 [Software Engineering]: Distribution, Maintenance,and Enhancement

General TermsAlgorithms, Design, Reliability

1. INTRODUCTIONIn object-oriented programming, developers must deal with

multiple objects of the same or different classes. Objectsinteract with one another via their provided methods andfields. The interplay of several objects, which involves ob-jects’ fields/method calls and the control flow among them,often follows certain orders or control structure constraintsthat are parts of the intended usages of corresponding classes.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ESEC-FSE’09, August 24–28, 2009, Amsterdam, The Netherlands.Copyright 2009 ACM 978-1-60558-001-2/09/08 ...$10.00.

In team development, newly introduced program-specificAPIs by one or more team members often lack of usage doc-umentation due to busy schedules. Other developers haveto look through new code to understand the programmingusages. This is a very inefficient, confusing, and error-proneprocess. Developers often do not know where to start. Evenworse, some of them do not properly use newly introducedclasses, leading to errors. Moreover, specific orders and/orcontrol flows of objects’ method calls cannot be checked atcompile time. As a consequence, errors could not be caughtuntil testing and even go unnoticed for a long time. Thesealso occur often in the case of general API usages.

In this paper, we propose GrouMiner, a new approachfor mining the usage patterns of objects and classes usinggraph-based algorithms. In our approach, the usage of a setof objects in a scenario is represented as a labeled, directedacyclic graph (DAG), of which nodes represent objects’ con-structor calls, method calls, field accesses, and branchingpoints of control structures, and edges represent temporalusage orders and data dependencies among them.

A usage pattern is considered as a (sub)graph that fre-quently “appears” in the object usage graphs extracted fromall methods in the code base. Appearance here means thatit is label-isomorphic to an induced subgraph of each objectusage graph, i.e. satisfying all temporal orders and data de-pendencies between the corresponding nodes in that graph.

GrouMiner detects those patterns using a novel graph-based algorithm for mining the frequent induced subgraphsin a graph dataset. The patterns are generated increasinglyby their sizes (i.e. the number of nodes). Each pattern Q ofsize k+1 is discovered from a pattern P of size k via extend-ing the occurrence(s) of P in every method’s graph G in thedataset with relevant nodes of G. The generated subgraphsare then compared to find isomorphic ones. To avoid thecomputational cost of graph isomorphism solutions, we useExas [24], our efficient structural feature extraction methodfor graph-based structures to extract a characteristic vec-tor for each subgraph. It is an occurrence-count vector ofsequences of nodes and edges’ labels. The generated sub-graphs having the same vector are considered isomorphicand counted toward the frequency of the corresponding can-didate. If it exceeds a threshold, the candidate is consideredas a pattern and is used to discover the larger patterns.

After the patterns are mined, they could be translatedinto user-friendly code skeletons, which assist developers inobject usages. The patterns that are confirmed by the devel-opers as typical usages could also be used to automaticallydetect the locations in programs that deviate from them. A

CorporaCase # Groums Min. support

Case 1 273 30Case 2 (full) 3925 400Case 2 (limited) 3573 400

#Unique programs

Patterns with ≥ 2 call nodes

GrouMiner • Apriori pattern growth • GrouMiner*: Memory optimisations

BigGroum • Clustering, slicing, embeddings

via SAT formulae • Scales better than GrouMiner

Mining Framework Usage Graphs from AppCorpora

Sergio Mover, Sriram Sankaranarayanan, Rhys Braginton Pettee Olsen, Bor-Yuh Evan ChangUniversity of Colorado Boulder, USA

Abstract—We investigate the problem of mining graph-

based usage patterns for large, object-oriented frameworks like

Android—revisiting previous approaches based on graph-based

object usage models (groums). Groums are a promising approach

to represent usage patterns for object-oriented libraries because

they simultaneously describe control flow and data dependencies

between methods of multiple interacting object types. However,

this expressivity comes at a cost: mining groums requires solving

a subgraph isomorphism problem that is well known to be

expensive. This cost limits the applicability of groum mining to

large API frameworks.

In this paper, we employ groum mining to learn usage patterns

for object-oriented frameworks from program corpora. The

central challenge is to scale groum mining so that it is sensitive

to usages horizontally across programs from arbitrarily many

developers (as opposed to simply usages vertically within the

program of a single developer). To address this challenge, we

develop a novel groum mining algorithm that scales on a large

corpus of programs. We first use frequent itemset mining to

restrict the search for groums to smaller subsets of methods in

the given corpus. Then, we pose the subgraph isomorphism as

a SAT problem and apply efficient pre-processing algorithms

to rule out fruitless comparisons ahead of time. Finally, we

identify containment relationships between clusters of groums

to characterize popular usage patterns in the corpus (as well

as classify less popular patterns as possible anomalies). We find

that our approach scales on a corpus of over five hundred open

source Android applications, effectively mining obligatory and

best-practice usage patterns.

I. INTRODUCTION

We consider the problem of mining graph-based descrip-tions of application programming interface (API) usage pat-terns from large software repositories. Such usage patterns areinvaluable in understanding how the API is typically used bydevelopers and help highlight anomalous usage. This workcan in turn lead to automatic approaches to detecting potentialdefects, automatic code completion, and automatic repair. TheAPI usage mining problem has been well-studied in the pasttwo decades with numerous proposed approaches (e.g., [1],[2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]).The work continues, as each approach has an expressivityversus efficiency tradeoff. As the API usage patterns get moreexpressive, the problem of capturing common patterns fromlarge repositories becomes ever harder. The ideal usage patterndescribes (a) the control flow of the various methods called,including control structures such as branches and loops; and(b) the data flow between how an object created by one methodcall gets utilized in another call. Commonly used APIs suchas Android involve a large collection of methods and API

usage protocols, each involving multiple object types as wellas constraints on both the control and data flows.

Graph-based models of API usage are therefore very naturalfor such applications. A previous work [15] considered theproblem of mining program-specific usage patterns usingGRaph-based Object Usage Models (groums), an abstractedcontrol-flow graph with data dependency edges. In a groum,nodes represent API calls (or control structures) and edgesrepresent data dependencies or control flow between the nodes.

While groums are a natural and promising representation forexpressing API usage, the main unsolved challenge in applyinggroum mining for finding framework usage patterns is to effi-ciently scale on a large corpora of Android applications (apps).Efficiently mining groums from a large corpus is challengingbecause of the cost of computing subgraph isomorphisms, thebasic operation used to compute graph patterns. To illustratethis challenge, we tried to run the GrouMiner tool [15] onover five hundred Android projects with over 70,000 methodsdownloaded from GitHub. While this was never the intendedapplication of GrouMiner, this approach also did not produceresults despite our running it over 3 days.

To address this challenge, we make the following contribu-tions:

• We combine frequent itemset mining with groum miningto restrict the search for relevant groums to smaller subsetsof API methods (Section II). At a high level, we first use themined itemsets as a basis for partitioning the entire corpus ofgroums into smaller clusters.

• We use a SAT-based encoding, coupled with filteringtechniques to avoid unnecessary solver calls (Section IV), forcomputing subgraph isomorphisms, the main operation usedto construct the lattice-ordered bins of groums.

• We organize the groums of each sub-corpus defined by aitemset cluster into lattice-ordered bins of groums (logroums),using containment-based subsumption relation to define theorder of the lattice. Groum bins can then be labeled as POPU-LAR, ANOMALOUS, or ISOLATED usage patterns (Section V)according to their frequency of occurrence in the corpus andthe ordering relationship with other bins. Using the mineditemsets, we then further slice the graphs according to theitems (API calls), to compute subgraph isomorphisms onsmaller instances.

• We evaluate our approach, BIGGROUM, on a corpusconsisting of over five hundred Android apps (Section VI).We first show that BIGGROUM is efficient, being able to minethe patterns for the entire corpus of apps, and that our con-

Page 28: Mining for Graph-Based Library Usage Patterns in COBOL Systems

RQ2: Does Mining Scale? (Results)

18

CaseGrouMiner* BigGroum

Time (min) # P Time (min) # Clusters # P

Case 1 0.95 3 10.4 3 4Case 2 (full) - - - - -Case 2 (limited) 121.8 7 618.2 1 0

xFailed: Graphs

too large

Library usages Frequent itemsetmining

Cluster #1

Cluster #2

Cluster #n

...

SAT solver

SAT solver

SAT solver

Library usagepatterns #1

Library usagepatterns #2

Library usagepatterns #n

BigGroum

Page 29: Mining for Graph-Based Library Usage Patterns in COBOL Systems

Conclusion

19

COBOL Library Usage Pattern Mining2

IDENTIFICATION DIVISION. PROGRAM-ID. BARTHOL. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Main menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'A'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage accounts". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'T'. 15 M-OPT1-TEXT PIC X(40) VALUE "Manage transactions". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit application". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). PROCEDURE DIVISION. DISPLAY "Starting". PERFORM INIT-WORK. PERFORM MAIN-LOOP. PERFORM CLOSE-WORK. DISPLAY "Done". GOBACK. INIT-WORK. CALL "TRANIO" USING TRAN-CTRL-BLK. CALL "ACCIO" USING ACC-CTRL-BLK. MAIN-LOOP. PERFORM WITH TEST AFTER UNTIL M-SELECTION='Q' PERFORM CLS CALL "RMENU" USING M-MENU EVALUATE M-SELECTION WHEN 'A' PERFORM MANAGE-ACCOUNTS WHEN 'T' PERFORM MANAGE-TRANSACTIONS END-EVALUATE END-PERFORM. MANAGE-ACCOUNTS. CALL "ACCMENU".

IDENTIFICATION DIVISION. PROGRAM-ID. ACCMENU. ENVIRONMENT DIVISION. CONFIGURATION SECTION. DATA DIVISION. WORKING-STORAGE SECTION. COPY TRANCTRL. COPY ACCCTRL. COPY SCREENIOV. COPY ACCOUNT REPLACING ==:PREFIX:== BY ====. COPY ACCOUNT REPLACING ==:PREFIX:== BY ==T-==. COPY TRANL REPLACING ==:PREFIX:== BY ====. COPY TRAN REPLACING ==:PREFIX:== BY ====. 01 M-MENU. 05 M-SELECTION PIC X(1) VALUE SPACES. 05 M-TITLE PIC X(40) VALUE "Barthol Bank - Account menu". 05 M-OPTION. 10 M-OPT1 15 M-OPT1-CODE PIC X VALUE 'C'. 15 M-OPT1-TEXT PIC X(40) VALUE "Create account". 10 M-OPT2 15 M-OPT1-CODE PIC X VALUE 'S'. 15 M-OPT1-TEXT PIC X(40) VALUE "Select account". 10 M-OPT3 15 M-OPT1-CODE PIC X VALUE 'E'. 15 M-OPT1-TEXT PIC X(40) VALUE "Edit account". 10 M-OPT35 15 M-OPT1-CODE PIC X VALUE 'D'. 15 M-OPT1-TEXT PIC X(40) VALUE "Delete account". 10 M-OPT4 15 M-OPT1-CODE PIC X VALUE 'H'. 15 M-OPT1-TEXT PIC X(40) VALUE "Account history". 10 M-OPT5 15 M-OPT1-CODE PIC X VALUE 'L'. 15 M-OPT1-TEXT PIC X(40) VALUE "List bank status". 10 M-OPT6 15 M-OPT1-CODE PIC X VALUE SPACES. 15 M-OPT1-TEXT PIC X(40) VALUE SPACES. 10 M-OPT7 15 M-OPT1-CODE PIC X VALUE 'Q'. 15 M-OPT1-TEXT PIC X(40) VALUE "Quit menu". 10 M-OPT99 15 M-OPT1-CODE PIC X VALUE LOW-VALUE. 15 M-OPT1-TEXT PIC X(40) VALUE LOW-VALUE. 01 WRK-VARS. 05 WRK-INPUT-VAR PIC X(10). 05 W-ACC-ID PIC 9(5). 05 W-AMOUNT PIC -ZZZ9.99. PROCEDURE DIVISION. PERFORM INIT-WORK. PERFORM MAIN-LOOP. GOBACK. INIT-WORK.

IC102A (0)

IC102A (2)

order

DN1 (Group) (1)

use

use

IC102A (3)

use

order

IC102A (4)

order

IC102A (6)

order

DN2 (Group) (5)

use

use

STOP RUN (7)

order

418248 (no-op, L268) (8)

order

IC105A (0)

IC105A (3)

order

MAIN-DN2 (Group) (1)

use

use

IC105A (4)

use

IC105A (5)

use

MAIN-DN1 (Group) (2)

use

use

use

use

order

order

IC104A (6)

order

STOP RUN (10)

order

GROUP-02 (Group) (7)

use

GROUP-01 (Group) (8)

use

ELEM-77 (Group) (9)

use

1757336 (no-op, L272) (11)

order

IC107A (0)

STOP RUN (4)

order

TABLE-1 (Group) (1)

use

IDN1 (Group) (2)

use

TABLE-2 (Group) (3)

use

650331 (no-op, L275) (5)

order

IC109A (0)

STOP RUN (2)

order

GRP-01 (Group) (1)

use

454282 (no-op, L270) (3)

order

IC110A (0)

STOP PROGRAM (3)

order

GRP-01 (Group) (1)

use

WS1 (Group) (2)

use

IC111A (0)

STOP PROGRAM (4)

order

LS1 (Group) (1)

use

WS2 (Group) (2)

use

GRP-01 (Group) (3)

use

IC113A (0)

STOP RUN (6)

order

SQ-FS3R1-F-G-120 (Group) (1)

use

IC113A (5)

use

ERROR-FLAG (Group) (2)

use use

WRK-CS-09V00 (Group) (3)

use use

RECORDS-IN-ERROR (Group) (4)

use use

order

214890 (no-op, L343) (7)

order order

IC115A (0)

IC115A (3)

order

FILE-REC-SQ-FS3 (Group) (1)

use

use

IC115A (4)

use

IC115A (5)

use

IC115A (6)

use

GROUP-LINKAGE-VARIABLES (Group) (2)

use

use

use

use

useorder

order

order

STOP RUN (7)

order

180074 (no-op, L290) (8)

order

IC117M (0)

STOP RUN (1)

order

833320 (no-op, L279) (2)

order

IC118M (0)

STOP PROGRAM (1)

order

IC202A (0)

ID1[IC202A] (5)

order

DN3 (Group) (1)

use

use

ID2[] (6)

use

ID2[] (7)

use

ID1[IC202A] (9)

use

IC202A (11)

use

IC202A (12)

use

ID1[IC202A] (13)

use

ID1[IC202A] (14)

use

DN2 (Group) (2)

use

use

use

use

IC202A (8)

use

use

ID1[IC202A] (10)

use

use

use

use

use

DN4 (Group) (3)

use

use

use

use

use

use

use

use

use

DN1 (Group) (4)

use

use

use

use

use

use

use

use

use

use

use

order

order

order

order

order

order

order

order

order

STOP RUN (15)

order

518479 (no-op, L270) (16)

order

ID1[IC204A] (0)

ID1[IC204A] (3)

order

TABLE-1 (Group) (1)

use

use

IC204A (5)

use

IC204A (7)

use

IC204A (8)

use

ID1[IC204A] (10)

use

IC204A (11)

use

ID1[IC204A] (14)

use

IC205A (16)

use

ID1[IC204A] (18)

use

DN1 (Group) (2)

use

use

use

use

use

use

use

use

use

order

DN5 (Group) (4)

use

CANCEL IC204A (6)

order

order

order

CANCEL ID1[IC204A] (9)

order

order

order

CANCEL ID1[IC204A] (12)

order

CANCEL ID1[IC204A] (13)

order

order

CANCEL IC205A (15)

order

order

order

TABLE-2 (Group) (17)

use

CANCEL ID1[IC204A] (19)

order

STOP RUN (20)

order

1359146 (no-op, L272) (21)

order

Library Usages Pattern Mining Library Usage Patterns

V1[C1] CANCEL V1[C1] C2 GOBACK

order order order

orderD3 (Group)D2 (Group)

D1 (Group)

use

use use

D5 (Group) D6 (Group)D4 (Group) D7 (Group)

use use

use

use

Goal: Understand library usages to estimate modernisation e!ort

Challenges8

Challenge SolutionDefinition of a library Marked by the user

Granularity of Groums Whole programInter-paragraph control flow Graph inlining

Iteration through GOTO Inlining stackProgram termination Auxiliary nodes

Inter-program data flow Only data usagesProgram name in variable Special encoding

Language support tradeoffs Focus on control flow constructs

Details in the paper

RQ1: Does Extraction Scale?

CaseGrouMiner* BigGroum

Time (min) # P Time (min) # Clusters # P

Case 1 0.95 3 10.4 3 4Case 2 (full) - - - - -Case 2 (limited) 121.8 7 618.2 1 0

RQ2: Does Mining Scale?