efficient and flexible architectural support for dynamic monitoring yuanyuan zhou, pin zhou, feng...

20
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Post on 19-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Efficient and Flexible Architectural Support for

Dynamic MonitoringYUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS

UIUC

Page 2: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Outline

Background iWatcher Functionality iWatcher Design Performance Conclusion

Page 3: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Static or Dynamic Monitoring?

Static Monitoring– Needs annotation, programmer work– Difficult for unsafe languages (C, C++)

Dynamic Monitoring– Large instrumentation cost– Significant slowdown, performance loss

Dynamic is stronger than Static Monitoring– Dynamic based on actual execution path

Page 4: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Code or Location Controlled Dynamic Monitoring?

Code-Controlled Monitoring– Monitoring performed by special instructions– Assertions & Dynamic Checkers belong here – No hardware support needed

Location-Controlled Monitoring– Monitoring performed only when program

accesses watched memory locations by any way

– Hardware support is usually required– iWatcher and hardware-assisted watchpoints

Page 5: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

iWatcher Functionality

Flexible and low-overhead dynamic monitoring

With hardware support– Without expensive exceptions– The program has its own internal light-weight

exception handler, the monitoring function

When a watched memory address is accessed, the monitoring function is automatically executed.

Page 6: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

iWatcher Functionality (cont)

If the check of the monitoring action fails, then:– Report, simply report error (non-interactive)– Break, raise a hardware exception, switching

control to the debugger– Rollback, revert to a safe checkpoint

For the same address, more than one monitors may be watching.

Page 7: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

iWatcher – Software Level

int x, *p; /* assume invariant: x = 1 */iWatcherOn(&x, sizeof(int), READWRITE, BreakMode,

&MonitorX, &x, 1);...p = foo(); /* a bug: p points to x incorrectly */*p = 5; /* line A: a triggering access */z = Array[x]; /* line B: a triggering access */...iWatcherOff(&x, sizeof(int), READWRITE, &MonitorX);

bool MonitorX(int *x, int value){return (*x == value);

}

Page 8: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Modest Hardware Support (?)

Page 9: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

How to monitor a location?

When iWatcherOn() is called– Add monitoring function to (software) CheckTable– If size < LargeRegion → all words are

transferred to L2 cache and tagged update L1 if necessary

– If size > LargeRegion → the entire area is tagged in the Range Watch Table (RWT)

If RWT full, proceed as if size < LargeRegion

Page 10: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

How to monitor a location? (cont)

If a word is evicted from L2, store the watch bits (if valid) in Victim WatchFlag Table VWT– If VWT full, O/S support (rare)

When the word is restored, copy the watch bits from VWT

When iWatcherOff is called:– Remove monitoring function from Check Table– If no monitors are watching this area, update

VWT, RWT, L1 and L2 bits as necessary.

Page 11: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

How to detect a triggering access?

Out of Order Execution, Pipelining →– Not all instructions will commit

For each Load/Store– Check if valid entry exists in RWT– Bring word and WatchFlag from cache (load) or

prefetch word to cache and get WatchFlag (store)– Store the flags in the ReOrder Buffer (ROB)– Upon retirement of instruction (if it retires), jump

to the monitor, if bits are set.

Page 12: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

How to Trigger Monitoring Functions?

When a triggering access is detected– Save processor status and jump to

Main_Check_Function Register– The monitor scans the CheckTable and calls

serially all monitors that:Watch this addressFor this access mode

– For performance, the Thread-Level Speculation (TLS) mechanism may be used.

Page 13: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Executing Monitoring Functions

Page 14: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Executing Monitoring Functions

Page 15: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Comparison to Other Approaches

Page 16: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Performance Compared to Valgrind

4-179% overhead, 25-169x less than Valgrind

Page 17: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Performance with/without TLS

Up to 30% reduction in two cases

Page 18: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Performance varying the fraction of triggering loads and TLS

Page 19: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Performance varying the size of monitoring function and TLS

Above 4 contexts there is no significant improvement

Page 20: Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Conclusion

Some Hardware Changes <180% overhead if 20% of loads are

monitored Detects most bugs

– Buffer Overflow– Memory Leaks– Access to non-allocated or non-initialized– …