cool numbers 7 students got the lowest complexity: o(n 1/6 ) well done! –emily vukovich –priyank...
TRANSCRIPT
Cool Numbers
• 7 students got the lowest complexity: O(n1/6)Well done!– Emily Vukovich– Priyank Malvania– Kunal Choudhary– Vincent Lo– Carl Lam– Jerry Zhang– Weicheng Cao
• Best solution: not just coding, also thinking about the math of the problem– Typical of the best programs/algorithms
Administration
• Course evaluations– Due April 13
• Response rate so far: 1.2% !
– Please fill out • We read them all & take very seriously• Completely redesigned course
What worked & what didn’t?
• Free-form questions– 1 most interesting / important thing you learned– 1 least interesting thing what would you remove?
Multi-Threading• Program start: 1 “main” thread created• Need more: construct threads
#include <iostream>#include <thread> // C++ 11 featureusing namespace std;
void start_func_for_thread ( ) { cout << “myThread lives!\n”; do_func_a ();}
int main() { cout << “Main thread starting.\n”; thread myThread (start_func_for_thread); cout << “Main thread continues!\n”; do_func_b (); return 0;}
Compile:g++ --std=c++11 -pthread main.cpp
Thread TimingThread
Time
main“M
ain thread starting”
“Main thread continues”
myThread
do_func_b ()
“myThread lives!”
do_func_a ()
ExitProgram
Main thread startingmyThread lives!Main thread continues
Output
Another Possible TimingThread
Time
main“M
ain thread starting”
“Main thread continues”
myThread
do_func_b ()
“myThread lives!”
do_func_a ()
ExitProgram
Main thread startingMain thread continuesmyThread lives!
Output
Possible Timing?Thread
Time
main“M
ain thread starting”
“Main thread continues”
myThread
do_func_b ()
ExitProgram
Main thread startingMain thread continues
Output
Multi-Threading#include <iostream>#include <thread> // C++ 11 featureusing namespace std;
void start_func_for_thread ( ) { cout << “myThread lives!\n”; do_func_a ();}
int main() { cout << “Main thread starting.\n”; thread myThread (start_func_for_thread); cout << “Main thread continues!\n”;
// main thread waits here until myThread finishes myThread.join (); do_func_b (); return 0;}
New Thread TimingThread
Time
main“M
ain thread starting”
“Main thread continues”
myThread
do_func_b ()
“myThread lives!”
do_func_a ()
ExitProgram
Main thread startingmyThread lives!Main thread continues
Output
joinconstruct thread
Many Threads#define NUM_THREADS 10
void call_from_thread (int tid) { cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}
int main() { thread myThread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);
cout << “Main thread: Godfrey’s the main man!\n”;
// Main thread waits for all threads to complete for (int i = 0; i < NUM_THREADS; i++) myThread[i].join();
return 0;}
Can pass parametersto thread start function
Output?Thread Thread Thread Thread 12: Godfrey Hounsfield is #: Godfrey Hounsfield is #120: Godfrey Hounsfield is #0Thread 4: Godfrey Hounsfield is #43: Godfrey Hounsfield is #3
Thread 6: Godfrey Hounsfield is #6Thread 5: Godfrey Hounsfield is #5Thread 7: Godfrey Hounsfield is #7Thread 8: Godfrey Hounsfield is #8Main thread: Godfrey's the main man!Thread 9: Godfrey Hounsfield is #9
What happened?
Thread Synchronization
• Problem– cout is a global variable– All threads writing to it without coordination /
synchronization– Getting their output interleaved
• Solution– Only one thread should write to cout at a time– How?– mutex (mutual exclusion) variable– “lock” this variable
• Only 1 thread gets lock at a time• Only that thread can write to cout
Synchronizing Threads#include <mutex>mutex outputSync; // Global variable: who controls cout?
void call_from_thread (int tid) { // Need to grab the cout control variable (mutex) // Will block (wait) on next line until we get the lock lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl; // Lock will be automatically released by destructor // Then another thread can get it.}
int main() { thread myThread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);
cout << “Main thread: Godfrey’s the main man!\n”; . . .
New OutputThread 0: Godfrey Hounsfield is #0Thread 7: Godfrey Hounsfield is #7Main thread: Godfrey's the main man!Thread 9: Godfrey Hounsfield is #9Thread 3: Godfrey Hounsfield is #3Thread 4: Godfrey Hounsfield is #4Thread 5: Godfrey Hounsfield is #5Thread 6: Godfrey Hounsfield is #6Thread 1: Godfrey Hounsfield is #1Thread 2: Godfrey Hounsfield is #2Thread 8: Godfrey Hounsfield is #8
Output not interleaved
Each thread grabs cout to output an entire line
Order in which threads execute still arbitrary
Find the Bug!#include <mutex>mutex outputSync; // Global variable: who controls cout?
void call_from_thread (int tid) { lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}
int main() { thread myThread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);
cout << “Main thread: Godfrey’s the main man!\n”; . . .
Not respecting the mutex on cout! Can interleave output with the helper threads
Corrected Code#include <mutex>mutex outputSync; // Global variable: who controls cout?
void call_from_thread (int tid) { lock_guard<mutex> getTheOutput(outputSync); cout << “Thread “ << tid << “: Godfrey Hounsfield is #” << tid << endl;}
int main() { thread myThread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) // Launch threads myThread[i] = thread(call_from_thread, i);
{ lock_guard<mutex> getOutput(outputSync); cout << “Main thread: Godfrey’s the main man!\n”; } . . .
//Destructor will release the lock here
Why are these
braces here?
Who is Godfrey Hounsfield?
1. A British Electrical Engineer
2. Knighted (Sir Godfrey)
3. Winner of the Nobel Prize in medicine
4. Inventor of the CAT scanner
Broad knowledge innovate across disciplines!
Multithreading for Speed• Vector dot product
– prod = v1 v2
double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod = 0; for (int i = 0; i < v1.size(); i++) { prod += v1[i] * v2[i]; } return (prod);}
#define N 400000000int main () { vector<float> v1(1,N), v2(2,N); cout << “dot product: “ << dotProd (v1, v2) << endl;}
Output: 8e8
Parallel Dot Product: Idea
111111111111
222222222222
v1 v2
myThread[0]111111111111
222222222222
v1 v2 prod02468
Serial
myThread[1]
myThread[2]
myThread[3]
Parallel on 4 CPUs
prod
081624
. . .
24
Parallel Dot Product
#define NUM_THREADS 4 // 4 CPUs on each UG machine
double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod = 0; thread myThread[NUM_THREADS];
// Send 1/4 of the vector to each of 4 threads for (int ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr] = thread (dpHelper, ref(v1), ref(v2), ithr * N/NUM_THREADS, (ithr+1)*N/NUM_THREADS, ref(prod));
// Wait for all threads to complete for (ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr].join(); return (prod);}
Won’t pass by reference unless you explicitly say
to wants to make a copy per thread
Parallel Dot Product
#define NUM_THREADS 4 // 4 CPUs on each UG machine
void dpHelper (const vector<float>& v1, const vector<float>& v2, int istart, int iend, double& prod) {
for (int i = istart; i < iend; i++) prod += v1[i] * v2[i];}
Output: 2e8or 3.3e8or 4.6e8, ??
Race condition!Many threads updating one variable
This is read, then a write! Some threads reading old value, adding and overwriting someone else’s addition
What’s Really Happening
111111111111
222222222222
v1 v2
myThread[0]
myThread[1]
myThread[2]
myThread[3]
Parallel on 4 CPUs
prod
061014Should be 24, not 14!
Unsynchronized updates some
additions are being lost/over-written
Fix with Output Product per Thread
111111111111
222222222222
v1 v2
myThread[0]
myThread[1]
myThread[2]
myThread[3]
Parallel on 4 CPUs
prod[0]
0246
prod[1]
0246
prod[2]
0246
prod[3]
0246
mainthread
total
06121824
Only one thread ever reads/writes to these variables at a time
No race condition, and no lost additions
Fixed Code: Output Var per Thread
double dotProd (const vector<float>& v1, const vector<float>& v2) { double prod[NUM_THREADS]; // Partial dot products thread myThread[NUM_THREADS];
for (int ithr = 0; ithr < NUM_THREADS; ithr++) { myThread[ithr] = thread (dpHelper, ref(v1), ref(v2), ithr * N/NUM_THREADS, (ithr+1)*N/NUM_THREADS, ref(prod[ithr]));
// Wait for all threads to complete for (ithr = 0; ithr < NUM_THREADS; ithr++) myThread[ithr].join();
double total = 0; // Now add up the complete total for (int ithr = 0; ithr < NUM_THREADS; ithr++) total += prod[ithr];
return (total);}
m4
• Do not try multi-threading until you have a good serial implementation!
• Could multi-thread:– Calculation of paths / path delays– Search for best solutions
• Need to privitize (duplicate per thread) some data structures– E.g. Priority queue for wavefront– Best solution for optimizer?
• Or should you lock it?• Or a combination?