report of work done during april 2006 – may …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf ·...

34
REPORT OF WORK DONE DURING APRIL 2006 – May 2016 R. Govindarajan Professor Dept. of Computer Science and Automation Supercomputer Education and Research Centre Indian Institute of Science Bangalore 560 012, INDIA Table of Contents 0. Executive Summary 1 1. Details of Academic Qualifications 3 2. Details of Service at the Institute 3 3. Awards and Recognition 4 4. Report of Research/Teaching/Industry Interaction 4 5. Services Provided 10 6. Work Done for the Faculty/Senate 13 Appendix A. Details of Research and Project Guidance 15 B. Details of Publications 20

Upload: phungcong

Post on 06-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

REPORT OF WORK DONE

DURING APRIL 2006 – May 2016

R. GovindarajanProfessor

Dept. of Computer Science and AutomationSupercomputer Education and Research Centre

Indian Institute of ScienceBangalore 560 012, INDIA

Table of Contents

0. Executive Summary 11. Details of Academic Qualifications 32. Details of Service at the Institute 33. Awards and Recognition 44. Report of Research/Teaching/Industry Interaction 45. Services Provided 106. Work Done for the Faculty/Senate 13Appendix

A. Details of Research and Project Guidance 15B. Details of Publications 20

Page 2: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic
Page 3: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

Executive Summary

1. Research Areas:Computer Architecture, Compilers, and High Performance Computing

2. Research Vision:

Understanding the interaction between architecture, programming models, andcompilers; developing novel programming models, languages, efficient compileranalyses and optimizations, and runtime techniques for high performance com-puting systems; designing efficient processor and memory subsystem for multi-coreand heterogeneous systems;

3. Awards and Recognition:

• Prof. Rustom Choksi award for Excellence in Research in Engineering, IISc, 2015

• Best paper award in the International Conference on Code Generation and Optimizations(CGO-2015), San Francisco, USA.

• Fellow of Indian National Academy of Engineering (FNAE), 2011

• Prof. Satish Dhawan Young Engineers Award 2008, Karnataka State Council for Sci-ence & Technology, for contributions in Computer Science & Electronics, awarded in2011

• Associate Editor of ACM Transactions on Architecture and Code Optimizations (TACO)(since 2016) and ACM Transactions on Parallel Computing (ToPC) (since 2012).

• Associate Editor of IEEE Micro (2011 – 2015) and Computer Architecture Letters(2011 – 2014)

• IBM Faculty Award, 2008

4. Research Publications during the Review Period:

Journal Papers : 8Top-Tier International Conferences : 27Refereed International Conferences : 18Total: 53

My publications have received a total citation of 2156 (as listed in google scholar) and the citationin the last 5 years is 956. My h-index is 24 and i10-index is 56.

5. Research Supervision during the Review Period:Graduated 6 Ph.D. Students, 1 about to defend the thesis and 2 more are in various stagesof progress.Graduated 11 M.Sc.[Engg] students and 1 in progress

6. Industry Interaction:2 Sponsored R&D projects worth Rs. 250 Lakhs (USD 500,000) and 1 consultancyproject.

1

Page 4: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

7. Teaching:Taught 4 different courses for a total of 14 times.

8. Other Services:

• Member of the Technical Advisory Committee and Chairman of the Expert Group onR & D in the National Supercomputing Mission.

• Member of the Research Council of Advanced Numerical Research and Analysis Group(ANURAG), DRDO.

• Member of the Executive Council of the Central University of Karnataka, Gulbarga.

• Member of the Research Advisory Board of Sri Sathya Sai Institute of Higher Learning,Puttaparthi, since 2015.

• Chairman of Project Review & Steering Committee (PRSG) of two major projects, andMember of PRSG and Member of Working Group of projects funded by Department ofInformation Technology.

• Member of Technical Evaluation Committee for the purchase of High Performance Com-puting Systems for University of Hyderabad, JNCASR, C-DAC, and C-MMACS.

• Chairman, Supercomputer Education and Research Centre (SERC), since Dec. 2004and involved in planning the growth and expansion of SERC as a state-of-the-art highperformance computing centre and as a leading academic department with a strongresearch focus in computer systems and computational science.

2

Page 5: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

PROFORMA

Name of the member of staff R. GovindarajanDesignation ProfessorDepartment Dept. of Computer Science and Automation

Supercomputer Education and Research CentreIndian Institute of Science, Bangalore 560 012

Age and Date of Birth 54 years – July 25, 1961Present Salary and Grade Rs. 65,870 in the grade

Rs. 37,400 – 67,000 with Grade Pay Rs.10,500

1 Details of Academic Qualifications

Examination/ University/ Class Year ofDegree Institute Obtained Obtaining

the Degree

Doctor of Philosophy Indian Institute of Science — 1989(Computer Science) Bangalore, India

Bachelor of Engineering Indian Institute of Science — 1984(Electronics & Commn.) Bangalore, India

Bachelor of Science University of Madras I 1981(Mathematics) Madras, India

1.1 Membership of Professional Bodies

• Senior Member, IEEE

• Member, IEEE Computer Society

• Member, Association for Computing Machinery

2 Details of Service at the Institute

Year DesignationFrom To

March 2006 Present Professor

March 2000 March 2006 Associate Professor

August 1995 March 2000 Assistant Professor

3

Page 6: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

3 Awards/Recognition

• Prof. Rustom Choksi award for Excellence in Research in Engineering, IISc, 2015

• Best Paper Award in the International Symposium on Code Generation and Optimization,Palo Alto, CA, USA, 2015.

• Elected as Fellow of Indian National Academy of Engineering (FNAE), 2011

• Awarded Prof. Satish Dhawan Young Engineers Award (for the year 2008) by the KarnatakaState Council for Science & Technology, for contributions in Computer Science & Electronics,in 2011

• Associate Editor, ACM Transactions on Architecture and Code Optimizations (TACO), since2016

• Associate Editor, ACM Transactions on Parallel Computing (ToPC), since 2012

• Associate Editor, IEEE Micro, 2011 - 2015

• Associate Editor, Computer Architecture Letters, 2011 - 2014

• General Co-Chair of the 29th IEEE Parallel and Distributed Processing Symposium (IPDPS-2015), Hyderabad 2016.

• General Co-Chair of the 15th ACM SIGPLAN Symposium on Principles and Practice ofParallel Programming, Bangalore 2010.

• General Co-Chair of the 3rd IEEE International Conference on E-Science and Grid Comput-ing, Bangalore 2007.

• IBM Faculty Award, 2008

4 Report on Research/Teaching/Industry Interaction During the

Period Under Review

4.1 Research and Scientific Investigation

4.1.1 Research Focus

• Computer Architecture

• Compiler Analysis and Optimizations

• High Performance Computing

Details of my research work are presented in Section 4.1.5.

4

Page 7: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

4.1.2 Guidance of Students for Research Conferments

At the time of During the Period TotalLast Review under Review

Ph.D. Thesis Submitted 1 Completed 6 Completed 7Under Review 0 Under Review 1 Under Review 1Under Progress 3 Under Progress 2 Under Progress 2

M.Sc.[Engg.] Completed 8 Completed 11 Completed 19Under Progress 3 Under Progress 1 Under Progress 1

Details of the research topics pursued by my students are presented in Appendix A (page 15).

4.1.3 Sponsored Projects (Research Schemes)

Title Funding Value Period Co-InvestigatorsAgency

Compiling OpenCL for AMD Corp., Rs. 50 Lakhs 2010-13 —AMD Platforms

High Performance Applications Microsoft Corp., Rs. 205 Lakhs 2008-11 Prof. Ravi Nanjundiahon Heterogeneous Windows USA Prof. Rahul PanditClusters Prof. Matthew Jacob T

Enabling Technology for Future Intel Corp., Rs. 24 Lakhs 2005-06 Prof. Ravi NanjundiahHigh Performance Computing USA Dr. S. RahaApplications Dr. Nagasuma Chandra

Bridging the Generation Gap IISc, X Plan Rs. 8 Lakhs 2005-06 Dr. Sathish Vadhiyarbetween Clusters ofDifferent Generations

4.1.4 Publications

A complete list of publications is included in Appendix B (page 20).

During the Period Before the Period TotalUnder Review Under Review

Book Chapters 0 2 2International Journals 8 21 29Top-Tier International Conferences 27 20 47Refereed International Conferences 18 41 56

Total 53 84 137

Top-Tier International Conferences are rated as A or A* by Australian CORE Ranking(http://www.core.edu.au/). The acceptance in most of these conferences is through a rigorous

5

Page 8: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

review process, often involving 4 or 5 reviews per paper, and the acceptance rates are very low(typically 20% – 30%). The conference publications also have a high citation index and impactfactor. Hence these publications are deemed on par with journal publications.

Citation Report on Publications

My publications have received a total citation of 2156 (as listed in google scholar. The citation inthe last 5 years is 956. My h-index is 24 and i10-index is 56.The citation results for my publications can be viewed at:

http://scholar.google.com/citations?user=y91My88AAAAJ&hl=en

4.1.5 Research Summary of Work Carried out Under Period of Review

Our research focus has been on exploiting parallelism at all levels through novel architecture designand innovative compilation techniques. Our work has been in perfect tune with the rapid advanceswitnessed in this area. While our earlier work focused on instruction-level parallelism compilationtechniques, such as elegant software pipelining methods, efficient instruction scheduling approaches,and integrated register techniques, as well as on the design of effective vectorizing compilers forexploiting data-level parallelism, our recent research efforts have been on exploiting thread-level,data-level and task-level parallelisms for efficient execution on modern Graphics Processing Units(GPUs). Designing efficient programming models/compiling techniques along with effective runtimesystems for automatic mapping of applications to heterogeneous accelerator-based architectureshas been the recent research focus of our group. Also, we have focused our attention on thedesign of novel cache replacement techniques for last-level caches in multi-core architectures, andprefetching schemes for power-efficient and performance-centric memory subsystem design. Wehave developed a comprehensive and accurate analytical model for memory system in multi-corearchitectures. The analytical model has been extended and used for deriving useful insights aboutmemory system which can be used to enhance its performance. In addition our group has alsodeveloped techniques for scalable context-sensitive and flow-sensitive points-to analysis methods,accurate and comprehensive path-sensitive data-flow analysis, and compiler transformations forimproving the performance of Software Transactional Memory.

A brief summary of the research work is presented below.

Research in Computer Architecture

With multi-core architectures playing a dominant role, our research focus in computer architectureduring the last 6 years has been on memory subsystem performance. Specifically our work is on thedesign of a novel two-level cache mapping method [RT.5]1, an efficient replacement strategy which

1References that start with the prefix “RT” correspond to research/project guidance under my supervision. Theseare listed in Appendix A (page 15).

6

Page 9: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

emulates the optimal replacement policy [Pb.32]2, a novel partitioned cache design and replace-ment policy which uses program characteristics, such as next-use distance [Pb.25], and effectivecache management policies for last-level caches in a multi-core cache for multiprogrammed environ-ment [Pb.19]. We have expanded and explored the design space of prefetch schemes, quantified theregularity in prefetch history using entropy, and proposed an adaptive algorithm which can choosethe best prefetch history for given application and prefetch scheme [Pb.40]. We have also proposeda performance-aware prefetching scheme [Pb.29] which results in multiple benefits like improvedperformance, more accurate prefetches and reduced memory traffic. Our work on multi-core cachesand prefetching are reported in [RT.2].

With the widening gap between processor and memory performance, the memory (DRAM)performance can impact the overall performance of a multi-core system in a significant way. Weinvestigated various techniques to improve the performance memory system design for multi-corearchitecture [RT.7] We have also developed a detailed and accurate analytical model for evaluatingmemory system performance (latency and bandwidth) [Pb.12]. The model has been used to deriveuseful insights to improve performance of memory systems [Pb.18]. The model has been extended tostacked DRAM cache [Pb.37] and been used to reveal interesting insights in memory system designwhich in turn has lead to performance and power-efficient stacked DRAM cache designs [Ph.11,Pb.36].

Research in Compiler Analyses and Optimizations

In the areas of compilers, our research group’s focus in the past was on compiler optimizations. Inthe last 5+ it has also expanded to compiler analysis, specifically static analysis. First, we haveproposed a comprehensive, elegant, and efficient framework for path-sensitive data-flow analysiswhich improves the precision of data-flow analysis in the presence of complex control-flow. Thisis achieved by restructuring the control flow graph and making use of the notion of product-automata [Pb.30], [RT.15].

We have proposed several methods to improve the efficiency and scalability of context-sensitivepoints-to analysis [RT.3]. In particular we have proposed two methods which trade off a smallamount of precision (but retaining the safety/correctness of the method) for significant improvementin performance. These use either a novel randomized approach [RT.3] to points-to analysis oruse a probabilistic data structure [Pb.44] to reduce, respectively, the computational and memoryrequirements of the method. Also, we have also proposed two exact methods for context-sensitivepoints-to analysis. In the first approach, we formulate the points-to analysis problem as a set oflinear equations [Pb.26]. In another exact method, we propose efficient heuristics to prioritize theorder in which edges are introduced in the constraint graph (used in points-to analysis), which leadto significant improvements on how fast the fixpoint is reached [Pb.24].

More recently, we extended our work on points-to analysis to flow-sensitive techniques [RT.11].First we have developed an efficient parallel graph-rewriting method for flow-sensitive points-toanalysis. More specifically, we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem, and solve two key challenges, viz., the introduction of spurious edges and prob-lems due to strong and weak updates, that flow-sensitivity adds to a graph-rewriting formulation.

2References that start with the prefix “Pb” correspond to my publications. These are listed in Appendix B(page 20).

7

Page 10: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

The proposed parallel solution has been implemented using Intel Threading Building Blocks andevaluated on a set of benchmark programs [Pb.14]. More recently, our group improved the executiontime of staged flow-sensitive analysis by approximating frequently occurring group of objects (in apoints-to set) by a single object. The work uses a well-known data mining technique called frequentitemset mining to obtain these frequently occurring object sets efficiently. While this approach in-troduces some approximation, it is shown to be safe and the precision loss is only minor [Pb.10].This work won the best paper award in the International Symposium on Code Generation andOptimization (CGO-2015).

Research in Compiler Transformation for Software Transactional Memory

With the advent of multi-core processors, Software Transactional Memory (STM) has attractedsignificant interest as an effective and efficient programming paradigm for shared memory systems.However, the adoption of STMs in mainstream software has been quite low primarily due to theperformance overheads incurred by STMs. Two major contributors for the poor performanceof STMs are the significant number of transactional aborts and the excessive cache misses. Wehave proposed compiler analysis techniques and automatic code transformations to address theseissues [RT.4]. We have conducted a detailed experimental evaluation of the cache behavior ofSTM applications to quantify the impact of different STM factors on the cache misses experiencedand classify them into different components based on the STM factors which cause the misses.We have proposed several compiler transformations, namely lock-data collocation, redundant lockaccess removal, selective per partition time-stamp transformation to address the different cache misscomponents [Pb.21]. Next, we have addressed the problem of excessive transactional aborts causedin STM applications. We have proposed compiler analysis to identify always conflicting atomicsections, and develop two different mechanisms, viz., selective pessimistic concurrency control andcompiler-inserted early conflict checks, to reduce the impact of transactional aborts [RT.4] and[Pb.23]. In addition, we have also proposed a novel technique for reconciling part of abortedtransactions (transactions in which a conflict has occurred) through compiler support [Pb.20]. Theproposed approach is effective in reconciling many transactions which would have otherwise beenaborted.

Research in High Performance Computing

Recent advances in the design of Graphics Processing Units (GPUs) have significantly increasedthe compute capabilities of systems with hundreds of SIMD cores. This provides an excellent op-portunity for exploiting data-level, thread-level, and task-level parallelism. Our research group hasworked on this problem of programming heterogeneous multi-core accelerator-based architectures.In particular, our efforts have been on efficient and automatic compilation of programs writtenin different programming languages to accelerator architectures. First, we have proposed an effi-cient compilation method for programs written in StreamIT — a stream programming language,which allows exploiting data-level, thread-level, task-level and pipelined parallelisms — for syn-ergistic execution on to multiple CPU cores and GPUs [Pb.28], [Pb.46], [RT.14]. We have alsoproposed an automatic compilation framework for synergistic execution of MATLAB programs onCPU and GPU cores [Pb.22], [RT.13]. Next we have turned our attention to Partitioned GlobalAddress Space (PGAS) languages, such as X10 and Chapel, and propose an efficient automatic

8

Page 11: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

memory management scheme that uses a compiler-assisted runtime coherence approach [RT.1],[Pb. 17]. In addition we have also developed approaches for executing programs written in CUDAand OpenMP for synergistic execution on a cluster of CPU cores or CPU and GPU cores [Pb.39].Further our group has studied concurrent execution of multiple kernels in GPUs and proposedseveral concurrency-aware scheduling policies for GPUs [Pb.15].

We have designed and developed a runtime systems, called FluidiCL, which takes an OpenCLkernel written for a single device and executes it on multiple heterogeneous devices to speedupthe computation. The runtime performs the required dynamic work distribution and data transferfor cooperatively executing the kernel on all available devices, such CPU cores, GPUs, and In-tel’s ManyCore processors [Pb.13, RT.12]. Lastly, we have explored different scheduling methodsfor improving the execution of program GPUs. These include warp scheduling methods [Pb.9],inspector-execution method for GPUs for runtime dependence computation and efficient executionof loops [Pb.16], and taming control divergence in GPUs using control flow linearization [Pb.38].

4.2 Teaching

4.2.1 Courses Taught (at Graduate Level)

Course Title of Course Credits No. of Times Taught Average Any otherNumber Offered Alone/ Class Comments

Jointly Strength

E0-243 Computer Architecture 3:1 6 Alone 5 20 —Jointly 1

E0-343 Topics in Computer 3:1 2 Alone 1 5 –Architecture Jointly 1 2 —

SE-290 Modeling and Simulation 3:0 1 Jointly 1 10 —

SE-292 High Performance 3:1 5 Alone 1 15Computing Jointly 4

4.2.2 Guidance of M.E./M.Tech Project Students

At the time of During the PeriodAppointment/ under ReviewLast Review

M.E./M.Tech. 20 4

Details of the M.E./M.Tech project work are listed in Section A.2 in Appendix A (page 18).

4.2.3 Projects in Education/Short-Term Courses

• Given several keynote talks in national conferences and invited lectures on various topicsin Computer Architecture, Compiler Optimizations, and High Performance Computing in anumber of engineering colleges.

9

Page 12: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

4.3 Interaction with Industry

4.3.1 Sponsored Research Projects

• Completed a research project on “High Performance Applications on Heterogeneous WindowsClusters” funded by Microsoft Corp., USA.In this project our objective was to enhance the performance and functionality of a Windows-based HPC Cluster (WHPC) running under the Windows Compute Cluster Server (WCCS)and deploy it for real-world applications. More specifically, the objectives of this projectinclude:

– Enhancing the performance of a Windows HPC Cluster (WHPCC) by exploiting thecapabilities of the Graphics Processing Units (GPUs).

– Enhancing the functionalities of the WC platform to support Distributed Shared Memory(DSM) and OpenMP programming.

– To use high-performance computing on a WHPCC to study real-world applications suchas partial-differential-equation models for cardiac tissue and the Regional Ocean Mod-eling System for the Indian Ocean.

• Completed research project on “Compiling OpenCL for AMD Platforms” funded by AMDCorporation, USA.The objective of this project was to explore an alternative execution model for OpenCL whichperforms as much of task mapping, scheduling and assignment at compile time as possibleto efficiently execute these tasks, exploiting task-level, data-level and pipeline parallelismacross multiple heterogeneous computing devices and cores. The proposed model was im-plemented and tested on a heterogeneous cluster with AMD multi-core processors and ATIRadeon Graphics processing units. We developed a couple of large real-world applicationsto demonstrate the usefulness and effectiveness of the proposed model. Further we exploredcompiling other high-level languages, such as OpenMP, into OpenCL and executing them onheterogeneous computing platforms.

4.3.2 Consultancy Projects

• Completed a consultancy project on the “Performance Evaluation of Network Processors” forMindtree Consultants, Bangalore.

5 Services Provided

I have been serving as the Chairman of Supercomputer Education and Research Centre (SERC)since Dec. 2004. As the Chairman of the centre, my responsibilities include planning the growth ofSERC as a leading academic department in the country with a strong research focus in computersystems and computational science. Also, I have been involved in planning the expansion/growthof the computing facilities at SERC, the network services, storage and software packages.

In the area of upgradation of computing facilities, I was primarily involved in:

10

Page 13: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

• Assisted in preparing the report on High Performance Computing Initiatives during the 12thPlan Period, submitted to the Planning Commission, which identifies potential applicationareas and their HPC requirements. The report was subsequently accepted by the Governmentof India and the National Supercomputing Mission was initiated with an investment of Rs.4,500 Crores over a period of 7 years. Indian Institute of Science is one of the two implement-ing agencies for the project and I am a co-investigator in this major initiative (along withProf. N. Balakrishnan, as Principal Investigator).

I continue to play a key role in the National Supercomputing Mission; I am a member ofthe Technical Advisory Committee (TAC) of the mission. I am also the Chairman of theExpert Group on Research & Development (R & D).

• As Chairman of SERC, over the last 10 years, I have planned and augmented the HPCfacilities at SERC and ensured the availability of the state-of-the-art computing systems,including the first Petascale system in the country SahasraT, for the users at SERC. Thecomputing capability has been enhanced significantly (from 0.200 TeraFLOPS systems to1300 TeraFLOPS system), with a proportional increase in the storage system. The data centerinfrastructure has also gone through a planned expansion. Periodic planning, upgradationand procurement of software packages were also done to meet the user requirements. Also,I have been involved in the planning/expansion of the campus network services (wired andwireless LAN and Internet Services) as well as campus e-mail services.

5.1 Administrative Activities at the Institute

I have been a member/chairman of several academic and administrative committees at the Institute.

• Chairman, Contract Labour Management committee

• Member of the Undergraduate Purchase Committee

• Member of the Undergraduate Engineering Curriculum Committee

• Member of the Standing Committee on Network and Internet Services.

• Member of Library Committee.

• Member of the Organizing Committee of IISc Centenary Conference and IISc Global Confer-ence.

• Member of a number of committees for the procurements of clusters in various departments(including Physics, Chemical Engineering, Centre for Atmospheric and Oceanic Sciences, andComputer Science and Automation).

5.2 Committees Outside the Institute

• Member of the Technical Advisory Committee and Chairman of the Expert Group on R &D in the National Supercomputing Mission.

11

Page 14: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

• Member of the Research Council of Advanced Numerical Research and Analysis Group(ANURAG), DRDO.

• Member of the Executive Council of the Central University of Karnataka, Gulbarga, since2016.

• Member of the Research Advisory Board of Sri Sathya Sai Institute of Higher Learning,Puttaparthi, since 2015.

• Member of the High Power Committee for the Petascale Computing Initiative of CSIR (Coun-cil of Scientific and Industrial Research).

• Chairman, Project Review & Steering Group (PRSG) for the project titled “Proof of ConceptPhase of National Grid Computing Initiative: Garuda” at C-DAC, Bangalore.

• Chairman, Project Review & Steering Group (PRSG) for the project titled “Centre for Ad-vanced Computing Research and Education (CARE)” implemented by the Department ofInformation Technology, Anna University, Madras Institute of Technology Campus.

• Member, Project Review & Steering Group (PRSG) for the “National Grid Computing Ini-tiative: Garuda” project, being implemented by C-DAC, Bangalore.

• Member, Thematic Review Committee of projects on High Performance Computing & GridComputing, at C-DAC.

• Member of the R & D Working Group for C-DAC, formed by the Ministry of Communicationand Information Technology (MCIT).

• Member/Chairman of various Purchase/Advisory committees for the procurement of HighPerformance Computing systems and storage for C-DAC, Pune, C-DAC, Bangalore, Univer-sity of Hyderabad, JN Centre for Advanced Scientific Research (JNCASR), Bangalore, CSIRCentre for Mathematical Modelling and Computer Simulation (C-MMACS), Bangalore.

• Member of faculty selection committee for Indian Institute of Information Technology Designand Manufacturing (IIITD&M), Kancheepuram.

• Member of Faculty Selection Committee for National Institute of Technology, Nagpur.

• Member of Faculty Promotion Committee for University of Hyderabad.

• Member of Negotiation Committee, INDEST-AICTE Consortium for e-Journals, New Delhi.

• Member, Expert Committee for the AICTE (All India Council for Technical Education) visitsto Engineering College in and around Bangalore.

5.3 Professional Activities

• Program Committee member for various top-tier International Conferences (ranked A* or Aby Australian CORE ranking), including ISCA-2016 (ACM SIGARCH International Sympo-sium on Computer Architecture), ASPLOS-2015 (ACM International Conference on Architec-ture Support for Programming Languages and Operating Systems), PLDI-2014 (ACM SIG-PLAN Symposium on Programming Languages Design and Implementation), PACT-2014,

12

Page 15: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

PACT-2011, PACT-2006 (ACM/IEEE International Conference on Parallel Architecture andCompilation Techniques), PPoPP 2013 , PPoPP-2011, and PPoPP-2009 (ACM SIGPLANAnnual Symposium on Principles and Practice of Parallel Programming).

• Program Committee member for CGO-2013 (International Conference on Code Generationand Optimization), ISLPED-2009 (International Symposium on Low Power Electronics andDesign), ISLPED-2008, CASES-2013, CASES-2009 (International Conference on Compilers,Architectures and Synthesis for Embedded Systems), ICPP-2016, ICPP-2015 (IEEE Inter-national Conference on Parallel Processing), and HiPC 2006-2011, 2013, 2015 (InternationalConference on High Performance Computing)

• Associate Editor, ACM Transactions on Architecture and Code Optimizations (TACO), since2016

• Associate Editor, ACM Transactions on Parallel Computing (ToPC), since 2012

• Associate Editor, IEEE Micro, 2011 - 2015

• Associate Editor, Computer Architecture Letters, 2011 - 2014

• General Co-Chair for the 29th IEEE International Parallel and Distributed Processing Sym-posium (IPDPS-2015), Hyderabad, May 2015.

• General Co-Chair for the 15th ACM SIGPLAN Annual Symposium on Principles and Practiceof Parallel Programming (PPoPP-2010), Bangalore, Jan. 2010.

• Organized the 1st ATIP Workshop on High Performance Computing in India, held in con-junction with the International Conference on Supercomputing (SC-09), Portland, USA, Nov.2009.

• General Co-Chair for the 3rd IEEE International Conference on e-Science and Grid Comput-ing, Bangalore, Dec. 2007

• Organized the workshop on “New Horizons in Compilers”, held in conjunction with the In-ternational Conference on High Performance Computing, in Dec. 2007.

• Served as a reviewer for several international journals, such as IEEE Transactions on Comput-ers, IEEE Transactions on Parallel and Distributed Computing, ACM Transactions on Com-piler Optimizations, ACM Computing Surveys, and conferences, including ACM SIGPLANConference on Programming Language Design and Implementation (PLDI), Symposium onCode Generation and Optimization (CGO), and International Parallel Distributed ProcessingSymposium (IPDPS).

6 Work Done for the Faculty/Senate

• Thesis Examiner for Masters Theses at IISc.

• Thesis Examiner for Ph.D. and Masters Theses at IIT Delhi.

• Thesis Examiner for Masters Theses at IIT Madras.

13

Page 16: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

• Thesis Examiner for a Ph.D. Thesis at IIIT Hyderabad.

• Senate Nominee for Comprehensive Oral Examination (Qualifier) for Ph.D./M.Sc.(Engg) stu-dents at IISc.

• Examiner for several M.E./M.Tech Projects at IISc

14

Page 17: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

APPENDIX A

A Details of Research and Project Guidance

The following table summarizes the research/project supervision of graduate and undergraduatestudents.

Under Completed Graduated TotalIn Progress Review During the Before the

Period Period

Ph.D. 2 1 6 1 10M.Sc.(Engg.) 1 0 11 8 20M.E. 0 0 4 20 24Total 3 1 21 29 54

A.1 Research Guidance

Currently I am supervising 2 Ph.D. and 1 M.Sc.(Engg.) students. I have graduated 6 Ph.D. studentsand 11 M.Sc.[Engg] students during the period under review. Further 1 Ph.D. student is scheduledto defend his thesis in the next month (June 2016). In addition, 1 Ph.D. and 8 M.Sc[Engg] theseshave been completed under my supervision before the current review period. The names of thestudents and their research topics are listed below.

A.1.1 Ph.D. Theses (Completed) – During the Current Review Period

1. Sreepathi Pai, Efficient Dynamic Automatic Memory Management and Concurrent KernelExecution for General-Purpose Programs on Graphics Processing Units, Ph.D., Dec. 2014).

2. R. Manikantan, Improving Last-Level Cache Performance in Single and Multi-Core Proces-sors, Ph.D., Nov. 2013. Won the Alumni Medal for the best Ph.D.thesis in theDepartment of Computer Science & Automation for the year 2013-14.

3. Rupesh Nasre, Scaling Context-Sensitive Points-to Analysis, Ph.D., Sept. 2011.

4. Sandya S. Mannarswamy, Compiler Transformation for Improving the Performance of Soft-ware Transactional Memory, Ph.D., July 2011.

5. Kaushik Rajan, Efficient Cache Organization for Application Specific and General PurposeProcessors, Ph.D., May 2008.

6. T.S. Rajesh Kumar, On-Chip Memory Architecture Exploration of Embedded System on Chip,Ph.D., May 2008.

15

Page 18: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

A.1.2 Ph.D. Research Theses (Under Review) – During the Current Review Period

7. G.D. Nagendra, Multi-Core Memory System Design: Developing and Using Analytical Modelsfor Performance Evaluation and Enhancements Ph.D., Aug. 2015 (Thesis Defense scheduledon June 10, 2016)

A.1.3 Ph.D. Research Theses (In Progress) – During the Current Review Period

8. Jayvant Anantpur, Efficient Scheduling in GPU Architectures, Ph.D. (Submission likely inAug. 2016)

9. Shilpa Babalad, Efficient Program Execution in Many-Core Architectures, Ph.D. (joined inJan. 2014)

A.1.4 Ph.D. Theses (Completed) – Before the Current Review Period

10. V. Santhosh Kumar, Improving Communication Performance of I/O Intensive and Commu-nication Intensive Application in Cluster Computer Systems, Ph.D. Oct. 2006

A.1.5 M.Sc(Engg) Research Theses (Completed) – During the Current Review Pe-riod

11. Vaivaswatha Nagaraj, Fast Flow-Sensitive Pointer Analysis, M.Sc.(Engg), Oct. 2015

12. Prasanna Pandit, Cooperative Execution of OpenCL Programs on Multiple HeterogeneousDevices, M.Sc.(Engg), Jan. 2014

13. Ashwin Prasad, Automatic Compilation of MATLAB Programs for Synergistic Execution onHeterogeneous Processors, M.Sc.(Engg), Jan. 2012

14. Abhishek Udupa Efficient Compilation of Stream Programs for Multicores with Accelerators,M.Sc(Engg.), July 2009. Won the M.N.S. Swamy Medal for the best M.Sc(Engg)thesis in the Department of Computer Science & Automation for the year 2010-11.

15. Aditya Thakur, Comprehensive Path-sensitive Data-flow Analysis, M.Sc(Engg.), July 2008.

16. B.C. Girish, Efficient Techniques Exploiting Memory Hierarchy to Improve Network ProcessorPerformance, M.Sc(Engg.), Feb. 2008.

17. Santosh G. Nagarakatte, Spill Code Minimization and Buffer and Code Size Aware InstructionScheduling Techniques, M.Sc(Engg.), July 2007.

18. Sudhakar Surendran, A Systematic Approach to Synthesis of Verification Test-suites for Mod-ular SoC Designs, M.Sc(Engg.), Dec. 2006.

19. Govind S., Performance Modeling and Evaluation of Network Processors, M.Sc(Engg.), Dec.2006. Won the Sir Vithal N Chandavarkar Memorial Medal for the best M.Sc(Engg)thesis in the Supercomputer Education and Research Centre for the year 2007-08.

16

Page 19: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

20. Rajesh Vivekanandam Scalable Low Power Issue Queue and Store Queue Design for Super-scalar Processors, M.Sc(Engg.), Dec. 2006.

21. K. Shyam, Power-Aware Compilation Techniques for Embedded Systems, M.Sc(Engg.), July2006.

A.1.6 M.Sc(Engg) Research Theses (In Progress) – During the Current Review Pe-riod

22. Adarsh Patil, Memory Hierarchy Design in Fusion Architecture, M.Sc.(Engg)

A.1.7 M.Sc (Engg) Research Theses (Completed) – Before the Current Review Pe-riod

23. Sarvani V.V.N.S., Compiler Techniques for Code Size and Power Reduction for EmbeddedProcessors, M.Sc(Engg.), June 2004.

24. A. Radhika Sarma, A Simple Replacement Policy and a Dynamic Prefetching Technique forWWW Cache Processors, M.Sc(Engg.), April 2004.

25. Subash Chandar G., Reconfigurable Architectures for Application Specific Processors used inEmbedded Control Applications, M.Sc(Engg.), Feb. 2002.

26. K.V. Manjunath, Performance Analysis of Methods that Overcome False Sharing Effects inSoftware DSMs, M.Sc(Engg.), Apr. 2001.

27. Madhavi G. Valluri, Evaluation of Register Allocation and Instruction Scheduling Methods inMultiple Issue Processors, M.Sc. (Engg.), 1999.

28. N.P. Manoj, Compile-Time Support Approach for Software Distributed Shared Memory Sys-tems. M.Sc. (Engg.), 1999.

29. V. Sricharan, Cache Performance Study in Software Distributed Shared Memory Systems.M.Sc.(Engg.), 1999.

30. V. Janaki Ramanan, Efficient Resource Usage Modelling M.Sc.(Engg.), 1999.

I was associated with the supervision of (i) one Ph.D. student, Hongbo Yang, graduatedin Aug. 2004 at the Department of Electrical and Computer Engineering, University of Delaware,Newark, USA, (ii) two Ph.D. students, Erik R. Altman and Shashank Nemawarkar, graduatedin July 1995 and Aug. 1996, respectively, at the Department of Electrical Engineering, McGillUniversity, Montreal, Canada, and (iii) a Masters student, Florea Suciu at the Department ofComputer Science, Memorial University of Newfoundland, St. John’s, Canada.

17

Page 20: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

A.2 Project Guidance

A.2.1 M.E./M.Tech Project Guidance – During the Current Review Period

31. Patel Arth Kausheybhai, Improving Memory Hierarchy Performance in Heterogeneous SystemArchitecture (HSA), M.E Project, June 2014.

32. Aravind Krishnan, Large Graph Applications in OpenCL, M.Tech Project, June 2012.

33. T. Vasu Babu, Incorporating Dynamic Rate Support in StreamIt, M.E. Project, 2009.

34. R. Karthikeyan, A Multi-stage Linear Regression Strategy for Determining Rmax of a TOP500System, M.Tech Project, 2008.

A.2.2 M.E./M.Tech Project Guidance – Before the Current Review Period

35. R. Manikantan, Performance Enhancement Schemes for Superscalar Processors: ExploitingNarrow Width Results and Limited Prefetching, M.E. Project, June 2006.

36. Rajani Pai, Instruction Scheduling Techniques in SUIF for Value Speculation, M.E. Project,June 2005.

37. Dushyant M.P., Enhancing the Performance of Clustered Superscalar Processors, M.E. Project,June 2005.

38. S. Sujatha, A Clustered Digital Library Server with Cooperative Semantic Cache M.Tech.Project, Jan. 2002.

39. Anand Chitipothu, DOMP: OpenMP Programming on Cluster of SMPs, M.Tech. Project,Jan. 2002.

40. Vinodh Kumar R., Dynamic Path Profile Aided Recompilation in a Java Just-In-Time Com-piler, M.E. Project, Jan. 2001.

41. S.M. Sandya, Instruction Scheduling Techniques in SUIF for Value Speculation, M.E. Project,Jan. 2000.

42. Amit H. Rangari, Implementation of Simple COMA Simulator on RSIM, M.E. Project, Jan.2000.

43. Veeral P. Shah, Copy Propagation Optimization and Linear Scan Register Allocation in JITCompilation, M.E. Project, Jan. 2000.

44. N. Sreraman, Compilation Techniques for Exploiting MMX Features of Intel Architecture,M.E. Project, June 1999.

45. R. Srinivasan, Instruction Scheduling for Load Value Speculation, M.E. Project, June 1999.

46. Kumar Valluri, Java Virtual Machine: Just-In-Time Compiler Implementation for SPARC,M.E. Project, Jan. 1999.

47. V. Amar Nath, on Performance Enhancement of Software Distributed Shared Memory, M.E.Project, Jan. 1999.

18

Page 21: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

48. P.S. Udaya Shankara, Granularity Study and Evaluation of Performance Metrics for SharedMemory Accesses in Distributed Shared Memory Architectures, M.E. Project, Jan. 1999.

49. R. Lakshmi, Performance Enhancement and Evaluation of DSM-SP2: A Distributed SharedMemory, M.E. Project, July 1997.

50. B. Hari Krishna, Enhancing the Performance of Multithreaded Architectures, M.E. Project,Jan. 1997.

51. N.S.S. Narasimha Rao, Implementation of Three Software Pipelining Methods, M.E. Project,Jan. 1997.

52. S. Ramesh, DSM-SP2: An Implementation of Distributed Shared Memory on IBM SP2, M.E.Project, Jan. 1997.

53. Biren Gandhi, Distributed Shared Memory on Network of Workstations with TCP/IP, M.E.Project, Jan. 1997.

54. Amod K. Dani, Software Pipelining for VLIW Architectures, M.E. Project, Jan. 1997.

A.3 Courses Offered at Universities Abroad

• Parallel Processing (graduate/undergraduate course), offered in Aug. 2003 in the Dept. ofComputer Science, Arizona State University, Tempe, AZ, USA.

• Topics in Compiler Design: Software and Hardware Tradeoffs (graduate/undergraduate course),offered in Jan. 2003 in the Dept. of Electrical & Computer Engineering, University ofDelaware, Newark, DE, USA.

A.4 Professional Experience at Other Places

Position University From To

Visiting Professor Dept. of Informatique May 2013 July 2013Ecole Normale Superiere, Paris

Visiting Professor Dept. of Computer Science Aug. 2003 Dec. 2003(on Sabbatical from IISc) Arizona State Univ., Tempe, AZ, USA

Visiting Professor Dept. of Electrical & Computer Engineering Oct. 2002 Aug. 2003(on Sabbatical from IISc) Univ. of Delaware, Newark, DE, USA

Assistant Professor Dept. of Computer Science, Memorial Sept. 1994 July 1995Univ. of Nfld., St. John’s, Canada

Assistant Professor Dept. of Electrical Engineering, Sept. 1992 Aug. 1994McGill University, Montreal, Canada

Post-Doctoral Fellow Dept. of Electrical Engineering, Oct. 1990 Aug. 1992McGill University, Montreal, Canada

Post-Doctoral Fellow Dept. of Computer Science, Univ. Aug. 1989 Aug. 1990Western Ontario, London, Canada

19

Page 22: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

APPENDIX B

B List of Publications

During the Period Before the Period TotalUnder Review Under Review

Book Chapters 0 2 2International Journals 8 21 29Top-Tier International Conferences 27 20 47Refereed International Conferences 18 41 59

Total 53 84 137

B.1 Publications During the Period of Review

For each publication, I have indicated the percentage of my contribution in that paper and theacceptance rate (for conference publications), wherever available.

B.1.1 Journal Papers

1. Martin Kong, Antoniu Pop, Louis-NoPouchet, R. Govindarajan, Albert Cohen and P.Sadayappan,“ Compiler/Run-time Framework for Dynamic Data-Flow Parallelization of Tiled Programs”,ACM Transactions on Architecture and Code Optimization (TACO), vol.11, no. 4, December2014. [Contrib.: 10%]

2. Mrugesh R. Gajjar, T. V. Sreenivas, and R. Govindarajan, “Fast Likelihood Computation inSpeech Recognition using Matrices”, vol. 70, no. 2, Page 219-234, 2013. [Contrib.: 20%]

3. T.S. Rajesh Kumar, R. Govindarajan, and C.P. Ravikumar, “On-Chip Memory Architec-ture Exploration Framework for DSP Processor Based Embedded System-on-Chip”, ACMTransactions on Embedded Computing Systems, vol. 11, no. 1, March 2012. [Contrib.:40%]

4. R. Manikantan and R. Govindarajan, “Performance Oriented Prefetching Enhancements Us-ing Commit Stalls”, Journal of Instruction-Level Parallelism, vol 13, pp. 1–28, March, 2011.[Contrib.: 30%]

5. Kaushik Rajan and R. Govindarajan, “A Novel Cache Architecture and Placement Frameworkfor Packet Forwarding Engines”, IEEE Transactions on Computers, vol. 58, no. 8, pp.1009-1025, Aug. 2009. [Contrib.: 30%]

6. V. Santhosh Kumar, R. Nanjundiah, M.J. Thazhuthaveetil, and R. Govindarajan, “Impact ofmessage compression on the scalability of an atmospheric modeling application on clusters”,Parallel Computing, vol.34, no.1, pp.1–16, 2008. [Contrib.: 30%]

20

Page 23: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

7. Rajani Pai and R. Govindarajan, “FEADS: A Framework for Exploring the ApplicationDesign Space on Network Processors”, International Journal Parallel Programming, vol. 35,no. 1, pp.1-31, Feb. 2007. [Contrib.: 40%]

8. H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G.R. Gao, “‘Single-Dimension Soft-ware Pipelining for Multi-Dimensional Loops”, ACM Transactions on Architecture and CodeOptimization, vol.4, no.1, 2007. [Contrib.: 30%]

B.1.2 Top-Tier International Conferences

Conferences rated as A or A* by Australian CORE Ranking (http://www.core.edu.au/). Theacceptance in most of the conferences is through a rigorous review process, often involving 4 or 5reviews per paper, and the acceptance rates are very low (typically 20% – 30%). The conferencepublications also have a high citation index and impact factor. Hence these publications are deemedon par with journal publications.

9. Jayvant Anantpur and R. Govindarajan, “Progress Aware GPU Warp Scheduling”, in the29th IEEE International Parallel and Distributed Processing Symposium, Hyderabad, May2015. [Contrib.: 30% Accept. Rate: 21.8%]

10. Vaivaswatha Nagaraj and R. Govindarajan, “Approximating Flow-Sensitive Pointer AnalysisUsing Frequent Itemset Mining”, in 2015 International Symposium on Code Generation andOptimization (CGO2015), Feb. 2015. (Best Paper Award) [Contrib.: 30% Accept. Rate:27.3%]

11. G.D. Nagendra, M. Mehendale, R. Manikantan, and R. Govindarajan, “Bi-Modal DRAMCache: Improving Hit Rate, Hit Latency and Bandwidth”, in Proc. of the 47th AnnualIEEE/ACM International Symposium on Microarchitecture (Micro-2014), Cambridge, UK,Dec. 2014. [Contrib.: 30% Accept. Rate: 19%]

12. G.D. Nagendra, M. Mehendale, R. Manikantan, and R. Govindarajan, “ANATOMY: AnAnalytical Model of Memory System Performance”, in Proc. of the 2014 ACM internationalconference on Measurement and modeling of computer systems (SIGMETRICS-2014), Austin,TX, USA, June 2014. [Contrib.: 30% Accept. Rate: 16.8%]

13. Prasanna Pandit and R. Govindarajan, “FluidiCL Kernels: Cooperative Execution of OpenCLPrograms on Multiple Heterogeneous Devices”, Proc. of the International Symposium onCode Generation and Optimization (CGO-14), Orlando, FL, USA, Feb. 2014 [Contrib.:35% Accept. Rate: 28%]

14. N. Vaivaswatha and R. Govindarajan, “Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting”, in Proc. of the 22nd International Conference on Parallel Architecture and Com-pilation Techniques (PACT-2013), Edinburgh, Scotland, Sept 2013. [Contrib.: 25% Accept.Rate: 17%]

15. Sreepathi Pai, M. J. Thazhuthaveetil, and R. Govindarajan, “Improving GPGPU Concur-rency with Elastic Kernels”, in Proc. of the 18th International Conference on ArchitecturalSupport for Programming Languages and Operating Systems (ASPLOS-2013), Houston, USA,March 2013. [Contrib.: 30% Accept. Rate: 22.7%]

21

Page 24: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

16. Jayvant Anantpur and R. Govindarajan, “Runtime Dependence Computation and Executionof Loops on Heterogeneous Systems”, Proc. of the International Symposium on Code Gener-ation and Optimization (CGO-13), Shenzhen, China, Feb. 2013. [Contrib.: 30% Accept.Rate: 28.0%]

17. Sreepathi Pai, R. Govindarajan, and M. J. Thazhuthaveetil, “Fast and Efficient AutomaticMemory Management for GPUs using Compiler-Assisted Runtime Coherence Scheme”, inProc. of the 21st International Conference on Parallel Architecture and Compilation Tech-niques (PACT-2012), Minneapolis, USA, Sept 2012. [Contrib.: 30% Accept. Rate:18.9%]

18. G.D. Nagendra, R. Manikantan, M. Mehendale, and R. Govindarajan, “Multiple Sub-RowBuffers in DRAM: Unlocking Performance and Energy Improvement Opportunities”, in Proc.of the International Conference on Supercomputing (ICS-2012), Venice, June 2012. [Con-trib.: 30%]

19. R. Manikantan, Kaushik Rajan, and R. Govindarajan, “Probabilistic Shared Cache Man-agement (PriSM)”, in Proc. of the 39th International Symposium on Computer Architecture(ISCA-2012), Portland, OR, June 2012. [Contrib.: 25% Accept. Rate: 17.9%]

20. Sandya Mannarswamy and R. Govindarajan, “Reconciling Transactional Conflicts with Com-piler’s Help” in the Proc. of the International Symposium on Code Generation and Optimiza-tion (CGO-12), San Jose, CA, USA, Apr. 2012. [Contrib.: 30% Accept. Rate: 28.0%]

21. Sandya Mannarswamy and R. Govindarajan, “Making STMs Cache Friendly with CompilerTransformations” in the Proceedings of the 20th International Conference on Parallel Archi-tectures and Compilation Techniques (PACT-2011), Galveston Island, TX, USA, Oct. 2011.[Contrib.: 30% Accept. Rate: 16.3%]

22. Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan, “Automatic Compilation of MAT-LAB Programs for Synergistic Execution on Heterogeneous Processors”, in Proc. of the 32ndACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-2011), San Jose, California, June 4–8, 2011. [Contrib.: 30% Accept. Rate: 23.3%]

23. Sandya Mannarswamy and R. Govindarajan, “Variable Granularity Access Tracking Schemefor Improving the Performance of Software Transactional Memory”, in Proc. of the 25thIEEE International Parallel & Distributed Processing Symposium (IPDPS-2011), Anchorage,Alaska, USA, May 16–20, 2011. [Contrib.: 25% Accept. Rate: 19.6%]

24. Rupesh Nasre and R. Govindarajan, “Prioritizing Constraint Evaluation for Efficient Points-to Analysis”, in Proc. of the IEEE/ACM International Symposium on Code GenerationOptimization (CGO-2011), Chamonix, France, April 02–06, 2011. [Contrib.: 30% Accept.Rate: 26.7%]

25. R. Manikantan, Kaushik Rajan, and R. Govindarajan, “NUcache: An Efficient MulticoreCache Organization Based on Next-Use Distance”, in Proc. of the 17th International Con-ference on High Performance Computer Architecture (HPCA-2011), San Antonio, Texas,February 12–16, 2011. [Contrib.: 25% Accept. Rate: 18.5%]

26. Rupesh Nasre and R. Govindarajan, “Points-to Analysis as a System of Linear Equations”,in Proc. of the 17th International Static Analysis Symposium(SAS-2010), Perpignan, France,September 14–16, 2010. [Contrib.: 25%]

22

Page 25: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

27. Sandya Mannarswamy, R. Govindarajan, Rishi Surendran, “Region Based Structure LayoutOptimization by Selective Data Copying”, Proceedings of the 18th International Conferenceon Parallel Architectures and Compilation Techniques (PACT-2009), Raleigh, USA, August2009. [Contrib.: 35% Accept. Rate: 18.1%]

28. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil ”Software Pipelined Execution ofStream Programs on GPUs” in Proc. of the International Symposium on Code Generationand Optimization (CGO-09), Seattle, WA, USA, Mar. 2009. [Contrib.: 35% Accept.Rate: 37.1%]

29. R. Manikantan and R. Govindarajan, “Focused Prefetching: Performance Oriented Prefetch-ing Based on Commit Stalls”, in Proc. of the Intl. Conf. on Supercomputing (ICS-08), Kos,Greece, June 2008. [Contrib.: 35% Accept. Rate: 25.0%]

30. Aditya Thakur and R. Govindarajan, “Comprehensive Path-Sensitive Dataflow Analysis”,in Proc. of the International Symposium on Code Generation and Optimization (CGO-08),Boston, MA, USA, Apr. 2008. [Contrib.: 25% Accept. Rate: 31.8%]

31. K. Shyam and R. Govindarajan, “Compiler-Directed Dynamic Voltage Scaling using Pro-gram Phases” in Proc. of the 14th Annual International Conference on High PerformanceComputing (HiPC-07), Goa, India, Dec. 2007. [Contrib.: 40% Accept. Rate: 20.5%]

32. Kaushik Rajan and R Govindarajan, “Emulating Optimal Replacement with Shepherd Cache”,in Proc. of the Annual IEEE/ACM Intl. Symp. on Microarchitecture, Dec. 2007. [Contrib.:30% Accept. Rate: 21.1%]

33. S. Govind, R. Govindarajan, and J. Kuri, “Packet Reordering in Network Processors”, inProc. of the International Parallel and Distributed Processing Symposium (IPDPS-07), April2007. [Contrib.: 30% Accept. Rate: 26.0%]

34. Santosh G. Nagarakatte and R. Govindarajan, “Register Allocation and Optimal Spill CodeScheduling in Software Pipelined Loops Using 0-1 Integer Linear Programming Formulation”,in Proc. of the International Conference on Compiler Construction (CC-07), pp. 126-140,Braga, Portugal, March 2007. [Contrib.: 40% Accept. Rate: 25.0%]

35. K. Shyam and R. Govindarajan, “Compiler Directed Power Optimization for PartitionedMemory Architectures”, in Proc. of the International Conference on Compiler Construction(CC-07), pp. 32-47, Braga, Portugal, March 2007. [Contrib.: 40% Accept. Rate: 25.0%]

B.1.3 Refereed International Conferences

36. G.D. Nagendra, M. Mehendale, and R. Govindarajan, “MicroRefresh: Minimizing RefreshOverhead in DRAM Caches”, in the Proc. of the Memory Systems Conference (MEMSYS2016), Oct. 2016. [Contrib.: 30%]

37. G.D. Nagendra, M. Mehendale, and R. Govindarajan, “A Comprehensive Analytical Per-formance Model of the DRAM Cache”, in the Proc. of the 6th ACM/SPEC InternationalConference on Performance Engineering (ICPE 2015). [Contrib.: 20% Accept. Rate:30.0%]

23

Page 26: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

38. Jayvant Anantpur, and R. Govindarajan, “Taming Control Divergence in GPUs throughControl Flow Linearization”, in the Proc. of the International Conference on Compiler Con-struction (CC-2014), Grenoble, France, April 2014. [Contrib.: 30% Accept. Rate: 29.8%]

39. Raghu Prabhakar, R. Govindarajan, Matthew J. Thazhuthaveetil, “CUDA-For-Clusters: ASystem for Efficient Execution of CUDA Kernels on Multi-Core Clusters”, in Proc. of the18th International European Conference on Parallel and Distributed Computing (Euro-Par2012), Rhodes Island, Greece, August 2012. [Contrib.: 40%]

40. Manikantan, R. Govindarajan, and Kaushik Rajan, “Extended Histories - Improving Regular-ity and Performance in Correlation Prefetchers”, in Proc. of the 6th International Conferenceon High Performance and Embedded Architectures and Compilers, (HiPEAC-2011), Herak-lion, Crete, Greece, January 22–26, 2011. [Contrib.: 30% Accept. Rate: 23.0%]

41. Mrugesh R. Gajjar, T. V. Sreenivas, and R. Govindarajan, “Fast Computation of GaussianLikelihoods using Low-Rank Matrix Approximations”, in the Proceedings of the IEEE Work-shop on Signal Processing Systems (SiPS-2011), Beirut, Lebanon, Oct. 2011. [Contrib.:20%]

42. Sandya Mannarswamy and R. Govindarajan, “Handling Conflicts with Compiler’s help inSoftware Transactional Memory Systems”, in Proc. of the 39th International Conferenceon Parallel Processing (ICPP-2010), San Diego, CA, September 13–16, 2010. [Contrib.:30% Accept. Rate: 21.0%]

43. Sreepathi Pai, R. Govindarajan, M. J. Thazhuthaveetil, “PLASMA: Portable Programmingfor SIMD Heterogeneous Accelerators”, in Workshop on Language, Compiler, and Architec-ture Support for GPGPU, held in conjunction with HPCA/PPoPP 2010, Bangalore, India,January 9, 2010. [Contrib.: 30%]

44. Rupesh Nasre, Kaushik Rajan, R. Govindarajan, Uday P. Khedker, “Scalable Context-Sensitive Points-To Analysis using Multi-Dimensional Bloom Filters”, in Proceedings of theSeventh Asian Symposium on Programming Languages and Systems (APLAS 2009), Seoul,South Korea, December 14–16, 2009. [Contrib.: 30%]

45. Govindarajan, “Reducing Buffer Requirements in Core Routers using Dynamic Buffering”, inProceedings of the 18th International Conference on Computer Communications and Networks(ICCCN 2009), San Francisco, USA, September 2009. [Contrib.: 35%]

46. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil “Synergistic Execution of Stream Pro-grams on Multicore with Accelerators” in Proc. of the ACM SIGPLAN/SIGBED 2009 Con-ference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2009), Dublin,Ireland, June 2009. [Contrib.: 30% Accept. Rate: 25.0%]

47. B.C. Girish and R. Govindarajan, “Improving Performance of Result Caches in NetworkProcessors”, in Proc. of the 15th Annual International Conference on High PerformanceComputing (HiPC-08), Bangalore, Dec. 2008. [Contrib.: 30% Accept. Rate: 14.4%]

48. Mrugesh R. Gajjar, R. Govindarajan, and T.V. Sreenivas, “Online Unsupervised PatternDiscovery in Speech using Parallelization”, in Proc. of the InterSpeech 2008, Brisbane, Sept.2008. [Contrib.: 30%]

24

Page 27: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

49. Sudhakar Surendran, Rubin Parekhji, and R. Govindarajan, “A Systematic Approach toSynthesis of Verification Test-suites for Modular SoC Designs”, in Proc. of the 21st AnnualIEEE SoC Conference, (SoCC-08), Newport Beach, CA, USA, Sept. 2008. [Contrib.: 35%]

50. T.S. Rajesh Kumar, C.P. Ravikumar, and R. Govindarajan, ”Memory Architecture Explo-ration Framework for Cache Based Embedded SoC” in Proc. of the International Conferenceon VLSI Design (VLSI-08) Hyderabad, India, Jan. 2008. [Contrib.: 35%]

51. B.C. Girish and R. Govindarajan, “A Petri Net Model for Evaluating Packet Buffering Strate-gies in a Network Processor”, in Proc. of the Intl. Conf. on Quantitative Evaluation ofSystems (QEST-07), Sept. 2007. [Contrib.: 35%]

52. T.S. Rajesh Kumar, C.P. Ravi Kumar, and R. Govindarajan, “MODLEX: A Multi-ObjectiveData Layout Exploration Framework for Embedded SoC”, in Proc. of the 12th Asia andSouth Pacific Design Automation Conference (ASP-DAC-07), 2007. [Contrib.: 35%]

53. T.S. Rajesh Kumar, C.P. Ravi Kumar, and R. Govindarajan, ”MAX: A Multi ObjectiveMemory Architecture eXploration Framework for Embedded Systems-on-Chip”, in Proc. Ofthe International Conference on VLSI Design (VLSI-07), 2007. [Contrib.: 35%]

B.2 Publications Before the Period of Review

B.2.1 Book Chapters

54. R. Govindarajan, “Instruction Scheduling”, in The Compiler Design Handbook: Optimizationand Machine Code Generation. Editors: Y. N. Srikant and P. Shankar. CRC Press, BocaRaton, FL, 2002.

55. H. Rong and R. Govindarajan, “Advances in Software Pipelining”, in preparation for TheCompiler Design Handbook: Optimization and Machine Code Generation, Second Edition.Editors: Y. N. Srikant and P. Shankar. CRC Press, Boca Raton, FL, 2007.

B.2.2 In International Journals

56. G. Subash Chandar, M. Mehendale, and R. Govindarajan, “Area and Power Reduction of Em-bedded DSP Systems using Instruction Compression and Reconfigurable Encoding” Journalof VLSI Signal Processing, 2006. [Contrib.40%]

57. H. Yang, R. Govindarajan, G.R. Gao, and Z. Hu, “Improving Power Efficiency with CompilerAssisted Cache Replacement”, Journal of Embedded Computing, 2005. [Contrib.30%]

58. K.V. Manjunath and R. Govindarajan, “Performance Analysis of Methods that OvercomeFalse Sharing Effects in Software DSMs”, Journal of Parallel and Distributed Computing,Volume 64, no. 8, Pages 887-907, August 2004. [Contrib.40%]

59. N.P. Manoj, K.V. Manjunath and R. Govindarajan, “CAS-DSM: A Compiler Assisted Soft-ware DSM”, International Journal of Parallel Programming, vol.32, no. 2, pp.77-122, April2004. [Contrib.40%]

25

Page 28: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

60. R. Govindarajan, H.Yang, C. Zhang, J. N. Amaral, G.R. Gao, “Using Instruction Sequencingto Avoid Register Spills in Superscalar Architectures’, IEEE Transactions on Computers, vol.54. no.1, pp. 4-20, Jan 2003.

61. R. Govindarajan, E.R. Altman, and G.R.Gao, “A Theory for Co-Scheduling Hardware andSoftware Pipelines in ASIPs and Embedded Processors”, Jl. of Design Automation for Embed-ded Systems:An International Journal, vol.7, no.3, pp.243-275, March 2002. [Contrib.75%]

62. R. Govindarajan, G.R. Gao, and P. Desai, “Minimizing Buffer Requirement under Rate-Optimal Schedules in Regular Dataflow Networks”, Journal of VLSI Signal Processing, vol.31, no.3, 2002. [Contrib.75%]

63. N. Sreraman and R. Govindarajan, “A Vectorizing Compiler for Multimedia Extension”, Intl.Journal of Parallel Programming, vol. 28, no. 4, pp.363–400, Aug. 2000. [Contrib.50%]

64. R. Govindarajan, N.S.S. Narasimha Rao, E.R. Altman, and G.R. Gao, “Enhanced Co-Scheduling using Reduced MS-State Diagrams”, International Journal of Parallel Program-ming, vol. 28, no. 1, pp.1-45, Feb. 2000. [Contrib.50%]

65. E.R. Altman, R. Govindarajan, and G.R. Gao, “A Unified Framework for Instruction Schedul-ing and Mapping for Function Units with Structural Hazards”, Journal of Parallel and Dis-tributed Computing, vol.49, no.2, pp.259-294, Feb. 1998. [Contrib.: 40%]

66. R. Govindarajan, E.R. Altman, and G.R. Gao, “A Framework for Resource-ConstrainedRate-Optimal Software Pipelining”, IEEE Transactions on Parallel and Distributed Systems,vol.7, no.11, pp.1133-1149, Nov. 1996. [Contrib.: 50%]

67. R. Govindarajan and G.R. Gao, “Rate-optimal Scheduling in Multi-Rate DSP in DSP Ap-plications”, Journal of VLSI Signal Processing, vol.9, pp.211-232, 1995. [Contrib.: 80%]

68. R. Govindarajan, “Exception Handlers in Functional Programming,” IEEE Transactions onSoftware Engineering, vol.19, no.8, pp.826-834, Aug. 1993. [Contrib.: 100%]

69. R. Govindarajan, Sheng Yu, and V.S. Lakshmanan, “Data Flow Implementation of Gener-alized Guarded Commands,” International Journal on Parallel Programming, vol.21, no.4,pp.225-267, 1992. [Contrib.: 60%]

70. R. Govindarajan and L.M. Patnaik, “A Data Structure for Applicative Languages,” acceptedfor publication in The Computer Journal, 1990. [Contrib.: 90%]

71. R. Govindarajan and L.M. Patnaik, “Lenient Execution and Concurrent Execution of Reen-trant Routines,” The Computer Journal, vol.33, no.2, pp.185-187, April 1990. [Contrib.:90%]

72. R. Govindarajan, R.Kumar, D.Kumar, and L.M.Patnaik, “PROMIDS: A PROtotype Multi-rIng Data flow System for Functional Programming,” Microprocessing and Microprogram-ming , pp.161-173, vol.26, no.3, October 1989. [Contrib.: 60%]

73. R. Govindarajan and L.M. Patnaik, “OR-Parallel Evaluation of Logic Programs in Func-tional Languages,” Vivek, Journal of National Center for Software Technology , June 1989.[Contrib.: 90%]

26

Page 29: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

74. L.M. Patnaik, R. Govindarajan, J.Silc, and M.Spegel, “A Critique on Parallel Computer Ar-chitectures,” Journal of Computing and Informatics, vol.12, no.2, pp.47-64, 1988. [Contrib.:40%]

75. L.M. Patnaik, R. Govindarajan, and N.S. Ramadoss, “Design and Performance Evaluationof EXMAN: An EXtended MANchester Data Flow Computer,” IEEE Transactions on Com-puters, vol.C-35, no.3, pp.229-244, March 1986. [Contrib.: 40%]

76. K.Viswanathan Iyer, R. Govindarajan, and L.M.Patnaik, “Simulation of a Concurrency Con-trol Algorithm Using SIMULA,” SIMULA Newsletter , vol.14, no.4, pp.4-9, Nov. 1986. [Con-trib.: 30%]

B.2.3 Top-Tier International Conferences

77. Kaushik Rajan and R. Govindarajan, “Two-level Mapping Based Cache Index Selection forPacket Forwarding Engines”, in Proc. of the Fifteenth Intl. Conf. on Parallel Architecturesand Compilation Techniques (PACT-06), Seattle, Washington, USA, Sept. 2006.

78. Rajesh Vivekanandham, Bharadwaj Amrutur, and R Govindarajan, “A Scalable Low PowerIssue Queue for Large Instruction Window Processors”, in Proc. of the Intl. Conf. onSupercomputing (ICS-06), June 2006. [Contrib.: 30% Accept. Rate: 26.2%]

79. V. Santhosh Kumar, M. J. Thazhuthaveetil, and R Govindarajan, “Exploiting ProgrammableNetwork Interfaces for Parallel Query Execution in Workstation Clusters” in Proc. of theIntl. Parallel and Distributed Processing Symposium (IPDPS 06), Rhodes Island, Greece,April 2006. [Contrib.: 30% Accept. Rate: 23.5%]

80. Kaushik Rajan and R Govindarajan, “Heterogeneously Segmented Cache Architecture for aPacket Forwarding Engine”, in Proc. of the Intl. Conf. on Supercomputing (ICS-05), June2005. [Contrib.: 30% Accept. Rate: 27.6%]

81. H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G.R. Gao, “Single-Dimension SoftwarePipelining for Multi-Dimensional Loops”, in Proc. of 2004 International Symposium on CodeGeneration and Optimization, Palo Alto, CA, Mar. 2004. (Best Paper Award) [Contrib.:20% Accept. Rate: 31.6%]

82. H. Rong, A. Douillet, R. Govindarajan, and G.R. Gao, “Code Generation for Single-DimensionSoftware Pipelining of Multi-Dimensional Loops”, in Proc. of 2004 International Symposiumon Code Generation and Optimization, Palo Alto, CA, Mar. 2004. [Contrib.: 20% Accept.Rate: 31.6%]

83. R. Achutharaman, R. Govindarajan, G. Hariprakash, and Amos R. Omondi, “ExploitingJava-ILP on a Simultaneous Multi-Trace Instruction Issue (SMTI) Processor”, in the Proc.of the Intl. Parallel and Distributed Processing Symposium (IPDPS-03) , Nice, France, April.2003. [Contrib.: 30% Accept. Rate: 29.2%]

84. G. Subash Chandar, M. Mehendale, and R. Govindarajan, “Area and Power Reduction ofEmbedded DSP Systems using Instruction Compression and Reconfigurable Encoding” in theProc. of the Intl. Conf. on Computer Aided Design (ICCAD-2001), San Jose, Nov. 2001.[Contrib.: 30% Accept. Rate: 30.6%]

27

Page 30: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

85. R. Govindarajan, H.Yang, C. Zhang, J. N. Amaral, G.R. Gao, “Minimum Register InstructionSequence Problem: Revisiting Optimal Code Generation for DAGs”, in the Proc. of theIntl. Parallel and Distributed Processing Symposium (IPDPS-2001), San Jose, April 2001.[Contrib.: 60% Accept. Rate: 36.2%]

86. Madhavi Gopal Valluri and R. Govindarajan, “Evaluating Register Allocation and InstructionScheduling Techniques in Out-Of-Order Issue Processors”, in the Proc. of the Intl. Conf. onParallel Architectures and Compilation Techniques (PACT-99), 1999. [Contrib.30%]

87. V. Janaki Ramanan and R. Govindarajan, “Resource Usage Models for Instruction Schedul-ing: Two New Models and a Classification”, in the Proc. of the Intl. Conf. on Supercomputing(ICS-99), 1999. [Contrib.: 50%]

88. C. Zhang, R. Govindarajan, S. Ryan, and G.R. Gao, “Efficient State-Diagram ConstructionMethods for Software Pipelining”, in the Proc. of the Compiler Construction Conference,Amsterdam, March 1999. [Contrib.: 40%]

89. R. Govindarajan, N.S.S. Narasimha Rao, E.R. Altman, and G.R. Gao, “An Enhanced Co-Scheduling Method using Reduced MS-State Diagram”, in the Proc. of the 12th InternationalParallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing,March 1998. [Contrib.: 40%]

90. A. Dani, V.J. Ramanan, and R. Govindarajan, “Register-Sensitive Software Pipelining”, inthe Proc. of the 12th International Parallel Processing Symposium and 9th Symposium onParallel and Distributed Processing, March 1998. [Contrib.: 30%]

91. R. Silvera, J. Wang, G.R. Gao, and R. Govindarajan, “A Register Pressure Sensitive Instruc-tion Scheduler for Dynamic Issue Processors”, in the Proc. of the International Conferenceon Parallel Architectures and Compilation Techniques, San Francisco, Nov. 1997. [Contrib.:20%]

92. R. Govindarajan, E.R. Altman, and G.R. Gao, “Co-Scheduling Hardware and SoftwarePipelines”, in the Proc. of the Second International Symposium on High Performance Com-puter Architecture, San Jose, CA, pp.52-61, Feb. 1996. [Contrib.: 60%]

93. E.R. Altman, R. Govindarajan, and G.R. Gao, “Scheduling and Mapping: Software Pipeliningin the Presence of Structural Hazards”, in the Proc. of the ACM SIGPLAN Conference onProgramming Language Design and Implementation, pp.139-150, La Jolla, CA, June 1995.[Contrib.: 40%]

94. R. Govindarajan, S.S. Nemawarkar and Philip LeNir, “Design and Performance Evaluationof a Scalable Multithreaded Architecture”, in the Proc. of the First International Sympo-sium on High Performance Computer Architecture, pp.298-307, Raliegh, NC, January 1995.[Contrib.: 70%]

95. R. Govindarajan, E.R. Altman, and G.R. Gao, “Minimizing Register Requirement underResource-Constrained Rate-Optimal Software Pipelining”, in the Proc. of the 27th Annual In-ternational Symposium on Microarchitecture, pp.85-94, San Jose, CA, December 1994. [Con-trib.: 50%]

96. Philip LeNir, R. Govindarajan, and S.S. Nemawarkar, “Exploiting Instruction-Level Par-allelism: The Multithreaded Approach”, in the Proc. of the 25th Annual Symposium onMicroarchitecture, pp.189-192, Portland, December 1992. [Contrib.: 40%]

28

Page 31: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

B.2.4 In Refereed Conference Proceedings

97. V. Santhosh Kumar, M. J. Thazhuthaveetil, and R Govindarajan, “Offloading Bloom FilterOperations to Network Processor for Parallel Query Processing in Cluster of Workstations”, inProc. of the Intl. Conf. on High-Performance Computing (HiPC-05), Dec. 2005. [Contrib.:30% Accept. Rate: 13.8%]

98. S. Govind and R. Govindarajan, “Performance Modeling and Architecture Exploration ofNetwork Processors”, in Proc. of the Intl. Conf. on Quantitative Evaluation of Systems(QEST-05), Sept. 2005. [Contrib.: 30%]

99. Uday Khedker and R. Govindarajan, “Compiler analysis and optimizations: What is new?”,Invited Paper, in the Workshop on Cutting Edge Computing, held in conjunction with theIntl. Conf. on High Performance Computing, Hyderabad, Dec. 2003. [Contrib.: 50%]

100. A. Radhika Sarma and R. Govindarajan, “An Efficient Web Cache Replacement Policy”,in the Proc. of the Intl. Conf. on High Performance Computing, Hyderabad, Dec. 2003.[Contrib.: 30% Accept. Rate: 29.3%]

101. Z. Hu, Y. Xie, G.R. Gao and R. Govindarajan, “Code Size Oriented Memory Allocation forTemporary Variables”, in the Proc. of the 5th Workshop on Media and Streaming Processors,held in conjunction with the 36th International Symposium on Microarchitecture (MICRO-36), San Diego, CA, Dec 2003. [Contrib.: 20%]

102. H. Yang, R. Govindarajan, G.R. Gao, and Z. Hu, “Compiler-Assisted Cache Replacement:Problem Formulation and Performance Evaluation”, in the Proc. of the 16th InternationalWorkshop on Languages and Compilers for Parallel Computing, College Station, Texas, Oct.2003. [Contrib.: 20%]

103. V.V.N.S. Sarvani and R. Govindarajan, “Unified Instruction Reordering and Algebraic Trans-formations for Minimum Cost Offset Assignment”, in the Proc. of the 7th International Work-shop on. Software and Compilers for Embedded Systems (SCOPES-2003), Vienna, Austria,September 2003. [Contrib.: 30%]

104. A. Jacquet, V. Janot, C. Leung, G.R. Gao, R. Govindarajan, and T.L. Sterling, “An Exe-cutable Analytical Performance Evaluation Approach for Early Performance Prediction”, inthe Workshop on Massively Parallel Processing (MPP)” held in conjunction with the Intl.Parallel and Distributed Processing Symposium (IPDPS-03) , Nice, France, April. 2003.[Contrib.: 30%]

105. T.S. Rajesh Kumar, R. Govindarajan, and C.P. Ravi Kumar, “Optimal Code and DataLayout in Embedded Systems”, in the Proc. of the Intl. Conf. on VLSI Systems , NewDelhi, India , Jan. 2003. [Contrib.: 40% Accept. Rate: 39.2%]

106. R. Vinodh Kumar, B. Lakshmi Narayanan, and R. Govindarajan, “Dynamic Path ProfileAided Recompilation in a JAVA Just-In-Time Compiler”, in the Proc. of the 9th Intl. Conf.on High Performance Computing , Bangalore, Dec. 2002. [Contrib.: 30% Accept. Rate:39.3%]

107. H. Yang, R. Govindarajan, G.R. Gao, G. Cai, and Z. Hu, “Exploiting Schedule Slacks forRate-Optimal Power-Minimum Software Pipelining”, in the Proc. of the Workshop on Com-pilers and Operating Systems for Low Power (COLP-2002), Charlottesville, Virginia, Sep.2002. [Contrib.: 30%]

29

Page 32: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

108. H. Yang, G.R. Gao, C. Leung, R. Govindarajan, and H. Wu, “On Achieving Balanced PowerConsumption in Software Pipelined Loops”, in the Proc. of the Intl. Conf. on Compilers,Architecture, and Synthesis for Embedded Systems (CASES-2002), Grenoble, France, Oct.2002. [Contrib.: 30%, Accept. Rate: 25%]

109. H. Yang, R. Govindarajan, G.R. Gao, and K. Theobald, “Power-Performance Trade-offs forEnergy-Efficient Architectures: A Quantitative Study”, in the Proc. of the Intl. Conferenceon Computer Design (ICCD-2002), pp.174-179, Frieburg, Germany. [Contrib.: 30% Accept.Rate: 27.2%]

110. K.V. Manjunath and R. Govindarajan, “Hidden Costs in Avoiding False Sharing in SoftwareDSMs” in the. Proc. of the Intl. Conf. on High Performance Computing, pp.294–304,Hyderabad, Dec. 2001. [Contrib.: 30% Accept. Rate: 26.9%]

111. R. Govindarajan, E.R. Altman, and G.R.Gao, “A Theory for Software-Hardware Co-Schedulingfor ASIPs and Embedded Processors”, in the Proc. of the Intl. of Conf. on Application Spe-cific Array Processors, Boston, July 2000. [Contrib.: 75%]

112. V. Janaki Ramanan and R. Govindarajan, ”Resource Usage Modelling for Software Pipelin-ing”, in the Proc. of the 6th International Conference on High Performance Computing(HiPC-99), Calcutta, Dec. 1999. [Contrib.: 30%]

113. R. Govindarajan, C. Zhang, and G.R. Gao, ”Minimal Register Instruction Scheduling: ANew Approach for Dynamic Instruction Scheduling Processors”, in the Proc. of the TwelfthInternational Workshop on Languages and Compilers for Parallel Computing, San Diego,Aug. 1999. [Contrib.75%]

114. Manoj N.P. and R. Govindarajan, “CAS-DSM: A Compiler Assisted Software DistributedShared Memory”, in the Proc. of the 1st Workshop on Software Distributed Shared Memory,Rhodes, June 1999. [Contrib.30%]

115. V. Sricharan and R. Govindarajan, “Cache and TLB Performance in Software DistributedShared Memory”, in the Proc. of the 1st Workshop on Software Distributed Shared Memory,Rhodes, June 1999. [Contrib.30%]

116. Madhavi G. Valluri and R. Govindarajan, “Modulo-Variable Expansion Sensitive Scheduling”in the Proc. of the 5th International Conference on High-Performance Computing, pp.334-341, Chennai, India, Dec. 1998.(Best Paper Award) [Contrib.: 20%] [Contrib.: 30%]

117. V. Janaki Ramanan and R. Govindarajan, “Instruction Scheduling Method Using GroupAutomaton”, in the Proc. of the 6th International Conference on Advanced Computing,Pune, India, Dec. 1998. [Contrib.: 20%]

118. B. Hari Krishna and R. Govindarajan, “Classification and Performance Evaluation of Simul-taneous Multithreaded Architectures”, in the Proc. of the 4th International Conference onHigh-Performance Computing, pp. 34-39, Bangalore, India, Dec. 1997. [Contrib.: 40%]

119. S. Ramesh, R. Lakshmi, and R. Govindarajan, “Distributed Shared Memory on IBM-SP2’, inthe Proc. of the International Conference on Parallel and Distributed Systems, Seoul, Korea,Dec. 1997. [Contrib.: 30%]

30

Page 33: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

120. W.M. Zuberek and R. Govindarajan, “Performance Balancing in Multithreaded Multipro-cessor Systems”, in the Proc. of the 4th Australasian Conference on Parallel and Real-TimeSystems (PART-97), Newcastle, Australia, Sept. 1997. [Contrib.: 40%]

121. R. Govindarajan, F. Suciu, and W. Zuberek, “Timed Petri Net Models of MultithreadedMultiprocessor Architectures”, in the Proc. of the 7th International Workshop on Petri Netsand Performance Models, pp.163-172, Saint Malo, France, June 1997. [Contrib.: 40%]

122. R. Govindarajan and S. Rengarajan, “Buffer Allocation in Regular Dataflow Networks: AnApproach Based on Coloring Circular-Arc Graphs”, in the Proc. of the 3rd InternationalConference on High-Performance Computing, pp.419-424, Trivandrum, India, Dec. 1996.[Contrib.: 70%]

123. R. Govindarajan, E.R. Altman, and G.R. Gao, “Instruction Scheduling in the Presence ofStructural Hazards: An Integer Programming Approach to Software Pipelining”, in the Proc.of the 1995 International Conference on High-Performance Computing, New Delhi, India,pp.291-296, Dec. 1995. [Contrib.: 40%]

124. E.R. Altman, R. Govindarajan, and G.R. Gao, “An Experimental Study of an ILP-BasedExact Solution Method for Software Pipelining”, in the Proc. of the Workshop on Languagesand Compilers for Parallel Computing, Columbus, OH, Aug. 1995. [Contrib.: 40%]

125. R. Govindarajan, G.R. Gao, and P. Desai, “Minimizing Buffer Requirement in Rate-OptimalSchedules for DSP Applications”, in the Proc. of the 1994 Intl. Conference on Application-Specific Array Processors, pp. 87-98, San Francisco, CA, August 1994. [Contrib.: 60%]

126. R. Govindarajan, E.R. Altman, and G.R. Gao, “A Framework for Rate-Optimal Resource-Constrained Software Pipelining”, in the Proc. of the Conference on Vector and ParallelProcessing, (CONPAR-94), pp. 640-651, Linz, Austria, September 1994. [Contrib.: 50%]

127. S.S. Nemawarkar, R. Govindarajan, G.R. Gao, and V.K. Agarwal, “Performance of Intercon-nection Network in Multithreaded Architectures”, in the Proc. of the Parallel Architecturesand Languages, Europe (PARLE-94) Conference, July 1994. [Contrib.: 30%]

128. S.S. Nemawarkar, R. Govindarajan, G.R. Gao, and V.K. Agarwal, “Analysis of MultithreadedMultiprocessor Architectures with Distributed Shared Memory”, in the Fifth IEEE Sympo-sium on Parallel and Distributed Processing, Dallas, pp.114-121, 1993. [Contrib.: 30%]

129. R. Govindarajan and G.R. Gao, “A Novel Framework for Multi-Rate Scheduling in DSPApplications”, in the Proc. of the 1993 Intl. Conf. on Application-Specific Array Processors,Venice, Italy, pp.77-88,1993. [Contrib.: 80%]

130. R. Govindarajan and S.S. Nemawarkar, “SMALL: A Scalable Multithreaded Architecturethat Exploits Large Locality”, in the Proc. of the Fourth IEEE Symposium on Parallel andDistributed Processing, pp.32-39, Dallas, December 1992. [Contrib.: 80%]

131. R. Govindarajan, “Software Fault Tolerance in Functional Languages”, in the Proc. of the16th Annual International Computer Software and Applications Conference, 1992, pp.194-199, Chicago, IL, September 1992. [Contrib.: 100%]

132. R. Govindarajan and S.S. Nemawarkar, “A Large Context Multithreaded Architecture”, inthe Proc. of the Joint Conference on Vector and Parallel Processing, CONPAR 92, Lyon,France, September 1992. [Contrib.: 80%]

31

Page 34: REPORT OF WORK DONE DURING APRIL 2006 – May …drona.csa.iisc.ernet.in/~govind/sprf-2016.pdf · Book Chapters 0 2 2 ... exploiting data-level parallelism, ... use a probabilistic

133. R. Govindarajan, “Shielded Objects: A Fresh Look at Exception Handlers in FunctionalLanguages”, in the Proc. of the 1992 Workshop on Fault-Tolerant Parallel and DistributedSystems, Boston, MA, July 1992. [Contrib.: 100%]

134. S. Nemawarkar, R. Govindarajan, G.R. Gao, and V.K. Agarwal, “Performance Evaluationof Latency Tolerant Architectures”, in the Proc. of the 4th International Conference onComputing and Information, pp.183-186, Toronto, Canada, May 1992. [Contrib.: 30%]

135. G.R. Gao, R. Govindarajan, and P. Panangaden, “Well-Behaved Programs for DSP Compu-tation”, in the Proc. of the Intl. Conference on Acoustics, Speech and Signal Processing, SanFrancisco,pp.V.561-564, March 1992. [Contrib.: 40%]

136. R. Govindarajan, L. Gou, Sheng Yu, and P. Wang, “ParC Project: Practical Constructs forParallel Programming Languages,” in the Proc. of the 15th Annual International ComputerSoftware and Applications Conference, COMPSAC 91, pp.183-189, Tokyo, Japan, September1991. [Contrib.: 30%]

137. R. Govindarajan and Sheng Yu, “Data Flow Implementation of Generalized Guarded Com-mands,” in Proc. of the Conference on Parallel Architectures and Languages Europe, PARLE91, Lecture Notes in Computer Science, 505, Ed. E.H.L. Aarts, J. van Leeuwen, and M. Rem,pp.372-389, Eindhoven, The Netherlands, June 1991. [Contrib.: 60%]

B.3 Citation Report on Publications

My publications have received a total citation of 2156 (as listed google scholar. The citation inthe last 5 years is 956. My H-index is 24 and i10-index is 56. The citation results for mypublications can be viewed at:

http://scholar.google.com/citations?user=y91My88AAAAJ&hl=en

32