Compiler++ Evolving the compiler - C2.DLL

Download Compiler++  Evolving the compiler  -  C2.DLL

Post on 24-Feb-2016

125 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Compiler++ Evolving the compiler - C2.DLL. Jim Radigan - Architect C ++ Optimizer. Mission: Evolving the C++ compiler. Evolve the red arrow. $87.7 B. 1. ~Absolute Correctness 2. Compiler throughput 3. Code size 4. Code quality. $100 .0B +. 3,100,000 Transistors. Ivy Bridge . - PowerPoint PPT Presentation

TRANSCRIPT

How to use this template

Compiler++ Evolving the compiler - C2.DLL

Jim Radigan - Architect C++ Optimizer9/3/2013Windows Azure1 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Mission: Evolving the C++ compiler

1. ~Absolute Correctness 2. Compiler throughput3. Code size4. Code quality

$87.7 B$100 .0B +

Evolve the red arrow

3,100,000 Transistors

Ivy Bridge 1.4 Billion Transistors

TEGRA 3 - 5 cores / 128 bit vector instructions

HaswellC++ Built with C++ Windows SQL Office

Mission critical correctness and compile time

Financial impact to the company. The impact we have through these teams is hugeAnd just here, with these three teams, were talking about thousands and thousands of developers having access to the work weve done

9Compiler++ Evolving the compiler How we work

Core Technologies

Where we are goingFull compile, test build Windows N hours

24 cores + 32 Gb memory 3 raid 0 drives

if youre in a hurry 40 coresX86, ARM, X64 - retail and checked

N Applications - then stress a compilers build

Compiler developer bad day

Win8 improved but still a work/life balance thing

Compiler++ Evolving the compiler How we work

Core Technologies

Where we are goingCompiler BusinessAbsolutely NO new compiler optimization switches

Each switch would cost millions $$ Core TechnologiesCode size / stack size / data alignmentVectorization/Parallelization of existing C++SecurityParallelizing C++ control flowAlias analysis

FOR ALL HARDWARE & RUNTIMES!!

Build 2012 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.9/3/201319Code Size / Stack SizeFoo (int p1, int p2, int p3) { int w,x,y,z . if (flag) { w = x = w + z return x } else { y = } [ebp+10] Parameter 3 [ebp+0C] Parameter 2 [ebp+08] Parameter 1 [ebp+04] Return address [ebp+00] Old ebp [ebp -04] Local 1 // w[ebp -08] Local 2 // x[ebp -0C] Local 3 // z or yStack PackingStack Packing ?Bind_DeterminePinned@CBase@@UAEXXZ:638643E0: 8B FF mov edi,edi638643E2: 53 push ebx638643E3: 56 push esi638643E4: 8B F1 mov esi,ecx638643E6: 8B 5E 18 mov ebx,dword ptr[esi+18h]638643E9: 8B 46 04 mov eax,dword ptr [esi+4]638643EC: F6 C3 01 test bl,1638643EF: 74 08 je 638643F9638643F1: 3B 46 08 cmp eax,dword ptr [esi+8]638643F4: 76 1E jbe 63864414638643F6: 5E pop esi638643F7: 5B pop ebx638643F8: C3 ret MORE COLD CODE No Stack Packing (R1 R5 reasons for bad code)?Bind_DeterminePinned@CBase@@UAEXXZ:639E2840: 8B FF mov edi,edi639E2842: 55 push ebp #R1639E2843: 8B EC mov ebp,esp639E2845: 51 push ecx #R2 639E2846: 53 push ebx639E2847: 56 push esi639E2848: 8B F1 mov esi,ecx639E284A: 57 push edi #R3639E284B: 8B 5E 18 mov ebx,dword ptr [esi+18h]639E284E: 8B 46 04 mov eax,dword ptr [esi+4]639E2851: F6 C3 01 test bl,1639E2854: 74 0C je 639E2862639E2856: 3B 46 08 cmp eax,dword ptr [esi+8]639E2859: 76 3F jbe 639E289A639E285B: 5F pop edi #R4639E285C: 5E pop esi639E285D: 5B pop ebx639E285E: 8B E5 mov esp,ebp #R5639E2860: 5D pop ebp639E2861: C3 ret MORE COLD CODE

Its all about

CACHE LINESNTSTATUSNtfsCommonRead ( PIRP_CONTEXT IrpContext, PIRP Irp, BOOLEAN AcquireScb){ NTSTATUS Status; PIO_STACK_LOCATION IrpSp; PFILE_OBJECT FileObject; TYPE_OF_OPEN TypeOfOpen; PVCB Vcb; PFCB Fcb; PSCB Scb; PCCB Ccb; ATTRIBUTE_ENUMERATION_CONTEXT AttrContext; EOF_WAIT_BLOCK EofWaitBlock; PFSRTL_ADVANCED_FCB_HEADER Header; PTOP_LEVEL_CONTEXT TopLevelContext; VBO StartingVbo; LONGLONG ByteCount; LONGLONG ByteRange; ULONG RequestedByteCount; PCOMPRESSION_SYNC CompressionSync = ((void *)0); BOOLEAN FoundAttribute = 0; BOOLEAN PostIrp = 0; BOOLEAN OplockPostIrp = 0; BOOLEAN ScbAcquired = 0; BOOLEAN ReleaseScb; BOOLEAN PagingIoAcquired = 0; BOOLEAN DoingIoAtEof = 0; BOOLEAN Wait; BOOLEAN PagingIo; BOOLEAN NonCachedIo; BOOLEAN SynchronousIo; BOOLEAN CompressedIo = 0; __try { NtfsPrePostIrp( IrpContext, Irp ); if (( (((Fcb->FcbState) & ((0x00000004)))) ) && ( (((Scb->ScbState) & ((0x00000010)))) )) { FsRtlPostPagingFileStackOverflow( IrpContext, Event, NtfsStackOverflowRead ); } else { FsRtlPostStackOverflow( IrpContext, Event, NtfsStackOverflowRead ); } (void) KeWaitForSingleObject( Event, Executive, KernelMode, 0, ((void *)0) ); Status = ((NTSTATUS)0x00000103L); } __finally { if (Resource != ((void *)0)) { (ExReleaseResourceLite(Resource)); } ExFreeToNPagedLookasideList( &NtfsKeventLookasideList, Event ); } } else { if (Irp->Tail.Overlay.AuxiliaryBuffer != ((void *)0)) { IrpContext->Union.AuxiliaryBuffer = (PFSRTL_AUXILIARY_BUFFER)Irp->Tail.Overlay.AuxiliaryBuffer; if (!( (((IrpContext->Union.AuxiliaryBuffer->Flags) & (0x00000001))) )) { Irp->Tail.Overlay.AuxiliaryBuffer = ((void *)0); } } Status = NtfsCommonRead( IrpContext, Irp, 1 ); } break; } __except (NtfsExceptionFilter( IrpContext, (struct _EXCEPTION_POINTERS *)_exception_info() )) { NTSTATUS ExceptionCode; ExceptionCode = _exception_code(); if (ExceptionCode == ((NTSTATUS)0xC0000123L)) { IrpContext->ExceptionStatus = ExceptionCode = ((NTSTATUS)0xC0000011L); Irp->IoStatus.Information = 0; } } TRYEXCEPTTRYFINALLYROOTTry Region Graph asynchronous lifetimesROOTTRY = xEXCEPTTRYX = FINALLYint x, y;

_try {

_try { x = } _finally {

} = x + y = _except (filter()) { = y} Recall Compiler dev. primary concern

C++ Core TechnologiesCode size / stack size / data alignmentVectorization/Parallelization of existing C++SecurityParallelizing C++ control flowAlias analysis

Build 2012 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.9/3/201327C++ Compiler - Auto Parallelism

Vector - all loads before all stores B[0] B[1] B[2] B[3] A[0] A[1] A[2] A[3] A[0] + B[0] A[1] + B[1] A[2] + B[2] A[3] + B[3] xmm0 addps xmm1, xmm0 xmm1xmm1+Simple vector add loop - unaligned for (i = 0; i < 1000/4; i++){

movps xmm0, [ecx] movps xmm1, [eax] addps xmm0, xmm1 movps [edx], xmm0 }

for (i = 0; i < 1000; i++) A[i] = B[i] + C[i];

Compiler looks across loop iterations !Auto Parallelism/Vectorization for C++Fo