php compiler internals

44
(Do not be afraid of) PHP Compiler Internals Sebastian Bergmann June 13 th 2009

Upload: kaplumbaga

Post on 13-Nov-2014

2.246 views

Category:

Documents


3 download

DESCRIPTION

Sebastian Bergmann's "PHP Compiler Internals" slides from Dutch PHP Conference 2009

TRANSCRIPT

Page 1: PHP Compiler Internals

(Do not be afraid of)

PHP Compiler Internals

Sebastian Bergmann

June 13th 2009

Page 2: PHP Compiler Internals

Who I Am

Sebastian Bergmann Involved in the PHP

project since 2000 Creator of PHPUnit Co-Founder and

Principal Consultant with thePHP.cc

Page 3: PHP Compiler Internals

Under PHP's Hood

Server API (SAPI)

(mod_php, FastCGI, CLI, ...)

PHP Core

Request ManagementFile and Network Operations

Extensions

(date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)

Zend Engine

Compilation and ExecutionMemory and Resource Allocation

This slide contains material by Sara Golemon

Page 4: PHP Compiler Internals

How PHP executes code

Lexical Analysis Converts the source from a sequence of characters into a

sequence of tokens

Page 5: PHP Compiler Internals

How PHP executes code

Lexical Analysis Syntax Analysis

Analyzes a sequence of tokens to determine their grammatical structure

Page 6: PHP Compiler Internals

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation

Generate bytecode based on the information gathered by analyzing the sourcecode

Page 7: PHP Compiler Internals

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation Bytecode Execution

Page 8: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

Scan a sequence of characters

Page 9: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

T_OPEN_TAG

Scan a sequence of characters

Page 10: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACE

Scan a sequence of characters

Page 11: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;

Scan a sequence of characters

Page 12: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}

Scan a sequence of characters

Page 13: PHP Compiler Internals

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }

5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}T_WHITESPACET_CLOSE_TAG

Scan a sequence of characters

Page 14: PHP Compiler Internals

Lexical Analysis

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}T_WHITESPACET_CLOSE_TAG

Scan a sequence of characters<?phpif

TRUE

print

'*'

?>

Page 15: PHP Compiler Internals

Lexical AnalysisScan a sequence of characters

Page 16: PHP Compiler Internals

Lexical Analysis

You do not want to write a scanner by hand At least when the code for the scanner should

be efficient and maintainable

Tools such as flex or re2c generate the code for a scanner from a set of rules

Scanner Generators

"if" {return T_IF;

}

<ST_IN_SCRIPTING>"if" {return T_IF;

}

Page 17: PHP Compiler Internals

Lexical AnalysisPHP Tokens

T_ABSTRACT

T_AND_EQUAL

T_ARRAY

T_ARRAY_CAST

T_AS

T_BAD_CHARACTER

T_BOOLEAN_AND

T_BOOLEAN_OR

T_BOOL_CAST

T_BREAK

T_CASE

T_CATCH

T_CHARACTER

T_CLASS

T_CLASS_C

T_CLONE

T_CLOSE_TAG

T_COMMENT

T_CONCAT_EQUAL

T_CONST

T_CONSTANT_ENCAPSED_STRING

T_CONTINUE

T_CURLY_OPEN

T_DEC

T_DECLARE

T_DEFAULT

T_DIR

T_DIV_EQUAL

T_DNUMBER

T_DOC_COMMENT

T_DO

T_DOLLAR_OPEN_CURLY_BRACES

T_DOUBLE_ARROW

T_DOUBLE_CAST

T_DOUBLE_COLON

T_ECHO

T_ELSE

T_ELSEIF

T_EMPTY

T_ENCAPSED_AND_WHITESPACE

T_ENDDECLARE

T_ENDFOR

T_ENDFOREACH

T_ENDIF

T_ENDSWITCH

T_ENDWHILE

T_END_HEREDOC

T_EVAL

T_EXIT

T_EXTENDS

T_FILE

T_FINAL

T_FOR

T_FOREACH

T_FUNCTION

T_FUNC_C

T_GLOBAL

T_GOTO

T_HALT_COMPILER

T_IF

T_IMPLEMENTS

T_INC

T_INCLUDE

T_INCLUDE_ONCE

T_INLINE_HTML

T_INSTANCEOF

T_INT_CAST

T_INTERFACE

T_ISSET

T_IS_EQUAL

T_IS_GREATER_OR_EQUAL

T_IS_IDENTICAL

Page 18: PHP Compiler Internals

Lexical AnalysisPHP Tokens

T_IS_NOT_EQUAL

T_IS_NOT_IDENTICAL

T_IS_SMALLER_OR_EQUAL

T_LINE

T_LIST

T_LNUMBER

T_LOGICAL_AND

T_LOGICAL_OR

T_LOGICAL_XOR

T_METHOD_C

T_MINUS_EQUAL

T_ML_COMMENT

T_MOD_EQUAL

T_MUL_EQUAL

T_NAMESPACE

T_NS_C

T_NEW

T_NUM_STRING

T_OBJECT_CAST

T_OBJECT_OPERATOR

T_OLD_FUNCTION

T_OPEN_TAG

T_OPEN_TAG_WITH_ECHO

T_OR_EQUAL

T_PAAMAYIM_NEKUDOTAYIM

T_PLUS_EQUAL

T_PRINT

T_PRIVATE

T_PUBLIC

T_PROTECTED

T_REQUIRE

T_REQUIRE_ONCE

T_RETURN

T_SL

T_SL_EQUAL

T_SR

T_SR_EQUAL

T_START_HEREDOC

T_STATIC

T_STRING

T_STRING_CAST

T_STRING_VARNAME

T_SWITCH

T_THROW

T_TRY

T_UNSET

T_UNSET_CAST

T_USE

T_VAR

T_VARIABLE

T_WHILE

T_WHITESPACE

T_XOR_EQUAL

Page 19: PHP Compiler Internals

Syntax AnalysisAnalyze a sequence of tokens

Page 20: PHP Compiler Internals

Syntax Analysis

You do not want to write a parser by hand At least when the code for the scanner should

be efficient and maintainable

Tools such as bison or lemon generate the code for a parser from a set of rules

Parser Generators

T_IF '(' expr ')' { ... }statement { ... }elseif_list else_single { ... }

Page 21: PHP Compiler Internals

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

sb@thinkpad ~ % php -dextension=vld.so -dvld.active=1 -dvld.execute=0 if.phpfilename: /home/sb/if.phpfunction name: (null)number of ops: 8compiled vars: noneline # op fetch ext return operands------------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

PHP BytecodeDisassembling with vld

Page 22: PHP Compiler Internals

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

sb@thinkpad ~ % bytekit if.phpbytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/if.phpFunction: mainNumber of oplines: 8

line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6

3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6

6 6 EXT_STMT 7 RETURN 1

PHP BytecodeDisassembling with bytekit-cli

Page 23: PHP Compiler Internals

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

PHP BytecodeBytecode visualization with bytekit-cli

sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php

Page 24: PHP Compiler Internals

1 <?php2 $a = 1;3 $b = 2;4 print $a + $b;5 ?>

sb@thinkpad ~ % bytekit add.phpbytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/add.phpFunction: mainNumber of oplines: 10Compiled variables: !0 = $a, !1 = $b

line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 ASSIGN !0, 1 3 2 EXT_STMT 3 ASSIGN !1, 2 4 4 EXT_STMT 5 ADD ~2 !0, !1 6 PRINT ~3 ~2 7 FREE ~3 6 8 EXT_STMT 9 RETURN 1

PHP BytecodeDisassembling with bytekit-cli

Page 25: PHP Compiler Internals

PHP BytecodeList of Opcodes

NOP

ADD

SUB

MUL

DIV

MOD

SL

SR

CONCAT

BW_OR

BW_AND

BW_XOR

BW_NOT

BOOL_NOT

BOOL_XOR

IS_IDENTICAL

IS_NOT_IDENTICAL

IS_EQUAL

IS_NOT_EQUAL

IS_SMALLER

IS_SMALLER_OR_EQUAL

CAST

QM_ASSIGN

ASSIGN_ADD

ASSIGN_SUB

ASSIGN_MUL

ASSIGN_DIV

ASSIGN_MOD

ASSIGN_SL

ASSIGN_SR

ASSIGN_CONCAT

ASSIGN_BW_OR

ASSIGN_BW_AND

ASSIGN_BW_XOR

PRE_INC

PRE_DEC

POST_INC

POST_DEC

ASSIGN

ASSIGN_REF

ECHO

PRINT

JMPZ

JMPNZ

JMPZNZ

JMPZ_EX

JMPNZ_EX

CASE

SWITCH_FREE

BRK

BOOL

INIT_STRING

ADD_CHAR

ADD_STRING

ADD_VAR

BEGIN_SILENCE

END_SILENCE

INIT_FCALL_BY_NAME

DO_FCALL

DO_FCALL_BY_NAME

RETURN

RECV

RECV_INIT

SEND_VAL

SEND_VAR

SEND_REF

NEW

FREE

INIT_ARRAY

ADD_ARRAY_ELEMENT

INCLUDE_OR_EVAL

UNSET_VAR

UNSET_DIM

UNSET_OBJ

FE_RESET

FE_FETCH

EXIT

FETCH_R

FETCH_DIM_R

FETCH_OBJ_R

FETCH_W

FETCH_DIM_W

FETCH_OBJ_W

FETCH_RW

FETCH_DIM_RW

FETCH_OBJ_RW

FETCH_IS

FETCH_DIM_IS

FETCH_OBJ_IS

FETCH_FUNC_ARG

Page 26: PHP Compiler Internals

PHP BytecodeList of Opcodes

FETCH_DIM_FUNC_ARG

FETCH_OBJ_FUNC_ARG

FETCH_UNSET

FETCH_DIM_UNSET

FETCH_OBJ_UNSET

FETCH_DIM_TMP_VAR

FETCH_CONSTANT

EXT_STMT

EXT_FCALL_BEGIN

EXT_FCALL_END

EXT_NOP

TICKS

SEND_VAR_NO_REF

CATCH

THROW

FETCH_CLASS

CLONE

INIT_METHOD_CALL

INIT_STATIC_METHOD_CALL

ISSET_ISEMPTY_VAR

ISSET_ISEMPTY_DIM_OBJ

PRE_INC_OBJ

PRE_DEC_OBJ

POST_INC_OBJ

POST_DEC_OBJ

ASSIGN_OBJ

INSTANCEOF

DECLARE_CLASS

DECLARE_INHERITED_CLASS

DECLARE_FUNCTION

RAISE_ABSTRACT_ERROR

ADD_INTERFACE

VERIFY_ABSTRACT_CLASS

ASSIGN_DIM

ISSET_ISEMPTY_PROP_OBJ

HANDLE_EXCEPTION

Page 27: PHP Compiler Internals

Extending the Compiler

Page 28: PHP Compiler Internals

Test First!

--TEST--unless statement--FILE--<?phpunless (FALSE) { print 'unless FALSE is TRUE, this is printed';}

unless (TRUE) { print 'unless TRUE is TRUE, this is printed';}?>--EXPECT--unless FALSE is TRUE, this is printed

Zend/tests/unless.phpt

Page 29: PHP Compiler Internals

Extending the Compiler

Add token for unless to the scanner Add rule for unless to the parser Generate bytecode for unless in the compiler Add token for unless to ext/tokenizer

Page 30: PHP Compiler Internals

Add unless scanner token

<ST_IN_SCRIPTING>"if" {return T_IF;

}

<ST_IN_SCRIPTING>"unless" {return T_UNLESS;

}

<ST_IN_SCRIPTING>"elseif" {return T_ELSEIF;

}

<ST_IN_SCRIPTING>"endif" {return T_ENDIF;

}

<ST_IN_SCRIPTING>"else" {return T_ELSE;

}

Zend/zend_language_scanner.l

Page 31: PHP Compiler Internals

Add unless parser rule

%token T_NAMESPACE%token T_NS_C%token T_DIR%token T_NS_SEPARATOR%token T_UNLESS..unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); } . .

Zend/zend_language_parser.y

Page 32: PHP Compiler Internals

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){

}

zend_do_if_cond() is called when an if statement is compiled

Zend/zend_compile.c

typedef struct _znode {int op_type;union {

zval constant;

zend_uint var;zend_uint opline_num;zend_op_array *op_array;zend_op *jmp_addr;struct {

zend_uint var;zend_uint type;

} EA;} u;

} znode;

Page 33: PHP Compiler Internals

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

}

Allocate a new opline in the current oparray

Zend/zend_compile.c

struct _zend_op {opcode_handler_t handler;znode result;znode op1;znode op2;ulong extended_value;uint lineno;zend_uchar opcode;

};

Page 34: PHP Compiler Internals

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;

}

Set the opcode of the new opline to JMPZ (jump if zero)

Zend/zend_compile.c

Page 35: PHP Compiler Internals

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ; opline->op1 = *cond;

}

Set the first operand of the new opline to the if condition

Zend/zend_compile.c

Page 36: PHP Compiler Internals

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));}

Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray

Zend/zend_compile.c

Page 37: PHP Compiler Internals

Add unless to compiler

void zend_do_unless_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));}

All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to use the JMPNZ (jump if not zero) opcode instead of the JMPZ (jump if zero) opcode

Zend/zend_compile.c

Page 38: PHP Compiler Internals

Add unless to compiler

1 <?php2 unless (FALSE) {3 print '*';4 }5 ?>

The generated bytecode

sb@thinkpad ~ % bytekit unless.phpbytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/unless.phpFunction: mainNumber of oplines: 8

line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPNZ true, ->6

3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6

6 6 EXT_STMT 7 RETURN 1

Page 39: PHP Compiler Internals

Run the test

sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt

Build complete.Don't forget to run 'make test'.

=====================================================================PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cliPHP_VERSION : 5.3.0RC3-devZEND_VERSION: 2.3.0PHP_OS : Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/LinuxINI actual : /usr/local/src/php/php-5.3-unless/tmp-php.iniMore .INIs : CWD : /usr/local/src/php/php-5.3-unlessExtra dirs : VALGRIND : Not used=====================================================================Running selected tests.PASS unless statement [Zend/tests/unless.phpt] =====================================================================Number of tests : 1 1Tests skipped : 0 ( 0.0%) --------Tests warned : 0 ( 0.0%) ( 0.0%)Tests failed : 0 ( 0.0%) ( 0.0%)Expected fail : 0 ( 0.0%) ( 0.0%)Tests passed : 1 (100.0%) (100.0%)---------------------------------------------------------------------Time taken : 0 seconds=====================================================================

Page 40: PHP Compiler Internals

Add unless to ext/tokenizerext/tokenizer/tokenizer_data.c

sb@thinkpad tokenizer % ./tokenizer_data_gen.shWrote tokenizer_data.c

Page 41: PHP Compiler Internals

The End

Thank you for your interest!

These slides will be linked soon fromhttp://sebastian-bergmann.de/

You can vote for this talk onhttp://joind.in/582

Page 42: PHP Compiler Internals

Acknowledgements

Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation

Stefan Esser for creating the Bytekit extension that provides PHP bytecode access and analysis features

Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides

Page 43: PHP Compiler Internals

References

http://www.php.net/manual/en/tokens.php

http://www.zapt.info/opcodes.html

Sara Golemon: ”Extending and Embedding PHP”

http://derickrethans.nl/vld.php

http://bytekit.org/

http://github.com/sebastianbergmann/bytekit-cli/

Page 44: PHP Compiler Internals

  This presentation material is published under the Attribution-Share Alike 3.0 Unported license.

  You are free:

✔ to Share – to copy, distribute and transmit the work.

✔ to Remix – to adapt the work.

  Under the following conditions:

● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

  For any reuse or distribution, you must make clear to others the license terms of this work.

  Any of the above conditions can be waived if you get permission from the copyright holder.

  Nothing in this license impairs or restricts the author's moral rights.

License