php|architect (may 2005)

63
This copy is registered to: Linn Wilson [email protected]

Upload: others

Post on 11-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: php|architect (May 2005)

This copy is registered to:Linn [email protected]

Page 2: php|architect (May 2005)

NEXCESS.NET Internet Solutions304 1/2 S. State St.Ann Arbor, MI 48104-2445

h t t p : / / n e x c e s s . n e t

PHP / MySQL SPECIALISTS!

Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions

POPULAR SHARED HOSTING PACKAGES

MINI-ME $695

POPULAR RESELLER HOSTING PACKAGES

500 MB Storage15 GB Transfer50 E-Mail Accounts25 Subdomains25 MySQL DatabasesPHP5 / MySQL 4.1.XSITEWORX control panel

/mo SMALL BIZ $2195

2000 MB Storage50 GB Transfer200 E-Mail Accounts75 Subdomains75 MySQL DatabasesPHP5 / MySQL 4.1.XSITEWORX control panel

/mo

NEXRESELL 1 $1695

900 MB Storage30 GB TransferUnlimited MySQL DatabasesHost 30 DomainsPHP5 / MYSQL 4.1.XNODEWORX Reseller Access

All of our servers run our in-house developed PHP/MySQL

server control panel: INTERWORX-CP

INTERWORX-CP features include:

- Rigorous spam / virus filtering

- Detailed website usage stats (including realtime metrics)

- Superb file management; WYSIWYG HTML editor

INTERWORX-CP is also available for your dedicated server. Just visit http://interworx.info for more information and to place your order.

WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS

LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!

ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGEVISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS

Dedicated & Managed Dedicated server so lut ions a lso ava i lab le

Serving the web since Y2K

/mo NEXRESELL 2 $5995

7500 MB Storage100 GB TransferUnlimited MySQL DatabasesHost Unlimited DomainsPHP5 / MySQL 4.1.XNODEWORX Reseller Access

/mo

C O N T R O L P A N E L:

phpphp 5

phpphp 4

NEW! PHP 5 & MYSQL 4.1.X

PHP4 & MySQL 3.x/4.0.x options also available

We'll install any PHP extension you need! Just ask :)

128 BIT SSL CERTIFICATES

AS LOW AS $39.95 / YEAR

DOMAIN NAME REGISTRATION

FROM $10.00 / YEAR

GENEROUS AFFILIATE PROGRAM

UP TO 100% PAYBACK

PER REFERRAL

30 DAYMONEY BACK GUARANTEE

FREE DOMAIN NAMEWITH ANY ANNUAL SIGNUP

4.1.x

3.x/4.0.x

Page 4: php|architect (May 2005)

II NN

DD

EE XX

6 EDITORIALYou Know Nothing

7 What’s New!

51 Test PatternThe Never Ending Backlogby Marcus Baker

55 Product ReviewJaws 0.5: Just When You Thought itwas Safe to Go Back in the Waterby Peter B. MacIntyre

59 Security CornerPersistent Logins

62 exit(0);Oh No, Not Again!by Marco Tabini

10 The Anatomy of a Hit:An Advanced PHP & MySQL Hit Counterby John R. Zaleski, Ph.D.

22 Solving the Unicode Puzzleby Michael Toppa

29 XMLPullAn Alternative to DOM & SAX

by Markus Nix

40 More on Advanced Sessionsand Authentication in PHP5by Ed Lecky-Thompson

TABLE OF CONTENTS

php|architectTM

Departments Features

Have you had your PHP today?Have you had your PHP today?

The Magazine For PHP Professionals

http://www.phparch.com

NEW COMBO NOW AVAILABLE: PDF + PRINT

NNEEWWLLoowweerr PPrriiccee!!

Page 6: php|architect (May 2005)

NNOOTTHHIINNGGyyoouu kknnooww

EEDDIITTOORRIIAALL

Software development is humbling. Just when you thinkyou’ve got a solid handle on every last (important) bit of tech-nology you need to complete the project at hand, you’re

often slapped in the face with the news that you’re just plainwrong. This news can be both frustrating, and encouraging (at thesame time, believe it or not).

Let me set the scene. Your team has been commissioned withadding a new section to your corporate intranet. In the course ofthe addition, you adopt a new technology of some sort. Perhapsthis is a new database abstraction layer, or a different manner ofhandling HTML forms. It could be anything; it doesn’t really mat-ter. Your team has worked on this new module for two months.You’ve put all of your collective knowledge and experience intothe project. The launch date is in a couple days, and you’re actu-ally going to make your deadline.

So, this sounds pretty good so far; what could go wrong?Perhaps one of the directors is about to walk in with a must-havefeature that needs to be in the next release, and will disrupt yourschedule? Sure. This happens all the time, but it’s not the scenarioI’m thinking of—that’s just frustrating, and rarely the least bitencouraging. The bad situation that I’m thinking of is (oddly) freeof managerial influence.

This new technology that you’ve adopted is really great. It has afew problems, but you’ve managed to work around them. Allthings considered, it’s saved you many hours in the course of thepast few weeks, and you’ve been bragging about it to your devel-oper-friends who work at different companies.

Then, in the course of your daily, duly-diligent reading of variousPHP news sources, you discover a brand-new, just-released-yester-day extension that could replace this other new technology you’vealready adopted. Not only is it a suitable replacement, but it solvesall of the problems you had to work around, and also opens thedoor to new possibilities that you didn’t even consider.

Frustrating because you’re about to release a critical project thatencompasses technology that you’ve just discovered is inferior. Butencouraging because you’re now awaiting the day you’re allowedto rip out all of that legacy (but, ironically, not-yet-released) codeand employ a superior product.

So, what’s my point? Simple: I know nothing. What I think Iknow is only temporary, and could be supplanted at any moment.My life as a developer is a constant journey of staying on top ofthings, and no matter how much I think I “have it covered,”there’s always something new about to appear on the weblog,newsgroup, or source repository of tomorrow.

I hope the articles in this issue open your eyes to new ideas.Especially the XMLPull article, which I think is pretty sweet new(well, newer) technology, and that it’s not too late to incorporatethese ideas into your current—or next—project.

May 2005 ● PHP Architect ● www.phparch.com 6

php|architectVolume IV - Issue 5

May, 2005

PublisherMarco Tabini

Editor-in-ChiefSean Coates

Editorial TeamArbi ArzoumaniPeter MacIntyre

Eddie Peloke

Graphics & LayoutAleksandar Ilievski

Managing EditorEmanuela Corso

News EditorLeslie Hill

[email protected]

AuthorsMarcus Baker, Ed Lecky-Thompson, Peter B. MacIntyre, Chris Shiflett,

John R. Zaleski, Ph.D., Michael Toppa,Markus Nix

php|architect (ISSN 1709-7169) is published twelve times a year byMarco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road,Toronto, ON M5M 4N5, Canada.

Although all possible care has been placed in assuring the accuracy ofthe contents of this magazine, including all associated source code, list-ings and figures, the publisher assumes no responsibilities with regardsof use of the information contained herein or in all associated material.

Contact Information:General mailbox: [email protected]: [email protected]: [email protected] & advertising: [email protected] support: [email protected]

Copyright © 2003-2005 Marco Tabini &Associates, Inc. — All Rights Reserved

EE DD

II TT OO

RR

II AA

LL RR

AA

NN

TT SS

TM

Page 7: php|architect (May 2005)

Solar 0.2.0 paul-m-jones.com announces the release of Solar 0.2.0.What is it? According to solarphp.com: "Solar is a simpleobject library and application repository (that is, a com-bined class library and application component suite) for PHP5."

"Solar provides simple, easy-to-comprehend classes and components for the com-mon aspects of web-based rapid application development, all under the LGPL."

Solar is designed for developers who intend to distribute their applications to theworld. This means the database driver functions work exactly the same way for eachsupported database. It also means that localization support is built in from the start."

Get all the latest info from solarphp.com.

phpBB 2.0.14The phpBB Group announces the release of phpBB 2.0.14,the "We know we are (not) furry" edition. "This releaseaddresses some bugfixes as well as fixing some minor non-critical security issues. All issues not reported to us before being released are notcredited to the founder, as usual."

"As with all new releases, we urge you to update as soon as possible. You can, ofcourse, find this download on our downloads page (http://www.phpbb.com/down-loads.php). As usual, three packages are available to simplify your update."

"The Full Package contains entire phpBB2 source and English language package."

For more information visit: http://phpbb.com

NNEEWW SSTTUUFFFF

May 2005 ● PHP Architect ● www.phparch.com 7

What’s New!

NNEE

WW

SSTT

UUFF

FF

Vogoo PHP API v0.8.2Vogoo-API.com is happy to announcethe release of Vogoo PHP API 0.8.2.Vogoo-API.com announces: Vogoo PHPAPI v0.8.2 is a free PHP API licensedunder the terms of the GNU GPL. WithVogoo PHP API, you can easily andfreely add professional collaborativefiltering features to your Web Site.

v0.8.2 features• Handles all member/product

votes (available since v0.8)• Fast computation of similarities

between members (availablesince v0.8)

• One-to-one product recommen-dations (available since v0.8)

• Ability for members to specifywhen they are not interested ina product recommendation

Planned features for future versions• New engine based on products

recommendations that givesbetter performances when littleinformation is available on themember.

• Real time targeted ads• Handles multiple product cate-

gories• Collaborative filtering features

available for non-member visi-tors

• Administration tool• Engine for 'related sales'.• Engine for 'related sales'.

Check out Vogoo-API.com for allthe latest info.

The Zend PHP Certification Practice Test Book is now available!

We're happy to announce that, after many months of hard work, the Zend PHPCertification Practice Test Book, written by John Coggeshall and Marco Tabini, is nowavailable for sale from our website and most book sellers worldwide!The book provides 200 questions designed as a learning and practice tool for theZend PHP Certification exam. Each question has been written and edited by fourmembers of the Zend Education Board--the very same group who prepared theexam. The questions, which cover every topic in the exam, come with a detailedanswer that explains not only the correct choice, but also the question's intention,pitfalls and the best strategy for tackling similar topics during the exam.

For more information, visit hhttttpp::////wwwwww..pphhppaarrcchh..ccoomm//cceerrtt//mmoocckk__tteessttiinngg..pphhpp

Page 8: php|architect (May 2005)

NNEEWW SSTTUUFFFF

May 2005 ● PHP Architect ● www.phparch.com 8

Check out some of the hottest new releases from PEAR.

MDB2_Schema 0.2.0PPEEAARR::::MMDDBB22__SScchheemmaa enables users to maintain RRDDBBMMSS independent schema files in XML that can be used to create, alter and drop

database entities and insert data into a database. Reverse engineering database schemas from existing databases is also supported.

The format is compatible with both PEAR::MDB and Metabase.

MDB2 2.0.0beta4PEAR MDB2 is a merge of the PEAR DB and Metabase php database abstraction layers.

Note that the API will be adapted to better fit with the new PHP 5-only PDO before the first stable release.

It provides a common API for all supported RDBMS. The main difference to most other DB abstraction packages is that MDB2 goes

much further to ensure portability. Among other things, MDB2 features:

• An OO-style query API

• A DSN (data source name) or array format for specifying database servers

• Datatype abstraction and on demand datatype conversion

• Portable error codes

• Sequential and non sequential row fetching as well as bulk fetching

• Ability to make buffered and unbuffered queries

• Ordered array and associative array for the fetched rows

• Prepare/execute (bind) emulation

• Sequence emulation

• Replace emulation

• Limited Subselect emulation

• Row limit support

• Transactions support

• Large Object support

• Index/Unique support

• Module Framework to load advanced functionality on demand

• Table information interface

• RDBMS management methods (creating, dropping, altering)

• RDBMS independent xml based schema definition management

• Reverse engineering schemas from an existing DB (currently only MySQL)

• Full integration into the PEAR Framework

• PHPDoc API documentation

Currently supported RDBMS:

• MySQL (mysql and mysqli extension)

• PostGreSQL

• Oracle

• Frontbase

• Querysim

• Interbase/Firebird

• MSSQL

• SQLite

• Others soon to follow.

Cache 1.5.5RC1With the PEAR Cache, you can cache the result of certain function calls, as well as the output of a whole script run, or share data

between applications.

DB_DataObject_FormBuilder 0.14.0DB_DataObject_FormBuilder will aid you in rapid application development using the packages DB_DataObject and HTML_QuickForm.

For having a quick but working prototype of your application, simply model the database, run DataObject's createTable script over it

and write a script that passes one of the resulting objects to the FormBuilder class. The FormBuilder will automatically generate a sim-

ple but working HTML_QuickForm object that you can use to test your application. It also provides a processing method that will auto-

matically detect if an iinnsseerrtt(()) or update() command has to be executed after the form has been submitted. If you have set up

DataObject's links.ini file correctly, it will also automatically detect if a table field is a foreign key and will populate a selectbox with the

linked table's entries. There are many optional parameters that you can place in your DataObjects.ini or in the properties of your

derived classes, that you can use to fine-tune the form-generation, gradually turning the prototypes into fully-featured forms, and you

can take control at any stage of the process.

Net_GeoIP 0.9.0alpha1A library that uses Maxmind's GeoIP databases to accurately determine geographic location of an IP address.

Page 9: php|architect (May 2005)

NNEEWW SSTTUUFFFF

May 2005 ● PHP Architect ● www.phparch.com 9

Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.

archive 0.2The archive extension allows reading and writing tar and cpio archives using libarchive(http://people.freebsd.org/~kientzle/libarchive/).

xmlReader 1.0.1This extension wraps the libxml xmlReader API. The reader acts as a cursor going forward on the document stream and stopping ateach node in the way. xmlReader is similar to SAX though uses a much simpler API.

runkit 0.1.0Replace, rename, and remove user defined functions and classes. Define customized superglobal variables for general purpose use.Execute code in restricted environment (sandboxing).

mqseries 0.8.0This package provides support for IBM Websphere MQ (MQSeries).

colorer 0.2Colorer take5 is a syntax highlighting and text parsing library, that provides services of text parsing in host editor systems in real-timeand transforming results into colored text. For details, see http://colorer.sourceforge.net/

While colorer is primarily designed for use with text editors, it can be also used for non-interactive syntax highlighting, for example,in web applications. This PHP extension provides basic functions for syntax highlighting.

CONFERENCES

ApacheCon Europe 05ApacheCon.com announces:

"ApacheCon Europe, the official conference of the Apache Software Foundation (ASF) will be held July 18-22 in Stuttgart, Germany.For the forth consecutive year, half- and full-day pre-conference tutorials offer real world insight, techniques, and methodologiespivotal to the increasing demand for Open Source software. Topics include Scalable Internet Architectures, Web Services, PHP,mod_perl, Apache HTTP Server, Java, XML, Subversion, and SpamAssassin.

The three main conference days offer a wide range of beginner, intermediate and advanced sessions. ApacheCon attendees havemore than 70 sessions to choose from, to learn firsthand the latest developments of key Open-Source projects including the ApacheHTTP Server, the world's most popular web server software.

With plenty of room for networking and peer discussions, attendees can meet ASF Members and participants during the ApacheConExpo, evening events, Birds Of a Feather sessions and a number of informal social gatherings."

For more information visit: http://www.apachecon.com/

VS.Php 1.1.1Jcx.Software brings news of the immediate availability ofVS.Php version 1.1.1. This update adds support for PhpDoccommenting, secure ftp deployment capabilities and manybug fixes

PhpDoc is a powerful feature of PHP that allows the devel-oper to add comments to the source code that can be usedto generate documentation. VS.Php uses this information toprovide a better intellisense content. For instance, VS.Php isable to parse those comments to determine what type is aparticular variable. Intellisense uses this information to bet-ter help the developer. This update also adds support forsecure ftp protocol for deploying applications through asecure connection.

For information or to download VS.Php, visit:http://www.jcxsoftware.com/

PHPEdit 1.2PHPEdit proudly announces the release of thelatest version, PHPEdit 1.2

Next major version of PHPEdit is finally available for down-load. This version includes lots of changes in its internals, andadds new, powerful features to the IDE, like complete PHP5support, real-time syntax checking, jump to declaration,SimpleTest integration, new document templates,phpDocumentor Wizard and lots of enhancements in existingtools like CodeHint, CodeInsight and CodeBrowser.

This version is available for free to all our customers. Youcan download it and test it for 30 days. You can also buy alicense to avoid the time limit.

To grab the latest version, visithttp://www.waterproof.fr/products/PHPEdit/

Page 10: php|architect (May 2005)

The following methodology was motivated by arequest from a client of mine who asked me toprovide a web page access counter for their main

corporate web site. A condition of the deal, though,was that they did not want to show the actual numberof accesses, publicly, on the web site, itself. Instead,they wanted to keep track this data privately.

Their reasons for omitting a public counter were inkeeping with the idea that they did not want to broad-cast the activity on their site to all visitors, and, in keep-ing with the tone of their message, did not desire todisplay a typical web page access counter on their site.

Instead, they wanted an access counter that wouldprovide them with a means of comparing and contrast-ing the number of accesses from day to day so thatthey could analyze advertising impacts on the numberof visitors who were hitting their site.

As you may know, numerous types of Web countersexist that are wide ranging in their capabilities andstyles. However, I wanted to tailor a solution for myclient that would keep track of the number of accessesto their site, while providing a tool to view these datain a manner that was meaningful, and comparative.The output would provide an at-a-glance summary thatwould allow my client to assess the effectiveness ofadvertising campaigns with respect to changes in site

activity.What developed was a custom hit counter which

continues to evolve over time—an example screenshotcan be seen in Figure 1. The benefits of this hit count-er are not so much in its uniqueness as in the possibili-ties it offers to the average PHP developer who is inter-ested in evolving their skills in the domain of PHP,

REQUIREMENTS

PHP5.0 or greater(5.0.4 available)

OSWin2K Prof, Win2K Advanced Server, WinXP SP1/SP2

Other SoftwareMySQL version 4.0 or greater (4.1 available)

Code Directory hitcounter

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

10

The Anatomy of a HitAn Advanced PHP & MySQL Hit Counter

by John R. Zaleski, Ph.D.

The combined approach of capturing web page access,and charting the results provides a simple standalonecapability for graphically displaying hit counts to a website that requires only a basic working knowledge of PHPand MySQL, yet provides a basic model for expanding anddeveloping a much more sophisticated counter.Furthermore, the methodology for charting the hit countdata can be decoupled from basic web page access count-ing for use in academic, business, or other types of datamining applications where data charting and mining pro-vide a unique way of comparing and contrasting data asthey change over time.

FFEE

AATT

UURR

EE

RESOURCES

URL hhttttpp::////wwwwww..ttiizzaagg..ccoomm//mmyyssqqllTTuuttoorriiaall//

URLhhttttpp::////pphhpp..rreessoouurrcceeiinnddeexx..ccoomm//CCoommpplleettee__SSccrriippttss//AAcccceessss__CCoouunntteerrss//TTeexxtt__BBaasseedd//

ii

Page 11: php|architect (May 2005)

FFEEAATTUURREEThe Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

May 2005 ● PHP Architect ● www.phparch.com 11

MySQL, and user interface design.The counter and graphing methodology I provide

here are very simple to understand and can be modi-fied and used for many applications, even beyond webpage access counting.

Calling the Hit CounterThe visual hit counter methodology consists of two sep-arate pieces of code: one for incrementing hit countstatistics on a web page, and another for analyzing andmining those statistics for relevant value. The decisionto separate these two sets of functionalities is some-what based on heuristics, but are born out of logic: byseparating the processing from the actual hit counting,we remove the potential performance impacts associat-ed with database access for each visit to a web page.Instead, we assign the analytical data mining of the sta-tistics themselves to a web site dedicated to their study.This has the overall effect of reducing the load time ofthe original web site so that users are not impacted.

To implement the data collection part of the process,the initial step in any web page involves incorporatingthe following lines of code:

<!— Add the client hit counter —><?php include “./hc.php”; ?><!— End body tag —>

The hhcc..pphhpp file is then included in the web page, at thedesired location. Those wishing to make use of thismethodology need only include the above code seg-ment in their PHP page (once all supporting files havebeen uploaded to the server), and the hit counterbecomes operational.

The hhcc..pphhpp code contains the logic to open a datafile (hhiittccoouunntteerr..ddaatt), increment a counter, and storevarious other statistics to the opened file each time aweb page with the preceding include statement isencountered.

We begin the code in hhcc..pphhpp by assigning the nameof the data file to the variable $$CCOOUUNNTT__FFIILLEE:

$COUNT_FILE = “hitcounter.dat”;// ...if ( filesize( $COUNT_FILE ) > 0 ) {

$contents = fread ( $fp, filesize( $COUNT_FILE ));…}

If the file referred to by $$CCOOUUNNTT__FFIILLEE exists, and alreadycontains data, we can assume the contents are theresults of previous pages accesses. So, we read the con-tents of the entire file. Upon reading the last value, Iassign the content to the $$ccoonntteennttss variable, incrementthe value by 1, and append the new value to the hhiittccoouunntteerr..ddaatt file.

If this is the first time the web page has beenaccessed, the file is empty (or the file does not exist), sowe have to create the file and write new data to it. Inaddition to simply writing the current counter value, Ialso write the date and time stamp; this is to facilitate

the data mining process. The hhiittccoouunntteerr..ddaatt file hasthe following format:

[1] 23 14 45 PM Wednesday July 28th 2004 1 [2] 06 19 09 AM Thursday July 29th 2004 2 [3] 08 29 13 AM Thursday July 29th 2004 3

Note that much more information can be added (suchas the identity of those accessing the web page).However, that code would need to be added to thestructure of the hit count listing. The code fragmentresponsible for writing the output listing above is:

fwrite( $fp,”[“.$counter . “] “.date(“h:i A l F dSY”).” “.

$counter.” \n”);

The entire code listing for the hit counter is containedin Listing 1. It is important to set the permissions to per-mit the hhcc..pphhpp file to read and write files in the directo-ry in which it is placed. If this is not done properly, thescript will be unable to write to the hhiittccoouunntteerr..ddaatt file.

Plotting Preliminaries

Plotting preparation is accomplished using the ssiitteeiinn--ddeexx..pphhpp file (Listing 2). As I explained earlier, I hadopted to create the hit counter method independentlyof the plotting code to decouple the hit countermethod from the database. This serves several purpos-es. First, it allows those interested in just a plain hitcounter to implement it without requiring them tomaster the techniques of database connectivity.Second, this takes performance considerations intoaccount by avoiding database access during the count-er incrementing process. Third, and finally, this enablesthe user to alter and improve the plotting routine inde-pendently of the hit counter so that accurate statisticscan continue to be kept by keeping the index pageintact.

It will be noted that in the hit counter method Ideveloped in Listing 1, there is no direct output of thenumber of hits to the Web page. This is a matter ofchoice for the Web page owner. Sometimes individuals

Figure 1

Page 12: php|architect (May 2005)

perceive that, if the count is too low, this can bodepoorly for return visits, while others believe that the hitcount statistic may be seen as inappropriate or tacky forthe particular site. I manage several sites for local busi-nesses, and I have found have experienced both kindsof sentiments from the business owners. Thus, by cre-ating this separate method, and only publishing thelink to a site that is not directly associated with the webindex page and its child links, the business owners canprivately view the web page statistics to determine howmany accesses have been made. They can also viewwhen these hits occurred, in the course of the pastweeks, and months, and correlate the data to externalevents (for instance, during periods of specific types ofadvertising).

Updating the DatabaseI begin by opening a connection to the database andentering all existing data from the hit counter methodinto it. This is accomplished in the ssiitteeIInnddeexx..pphhpp code:

$conn = mysql_connect(“localhost”, “root”,”admin”);

In the examples I provide, everything is run on the localmachine (llooccaallhhoosstt), and I have set the username andpassword to rroooott and aaddmmiinn, respectively. The name ofthe database instance can be arbitrarily defined by theuser; I chose ssiitteessttaattss. Developers have their ownnaming conventions, and I’m merely giving you some

insight into my own. So, selecting the appropriatedatabase is accomplished via the following statement:

mysql_select_db(“sitestats”,$conn)or die(“Could not open sitestats: “ .

mysql_error());

The “oorr ddiiee” clause allows me to catch any errors andkick them out for debugging purposes, should a con-nection problem arise. I now read the table of siteentries and find the last value so that it can be updatedwith the latest data:

$table = “sitevisits”;$check = “select * from $table”;$qry = mysql_query($check)

or die (“Could not match data because “ .mysql_error());$nRows = mysql_num_rows($qry);

This query allows me to determine the current numberof rows contained in the table–this will be necessarylater. In addition, I load an array with the data that I justread. To plot the data, I need it in a form that I canmanipulate in memory:

while ($newArray = mysql_fetch_array($qry) ) {$visits = $newArray[‘visits’];if ( strcmp( $debug, “yes” ) == 0 )

echo “ maxVisits = “ . $maxVisits .“ value from db = “ . $visits . “<br>”;

if ( $visits > $maxVisits ) $maxVisits = $visits;}

From this segment, we determine the number of visits

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

12

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

Listing 1 (cont’d)

50 if ( $debug == 1 ) 51 echo “ stop: “ . $stop . “<br>”; 5253 $previous_count = substr( $contents, $start, 54 $stop-2 ); 5556 if ( $debug == 1 ) 57 echo “ Previous Count: “ . $previous_count 58 . “ <br>”; 5960 $counter = 1 + (int) $previous_count; 6162 if ( $debug == 1 ) 63 echo “ Counter: “ . $counter . “<br>”; 6465 fwrite($fp, “[“ . $counter . “] “ . 66 date(“H i s A l F dS Y”) . “ “ . $counter 67 . “ \n” ); 6869 fclose($fp); 70 } // endif 717273 // If file exists, but has no content, this means it is 74 // the first time the counter is being used. In this 75 // instance, write the counter number and the date/time 76 // stamp to the hit counter file, with the counter 77 // number = 1. 7879 if ( filesize( $COUNT_FILE ) == 0) { 80 fclose( $fp ); 8182 $fp = fopen(“$COUNT_FILE”, “a”); 8384 $counter = 1; 8586 if ( $debug == 1 ) echo “[“ . $counter . “] “ 87 . date(“h:i A l F dS, Y”); 8889 fwrite( $fp, “[“ . $counter . “] “ 90 . date(“h:i A l F dS Y”) . “ “ . $counter . “ \n” ); 91 fclose( $fp ); 9293 } // end if filesize = 0 94 } else { 95 echo “Can’t find file, check ‘\$file’<BR>”; 96 } 97 ?>

Listing 1

1 <?php 2 // hc.php 34 $debug = 0; 56 $ra = $_SERVER[“REMOTE_ADDR”]; 7 $rh = $_SERVER[“REMOTE_HOST”]; 89 $COUNT_FILE = “hitcounter.dat”;

1011 $counter = 0; $start = 0; $stop = 0; 1213 if (file_exists($COUNT_FILE)) { 14 $fp = fopen(“$COUNT_FILE”, “r”); 1516 // If file exists, and has content, read that content, 17 // extract the counter value, add 1 to it, and re-write 18 // to the counter data file. 19 // 2021 if ( filesize( $COUNT_FILE ) > 0 ) { 22 $contents = fread ( $fp, filesize( $COUNT_FILE ) ); 23 if ( $debug == 1 ) echo $contents; 2425 $stringlength = strlen($contents); 26 fclose( $fp ); 2728 $fp = fopen(“$COUNT_FILE”, “a”); 2930 $i = 0; 3132 while ( $i < $stringlength ) 33 { 34 $char = $contents{$i}; 35 $i = $i + 1; 3637 if ( $char == “[“ ) { 38 if ( $debug == 1 ) 39 echo “<br> Found [ “ . $i . “<br>”; 40 $start = $i; 41 } 42 if ( $char == “]” ) { 43 if ( $debug == 1 ) echo “ Found ] “ . $i . “<br>”; 44 $stop = $i; 45 } 46 } 4748 if ( $debug == 1 ) 49 echo “ start: “ . $start . “<br>”;

Page 13: php|architect (May 2005)

FFEEAATTUURREEThe Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

May 2005 ● PHP Architect ● www.phparch.com 13

and adjust our old maximum to reflect the currentvalue. The array variable, $$vviissiittss, now contains all ofthe data from the database. Therefore, $$vviissiittss is amulti-dimensional array that allows us to keep track ofall of this data. The time has come to read the hhiittccoouunntteerr..ddaatt file and determine what’s new so thatthis can be added to the database, and the $$vviissiittssarray. The hhiittccoouunntteerr..ddaatt file is opened and its recordsare stored in a new temporary array, $$ffiilleeEElleemmeennttss:

$data = file($fileName);foreach ($data as $column => $val ){if ( strcmp($val,” “) == 0 ){$fileElements[$column] = explode(“ “, $val);

}}

The explode function is very useful in expanding theelements read from the data file into separate fields thatare then assigned to the $$ffiilleeEElleemmeennttss array. This is

simple because the field delimiter in the hhiittccoouunntteerr..ddaattfile is the space character.

The next step in the process involves locating the cur-rent position in the database and determining howmany new data points need to be added. Then, welocate where to begin entering data into the databasetable. This is accomplished by reading thehhiittccoouunntteerr..ddaatt file and comparing the maximum num-ber of visits last recorded in the database with the asso-ciated visit data contained in the data file. When thetwo are equal, the point has been reached in the datafile wherein the last entry was made to the database.Any data contained beyond this point represents newinformation that must be inserted into the instance.This defines the starting index for future inserts into thedatabase, which we fill using a ffoorr loop as follows:

for($k = $startIndex+1; $k < sizeof($data)-1; $k++ ){

if(strcmp($fileElements[$k][5],$fileElements[$k+1][5])!=0)

{$hour = $fileElements[$k][1];

// ...$visits = $fileElements[$k][9];

$sql = “insert into sitevisits (visit_ID, hour,minute,

second, DayofWeek, Month, DayofMonth, Year, vis-its)

values (‘’, ‘$hour’, ‘$minute’, ‘$second’,‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’, ‘$Year’,‘$visits’)”;// ...

}

The code snippet above is contained in Listing 2; itinserts the new data into the ssiitteevviissiittss table. Thestarting point for the instance is at $$ssttaarrttIInnddeexx++11. Wecan identify where the new data begins from the hhiittccoouunntteerr..ddaatt file and the ending point isssiizzeeooff(($$ddaattaa)), that is, the total amount of data con-tained within the hhiittccoouunntteerr..ddaatt file. The fields enteredinto the database are truncated in the code segmentabove to save space. However, the fields include $$hhoouurr,$$mmiinnuuttee, $$sseeccoonndd, $$DDaayyooffWWeeeekk, $$MMoonntthh, $$DDaayyooffMMoonntthh,$$YYeeaarr, and $$vviissiittss.

Querying Results

Listing 3 is what I’ll call qquueerryyDDbb..pphhpp—one of the plot-ting workhorses of the methodology. I start by perform-ing a general query and fetching all data within thedatabase:

$table = “sitevisits”;$check = “select * from $table”;$qry = mysql_query($check)

or die (“Could not match data because “ .mysql_error());

Then, I assign these data to an array:

while ($newArray = mysql_fetch_array($qry) ) {$dow = $newArray[‘DayofWeek’];$mo = $newArray[‘Month’];$dom = $newArray[‘DayofMonth’];$yr = $newArray[‘Year’];$vis = $newArray[‘visits’];$dbElements[$i][0] = $dow;$dbElements[$i][1] = $mo;$dbElements[$i][2] = $dom;$dbElements[$i][3] = $yr;$dbElements[$i][4] = $vis;

These elements are to be used in the plotting process.The actual plotting takes place within the qquueerryyDDBB..pphhppcode using the $$ddbbEElleemmeennttss[[$$ii]][[44]] == $$vviiss;; assign-ment. Quite simply, I define arbitrarily a field width (inpixels) that defines the span or range of the plottingwindow. I selected 400 pixels simply because in thisway the entire screen will not be taken over by the plot-ting of the individual bar chart elements. Furthermore,I scale the plotting of the individual bars to the currentmaximum value contained within the database. This islogical because over time, as more data accumulates,the overall maximum number of visits increases. It istherefore necessary to scale all data by the new maxi-mum value so that earlier hit count recordings will dis-play proportionally with respect to one another.Furthermore, since the maximum number of visits is

“The output would provide an

at-a-glance summary that would

allow my client to assess

the effectiveness of

advertising campaigns...”

Page 14: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

14

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

Listing 2 (cont’d)

93 echo “[“ . $val . “]<br>”; 9495 if ( strcmp($val,” “) == 0 ) { 96 } 97 else 98 { 99 $fileElements[$column] = explode(“ “, $val);

100 } 101 } 102103 //******************************************************** 104 // Determine where to begin new data entry into database, 105 // based on what is contained in the hitcounter file 106 //******************************************************** 107108 if ( strcmp( $debug, “yes” ) == 0 ) 109 echo “ number of total data elements now: “ 110 . sizeof($data) . “<br>”; 111112 $startIndex = 0; 113 for ($k = 1; $k < sizeof($data); $k++ ) 114 { 115 if ( strcmp($maxVisits, $fileElements[$k][9]) == 0 ) { 116 // Found the entry 117 $startIndex = $k; 118 } 119 if ( strcmp( $debug, “yes” ) == 0 ) 120 echo “ “ . $fileElements[$k][9] . “ “ . $maxVisits 121 . “<br>”; 122 } 123124 if ( strcmp( $debug, “yes” ) == 0 ) 125 echo “ new start index: “ . $startIndex . “<br>”; 126127 if ( strcmp( $debug, “yes” ) == 0 ) 128 echo “ start index: “ . $startIndex . “<br>”; 129130 //*************************************************** 131 // Insert table data, beginning with the start index 132 //*************************************************** 133134 for ($k = $startIndex+1; $k < sizeof($data)-1; $k++ ) 135 { 136 if ( strcmp($fileElements[$k][5], 137 $fileElements[$k+1][5]) != 0 ) { 138139 $hour = $fileElements[$k][1]; 140 $minute = $fileElements[$k][2]; 141 $second = $fileElements[$k][3]; 142 $DayofWeek = $fileElements[$k][5]; 143 $Month = $fileElements[$k][6]; 144 $DayofMonth = $fileElements[$k][7]; 145 $Year = $fileElements[$k][8]; 146 $visits = $fileElements[$k][9]; 147148149 $sql = “insert into sitevisits (visit_ID, hour, 150 minute, second, DayofWeek, Month, DayofMonth, 151 Year, visits) values (‘’, ‘$hour’, ‘$minute’, 152 ‘$second’, ‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’, 153 ‘$Year’, ‘$visits’)”; 154155 if ( strcmp( $debug, “yes” ) == 0 ) 156 echo “ sql statement: “ . $sql . “<br>”; 157158 // 159 // Execute the SQL statement 160 // 161162 $result = mysql_query($sql); 163164 if ( strcmp( $debug, “yes” ) == 0 ) 165 echo “ result of insert: “ . $result . “<br>”; 166167 if ( strcmp( $debug, “yes” ) == 0 ) 168 echo “ result: “ . $result . “<br>”; 169170 if ( strcmp( $debug, “yes” ) == 0 ) { 171 for ($m = 1; $m < 10; $m++ ) 172 { 173 echo $fileElements[$k][$m] . “ “; 174 } 175 echo “<br>”; 176 } // end if 177 } 178 } 179180 //********************************** 181 // Insert the last row of data 182 //********************************** 183184 if ( strcmp( $debug, “yes” ) == 0 )

Listing 2

1 <?php 2 // siteindex.php 34 $debug = “no”; 56 //******************************** 7 // Read the hitcounter file 8 //******************************** 9

10 $fileName = “../hitcounter.dat”; 111213 //************************************** 14 // Open the db connection to sitestats 15 // and look at the last entry 16 //************************************** 1718 $conn = mysql_connect(“localhost”, “root”, “admin”); 19 if ( ! $conn ) 20 die(“Could not connect to MySQL” ); 2122 mysql_select_db(“sitestats”,$conn) 23 or die(“Could not open sitestats: “ . mysql_error()); 2425 //****************************************** 26 // Select the last visit entry in the table 27 //****************************************** 2829 $table = “sitevisits”; 30 $check = “select * from $table”; 31 $qry = mysql_query($check) 32 or die (“Could not match data: “ . mysql_error()); 33 $nRows = mysql_num_rows($qry); 34 $maxVisits = 0; 3536 while ($newArray = mysql_fetch_array($qry) ) { 37 $visits = $newArray[‘visits’]; 3839 if ( strcmp( $debug, “yes” ) == 0 ) 40 echo “ maxVisits = “ . $maxVisits 41 . “ value from db = “ . $visits . “<br>”; 4243 if ( $visits > $maxVisits ) $maxVisits = $visits; 44 } 4546 if ( strcmp( $debug, “yes” ) == 0 ) 47 echo “ max visits: “ . $maxVisits . “<br>”; 4849 mysql_close($conn); 5051 if ( strcmp( $maxVisits, “” ) == 0 ) $maxVisits = 0; 5253 if ( strcmp( $debug, “yes” ) == 0 ) 54 echo “ Maximum number of visits stored in database: “ 55 . $maxVisits . “<br>”; 56575859 //*************************************** 60 // Open the db connection to sitestats 61 // and prepare to insert data 62 //*************************************** 6364 $conn = mysql_connect(“localhost”, “root”, “admin”); 6566 if ( strcmp( $debug, “yes” ) == 0 ) 67 echo “ $conn = “ . $conn . “<br>”; 6869 if ( ! $conn ) die(“Could not connect to MySQL” ); 7071 mysql_select_db(“sitestats”,$conn) 72 or die(“Could not open sitestats: “ . mysql_error()); 7374 if ( strcmp( $debug, “yes” ) == 0 ) 75 echo “ selected table <br>”; 7677 //******************************************** 78 // Load data from hitcounter file into array 79 //******************************************** 8081 $data = file($fileName); 8283 //************************************** 84 // Extract each value and explode into 85 // a two-dimensional array 86 //************************************** 8788 foreach ($data as $column => $val ) 89 { 90 // Explore data into a new array 9192 if ( strcmp( $debug, “yes” ) == 0 )

Page 15: php|architect (May 2005)

(logically) always represented by the last data elementwithin the database, it follows that we need to scalebased on this last element.

Thus, I define a maximum width using the variable$$ggrraapphhWWiiddtthhMMaaxx == 440000 pixels. Now, I need to define theheight of each bar (that is, the width in the verticalsense), which I’ve arbitrarily assigned to be $$bbaarrHHeeiigghhtt== 1100;; pixels, and the absolute maximum width of eachbar, taken as the latest data entry in the databasessiitteessttaattss table $$bbaarrMMaaxx == $$ddbbEElleemmeennttss[[$$nnRRoowwss--11]][[44]];;

I also need to define the number of rows to plot on agiven web page. This is an important feature becausethe number that should be plotted is related to eachbar’s width as well as the resolution of the screen andthe ability of the user to see the data clearly withouthaving to use the scroll bar. Scrollbars can become anuisance, too, if the user is continually moving them tosee all data. Hence, one requirement which I imposedwas to keep all of the data within the eye span of theuser. So, I opted for a relatively low count in terms ofbars per page. Now, since I will only be plotting 10 barsper page, I need to come up with a mechanism forallowing the user to move to a new page and show thenext 10 bars in the database. I therefore defined vari-ables to keep track of the starting row and the endingrow on any given page. These quantities are represent-ed as follows:

$numberRowsToPlot = 10;$startRow = 0;$endRow = $startRow + $numberRowsToPlot;

These equations will become important, shortly. First,let’s plot the first 10 rows of data. We do this in a for-loop, like this:

for ( $i = $startRow; $i < $endRow; $i++ ){

$countVal = intval( $dbElements[$i][4] );$barWidth = $graphWidthMax * $countVal/$barMax;// ...

}

I begin with the $$ssttaarrttRRooww on the page and end withthe first $$eennddRRooww. I retrieve the $$ii—the current index ofthe $$ddbbEElleemmeennttss array for counter value—and assign itto variable $$ccoouunnttVVaall. I then scale the $$bbaarrWWiiddtthh in pro-portion to the maximum graphing width (defined ear-lier as 400 pixels) normalized by the maximum numberof hits. This gives me a proportional width with respectto the 400-pixel limit within the plotting frame (here,the web page itself).

You’ll note from Figure 1 that data are printed along-side of the bars, including the value of a particular barwidth. This is done in a straightforward manner by sim-ply encapsulating the printing of the data within atable, as columns within that table. This ensures uni-form spacing and alignment of the data within thecells.

Without going into all of the details (because Listing3 provides the explicit implementation), the key ele-ments of this plotting process are as follows: create atable, enter the data values into columns via an echostatement, and concatenate multiple columns so thatthe data are aligned across the page:

echo “<tr>”;echo “<td align=right><font face=arial color=bluesize=2>”;echo $dbElements[$i][0] . “,</font></td>”;

But how do we actually create the bar? Very easily: wehave a JPG image of a single pixel, and labeledrreeddddoott..jjppgg. Within the second to last column of thetable we create an image reference to that JPG imageand size it where its width is equal to $$bbaarrWWiiddtthh and its

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

15

Listing 2 (cont’d)

185 echo “ startIndex = “ . $startIndex 186 . “ sizeof(data) = “ . sizeof($data) . “<br>”; 187188 if ( $startIndex+1 < sizeof($data) ) { 189 $hour = $fileElements[sizeof($data)-1][1]; 190 $minute = $fileElements[sizeof($data)-1][2]; 191 $second = $fileElements[sizeof($data)-1][3]; 192 $DayofWeek = $fileElements[sizeof($data)-1][5]; 193 $Month = $fileElements[sizeof($data)-1][6]; 194 $DayofMonth = $fileElements[sizeof($data)-1][7]; 195 $Year = $fileElements[sizeof($data)-1][8]; 196 $visits = $fileElements[sizeof($data)-1][9]; 197198 $sql = “insert into siteVisits (hour, minute, second, 199 DayofWeek, Month, DayofMonth, Year, visits) values 200 (‘$hour’, ‘$minute’, ‘$second’, ‘$DayofWeek’, 201 ‘$Month’, ‘$DayofMonth’, ‘$Year’, ‘$visits’)”; 202203 // 204 // Execute the SQL statement 205 // 206207 $result = mysql_query($sql); 208209 if ( strcmp( $debug, “yes” ) == 0 ) 210 echo “ result: “ . $result . “<br>”; 211212 if ( strcmp( $debug, “yes” ) == 0 ) { 213 for ($m = 1; $m < 10; $m++ ) 214 { 215 echo $fileElements[sizeof($data)-1][$m] . “ “; 216 } 217 echo “<br>”; 218 } // end if 219 } // end if 220221 //*********************** 222 // Close the connection 223 //*********************** 224225 mysql_close($conn); 226 header(“Location: queryDB.php”); 227228 ?>

“The explode function

is very useful in expanding the

elements read from the data file

into separate fields.“

Page 16: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

16

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

Listing 3

1 <?php 2 // queryDB.php 34 include(“header.php”); 5 include(“logo.php”); 67 $debug = “no”; 8 $production = “no”; 9

10 //*************************************** 11 // Open the db connection to sitestats 12 // and look at the last entry 13 //*************************************** 1415 $conn = mysql_connect(“localhost”, “root”, “admin”); 1617 if ( ! $conn ) 18 die(“Could not connect to MySQL” ); 1920 mysql_select_db(“sitestats”,$conn) 21 or die(“Could not open sitestats: “ . mysql_error()); 222324 //***************************************************** 25 // Note: mysql_fetch_row($qry) retrieves a single row 26 // mysql_fetch_field($qry, $i) fetches field $i 27 //***************************************************** 2829 $table = “sitevisits”; 30 $check = “select * from $table”; 31 $qry = mysql_query($check) 32 or die (“Could not match data: “ . mysql_error()); 3334 if ( strcmp( $debug, “yes” ) == 0 ) 35 echo “ qry = “ . $qry . “<br>”; 3637 $nRows = mysql_num_rows($qry); 3839 if ( strcmp( $debug, “yes” ) == 0 ) echo “<table>”; 40 if ( strcmp( $debug, “yes” ) == 0 ) echo “<th>”; 41 if ( strcmp( $debug, “yes” ) == 0 ) echo “</th>”; 4243 $i = 0; 44 while ($newArray = mysql_fetch_array($qry) ) { 4546 $dow = $newArray[‘DayofWeek’]; 47 $mo = $newArray[‘Month’]; 48 $dom = $newArray[‘DayofMonth’]; 49 $yr = $newArray[‘Year’]; 50 $vis = $newArray[‘visits’]; 5152 $dbElements[$i][0] = $dow; 53 $dbElements[$i][1] = $mo; 54 $dbElements[$i][2] = $dom; 55 $dbElements[$i][3] = $yr; 56 $dbElements[$i][4] = $vis; 57 if ( strcmp( $debug, “yes” ) == 0 ) 58 echo “ ==> “ . $dbElements[$i][4] . “<br>”; 59 $i++; 6061 if ( strcmp( $debug, “yes” ) == 0 ) echo “<tr>”; 62 if ( strcmp( $debug, “yes” ) == 0 ) 63 echo “<td><font face=arial color=blue size=3>” 64 . $dow . “</font></td>”; 65 if ( strcmp( $debug, “yes” ) == 0 ) 66 echo “<td><font face=arial color=blue size=3>” 67 . $mo . “</font></td>”; 68 if ( strcmp( $debug, “yes” ) == 0 ) 69 echo “<td><font face=arial color=blue size=3>” 70 . $dom . “</font></td>”; 71 if ( strcmp( $debug, “yes” ) == 0 ) 72 echo “<td><font face=arial color=blue size=3>” 73 . $yr . “</font></td>”; 74 if ( strcmp( $debug, “yes” ) == 0 ) 75 echo “<td><font face=arial color=blue size=3>” 76 . $vis . “</font></td>”; 77 if ( strcmp( $debug, “yes” ) == 0 ) 78 echo “</tr>”; 79 } // end while 8081 if ( strcmp( $debug, “yes” ) == 0 ) echo “</table>”; 8283 mysql_close($conn); 8485 //************************************************* 86 // Sort by visits, ascending, using insertion sort 87 //************************************************* 8889 for ( $i = 1; $i < $nRows; $i++ ) 90 { 91 $index0 = $dbElements[$i][0]; 92 $index1 = $dbElements[$i][1];

Listing 3 (cont’d)

93 $index2 = $dbElements[$i][2]; 94 $index3 = $dbElements[$i][3]; 95 $index4 = $dbElements[$i][4]; 9697 $j = $i; 98 while ( ($j > 0) && ($dbElements[$j-1][4] > $index4) ) 99 {

100 $dbElements[$j][4] = $dbElements[$j-1][4]; 101 $dbElements[$j][3] = $dbElements[$j-1][3]; 102 $dbElements[$j][2] = $dbElements[$j-1][2]; 103 $dbElements[$j][1] = $dbElements[$j-1][1]; 104 $dbElements[$j][0] = $dbElements[$j-1][0]; 105106 $j = $j - 1; 107 } 108109 $dbElements[$j][0] = $index0; 110 $dbElements[$j][1] = $index1; 111 $dbElements[$j][2] = $index2; 112 $dbElements[$j][3] = $index3; 113 $dbElements[$j][4] = $index4; 114 } 115116 //************************************ 117 // Print out new table and plot graph 118 //************************************ 119120 $graphWidthMax = 400; 121 $barHeight = 10; // pixels 122 $barMax = $dbElements[$nRows-1][4]; 123124 $numberRowsToPlot = 10; 125 $startRow = 0; 126 $endRow = $startRow + $numberRowsToPlot; 127128 if ( strcmp( $debug, “yes” ) == 0 ) 129 echo “ Max = “ . $barMax . “<br>”; 130131 echo “<table>”; 132 echo “<th>”; 133 echo “</th>”; 134135 for ( $i = $startRow; $i < $endRow; $i++ ) 136 { 137 $countVal = intval( $dbElements[$i][4] ); 138 $barWidth = $graphWidthMax * $countVal/$barMax; 139140 echo “<tr>”; 141 echo “<td align=right><font face=arial color=blue “ 142 . “size=2>” . $dbElements[$i][0] . “,</font></td>”; 143 echo “<td align=right><font face=arial color=blue “ 144 . “size=2>” . $dbElements[$i][1] . “</font></td>”; 145 echo “<td align=right><font face=arial color=blue “ 146 . “size=2>” . $dbElements[$i][2] . “</font></td>”; 147 echo “<td align=right><font face=arial color=blue “ 148 . “size=2>” . $dbElements[$i][3] . “</font></td>”; 149 print(“<td>\n”); 150 echo “<font face=arial color=purple size=2>”; 151 echo “<b>”; 152 print(“<img src=\”reddot.jpg\” “); 153 print(“width=\”$barWidth\” height=\”$barHeight\”>”); 154 echo “ “ . $dbElements[$i][4]; 155 echo “</b>”; 156 echo “</font>”; 157 print(“</td>\n”); 158159 echo “</tr>”; 160 } 161162 echo “</table>”; 163 ?> 164 <table> 165 <tr> 166 <td> 167 <font Style=”font-family:arial; font-size:12pt; 168 font-style: bold; color: #000000;”> 169 Entries: <?php echo $startRow; ?> to 170 <?php echo $endRow; ?> with 171 <?php echo $barMax; ?> total rows 172 </font> 173 </td> 174 <td> 175 <form method=”post” action=”queryDB1.php”> 176 <input type=”hidden” name=”startRow” 177 value=”<?php echo $startRow; ?>” > 178 <input type=”hidden” name=”numberRowsToPlot” 179 value=”<?php echo $numberRowsToPlot; ?>” > 180 <input type=”hidden” name=”discrim” value=”add” > 181 <input type=”hidden” name=”delta” value=”10” > 182 <input type=”submit” value=”>” 183 Style=”font-family:sans-serif; font-size:10pt; 184 font-style:bold; background:#4400ff none;

Page 17: php|architect (May 2005)

height is equal to $$bbaarrHHeeiigghhtt, as shown below:

print(“<td>\n”);// ...print(“<img src=\”reddot.jpg\” “);print(“width=\”$barWidth\” height=\”$barHeight\”>”);echo “ “ . $dbElements[$i][4];// ...print(“</td>\n”);echo “</tr>”;

At the end of each bar, I print the actual value of thebar, accomplished by outputting the value of$$ddbbEElleemmeennttss[[$$ii]][[44]].

Getting the Next 10 RowsAt the bottom of Listing 3, there are two forms. I willfocus on the first form for the time being. This formaccepts the current values of $$ssttaarrttRRooww and $$eennddRRoowwand passes these, as hidden values, to the PHP code inListing 4 (qquueerryyDDBB11..pphhpp). This is shown in the code seg-ment below:

<form method=”post” action=”queryDB1.php”><input type=”hidden” name=”startRow”

value=”<?php echo $startRow; ?>” ><input type=”hidden” name=”numberRowsToPlot”

value=”<?php echo $numberRowsToPlot; ?>” ><input type=”hidden” name=”discrim” value=”add” ><input type=”hidden” name=”delta” value=”10” ><input type=”submit” value=”>”

Style=”font-family:sans-serif; font-size:10pt;font-style:bold; background:#4400ff none;color: #ccbbcc; height: 2em; width: 2em”>

</form>

Key within this form code are the variables named$$ddiissccrriimm and $$ddeellttaa which are passed as hidden vari-ables from qquueerryyDDBB..pphhpp to qquueerryyDDBB11..pphhpp. The ASCII textstring “add” is assigned to the ddiissccrriimm field. As you’llsee in a moment, this is the key to how theqquueerryyDDBB11..pphhpp code displays results—they are postedthrough the form. These are retrieved withinqquueerryyDDBB11..pphhpp using the following code:

$startRow = $_POST[‘startRow’];$numberRowsToPlot = $_POST[‘numberRowsToPlot’];$discrim = $_POST[‘discrim’];$delta = $_POST[‘delta’];

Again, I open the database and retrieve the data, trans-late it to the $$ddbbEElleemmeennttss array, and then apply the$$ddiissccrriimm parameter to the data.

if ( strcmp($discrim,”add”) == 0 ) { // Going up$startRow = $startRow + $delta;$endRow = $startRow + $delta;if ( $endRow > $barMax ) {

$endRow = $barMax;}

}

If we click the right-hand arrow in Figure 1 (that is, the“increase” button) then we expect that we will be pre-sented the next 10 rows of data. This is accomplishedwithin qquueerryyDDBB11..pphhpp by adding the value $$ddeellttaa to thecurrent $$ssttaarrttRRooww and assigning the new $$eennddRRooww equalto the current $$ssttaarrttRRooww plus $$ddeellttaa. We must be care-ful if we are at the last few elements of data, because byattempting to add $$ddeellttaa rows to the current $$ssttaarrttRRoowwwe may, in effect, run off the end of the data table. Toaccommodate this event, I perform a check on thevalue of $$eennddRRooww in relation to $$bbaarrMMaaxx. If $$eennddRRooww isgreater than $$bbaarrMMaaxx, then simply assign $$eennddRRooww to$$bbaarrMMaaxx. The application of this logic results in thescreen snapshot shown in Figure 2, in which the next10 rows appear.

In the interest of completeness, it must be noted thatcode Listings 5, 6, and 7 are those for hheeaaddeerr..pphhpp,llooggoo..pphhpp, and ffooootteerr..pphhpp, respectively. These are smallfiles that contain web page header, title, and page clos-ing HTML tags that are included in the main PHP doc-uments.

Getting the Previous 10 RowsThis process continues: located at the bottom ofqquueerryyDDBB11..pphhpp are three forms. The second form is the

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

17

Listing 3 (cont’d)

185 color: #ccbbcc; height: 2em; width: 2em”> 186 </form> 187 </td> 188 <td> 189 <font Style=”font-family:arial; font-size:12pt; font-style: bold; color: #000000;”> 190 Go to Entry: 191 </font> 192 </td> 193 <td> 194 <form method=”post” action=”queryDB1.php”> 195 <input name=”startRow” type=”text” > 196 <input type=”hidden” name=”numberRowsToPlot” 197 value=”<?php echo $numberRowsToPlot; ?>” > 198 <input type=”hidden” name=”discrim” value=”val” > 199 <input type=”hidden” name=”delta” value=”10” > 200 <input type=”submit” value=”>|<” 201 Style=”font-family:sans-serif; font-size:8pt; 202 font-style:bold; background:#4400ff none; 203 color: #ccbbcc; height: 3em; width: 3em”> 204 </form> 205 </td> 206 </tr> 207 </table> 208209 <?php 210 include(“footer.php”); 211 ?>

Figure 2

Page 18: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

18

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

Listing 4

1 <?php 2 // queryDB1.php 3 include(“header.php”); 4 include(“logo.php”); 56 $startRow = $_POST[‘startRow’]; 7 $numberRowsToPlot = $_POST[‘numberRowsToPlot’]; 8 $discrim = $_POST[‘discrim’]; 9 $delta = $_POST[‘delta’];

1011 $debug = “no”; 1213 //*************************************** 14 // Open the db connection to sitestats 15 // and look at the last entry 16 //*************************************** 1718 $conn = mysql_connect(“localhost”, “root”, “admin”); 1920 if ( ! $conn ) 21 die(“Could not connect to MySQL” ); 2223 mysql_select_db(“sitestats”,$conn) 24 or die(“Could not open sitestats: “ . mysql_error()); 252627 //***************************************************** 28 // Note: mysql_fetch_row($qry) retrieves a single row 29 // mysql_fetch_field($qry, $i) fetches field $i 30 //***************************************************** 3132 $table = “sitevisits”; 3334 $check = “select * from $table”; 3536 $qry = mysql_query($check) 37 or die (“Could not match data because “ . mysql_error()); 3839 if ( strcmp( $debug, “yes” ) == 0 ) 40 echo “ qry = “ . $qry . “<br>”; 4142 $nRows = mysql_num_rows($qry); 4344 if ( strcmp( $debug, “yes” ) == 0 ) echo “<table>”; 45 if ( strcmp( $debug, “yes” ) == 0 ) echo “<th>”; 46 if ( strcmp( $debug, “yes” ) == 0 ) echo “</th>”; 4748 $i = 0; 49 while ($newArray = mysql_fetch_array($qry) ) { 50 $dow = $newArray[‘DayofWeek’]; 51 $mo = $newArray[‘Month’]; 52 $dom = $newArray[‘DayofMonth’]; 53 $yr = $newArray[‘Year’]; 54 $vis = $newArray[‘visits’]; 5556 $dbElements[$i][0] = $dow; 57 $dbElements[$i][1] = $mo; 58 $dbElements[$i][2] = $dom; 59 $dbElements[$i][3] = $yr; 60 $dbElements[$i][4] = $vis; 61 if ( strcmp( $debug, “yes” ) == 0 ) 62 echo “ ==> “ . $dbElements[$i][4] . “<br>”; 63 $i++; 6465 if ( strcmp( $debug, “yes” ) == 0 ) echo “<tr>”; 66 if ( strcmp( $debug, “yes” ) == 0 ) 67 echo “<td><font face=arial color=blue size=3>” 68 . $dow . “</font></td>”; 69 if ( strcmp( $debug, “yes” ) == 0 ) 70 echo “<td><font face=arial color=blue size=3>” 71 . $mo . “</font></td>”; 72 if ( strcmp( $debug, “yes” ) == 0 ) 73 echo “<td><font face=arial color=blue size=3>” 74 . $dom . “</font></td>”; 75 if ( strcmp( $debug, “yes” ) == 0 ) 76 echo “<td><font face=arial color=blue size=3>” 77 . $yr . “</font></td>”; 78 if ( strcmp( $debug, “yes” ) == 0 ) 79 echo “<td><font face=arial color=blue size=3>” 80 . $vis . “</font></td>”; 81 if ( strcmp( $debug, “yes” ) == 0 ) 82 echo “</tr>”; 83 } // end while 8485 if ( strcmp( $debug, “yes” ) == 0 ) echo “</table>”; 8687 mysql_close($conn); 8889 //************************************************* 90 // Sort by visits, ascending, using insertion sort 91 //************************************************* 9293 for ( $i = 1; $i < $nRows; $i++ ) 94 { 95 $index0 = $dbElements[$i][0]; 96 $index1 = $dbElements[$i][1];

Listing 4 (cont’d)

97 $index2 = $dbElements[$i][2]; 98 $index3 = $dbElements[$i][3]; 99 $index4 = $dbElements[$i][4];

100101 $j = $i; 102 while ( ($j > 0) && ($dbElements[$j-1][4] > $index4) ) 103 { 104 $dbElements[$j][4] = $dbElements[$j-1][4]; 105 $dbElements[$j][3] = $dbElements[$j-1][3]; 106 $dbElements[$j][2] = $dbElements[$j-1][2]; 107 $dbElements[$j][1] = $dbElements[$j-1][1]; 108 $dbElements[$j][0] = $dbElements[$j-1][0]; 109110111 $j = $j - 1; 112 } 113114 $dbElements[$j][0] = $index0; 115 $dbElements[$j][1] = $index1; 116 $dbElements[$j][2] = $index2; 117 $dbElements[$j][3] = $index3; 118 $dbElements[$j][4] = $index4; 119120 } 121122 //************************************ 123 // Print out new table and plot graph 124 //************************************ 125126 $graphWidthMax = 400; 127 $barHeight = 10; // pixels 128 $barMax = $dbElements[$nRows-1][4]; 129130 // 131 // Define the field range to show on the page. 132 // 133134 if ( strcmp($discrim,”val”) == 0 ) { 135 // Go to specific range 136 $endRow = $startRow + $delta; 137 if ( $endRow > $barMax ) $endRow = $barMax; 138 } 139140 // 141 // Adding $delta 142 // 143144 if ( strcmp($discrim,”add”) == 0 ) { // Going up 145146 $startRow = $startRow + $delta; 147 $endRow = $startRow + $delta; 148149 if ( $endRow > $barMax ) { 150 $endRow = $barMax; 151 } 152 } 153154 // 155 // Subtracting $delta 156 // 157158 if ( strcmp($discrim, “subtract”) == 0 ) { // Going down 159160 $startRow = $startRow - $delta; 161 $endRow = $startRow + $delta; 162163 if ( $startRow <= 0 ) { 164 $startRow = 0; 165 $endRow = $startRow + $delta; 166 } 167 } 168169 if ( strcmp( $debug, “yes” ) == 0 ) 170 echo “ Max = “ . $barMax . “<br>”; 171172 echo “<table>”; 173 echo “<th>”; 174 echo “</th>”; 175176 if ( strcmp( $debug, “yes” ) == 0 ) 177 echo “ startRow = “ . $startRow . “<br>”; 178 if ( strcmp( $debug, “yes” ) == 0 ) 179 echo “ endRow = “ . $endRow . “<br>”; 180 if ( strcmp( $debug, “yes” ) == 0 ) 181 echo “ delta = “ . $delta . “<br>”; 182 if ( strcmp( $debug, “yes” ) == 0 ) 183 echo “ discrim = “ . $discrim . “<br>”; 184185 for ( $i = $startRow; $i < $endRow; $i++ ) 186 { 187 $countVal = intval( $dbElements[$i][4] ); 188189 if ( $countVal != “” ) { 190 $barWidth = $graphWidthMax * $countVal/$barMax; 191192 echo “<tr>”;

Page 19: php|architect (May 2005)

same as shown for qquueerryyDDBB..pphhpp: in which the variable$$ddeellttaa is added to the current $$ssttaarrttRRooww and $$eennddRRooww.The first form accommodates the left-hand arrow, andassigns the string “subtract” to the $$ddiissccrriimm variable.The code in qquueerryyDDBB11..pphhpp is then called recursively. Ifthe user opts to back up ten rows, then there is a “sub-tract” method that does the following:

if ( strcmp($discrim, “subtract”) == 0 ) { // Goingdown

$startRow = $startRow - $delta;$endRow = $startRow + $delta;if ( $startRow <= 0 ) {

$startRow = 0;$endRow = $startRow + $delta;

}}

In this instance, the $$ssttaarrttRRooww is decremented by theamount in $$ddeellttaa. The $$eennddRRooww is still incremented by$$ddeellttaa rows above $$ssttaarrttRRooww. Then, we must accom-modate the possibility of decrementing below the startrow. The conditional statement handles this event bychecking whether the current value of $$ssttaarrttRRooww is lessthan zero. If so, assign zero to the $$ssttaarrttRRooww variable,and set the $$eennddRRooww to zero plus $$ddeellttaa.

Starting at an Arbitrary Row

The third and last form contained in qquueerryyDDBB11..pphhppaccommodates the condition in which a user wishes togo to an arbitrary row within the table. This behavior ispreferred when, for example, much data exists withinthe database and the user would like to jump nearly tothe end.

In this case, the value for $$ssttaarrttRRooww is assigned direct-ly by the user, through the form, and qquueerryyDDBB11..pphhpp iscalled recursively, again. The value of $$ddiissccrriimm picks upthe string value “gotovalue” from qquueerryyDDBB..pphhpp, anduses this to assign the $$ssttaarrttRRooww:

<form method=”post” action=”queryDB1.php”><input name=”startRow” type=”text” ><input type=”hidden” name=”numberRowsToPlot”

value=”<?php echo $numberRowsToPlot; ?>” ><input type=”hidden” name=”discrim”

value=”val” ><input type=”hidden” name=”delta” value=”10” ><input type=”submit” value=”>|<”

Style=”font-family:sans-serif; font-size:8pt;font-style:bold; background:#4400ff none;color: #ccbbcc; height: 3em; width: 3em”>

</form>

The $$ssttaarrttRRooww variable becomes the point at which val-ues will start to be displayed, and is entered by the userthrough the form above. Again, qquueerryyDDBB11..pphhpp is calledrecursively, and the $$ddiissccrriimm value is set to the string“val”. The code segment that catches this value fol-lows:

if ( strcmp($discrim,”val”) == 0 ) { // Go to spe-cific range

$endRow = $startRow + $delta;if ( $endRow > $barMax ) $endRow = $barMax;

}

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

19

Listing 4 (cont’d)

193 echo “<td align=right><font face=arial color=blue “ 194 . “size=2>” . $dbElements[$i][0] . “,</font></td>”; 195 echo “<td align=right><font face=arial color=blue “ 196 . “size=2>” . $dbElements[$i][1] . “</font></td>”; 197 echo “<td align=right><font face=arial color=blue “ 198 . “size=2>” . $dbElements[$i][2] . “</font></td>”; 199 echo “<td align=right><font face=arial color=blue “ 200 . “size=2>” . $dbElements[$i][3] . “</font></td>”; 201 print(“<td>\n”); 202 echo “<font face=arial color=purple size=2>”; 203 echo “<b>”; 204 print(“<img src=\”reddot.jpg\” “); 205 print(“width=\”$barWidth\” height=\”$barHeight\”>”); 206 echo “ “ . $dbElements[$i][4]; 207 echo “</b>”; 208 echo “</font>”; 209 print(“</td>\n”); 210 echo “</tr>”; 211 } 212 } 213214 echo “</table>”; 215216 ?> 217 <table> 218 <tr> 219 <td> 220 <font Style=”font-family:arial; font-size:12pt; 221 font-style: bold; color: #000000;”> 222 Entries: <?php echo $startRow; ?> to 223 <?php echo $endRow; ?> with 224 <?php echo $barMax; ?> total rows 225 </font> 226 </td> 227228 <?php 229 if ( $startRow > 0 ) { 230 ?> 231 <td> 232 <form method=”post” action=”queryDB1.php”> 233 <input type=”hidden” name=”startRow” 234 value=”<?php echo $startRow; ?>” > 235 <input type=”hidden” name=”numberRowsToPlot” 236 value=”<?php echo $numberRowsToPlot; ?>” > 237 <input type=”hidden” name=”discrim” 238 value=”subtract” > 239 <input type=”hidden” name=”delta” value=”10”> 240 <input type=”submit” value=”<” 241 Style=”font-family:sans-serif; font-size:10pt; 242 font-style:bold; background:#4400ff none; 243 color: #ccbbcc; height: 2em; width: 2em”> 244 </form> 245 </td> 246 <?php 247 } 248 ?> 249250 <td> 251 <form method=”post” action=”queryDB1.php”> 252 <input type=”hidden” name=”startRow” 253 value=”<?php echo $startRow; ?>” > 254 <input type=”hidden” name=”numberRowsToPlot” 255 value=”<?php echo $numberRowsToPlot; ?>” > 256 <input type=”hidden” name=”discrim” value=”add” > 257 <input type=”hidden” name=”delta” value=”10” > 258 <input type=”submit” value=”>” 259 Style=”font-family:sans-serif; font-size:10pt; 260 font-style:bold; background:#4400ff none; 261 color: #ccbbcc; height: 2em; width: 2em”> 262 </form> 263 </td> 264 <td> 265 <font Style=”font-family:arial; font-size:12pt; 266 font-style: bold; color: #000000;”> 267 Go to Entry: 268 </font> 269 </td> 270 <td> 271 <form method=”post” action=”queryDB1.php”> 272 <input name=”startRow” type=”text” > 273 <input type=”hidden” name=”numberRowsToPlot” 274 value=”<?php echo $numberRowsToPlot; ?>” > 275 <input type=”hidden” name=”discrim” value=”val” > 276 <input type=”hidden” name=”delta” value=”10” > 277 <input type=”submit” value=”>|<” 278 Style=”font-family:sans-serif; font-size:8pt; 279 font-style:bold; background:#4400ff none; 280 color: #ccbbcc; height: 3em; width: 3em”> 281 </form> 282 </td> 283 </tr> 284 </table> 285286 <?php 287 include(“footer.php”); 288 ?>

Page 20: php|architect (May 2005)

The $$eennddRRooww variable is set to $$ssttaarrttRRooww plus $$ddeellttaa. Ifthe $$eennddRRooww exceeds the number of rows in the data-base, it is automatically set to the maximum databaserow. In this way a user can access any starting row andhop over intermediate values as needed. The data arepassed recursively back to qquueerryyDDBB11..pphhpp using the fol-lowing variables, which are retrieved from the formpost code:

$startRow = $_POST[‘startRow’];$numberRowsToPlot = $_POST[‘numberRowsToPlot’];$discrim = $_POST[‘discrim’];$delta = $_POST[‘delta’];

The values are set based on the user’s selection duringthe previous call to qquueerryyDDBB11..pphhpp. It is possible to aug-ment these statements by incorporating some errorchecking into the code to verify that the values havebeen set within the proper ranges. This is merely onesuggestion offered to improve the robustness of themethodology.

Operation and Data Base Table StructureFor those interested in using this methodology on theirown sites, all files are provided for download in thecode archive. Figure 3 shows the structure of thessiitteessttaattss database, and the ssiitteevviissiittss table; it con-tains a screenshot taken from PHPMyAdmin—a usefultool for managing MySQL databases. A user wishing torecreate this site counter tool will need to install MySQLon the server and will need to create the databaseinstance and table required to run the code.

SummaryI have intended to provide some insight into how todevelop a simple and useful bar-chart based hit count-er using PHP and MySQL. The code I have provided isthe same as that which I am using on client sites tokeep track of access statistics. A user having ordinaryskill in the art of PHP and MySQL can take this idea

much farther and include many different types of statis-tics.

The methodology I provide has educational value, aswell, by illustrating a simple manner of implementingPHP database connectivity—a capability that is neces-sary for any type of advanced commercial application.Some additional ideas include adding site statistics ontime of day, user identity, and server identity. It is evenpossible to accommodate statistics for each web pageassociated with a site, thereby providing details on thepopularity of various pages and on whether the site isable to hold the interest of individuals so that they visitother features available at your site.

There is no limit to what you can do.

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

20

The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter

About the Author ?>

To Discuss this article:

http://forums.phparch.com/218

John R. Zaleski, Ph.D., is a biomedical systems engineer with

20 years of experience in software development and medical

device integration experience as applied to acute care hospi-

tal environments. He has developed and fielded medical

products that are currently in use in large acute care hospi-

tals. He has developed products and many applications in

Java, PHP, and MySQL and has authored two dozen patent

applications and an equal number of refereed publications

in the areas of medical device integration, software methods

for medical device communication, software performance,

and real-time clinical analysis of patient data.

Listing 5

1 <?php 2 // header.php 3 echo “<html>”; 4 echo “<head>”; 5 echo “<title>Site Counter Tool</title>”; 6 echo “</head>”; 7 echo “<body bgcolor=’#fffffb’>”; 8 ?>

Listing 6

1 <?php 2 // logo.php 3 echo “<table>”; 4 echo “<tr align=center>”; 5 echo “<td>”; 6 echo “<h1>Site Counter Tool</h1>”; 7 echo “</td>”; 8 echo “</tr>”; 9 echo “</table>”;

10 ?>

Listing 7

1 <?php 2 // footer.php 3 echo “</body>”; 4 echo “</html>”; 5 ?>

Figure 3

Page 22: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

22

Unicode is a single character set designed toinclude characters from just about every writingsystem on the planet (and off the planet—even

Klingon has been written for Unicode, although it is notpart of the official standard). In recent years, Unicodehas become more prevalent on the web, and all majorweb browsers, web servers, programming languages,and databases worth their salt now support it.Switching your web applications to Unicode will giveyou the ability to correctly handle and display any char-acter from any language you’re likely to encounter.

Understanding the significance of Unicode requiresfirst understanding some basics of character sets, andtheir history. The first thing you need to know was saidbest by Joel Spolsky of Joel On Software: “There ain’t nosuch thing as plain text.” If you don’t know the charac-ter set and the encoding that were used in the creationof a string of text, then you won’t know how to displayit properly. For modern purposes, the story of charactersets starts with ASCII. In the 1960s, unaccented Englishcharacters, as well as various control characters for car-riage returns, page feeds, etc., were each assigned anumber from 0 to 127; there was general agreementon these number assignments, and so ASCII was born.The ASCII characters could fit in 7 bits, and computers

used 8-bit bytes, which left an extra bit of space. Thisled to the proliferation of hundreds of different charac-ter sets, with each one using this extra space in a differ-ent way. The characters from 0-127 are often referredto as Lower ASCII, and the characters from 128-255 as

Many web sites cannot correctly interpret or display any-thing other than English language characters. Convertingyour site to UTF-8 (Unicode) enables you to handle char-acters from almost any language in the world. However,currently available conversion guidelines typically focus onjust a single software product, offering little guidance onhow to move UTF-8 encoded data between different prod-ucts. Configuring your web server, PHP, and your databaseto support UTF-8 is one thing—configuring them so UTF-8 encoded data moves smoothly between them is anoth-er. This article guides you through a UTF-8 conversionusing PHP, Oracle, and Apache. It also covers data exportsto PDF, RTF, email, and plain text.

Solving the Unicode Puzzleby Michael Toppa

REQUIREMENTSPHP 4.3.10 or higher

OS Any

Other Software Oracle 9, Apache, PDFLib

Code Directory n/a

REFERENCES

UNICODE hhttttpp::////wwwwww..uunniiccooddee..oorrgg//

UNICODE hhttttpp::////wwwwww..aallaannwwoooodd..nneett//uunniiccooddee//

ORACLEhhttttpp::////wwwwww..oorraaccllee..ccoomm//tteecchhnnoollooggyy//tteecchh//ooppeennssoouurrccee//pphhpp//gglloobbaalliizziinngg__oorraaccllee__pphhpp__aapppplliiccaattiioonnss..hhttmmll

PHPhhttttpp::////uuss33..pphhpp..nneett//mmaannuuaall//eenn//rreeff..mmbbssttrriinngg..pphhpp

ii

FFEE

AATT

UURR

EE

Page 23: php|architect (May 2005)

Upper ASCII or Extended ASCII. Extended ASCII charac-ter sets added characters from non-English languages,special characters like copyright symbols, and line-drawing characters to simplify drawing boxes, etc. Withall these different versions of extended ASCII floatingaround, text generated on, say, a computer in Russiawould turn into gibberish if you tried to read it on acomputer in the US. This happened because the num-ber codes representing the Cyrillic characters wereassigned to totally different characters on the US com-puter. This became a bit of a problem when everyonestarted using the internet.

Unicode represents an effort to clean up this mess.The Unicode slogan is: “Unicode provides a uniquenumber for every character, no matter what the plat-form, no matter what the program, no matter what thelanguage.” Unicode can do this because it allows char-acters to occupy more than one byte, so it has enoughroom to store characters from languages around theworld—even Asian languages that have thousands ofcharacters. With Unicode, it’s particularly important tounderstand the distinction between a character set,and character encoding. Unicode is a single characterset, but there are three different ways to encode it: theyare called UTF-8, UTF-16, and UTF-32 (there’s also UTF-7, but it was never officially adopted by the UnicodeConsortium, and for the most part it’s been deprecatedin favor of UTF-8). The numbers 8, 16, and 32 indicatethe bits used for the Unicode code units (a completecharacter may occupy more than one code unit—it canbe multi-byte). All three encodings can display anyUnicode character, and each has its own advantagesand disadvantages depending on what’s important in aparticular implementation. In the case of web applica-tions, UTF-8 is the encoding of choice because it storesthe lower ASCII characters in a single byte format. Thismakes UTF-8 fully compatible with “plain text,” even ifyou’re clueless about character encoding.

For the sake of brevity, I’ve glossed over a great num-ber of points related to Unicode and character sets. Ifyou want to learn more, I highly recommend the arti-cle The Absolute Minimum Every Software DeveloperAbsolutely, Positively Must Know About Unicode andCharacter Sets (No Excuses!) by Joel Spolsky, atwwwwww..jjooeelloonnssooffttwwaarree..ccoomm//aarrttiicclleess//UUnniiccooddee..hhttmmll. Itcontains links to a number of other good resources aswell.

Why Care About Unicode?As far as Unicode and UTF-8 are concerned, all websites can be placed in one of three categories: thosethat don’t need to care about them, those that shouldconvert to UTF-8, and those that should convert toUTF-8 and internationalize.

The most common character set currently in use onthe English-speaking side of the web, other than UTF-8,is Western ISO-8859-1 (aka Latin-1). If your site isn’t

already using UTF-8, then you’re probably using Latin-1. If you’ve had no problems related to character setsso far, and you have absolutely no foreseeable needs tohandle text outside the ASCII range, then you fall intothe first category: you probably don’t need to do any-thing. As you’ll see in the rest of this article, convertingto UTF-8 is not a painless process, so you should onlyundertake the work if you have some clearly identifi-able, relevant goals to meet.

Here at the University of Pennsylvania School ofMedicine, we fall into the second category: our websites are in English, but we occasionally handle datafrom a variety of foreign languages that don’t use theEnglish alphabet. We must receive, store, display, andtransmit these characters faithfully. Since we can’t reli-ably predict what sort of characters might come ourway, converting our applications to UTF-8 was the log-ical choice, since it can handle any language we mightneed to support.

The third category is for sites that don’t just occasion-ally handle foreign characters—they actually serve aninternational audience. In addition to using UTF-8,these sites typically employ various mechanisms thatallow visitors to choose the language for displayingcontent. One important term applied here is interna-tionalization, defined by the W3C as “[t]he process ofdesigning, creating, and maintaining software that canserve the needs of users with differing language, cultur-al, or geographic requirements and expectations” (seehhttttpp::////wwwwww..ww33..oorrgg//TTRR//wwss--ii1188nn--sscceennaarriiooss//). Anotherkey term is localization: “[t]he tailoring of a system tothe individual cultural expectations for a specific targetmarket or group of individuals.” Sites that are able todynamically perform localization for a variety of targetaudiences can do so because they’ve been configuredwith a good internationalization framework.

Internationalization and localization are substantialtopics, and are not the focus of this article. However,getting all the various components of your web appli-cation environment to place nicely together using UTF-8 is a necessary step before you can even try interna-tionalizing your site. So this article will be of interest tothose who only want to handle the occasional non-English characters, and to those who are contemplatingfully internationalizing their site.

Getting Ready for UTF-8The first step is determining the scope of your work. Ata minimum, you probably have PHP, a web server, anda database to consider. I’ll cover doing a UTF-8 conversion with PHP, Apache, and Oracle. If you are also using Oracle, then you must read An Overview on Globalizing Oracle PHP Applications athh tt tt pp :: // // ww ww ww .. oo rr aa cc ll ee .. cc oo mm // tt ee cc hh nn oo ll oo gg yy // tt ee cc hh //oo pp ee nn ss oo uu rr cc ee // pp hh pp // gg ll oo bb aa ll ii zz ii nn gg __ oo rr aa cc ll ee __ pp hh pp __aapppplliiccaattiioonnss..hhttmmll. It’s an excellent starting point, but,unfortunately, it doesn’t always explain the reasons

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

Solving the Unicode Puzzle

23

Page 24: php|architect (May 2005)

behind its recommendations, which means you’ll getstuck if things don’t happen to work after you follow itsinstructions. I’ll try to fill those gaps.

You also have to take a look at any other applicationsthat interact with PHP, your web server, or your data-base, as they will also be affected by a character setconversion. For us, that included Smarty, PDFlib, andexporting data to RTF, text files, and email, so I’ll dis-cuss those as well. Even if you have a different mix ofapplications, the concepts I’ll describe are probablyapplicable to your situation, although the implementa-tion specifics, obviously, will be different.

Configuring Apache, PHP, and OracleMost of the time, PHP web applications are run underthe Apache web server, which itself is running in a useraccount (assuming you’re in a Unix-ish environment).So, the first step is to set the environment of this

account correctly. Since PHP and Oracle are speaking toeach other through this account, it’s crucial to specifythe right character set for it, so they both know what toexpect. You do this by setting the NNLLSS__LLAANNGG environ-ment variable in the Apache configuration. The OracleOverview document mentioned above says to set it to..AALL3322UUTTFF88, but doesn’t fully explain why. So when thisdidn’t do the trick for me, I had to do some moreresearch. I looked up the Oracle Character Set descrip-tions and learned that ..AALL3322UUTTFF88 corresponds toUnicode 3.1. After talking with our DBA I learned thatour Oracle database was set to Unicode 3.0, whichmeant I needed to set NNLLSS__LLAANNGG==..UUTTFF88. Note that weultimately switched to ..AALL3322UUTTFF88, since it correspondsto the latest version of Unicode, and in Oracle it allowsfor conversion between UTF-16 and UTF-8 (just in caseyou ever need to do that). The moral of the story is thatNNLLSS__LLAANNGG should exactly match the character set you’reusing in Oracle.

What I just said contradicts the advice of the OracleOverview document, where it says NNLLSS__LLAANNGG should beset to match the client (in this case, PHP) but that itdoesn’t need to match the database character set.That’s technically true, but a mismatch will quickly leadto trouble if, for example, you try to insert records fromPHP that are in an encoding that’s not compatible withthe Oracle character set. If you’re going to switch toUTF-8, do it wholeheartedly: set PHP, your web server,

and your database all to UTF-8. This will save you theheadache of translating character encodings as youmove data around.NNLLSS__LLAANNGG is not the end of the story. It applies to the

communication between PHP and Oracle, but it does-n’t determine how characters are encoded within PHP,and it doesn’t influence how documents are served byApache. There are a few different approaches to consid-er for having Apache and PHP serve your web pages inUTF-8.

If you want all of the documents on your server to default to UTF-8, one option is to set the AAddddDDeeffaauullttCChhaarrsseett directive in the Apache configuration to UTF-8. Note, however, that the Apache documentation athhttttpp::////hhttttppdd..aappaacchhee..oorrgg//ddooccss--22..00//mmoodd//ccoorree..hhttmmll

does not express enthusiasm about this approach:“AAddddDDeeffaauullttCChhaarrsseett should only be used when all of

the text resources to which it applies are known to bein that character encoding and it is too inconvenient tolabel their charset individually. One such example is toadd the charset parameter to resources containing gen-erated content, such as legacy CGI scripts, that mightbe vulnerable to cross-site scripting attacks due to user-provided data being included in the output. Note,however, that a better solution is to just fix (or delete)those scripts…”

If you want all of your PHP-generated content to beserved in UTF-8, set ddeeffaauulltt__cchhaarrsseett==UUTTFF--88 in yourpphhpp..iinnii file. It’s OK if the PHP ddeeffaauulltt__cchhaarrsseett is differ-ent from what’s specified in Apache AAddddDDeeffaauullttCChhaarrsseett:the former will apply only to PHP files, and the latterwill apply to everything else.

If you want some (but not all) of your PHP documentsserved in UTF-8, you don’t have to modify pphhpp..iinnii.Instead, specify UTF-8 as the character set in theCCoonntteenntt--ttyyppee header of those files. It’s important topoint out here that you should set this header with thePHP hheeaaddeerr(()) function. If you try to set it with an HTMLMeta tag, and you’ve used Apache’s AAddddDDeeffaauullttCChhaarrsseettdirective to specify a different character set, the Apachedirective will override your Meta tag.

Now that you’ve configured how you want docu-ments served, you need to configure PHP so it caninternally handle UTF-8. This means enabling multi-byte character support. You’ll need to re-compile PHP

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

24

Solving the Unicode Puzzle

“Switching your web applications to Unicode

will give you the ability to correctly handle and display

any character from any language you’re likely to encounter.“

Page 25: php|architect (May 2005)

with the --eennaabbllee--mmbbssttrriinngg option (unless, of course,you had the foresight to do it previously), and setmmbbssttrriinngg..iinntteerrnnaall__eennccooddiinngg==UUTTFF--88 in your pphhpp..iinnii file.

Look over the PHP documentation for multi-bytestring functions at hhttttpp::////wwwwww..pphhpp..nneett//rreeff..mmbbssttrriinngg.Many of the PHP string functions have multi-byteequivalents. An example is the best way to illustratewhat this means. The multi-byte version of ssttrrlleenn(()) ismmbb__ssttrrlleenn(()). The ssttrrlleenn(()) function assumes that acharacter always occupies a single byte, so it actuallyreturns the length of a string in bytes, and does notnecessarily indicate the number of characters. In UTF-8,though, a string that is 4 characters long could occupyanywhere from 4 to 24 bytes depending on the pres-ence of multi-byte characters. The mmbb__ssttrrlleenn(()) functionwill correctly tell you the number of characters in sucha string, but the regular ssttrrlleenn(()) function won’t.

Because of all this, you should consider enablingPHP’s function overloading feature, described athhttttpp::////pphhpp..nneett//rreeff..mmbbssttrriinngg##mmbbssttrriinngg..oovveerrllooaadd.Activating function overloading will cause PHP to auto-matically assume it’s handling multi-byte strings, so—continuing with the example—it will actually executemmbb__ssttrrlleenn(()) when you call ssttrrlleenn(()). If you’re making awholesale conversion to UTF-8, and you don’t want torevise all of the string function calls in your existingcode, implementing function overloading makes sense.But there are a couple of caveats:

Watch out for calls to ssttrrlleenn(()) (or any other stringfunction) where it really is intended to work with thebyte length, not the character length. In that situation,function overloading will end up giving you an unin-tended result. Fortunately, there is a workaround formmbb__ssttrrlleenn(()): it accepts a character set specification as asecond argument and if you pass in ‘latin1’ (eventhough it’s actually handling a UTF-8 string). This willcause the string to be evaluated as if it were single-byteencoded. mmbb__ssttrrlleenn(($$yyoouurr__uuttff88__ssttrriinngg,, ‘‘llaattiinn11’’)) willgive you the number of bytes in a multi-byte string.

You may not want to do function overloading onmmaaiill(()). I’ll explain why in the discussion of email below.Note that if you haven’t upgraded to PHP 5, thehhttmmll__eennttiittyy__ddeeccooddee(()) function will return an error ifyou pass it a UTF-8 string. This was the only UTF-8incompatibility we found in PHP 4.3.

Going back to Oracle, starting with Oracle 9i, it pro-vides improved handling for multi-byte characters bygiving you a way to distinguish between byte lengthand character length. When creating a table, you canspecify whether its length is defined in terms of charac-ters or bytes. For example, VVAARRCCHHAARR22((2200 BBYYTTEE)) will giveyou a 20-byte length field, and VVAARRCCHHAARR22((2200 CCHHAARR)) willgive you a 20-character length field. The default is BBYYTTEE,which you can alter with the NNLLSS__LLEENNGGTTHH__SSEEMMAANNTTIICCSSparameter—see your Oracle documentation for moredetails.

Beware Windows-1252 in Web FormsAs I mentioned, other than UTF-8, the character encod-ing you’re most likely to find on English-speaking websites, these days, is Latin-1 (aka Western ISO-8859-1).One of the nice things about UTF-8 is that the first 256characters are the same as in Latin-1. That is, the Latin-1 ASCII characters and its Extended ASCII characterslive in the same numerical locations in UTF-8. If you’recurrently on Latin-1, this greatly eases the pain ofswitching to UTF-8.

So, the big “however” comes from—you guessed it—Windows. Fortunately, Windows NT, 2000, and XP useUnicode internally and shouldn’t cause headaches for aUTF-8 web site. But Windows 95 and 98 use theWindows-1252 character set. Its standard ASCII charac-ters from 0-127 are the same as Latin-1 and UTF-8, butits Extended ASCII set is different. If you have a form ona web page that’s UTF-8 encoded, and someone run-ning Windows 9x fills out the form by copying-and-pasting text from Microsoft Word, Extended ASCIIcharacters may be interpreted properly. You may haveexperienced this before: for example, the “©©” symbol inyour Word document turned into something like “ää”when you pasted it into a form. Nothing about thecharacter’s underlying data changed—the decimal rep-resentation of the character is the same as it wasbefore—it just means something different in UTF-8than it does in Windows-1252.

This was more of a problem in the past than it is now,as modern browsers try to transparently perform acharacter set conversion for you as needed in these sit-uations. But the problems are by no means entirelyresolved: see FORM submission and i18n athhttttpp::////ppppeewwwwww..pphh..ggllaa..aacc..uukk//~~ffllaavveellll//cchhaarrsseett//ffoorrmm--ii1188nn..hhttmmll for a thorough overview of all theissues related to this, as well as a rundown of how themajor browsers behave (if you’re wondering about themeaning of i18n, it’s short-hand for internationaliza-tion).

What makes this a truly maddening problem is con-verting a Latin-1 encoded database to UTF-8 whensome of the data in it came from Latin-1 encoded webforms where users pasted in Windows-1252 text, andtheir browsers didn’t convert the characters properly.There is no easy fix for this, as you simply have to lookat the records yourself to see if the Extended ASCIIcharacters are displaying as the user intended, or ifthere was a character set conversion problem along theway.

UTF-8 Support in SmartySmarty handles UTF-8 transparently—almost. The onetrouble spot is the eessccaappee modifier. It calls the PHP hhttmmlleennttiittiieess(()) and hhttmmllssppeecciiaallcchhaarrss(()) functions, butit doesn’t provide them with the necessary charsetargument so they’ll work with UTF-8. The solution is to

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com 25

Solving the Unicode Puzzle

Page 26: php|architect (May 2005)

override eessccaappee with your own custom version. Start bymaking a copy of the Smarty eessccaappee modifier, andtweak it to pass along a charset argument to PHP. Thenoverride the original with your custom version. If youwon’t always be using UTF-8, set your custom versionto accept a charset argument, so you can adjust thefunctionality as needed. Look up the “ExtendingSmarty with Plugins” section of the manual on theSmarty site—[http://smarty.php.net/]—for instructionson how to customize Smarty.

Exporting UTF-8 Data to PDF, RTF, Plain

Text, and EmailIt may not always be wise, or even possible, to keepdata encoded in UTF-8 when exporting to other for-mats. As you’ll see below, sometimes you need tochange the character set before performing the export.Take a look at PHP’s uuttff88__ddeeccooddee(()) and iiccoonnvv(()) func-tions to learn about converting UTF-8 to single-byteencoding. Note that uuttff88__ddeeccooddee(()), while easy to use,is limited to the Latin-1 character set (see the user con-tributed notes on the PHP uuttff88__ddeeccooddee(()) page for tipson dealing with other character sets).

Our applications require exporting data to PDF, RDF,text files, and email:

To generate PDF, we run the PDFlib application onour web server to create PDF documents on the fly.PDFlib is an application specifically designed for pro-cessing PDF data and dynamically generating PDF doc-uments—you can learn more about it athhttttpp::////wwwwww..ppddfflliibb..ccoomm//. For it to work with UTF-8data, you need to use it with a UTF-8 compatible font.The commonly used Windows TrueType fonts—Arial,Times New Roman, and Courier New—are Unicodecompliant. However, that doesn’t mean they can dis-play any Unicode character. They are fine for Englishand most Central and Eastern European languages. For more on this, see the Font section of Alan Wood’s Unicode Resources athhttttpp::////wwwwww..aallaannwwoooodd..nneett//uunniiccooddee//. It’s important tomention Microsoft’s Arial Unicode MS font, which isnot the same as the standard Arial font. Arial UnicodeMS can display characters from Arabic, Tamil, Thai,Hangul, Chinese, and many other languages. Thismeans the font itself is huge: approximately 23Mb. Ifyou try to use it with PDFlib running on your web serv-er, you may run into performance problems.

If you are using, for example, Microsoft Word, it’s easy to take a Unicode document and save it as an RTF file. It’s also not difficult to use a tool like RTF File Generator (available athhttttpp::////wwwwww..ppaaggggaarrdd..ccoomm//pprroojjeeccttss//rrttff..ggeenneerraattoorr//) togenerate RTF files using PHP, as long as the source datadoes not include characters from multiple languages. Itturns out to be quite difficult to use PHP to generate anRTF file when the source data is UTF-8 encoded and

contains characters from several different languages.This is because RTF requires you to specify a characterset for displaying the characters, and you can’t just say“Unicode.” You have to specify one or more ANSI, PC-8, Mac, or IBM PC character sets. This means you mustanalyze the multi-byte characters in a UTF-8 string andfigure out what characters they represent. Then youneed to specify in the header of the RTF file what char-acter sets are needed to display them: a Hebrew char-acter set for Hebrew characters, Arabic for Arabic, etc.Then in the body of the file you must flag the variouschunks of non-English text and indicate which of thesecharacter sets are needed to display them. Rather thanattempting this Herculean task, our solution is to do auuttff88__ddeeccooddee(()) on our data before generating RTF files,so that the text is all in Latin-1. At the moment we canget away with this since none of the data going into theRTF files we currently generate contain non-Englishcharacters. We are planning to eventually discontinueour RTF support, so this will not be a long-term prob-lem. Acquiring an understanding of how RTF workswith Unicode data was difficult—of all the applications

we encountered in this project, RTF was the least welldocumented when it came to Unicode.

We export data to text files, primarily in ..ccssvv formatfor use in spreadsheets. Surprisingly, current versions ofMicrosoft Excel do not support importing UTF-8 encod-ed text files. As with RTF, our solution is to perform auuttff88__ddeeccooddee(()) before generating these text files. Thisdoesn’t pose any problems for us since the kind of datawe put in spreadsheets does not contain any non-English characters.

As I mentioned, I do not recommend doing functionoverloading on the PHP mmaaiill(()) function. The reasonhas to do with line breaks. In Unix, a line break is rep-resented by a line feed (LLFF, or \\nn) character, on Macs,it’s represented by a carriage return (CCRR, or \\rr) charac-ter, and on Windows, by a CCRR++LLFF (\\rr\\nn). For email towork between platforms, an email standard was agreedupon in the early days of the internet, which is CCRR++LLFF.So, for example, on Unix, sendmail will add a CCRR as

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

26

Solving the Unicode Puzzle

“Unicode allows characters to

occupy more than one byte,

so it has enough room to

store characters from

languages around the world.“

Page 27: php|architect (May 2005)

needed to each LF it finds in the body of an email mes-sage. But when an email is UTF-8, PHP will first base64encode it before passing it off to sendmail. This encod-ing is done so that multi-byte UTF-8 characters can betransported within the 7-bit world of email (for moreabout this, see Advanced E-mail Manipulation by WezFurlong, php|architect Vol. 3, Iss. 5). Sendmail andother mailers do not attempt to wade through thebase64 encoding to “fix” the line breaks. Unless you’recareful to put CCRR++LLFF line breaks in all your PHP generat-ed emails before sending them, you’ll end up sendingemails with improper line breaks. This can have unpre-dictable results, as you’re at the mercy of the recipient’semail client software, and what it chooses to do withmalformed line breaks. In our testing, we found thatthe LLFF-only line breaks in our UTF-8 encoded emailswere interpreted as desired in Mac and Unix mail read-ers, and by Microsoft Outlook on Windows, but not byEudora 6.2 (and previous versions) on Windows. InEudora, the messages displayed with no line breaks atall. You can’t say it’s a Eudora bug, since the line breaksweren’t meeting the standard. At this time, the emailswe generate only contain basic English characters, sosticking with the standard mmaaiill(()) function meets ourneeds for now.

The Bumpy Road to Unicode ComplianceAs you can see, converting your web site to UTF-8 is byno means a painless process. But the payoff is worth itif you plan to support characters from several lan-guages. It’s also a fascinating educational experience:you’ll gain a stronger understanding of how Apache,Oracle, and PHP interact, how Unicode supports somany different languages, some of the gory details ofhow email works, how browsers deal with mismatchingcharacter sets, what a Unicode compliant font is, andmuch more. Even if you’re not using the same softwarediscussed in this article, hopefully I’ve at least imparteda sense of what kinds of problems you should look outfor. If nothing else, hopefully you’ll remember, “thereain’t no such thing as plain text.”

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com 27

Solving the Unicode Puzzle

Available Right At Your DeskAll our classes take place

entirely through the Internet

and feature a real, live instructor

that interacts with each student

through voice or real-time

messaging.

What You GetYour Own Web Sandbox

Our No-hassle Refund Policy

Smaller Classes = Better Learning

CurriculumThe training program closely

follows the certification guide—

as it was built by some of its

very same authors.

Sign-up and Save!For a limited time, you can

get over $300 US in savingsjust by signing up for our

training program!

New classes start every three weeks!

http://www.phparch.com/cert

About the Author ?>

To Discuss this article:

http://forums.phparch.com/219

Michael Toppa is a web applications developer at the University ofPennsylvania School of Medicine. He has previously worked for AskJeeves, E*TRADE, and Stanford University Libraries’ HighWire Press. Hecan be found on the web at wwwwww..ttooppppaa..ccoomm. Credit for a lot of theresearch in this article goes to all of the U Penn School of Medicine WebDevelopment team.

Page 29: php|architect (May 2005)

The hype around XML (the logical connection ofstructure and data within a document) remainsunbroken—there is no serious Content

Management System that doesn’t offer, at least rudi-mentary, XML support in one form or another.

The dominant APIs for XML processing are DOM(Document Object Model) and SAX (Simple API forXML), two APIs that focus more on tags and less ondata. The DOM API creates an XML document in a tree-like structure that is saved in memory for continuoususe. SAX is different: it runs through a document andfires events based on the contents of the XML it is pars-ing.

Even before there was XML, there was the DocumentObject Model, or DOM. It allows a developer to referto, retrieve, and change items within an XML structure,and is essential to working with XML. The DocumentObject Model is a platform- and language-neutral inter-face that will allow programs and scripts to dynamical-ly access and update the structure, content and style ofdocuments. For large XML documents the memory andprocessor resources consumed can be prohibitive,because building a DOM object is relatively processorintensive and the resulting DOM object usually con-sumes a large amount of memory.

The SAX parser is often used to process large XMLdocuments, but, unfortunately, it is poorly designed.Rather than being called by the parsing application, theSAX parser uses a message handler with callbacks—this

is not straightforward. The approach taken by SAXmakes the software architecture much more difficultthan it needs to be. Although the resulting code maylook sufficient, there are always some inherent prob-lems because SAX does not maintain information aboutthe current state—that’s up to you. This can be fixed bykeeping track of how deeply nested the start/end-ele-ment is and by using extra flags, but it always requiresadding extra state variables and code to do validation.Unlike that of DOM, the SAX specification is not a W3C(World Wide Web Consortium) standard; it was,instead, created by the members of the XML-DEV mail-ing list. SAX parser doesn’t build a tree structure of thedocument in memory, like DOM does—the XML docu-ment is read sequentially, and special events are fired ifthe parser recognizes a significant component of thedocument (e.g. a comment). The parser doesn’t keeptrack of previous elements—when it runs into a recog-nized chunk of the document, its work is done.

XMLPull is an alternative API for parsing XML.Perhaps you find the memory consumption too high or

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com 29

FFEE

AATT

UURR

EE

REQUIREMENTSPHP 4/5

OS N/A

Other Software N/A

Code Directory xmlpull

Despite the popularity of known APIs for XML processing,such as SAX and DOM, the XMLPull parser is finding moreand more followers. There are equivalent programs forJava, Python, and Perl, and Harry Fuecks is writing anequivalent implementation for PHP. PHP 5 also comes witha native extension called xmlReader.

XMLPullan Alternative to SAX and DOMby Markus Nix

Page 30: php|architect (May 2005)

the manipulation of data with SAX too involving. If so,it will pay to take a closer look at XMLPull. Parsing XMLwith XMLPull reflects the organization of data struc-tures and therefore code written to use the XMLPullparser is much easier to maintain. State information iskept, naturally, on the parser’s stack, as a consequenceof method calls that can be nested as many times asnecessary. Pull parsers offer big ease-of-use advantagescompared to SAX, but you may be left wondering ifthey can measure up SAX’s industrial-strength perform-ance. They can!

XMLPull was introduced in early 2002 by ringleadersfrom the two leading pull parser implementations,Stefan Haustein from the kXML project and AleksanderSlominski from XPP3 (XML Pull Parser). Both, feelingthat the lack of a common API hindered wider pull pars-ing adoption, began to work on XMLPull in December2001. The resulting API reflects their substantial experi-ence, drawing from their respective projects to producean interface that works well for a wide range of appli-cations.

XMLPull for Java, for example, supports everythingfrom J2ME (Java 2 Platform, Micro Edition) to J2EE (Java2 Platform, Enterprise Edition). The J2ME requirementforced the lead developers of XMLPull to create a sim-ple interface with the minimum number of classes nec-essary to function well in low memory environments. Incontrast, J2EE environments don’t usually suffer fromsuch limited resources, but, instead, demand flexibilityand performance. Accommodating both extremes witha single interface is tough.

According to the API introduction by AlexanderSlominski, “XML pull parsing allows incremental (some-times called streaming) parsing of XML where applica-tion is in control—the parsing can be interrupted at anygiven moment and resumed when application is readyto consume more input.”

While many Java programmers are already familiarwith XMLPull, this method of accessing an XML docu-ment is still strange to most PHP programmers. ThexxmmllRReeaaddeerr API is similar to SAX-API (which is frequentlyused for simple XML processing in PHP), but provides asimpler, more standard and more extensible interfaceto handle large documents than the existing SAX ver-sion. It should be noted that XMLPull has no notion ofcallbacks. Think of XMLPull as defining a special kind ofiterator that delivers an XML document’s componentsto you, one at a time. It is totally up to you to decidewhen you’re done with the current component, andready to move to the next one. The parser always holdsa particular state that matches the current componenttype. Many of the methods prove meaningful onlywhen the parser is in a particular state, which is identi-fied by a set of constant definitions.

The Java API allows you choose the detail level thatyour program will see. This is a very powerful feature

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

30

XMLPull: An Alternative to SAX and DOM

Listing 1

1 <?php

2 /**

3 * @package XML_ SAXFilters

4 * @version $Id: XML_XMLPull.php,v 1.4 2003/09/12 20:58:24 har-

ryf Exp $

5 */

6

7 /**

8 * Includes

9 */

10 if ( !defined( ‘XML_XMLPULL’ ) ) {

11 define( ‘XML_XMLPULL’, ‘XML/’ );

12 }

13

14 if ( !defined( ‘XML_SAXFILTERS’ ) ) {

15 define( ‘XML_SAXFILTERS’, ‘XML/’ );

16 }

17

18 require_once( XML_XMLPULL . ‘XmlPull/PushListener.php’ );

19 require_once( XML_XMLPULL . ‘XmlPull/PullParser.php’ );

20

21 /**

22 * Factory function for creating the pull parser

23 * @param string parser type (‘Expat’or HTMLSax’)

24 * @param string reader type (‘File’, ‘String’or ‘Struct’)

25 * @param mixed source to read (e.g. string, file path, struct)

26 * @access public

27 * @return PullParser

28 * @package XML_XMLPull

29 */

30 function &XML_XMLPull_create Parser( $parserType, $readerType,

&$input ) {

31 switch ( strtolower( $readerType ) ) {

32 case ‘file’:

33 require_once XML_SAXFILTERS .

‘/SaxFilters/IO/StringReader.php’;

34 $reader = &new XML_SaxFilters_File Reader( $input

);

35

36 break;

37

38 case ‘string’:

39 require_once XML_SAXFILTERS .

‘/SaxFilters/IO/StringReader.php’;

40 $reader = &new XML_SaxFilters_StringReader( $input

);

41

42 break;

43

44 case’struct’:

45 require_once XML_SAXFILTERS .

‘/SaxFilters/IO/StructReader.php’;

46 $reader = &new XML_SaxFilters_StructReader( $input

);

47

48 break;

49

50 default:

51 return PEAR::raise Error( ‘Unrecognized reader

type: ‘ . $readerType );

52 break;

53 }

54

55 switch ( strtolower( $parserType ) ) {

56 case ‘expat’:

57 require_once XML_XMLPULL .

‘XmlPull/ExpatPushListener.php’;

58 $push = &new XML_XMLPull_Expat( $reader );

59

60 break;

61

62 case ‘htmlsax’:

63 require_once XML_XMLPULL .

‘XmlPull/HTMLSaxPushListener.php’;

64 $push = &new XML_XMLPull_HTMLSax( $reader );

65

66 break;

67 }

68

69 return new XML_XMLPull_Pull Parser( $push );

70 }

71

72 ?>

Page 31: php|architect (May 2005)

when talking about layering. The original SAX interfacedid not report all of the information needed to validatea document, so developers had to build special meth-ods into their parsers, if they wanted to support valida-tion.

A new Java Community Process (JCP) specificationrequest specifies a standard API for Java pull parsers:

JSR-173 (Streaming API for XML). Like SAX, XMLPull isa W3C recommendation, as the only existing referenceimplementations are explicitly Java based (see the XMLPull API at hhttttpp::////xxmmllppuullll..oorrgg//). A PHP Implementation by Harry Fuecks(hhttttpp::////ssoouurrcceeffoorrggee..nneett//pprroojjeeccttss//hhttmmllssaaxx) is current-ly in the testing phase. It can be picked up from CVS:

cvs \-

d:pserver:[email protected]:/cvsroot/htmlsax \

logincvs -z3\

-d:pserver:[email protected]:/cvsroot/htmlsax \

co xmlpull

If you know how callback functions work in the SAXParser, the interface of the XMLPull Parser is easy tounderstand: a simple factory method is enough toestablish a Parser- or Reader-type. The document is eas-ily iterated to capture the parts of the document thatare of interest. The HTMLSAX XMLPull implementationcontinues in the spirit of the original JAVA specification,and supplies a simple interface, versatility, usage, andgood performance.

Sax Pushes, XMLPull PullsPull Parser is turning the paradigm of SAX Parsersaround. Instead of forcing the parser to execute prede-fined callback functions when a certain component of adocument is reached, it is instead asked to reply withthe next component. This results in “pulling” instead of“pushing”, and makes data processing easier.

In the Java Community, there is a certain hype thatsurrounds pull-parsing, because, unlike SAX (or ratherSAX2, if you prefer working with namespaces), it willgive control of the parsing event back to the develop-er, instead of relying on a “black box.” XMLPull allowsincremental (streaming) parsing, so it is possible topause the parser in its work, for example, to wait for thearrival of new data in unpredictable surroundings (suchas when pulling data from a remote server). J2ME is a

parser variant that is made for such surroundings: goodperformance with a small footprint.

The PHP implementation follows the Java-API in mostscenarios. The principle of parsing, using pull, is veryeasy: the parser iterates over a data stream with theppaarrssee(()) method, and travels from event to event. Thevarious event types are replied as values that relate to

constants, with the original ggeettEEvveennttTTyyppee(()) method:SSTTAARRTT__DDOOCCUUMMEENNTT, SSTTAARRTT__TTAAGG, TTEEXXTT, EENNDD__TTAAGG, and EENNDD__DDOOCC--UUMMEENNTT. In PHP, these differ slightly: XXMMLL__PPUULLLL__SSTTAARRTT__TTAAGG,XXMMLL__PPUULLLL__EENNDD__TTAAGG, XXMMLL__PPUULLLL__TTEEXXTT and XXMMLL__PPUULLLL__PPII.XXMMLL__PPUULLLL__SSTTAARRTT__TTAAGG offers information about the start

tag of an element including information about theattributes. XXMMLL__PPUULLLL__TTEEXXTT delivers CCDDAATTAA information.The other conditions are self-explained. The parsing ofa XML document with XMLPull can be seen in Listing2.

At the time of writing, Fuecks’ Pull Parser supportsfour conditions that are represented through the con-stants that I’ve mentioned above. In addition to thesemain four, there are also XXMMLL__PPUULLLL__EESSCCAAPPEE andXXMMLL__PPUULLLL__JJAASSPP—these are useful only when workingwith the PEAR-Package (also written by Harry Fuecks).Support for namespaces is currently missing.

Most SAX parsers are built on top of a pull parsinglayer. It is an interesting challenge to expose both thepull and push layers to the user, but such functionalityallows a developer to use pull parsing when needed,without having to stop using the SAX API.

It is possible to convert a pull parser into a pushmodel—during pull parsing, the caller has control overparsing and can push events. It is also possible to con-vert push into pull parsers, but this requires that allevents be buffered, and converted from SAX callbacks.An alternative implementation of this conversioninvolves an extra thread that can be used to pull moredata from the SAX parser, but is kept suspended untilthe user asks for more events. This approach is bestexemplified by Fuecks’ Pull Parser Wrapper for SAX thatallows conversion from a SAX model into an XML pullparser. The parser-implementation by Fuecks is basedon the XML_SaxFilters PEAR Package (seehhttttpp::////ppeeaarr..pphhpp..nneett//ppaacckkaaggee//XXMMLL__SSaaxxFFiilltteerrss), and uses PEAR’s iteration mechanism extensively. The PHP implementation of the SAX filtercode was originally from Luis Argerich (hhttttpp::////pphhppxxmmll--ccllaasssseess..ssoouurrcceeffoorrggee..nneett//sshhooww__ddoocc..pphhpp??ccllaassss==ccllaassss__ssaaxx__ffiilltteerrss..hhttmmll), and was mentioned in greaterdetail in the Wrox Press title “PHP 4 XML.” Fuecks’

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com 31

XMLPull: An Alternative to SAX and DOM

“Code written to use the XMLPull parser is much easier to maintain.”

Page 32: php|architect (May 2005)

PEAR-Package has almost no commonality to the initialimplementation which neither used the PEAR interfacesfor DataReader and DataWriter, nor did it have the abil-ity to parse XML documents recursively, or by using fil-ters.

The idea behind SAX filters is simple: the code usedto parse XML documents can be created modularly,and is therefore easier to implement. The parser dele-gates events to filters; filters forward events to other fil-ters, and so on. The developer uses SAX to describe fil-ter compositions that are flexible and powerful likeDOM—but freely created.

The PHP implementation is conducted differentlyfrom its Java, Python and Perl colleagues—these otherimplementations use the parent-child concept morestringently. In the PHP version, the parser itself is theleader of the “filter family”; it may have children but noparents. The only task of the parser is to forward theXML data from a reader to an appropriate filter. Thedescribed XMLPull implementation uses native XMLprocessing, and is based on events, which allowsXMLPull to treat resources more gently than a similarlyfunctioning DOM parser.

It should be noted that, although the PHP XMLPullimplementation is lighter than SAX and DOM, bench-marks are not available at this time. The ideal use ofXMLPull in PHP is in the processing of small documents,

FFEEAATTUURREEXMLPull: An Alternative to SAX and DOM

Listing 2

1 <?php 2 require_once ‘XML_XMLPull.php’; 34 $test = <<<EOD 5 <?xml version=”1.0”?> 6 <config> 7 <database host=”lost host”> 8 <dbname>test</dbname> 9 <user>HarryF</user>

10 <pass>Secret</pass> 11 </database> 12 </config> 13 EOD; 14 /* test with character string as source */ 15 $parser = &XML_XMLPull_create Parser( ‘htmlsax’, ‘string’,$test ); 1617 /* that’s how XML Pull processes events - easier than SAX? */ 18 while ( $event = $parser->parse() ) { 19 switch ( $type = $event->getType() ) { 20 case XML_PULL_START_TAG: 21 echo ‘<hr />’; 22 echo ‘Start tag: ‘ . $event->getName() . ‘<br/>’; 23 echo ‘Attributes: <pre>’; 24 print_r( $event->getAttribs() ); 25 echo ‘</pre>’; 2627 break; 2829 case XML_PULL_END_TAG: 30 echo ‘End tag: ‘ . $event->getName() . ‘<br />’; 31 echo ‘<hr />’; 3233 break; 3435 case XML_PULL_TEXT: 36 echo ‘Text: ‘ . $event->getText() . ‘<br />’; 37 break; 38 } 39 } 40 ?>

Award-winning IDE for dynamic languages,

providing a powerful workspace for editing,

debugging and testing your programs. Features

advanced support for Perl, PHP, Python, Tcl and

XSLT, on Linux, Solaris and Windows.

Download your free evalutation at www.ActiveState.com/Komodo30

Page 33: php|architect (May 2005)

where performance is a concern, and when the devel-oper wishes to access particular elements within thesedocuments. Another drawback is that XML documentvalidation is not planned in the PHP 4 build of the expatlibrary (by James Clark) and, therefore, isn’t available inXMLPull, but this is a shortage that you can easily over-look if you consider the ease of processing withXMLPull.

The Java implementation of XMLPull by AleksanderSlominski was originally intended to parse SOAP docu-ments, but XMLPull’s worth was proven when develop-ers discovered that its purpose was the tip of the ice-berg: applications driven by XMLPull are clear, even incomplex XML documents—especially compared toSAX.

How can code that is easier to write and maintainachieve more than the accepted implementation? Theanswer is selective control. With just one pull parser,you can call methods that work directly on special com-ponents within a document. With SAX, however, youare at the mercy of the parser: it makes you processeverything that it delivers. Direct control over the pars-er simplifies the source code. As the developer, you getto decide when a given element is processed, andwhen to continue with the next one. This is the funda-mental difference in the handling of different event-based parsers. Pull parser remembers the state of thein-process component.

In the PHP implementation, the code iterates with a

wwhhiillee loop, using the ppaarrssee(()) method to retrieve data.The famous Java implementation offers two kinds ofiteration; one supports fewer scenarios than the other.We’ll concentrate on the more powerful iteration type,with very little focus on validating the XML document.We will also cover one of the more flexible parts ofXMLPull: layering.

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

XMLPull: An Alternative to SAX and DOM

33

Listing 3

1 <?xml version=”1.0”?> 23 <books xmlns:dc=”http://purl.org/dc/elements/1.1/”> 4 <!— This is a comment —> 5 <book id=”1” isbn=”3-8266-0612-4”> 6 <title>Apache Web-Server</title> 7 <year>2000</year> 8 <dc:subject>Webserver</dc:subject> 9 </book>

10 <book id=”2” isbn=”3-8266-0550-0”> 11 <title>Linux f&#xFC;r Internet und Intranet</title> 12 <year>2000</year> 13 <dc:subject>Operating Systems</dc:subject> 14 </book> 15 </books>

Listing 4

1 <?php 2 $reader = new xmlReader(); 3 $reader->open( ‘books.xml’ ); 45 while ( $reader->read() ) { 6 switch ( $reader->nodeType ) { 7 case XMLREADER_ELEMENT: 8 print “\n “ . str_repeat( “ “, $reader->depth ); 9 print $reader->localName . “ “;

1011 if ( $reader->namespaceURI ) { 12 print “NS: $reader->namespaceURI “; 13 } 1415 break; 16 } 17 } 18 ?>

Listing 5

1 <?xml version=”1.0”?> 23 <chevrolet_models> 4 <model name=”Alero” /> 5 <model name=”Astro” /> 6 <model name=”Avalanche” /> 7 <model name=”Beretta” /> 8 <model name=”Bel Air” /> 9 <model name=”Beretta” />

10 <model name=”Biscayne” /> 11 <model name=”Blazer” /> 12 <model name=”Camaro” /> 13 <model name=”Caprice 14 <model name=”Corvair” /> 15 <model name=”Corsica” /> 16 <model name=”Cavalier” /> 17 <model name=”Chevelle” /> 18 <model name=”Cheyenne” /> 19 <model name=”Corvette” wishlist=”yes” /> 20 <model name=”G30” /> 21 <model name=”El Camino” /> 22 <model name=”Lumina” /> 23 <model name=”Impala” /> 24 <model name=”Nova” /> 25 <model name=”Malibu” /> 26 <model name=”Monza” /> 27 <model name=”Pick Up” /> 28 <model name=”Suburban” /> 29 <model name=”Tahoe” /> 30 <model name=”Trailblazer” /> 31 <model name=”Trans Sport” /> 32 <model name=”Transcar” /> 33 <model name=”Van” /> 34 <model name=”Matiz” europe_only=”yes” /> 35 <model name=”Karos” europe_only=”yes” /> 36 <model name=”Lacetti” europe_only=”yes” /> 37 <model name=”Nubira” europe_only=”yes” /> 38 <model name=”Rezzo” europe_only=”yes” /> 39 <model name=”Evanda” europe_only=”yes” /> 40 </chevrolet_models>

Listing 6

1 <?php 23 $reader = new xmlReader(); 4 $reader->open( ‘listing5.xml’ ); 56 while ( $reader->read() ) { 7 switch ( $reader->nodeType ) { 8 case XMLREADER_ELEMENT: 9 if ( $reader->name != “model” || !$reader-

>hasAttributes ) { 10 continue; 11 } 1213 echo “Model found:<br />\n”; 14 $attr = $reader->moveToFirstAttribute(); 1516 while ( $attr ) { 17 echo “{$reader->name}: {$reader->value}<br/>\n”; 18 $attr = $reader->moveToNextAttribute(); 19 } 2021 echo “<br />\n”; 22 break; 23 } 24 } 2526 ?>

Page 34: php|architect (May 2005)

Everything’s Different with PHP 5In PHP 4, XML support was mainly SAX based, and wasimplemented with the ddoommxxmmll extension. Later, the xxssllttextension (with Sablotron as the backend) was added.During the PHP 4 life cycle, additional features like DTDvalidation were added to the ddoommxxmmll extension.Unfortunately, since the xxsslltt and ddoommxxmmll extensionsnever really left the experimental stage, they werenever enabled in PHP’s default configuration.Furthermore, the ddoommxxmmll extension did not implementthe DOM standard defined by the W3C, but had itsown method naming scheme. While this was improvedin the 4.3 series of PHP, it never reached a truly stablestage, and it was almost impossible to really fix thedeeper issues.

Therefore, almost everything related to XML was

rewritten for PHP 5: all the XML extensions are nowbased on the excellent lliibbxxmmll22 library which was devel-oped the GNOME project. This allows interoperabilitybetween different extensions, allowing the core devel-opers to work with a single underlying library. All of theXML extensions now support PHP streams throughout,even if you try to access a stream that is not directlyfrom PHP. Basically, you can access a PHP stream any-where you can access a normal file. PHP 5 now sup-ports DOM according to the W3C standard, and itincorporates standards-compliant XSLT with the veryfast lliibbxxsslltt engine. PHP also now has its ownSSiimmpplleeXXMMLL extension.

In PHP 5, there is also a pull parser implementationthat was written in C, and deserves special notice:xxmmllRReeaaddeerr by Rob Richards and Christian Stocker. Basedon lliibbxxmmll, this implementation has what it takes todevelop to a powerful alternative to SAX-Parser.xxmmllRReeaaddeerr is like SAX in that it does not load the com-plete document into memory and it is only suitable forreading XML (no writing).

Remember, the essential difference between SAXbase parsers, and those centered around an XMLPullimplementation is: SAX based parsers push events tothe user, while pull parsers retrieve information only onrequest, allowing the extension to control its own cur-sor.xxmmllRReeaaddeerr for PHP 5.0 is available for download from

the PECL repository, and is quite easy to install with thePEAR installer, when available.

pear install xmlReader

If your PHP installation is on Linux, or a similar platform,and you would like to experiment with a more currentversion of xxmmllRReeaaddeerr, or if (for example) the PEAR installer isn’t available, then you can get the source directly from the CVS athhttttpp::////ccvvss..pphhpp..nneett//ccvvss..pphhpp//ppeeccll//xxmmllrreeaaddeerr. You’llalso need a copy of the PHP source code. To compilexxmmllRReeaaddeerr, copy it into the eexxtt// directory of the PHPsource code and execute the following commands:

phpize./configure –-with-xmlreadermake

Once these commands have run their course, you willsee a file called xmlreader.so in the module/ subdirec-tory, which can be loaded into PHP via php.ini, or withthe dl() function.

If you’re using Windows, a pre-built DLL file can befound at hhttttpp::////ssnnaappss..pphhpp..nneett//wwiinn3322//. This file can beloaded in the same way as the ..ssoo file, mentionedabove.

In the upcoming PHP 5.1 release, xxmmllRReeaaddeerr has beenmerged into the core PHP distribution, but it may notbe activated by default.xxmmllRReeaaddeerr’s only external dependency is

lliibbxxmmll22..22..66..xx—the same as DOM and SimpleXML.The extension is still relatively new and is, therefore, notvery well documented. There are, however, a numberof slides available from presentations given by ChristianStocker. There is also some demo code in the sourcerepository. If you are interested in more detailed infor-mation, you will have to dig into the C API.

The lack of documentation, and the relative immatu-rity of xxmmllRReeaaddeerr, should not keep you from experi-menting with it. It is faster, more actively developedand maintained, and supports numerous features thathave become standard in the XML processing world(e.g. namespaces). Another strong benefit is that usingxxmmllRReeaaddeerr to parse documents requires less code thanSAX to achieve the same goals.

To demonstrate the simplicity of xxmmllRReeaaddeerr, let’s takea look at the XML document in Listing 3. The next bitof code, Listing 4, shows how easy it is to work with thisdata.

Similar to Harry Fuecks’ implementation, the internalcursor moves from node to node, within the XML doc-ument, and passes individual elements back to thehosting application. The type of node that is currentlyin focus can easily be determined from the value of

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

34

XMLPull: An Alternative to SAX and DOM

“The parsing can be interrupted at any given moment and resumed

when application is ready to consume more input.”

Page 35: php|architect (May 2005)

$$rreeaaddeerr-->>nnooddeeTTyyppee. Within this property lies an integervalue which represents the opening tag, closing tag,text data, attributes, etc. xxmmllRReeaaddeerr will populate thennooddeeTTyyppee property with the value of one of the follow-ing constants, if applicable: XXMMLLRREEAADDEERR__EELLEEMMEENNTT

(opening tag), XXMMLLRREEAADDEERR__EENNDD__EELLEEMMEENNTT

(closing tag), XXMMLLRREEAADDEERR__AATTTTRRIIBBUUTTEE, XXMMLLRREEAADDEERR__TTEEXXTT

(text between tags), XXMMLLRREEAADDEERR__CCDDAATTAA, XXMMLLRREEAADDEERR__CCOOMMMMEENNTT, XXMMLLRREEAADDEERR__PPII, XXMMLLRREEAADDEERR__NNOONNEE (isnot yet within a nodes), XXMMLLRREEAADDEERR__EENNTTIITTYY and XXMMLLRREEAADDEERR__XXMMLL__DDEECCLLAARRAATTIIOONN.xxmmllRReeaaddeerr can retrieve attributes associated with a

node in a number of ways. The easiest method is bypassing the name of the desired attribute toggeettAAttttrriibbuuttee(()). If you don’t know which attributes arein a given node, you can reference attributes by numer-ic index, using the ggeettAAttttrriibbuutteeNNoo(()) method.Furthermore, it is possible to move the cursor with themmoovveeTTooFFiirrssttAAttttrriibbuuttee(()) and mmoovveeTTooNNeexxttAAttttrriibbuuttee(())methods. Listing 5 shows an example of an XML docu-ment where certain elements have arbitrary attributes.Listing 6 shows how to parse and capture attributes.

Information about the XML node that is currently inscope can be retrieved from the xxmmllRReeaaddeerr object. Anode’s name, value and attributes are not the only val-ues that can be determined from the object, though.

The following table shows the most important attrib-utes:

nodeType type of current nodename name of the XML-Elements or #Text

for CDatavalue value of the nodehas Attributes reports if a tag has attributesattributeCount number of attributes of a tagdepth node depthisEmptyElement reports if a tax is empty

One of the big disadvantages of SAX is the high com-plexity of the program code when parsing XML docu-ments with a high node depth. xxmmllRReeaaddeerr offers an ele-gant solution for this problem: elements that aren’t ofinterest to our code can be skipped, easily, by using thenneexxtt(()) method, in which an entire tree is ignored, andthe cursor moves forward to the next element of thesame level.

If you would prefer to pass the active part of a treewith a different PHP 5 XML extension, xxmmllRReeaaddeerr canhandle this, elegantly. With the eexxppaanndd(()) method, partof the XML tree can be easily opened up:

$some_node = $reader->expand();$simple = simplexml_import_dom( $node );

Brilliant, isn’t it?

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

XMLPull: An Alternative to SAX and DOM

35

Page 36: php|architect (May 2005)

Validation

Since xxmmllRReeaaddeerr is based on libxml2, like the DOMextension, it is possible to validate documents, in addi-tion to parsing them. This validation can even be han-dled with the assistance of a DTD or with RelaxNG(hhttttpp::////wwwwww..rreellaaxxnngg..oorrgg//).

Validating against a DTD, is as simple as adding aDocument Type Declaration tag to your document:

<!DOCTYPE chevrolet_models SYSTEM“chevrolet_models.dtd”>

The XXMMLLRREEAADDEERR__LLOOAADDDDTTDD option must be set to ttrruuee toload the DTD. If the document should then be validat-ed against a loaded DTD, the XXMMLLRREEAADDEERR__VVAALLIIDDAATTEEoption must be set. Options can be set with thesseettPPaarrsseerrPPrrooppeerrttyy(()) method, and must be specifiedafter the ooppeenn(()) method is called, but before the firstcall to the rreeaadd(()) method. To check if the document isvalid, you can call the iissVVaalliidd(()) method, and check itsreturn value, which will be either ttrruuee or ffaallssee.

Another useful feature allows the DTD to specifydefault attribute values. xxmmllRReeaaddeerr offers an option topopulate absent attributes with the DTD’s defaultvalue, when available.

Similarly, when using a RelaxNG schema, instead of aDTD, the sseettRReellaaxxNNGGSScchheemmaa(()) andsseettRReellaaxxNNGGSScchheemmaaSSoouurrccee(()) methods allow the valida-tion of an XML document. The vvaalliidd(()) method is stillcalled in the same manner.

The table in Figure 1 gives an overview of the meth-ods in the xxmmllRReeaaddeerr extension.

ConclusionSAX parsers have proven their efficiency in severalapplications, but nobody can hide that processingcomplex XML documents with SAX will unavoidablyresult in confusing code, no matter what the program-ming language. The application will become more con-fusing and, thus, more difficult to maintain with everynested parse of the XML document. In the past, solvingthis problem required a series of code hacks (e.g. thedelegation of events to sub classes that represent spe-cialized elements). With XMLPull, the code can bemuch simpler.

Pull parsing is ideally suited for applications that needto transform XML into other formats—this process istypically complex; the code must retain state informa-tion during parsing. Using SAX would require the code

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

36

XMLPull: An Alternative to SAX and DOM

BBooooll cclloossee(()) frees resources

BBooooll eexxppaanndd(()) builds up the tree under the current node

ssttrriinngg ggeettAAttttrriibbuuttee((ssttrriinngg aatttt)) returns the value of an attribute

ssttrriinngg ggeettAAttttrriibbuutteeNNoo((iinntt nnoo)) returns the value of the attribute at given index

ssttrriinngg ggeettAAttttrriibbuutteeNNss((ssttrriinngg aatttt,, ssttrriinngg nnss)) returns the value of the attribute in a namespace

bbooooll ggeettPPaarrsseerrPPrrooppeerrttyy((iinntt pprroopp)) returns if an option is set

bbooooll iissVVaalliidd(()) checks if the document is valid

ssttrriinngg llooookkuuppNNaammeessppaaccee((ssttrriinngg pprreeffiixx)) returns the Namespace-URI from the current node for the given prefix

bbooooll mmoovveeTTooAAttttrriibbuuttee((ssttrriinngg aatttt)) moves the cursor to the declared attribute

bbooooll mmoovveeTTooAAttttrriibbuutteeNNoo((iinntt nnoo)) moves the cursor to the attribute of the declared position

bbooooll mmoovveeTTooAAttttrriibbuutteeNNss((ssttrriinngg aatttt,, ssttrriinngg nnss)) moves the cursor to the given attribute (with namespace)

bbooooll mmoovveeTTooEElleemmeenntt(()) moves the cursor back to to current element again

bbooooll mmoovveeTTooFFiirrssttAAttttrriibbuuttee(()) moves the cursor to the first attribute

bbooooll mmoovveeTTooNNeexxttAAttttrriibbuuttee(()) moves the cursor to the next attribute

bbooooll nneexxtt(()) moves the cursor to the next element on the same level

bbooooll ooppeenn((ssttrriinngg uurrii)) opens a XML document

bbooooll rreeaadd(()) moves the cursor to the next element in the document

bbooooll sseettPPaarrsseerrPPrrooppeerrttyy((iinntt pprroopp,, bbooooll vvaalluuee)) sets a parser option

bbooooll sseettRReellaaxxNNGGSScchheemmaa((ssttrriinngg ffiillee)) sets the filename for a RelaxNG-Scheme

bbooooll sseettRReellaaxxNNGGSScchheemmaaSSoouurrccee((ssttrriinngg sscchheemmaa)) sets a RelaxNG-Scheme out of a string

bbooooll XXMMLL((ssttrriinngg xxmmll)) reads a XML document out of a string

Figure 1

Page 37: php|architect (May 2005)

to maintain state between callbacks to be able to deter-mine the correct response to SAX events. In pull pars-ing, applications can be structured naturally, and infor-mation can be pulled from XML as needed. Your appli-cation can pull the next event when it is ready to beprocessed.

The results for Java performance tests show thatXMLPull parsers perform very well compared to theolder SAX2 parsers, even when working on large docu-ments. Dennis M. Sosnoski published detailed bench-marks using large document collections which consist-ed of small, mid-sized, and large documents. He testedfive SAX2 parsers (including Xerces2, AElfred2, andPiccolo) and two XMLPull implementations, namelykXML, a compact, J2ME-compatible parser, and XPP3,a compact parser originally designed for SOAP. TheXMLPull parsers performed extremely well with thesmall documents, beating all the SAX parsers exceptPiccolo. AElfred2 and Xerces2 both delivered accept-able performance, although they took more than twiceas long as Piccolo. For the tested mid-sized XML docu-ments, there was little performance difference betweenthe XMLPull parsers and most of the SAX2 parsers. Theperformance range here is much smaller than for thesmall documents, however. The large document resultsshow a smaller performance difference than the mid-

sized documents. For xmlReader, Christian Stocker hasshown impressive benchmarks on his website(hhttttpp::////ssvvnn..bbiittfflluuxx..oorrgg//rreeppooss//ppuubblliicc//pphhpp55eexxaammpplleess//llaarrggeexxmmll//ffuullllddooccuu..ppddff) that compare different XMLextensions when parsing documents in the 10 MBrange. Result: xmlReader is the fastest, least resourceintensive solution for extracting special informationfrom a XML document. This is because the nneexxtt(())method allows the main processing to take place in theC backend, and not in PHP, directly.

xmlReader is not just of interest for beginners, whoare often frightened by the complexity of SAX andDOM. It unites the strength of SAX and DOM, withoutcarrying their weaknesses. Reason enough to mergexmlReader extension into the core PHP distribution.

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

XMLPull: An Alternative to SAX and DOM

37

About the Author ?>

To Discuss this article:

http://forums.phparch.com/220

Markus Nix (mmnniixx@@ddooccuuvveerrssee..ddee) is a freelancing developer cur-rently working for the German company Mayflower. He concentrates onthe subjects PHP, Java, XML and Content Management.

Page 40: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

40

When we parted company last time, we had justfinished being a bit unkind about PHP’s nativesession handling functionality. You may recall

that we cited several key reasons why it was not espe-cially well suited to the enterprise.

First and foremost, we blasted its non-OOP approachin an increasingly OOP-friendly PHP, particularly version5. Second, we attacked its relatively poor security, anddiscussed how a determined intruder could breachanother user’s session with relative ease. Third, we crit-icized its mechanism of data storage, which we deter-mined was unlikely to be suited to multi-server orshared hosting environments.

At the very end of part one, we declared that with abit of work, we could do a whole lot better ourselves.Accordingly, over the next few pages, we’re going toput together a toolkit of classes to replace and improveupon PHP 5’s native session management. We’ll makethese classes as modular as possible, so that with rela-tively little modification they can be painlessly droppedright into our own real-world applications.

SecurityThe first requirement we will look at is that of security.

If you’ll recall from last month, we highlighted a coupleof fundamental problems with PHP’s session securitywhich we need to sort out.

The first concern is the identifier generated by PHP torepresent a session, and stored as the value of the ses-sion cookie by the browser—the session identifier. Wedetermined that PHP uses an md5 serialization of somereasonably random value to generate a session identifi-er. The key words here are “reasonably random”—it isbased on the system clock, which has a finite resolu-tion. The other constituents of the hash are the remoteIP address and the contents of $$__SSEERRVVEERR, whichalthough reasonably unique to the server are certainlyin no way necessarily unique to the session. This makesit fairly trivial for an intruder to make educated guessesat valid session identifiers by simply using brute force.

REQUIREMENTSPHP 5

OS Linux/UNIX or Windows

Other Software N/A

Code Directory advanced

More onAdvanced Sessions andAuthenticaion on PHP5by Ed Lecky-Thompson

FFEE

AATT

UURR

EE

Native session support has been present in PHP since ver-sion 4, but its lack of sophistication means it is often foundwanting in enterprise-level development environments. Inthis two part article, we’ll tackle sessions from the groundup; from recapping PHP’s built-in support right through tothe development of a sophisticated set of classes, especial-ly optimized for session handling and authentication inPHP 5.

Page 41: php|architect (May 2005)

Written and edited by four members of the Zend Education Board, the same body that prepared the exam itself, and officially sanctioned by Zend Technologies, this book

contains 200 questions that cover every topic in the exam. Each question comes with a detailed answer that not only provides the best choice, but also explains the relevant theory and the reason why a question is structured in a particular way.

The Zend PHP Certification Practice Test Book is available now directly from php|architect, from most online retailers (such as Amazon.com and BarnesandNoble.com) and at bookstores throughout the world.

Get your copy today at http://www.phparch.com/cert/mock_testing.php

We’re proud to announce the publication of The Zend PHP Certification Practice Test Book, a new manual designed specifically to help candidates who are preparing for the Zend Certification Exam.

Any more, and

we’d have to

take the exam

for you!

Available in both PDF and Print

Page 42: php|architect (May 2005)

Second, there are really no efforts made to prohibitsession hijacking, as described above. While obfuscat-ing session identifiers is a good start, there are a fewsimple steps that can be taken to help detect a mali-cious user guessing session identifiers, and not onlyreject the offered session identifier as invalid, but actu-ally destroy the session that is being guessed at, in thefirst place, precluding any subsequent guesses fromtaking place.

So, to tackle these one at a time, let’s improve theobscurity of our session identifier. We should use thingswhich are not only unique to our server and to our user,but actually unique to the moment in time at which theidentifier is generated. A good list might be:

• Current system clock• User’s remote IP address• User’s remote User Agent string• Server eth0 IP address

Concatenating all of the above and piping it into mmdd55(())will produce our session identifier.

Now, let’s see how we can best tackle the presenta-tion of a seemingly valid session identifier by an intrud-er. With session identifiers as unique as the above, thiswill be a rarity, but it’s better to be safe than sorry.

There are a few simple tricks we could use to deter-mine whether or not the session identifier being pre-sented is not only legitimate but actually offered by therightful owner.

The first is to perform simple checks on things wethink should remain fairly consistent from request torequest. That way, on the first request, we can recordsuch credentials, and on subsequent requests candetermine whether or not they have changed since thelast request. If they have, we can be pretty sure some-body is attempting to hijack a legitimate session. Themost obvious credentials we can use are the remote IPaddress and HTTP User Agent.

The User Agent string is provided by the requestingweb browser, and usually gives browser type, version,and the underlying operating system. Unless a userspecifically changes it, this will be consistent fromrequest to request, so we can record it on the hit, andthen ensure its consistency on subsequent requests.

It is less safe to check against the IP address. Twothings to be worried about here are setups whererequests from one user seem to come from multipleaddresses (an IP- or NAT-Pool), and setups where manyusers proxy through the same IP.

Often (but not always), the former situation providesIPs that change subtly, but not drastically. It is a fairlysafe bet to say that the first two octets of an addressaren’t really going to change from request to request.If the first request came from 194.193.10.25, it’s con-ceivable that the next request might come from

194.193.14.26, but unlikely that the next request couldlegitimately come from 86.20.8.14.

This allows us to use a partial IP address and HTTPuser agent as sanity checks.

What do we do if a session identifier is presentedwhich appears to be valid, but has the wrong IP addressor HTTP user agent presented alongside it? Obviously,we should disallow the use of the session and issue anew one; but, in addition, we should invalidate thelegitimate session that the hijacker is attempting toaccess.

This may seem like overkill, but this allows us to useone final clever bit of technology: a secondary key—asecond, randomly generated identifier that is stored ina cookie when the session is first offered to the webbrowser. The secondary key is linked to the session, inour handler, and the web browser must offer both pri-mary session identifier and the secondary key on allsubsequent requests. The handler will ensure that thesetwo pieces of data match.

This virtually eliminates session guessing. Sure, ahijacker may stumble upon a valid session identifier byaccident, but the chances of him guessing both piecesare virtually nil; especially because the moment he triesa secondary key which doesn’t work, the original ses-sion is invalidated. This could result in the unfortunatere-authentication of the legitimate user, but it’s betterthan somebody getting to her bank account, for exam-ple.

With this mechanism in place, brute force as a meansof guessing at sessions is out of the question.

Robust Data StorageWe took issue with the way in which PHP stores sessiondata. Our first objection was one of security; the secondrelated to multi-server environments.

By default, PHP stores its session data in //ttmmpp,which—again, by default—is world-readable. This isn’ta big deal on a dedicated server, but spells trouble in ashared server environment—other users on that sameshared server will be able to read your session data. Ifyou’re storing sensitive data in session variables, thiscould be a huge security problem. Even if you’re not,the session identifiers will be visible which also presentsa security risk.

The second problem related to an enterprise serverenvironment where two or more web servers are run-ning in a load-balanced environment. There is no guar-antee that from one request to the next, the same serv-er will be responsible for service. If each server main-tains its own //ttmmpp directory, then each server will main-tain its own sessions, and hence sessions will not neces-sarily “carry” from one request to the next, because theservers aren’t aware of the sessions created by theirpeers.

A hack solution to the above is to use a shared //ttmmpp

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

42

FFEEAATTUURREEMore on Advanced Sessions and Authentication in PHP5

Page 43: php|architect (May 2005)

directory, using NFS, SMB of similar. This, however, isextremely slow.

You can get around both these problems by using adatabase to store session data. This provides improvedsecurity—in a shared hosting environment, one usuallyonly has access to one’s own databases—and avoidsour multiple-server conundrum since both web serverscan connect to the same database.

All that remains to be determined is the structure ofour tables. If you were using PostgreSQL as your under-lying database, your tables might look something likeListing 1:

As you can see, there’s not too much to our two ses-sion tables. uusseerr__sseessssiioonn holds data on the sessionsthemselves. Our classes will use this data for furtherenforcement of validity. You will also see that it recordswhether or not the owner of the session is logged in,and their numeric user ID (which will correspond to arow in our uusseerr table), if they are. Our second table,uusseerr__sseessssiioonn__vvaarriiaabbllee, holds data on variables (vari-able name and value) associated with a session. You willnotice that we have a one-to-many relationship in ourdatabase schema; one user session can be linked tomany session variables.

The uusseerr table does what you might expect, andholds details of valid login information. Depending onyour application, you’ll want to add and subtract fieldshere—perhaps to hold information on the user’s mail-ing address, for example. The fundamentals are here,though.

That’s it. Not much to it, really. With the above inplace, we’ve got a storage mechanism that lets us storeeverything PHP would normally put in //ttmmpp, but with-out the security and scalability headaches.

An OOP approachAs a reader of php|architect, you’re serious about yourPHP. It’s fairly safe to say therefore that understand howwriting object-oriented code can be a good thing. Withthis in mind, we need to figure out how best to expressall the elements of our user session management puz-zle as classes; what member variables those classes willneed to have, what methods they’ll expose, and howthey all hook together.

First, let’s look at our HHTTTTPPRReeqquueesstt class. As you’ll recallfrom part one, the session paradigm is built on thefoundation of the traditional HTTP request. A singleHTTP request consists of a number of useful nuggets ofdata, and sounds like a perfect candidate for a class.GET, POST and COOKIE values are normally exposedthrough the ubiquitous $$__RREEQQUUEESSTT global pre-definedvariable. Request path and other HTTP headers areexposed in $$__SSEERRVVEERR. So the data’s all there—but it’snot in OOP form.

The class itself will be primarily nothing more than asimple OOP container to the aforementioned global

variables, but it will secondarily take responsibility forspawning an instance of our HHTTTTPPSSeessssiioonn class, whichwe’ll meet shortly.

Since it’s not really valid to allow more than oneinstance of HHTTTTPPRReeqquueesstt to exist, simply because thecontext of any active PHP script is only able to access asingle underlying HTTP request., we will simply declareall methods of HHTTTTPPRReeqquueesstt to be ssttaattiicc.

We will expose a method to retrieve a generic requestparameter (be it a GET, POST or COOKIE value), meth-ods to retrieve individual values, and a method toaccess the minutiae of the HTTP request itself.

Next is the HHTTTTPPSSeessssiioonn class. An instance of this classwill represent a session within our application.

The relationship between a request and a session is apretty straightforward one. One session is comprised ofone or more requests, since the very first request of auser’s sitting will yield a new session. That session shallcontinue to live for many requests to come, limitedonly by such environmental factors as inactivity time-outs and maximum duration limits.

Additionally, we note that a request can exist withouta partner session, but that a session cannot exist with-out an underlying request. As a result, it seems naturalto conclude that our static HHTTTTPPRReeqquueesstt class shouldyield, on demand, an instance of an HHTTTTPPSSeessssiioonn classby means of an accessor method—which I’ve calledggeettSSeessssiioonn(()).

There’s yet another trap waiting for us, however. Wedon’t want to confuse matters by having multiple ses-sions associated with a single request. It is not feasible,nor architecturally correct to return a static class froman accessor method; we can only return traditional

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

More on Advanced Sessions and Authentication in PHP5

43

Listing 1

1 CREATE TABLE user_session ( 2 id serial NOT NULL, 3 user_id integer NOT NULL, 4 session_id character varying(100) NOT NULL, 5 ip_address character varying(15) NOT NULL, 6 user_agent character varying(100) NOT NULL, 7 logged_in boolean, 8 last_impression timestamp without time zone, 9 when_created timestamp without time zone,

10 secondary_key character varying(32) 11 ); 1213 CREATE TABLE user_session_variable ( 14 id serial NOT NULL, 15 session_id integer NOT NULL, 16 variable_name character varying(128) NOT NULL, 17 variable_value text 18 ); 1920 CREATE TABLE “user” ( 21 id serial NOT NULL, 22 username character varying(32), 23 md5_pw character varying(32), 24 last_login timestamp without time zone, 25 account_created timestamp without time zone, 26 email_address character varying(128), 27 first_name character varying(128), 28 last_name character varying(128), 29 approved bool, 30 date_of_birth date, 31 sex character(1) 32 ); 3334

Page 44: php|architect (May 2005)

instantiated objects. Accordingly, our HHTTTTPPRReeqquueesstt classmust have an in-built mechanism to prevent it fromever returning more than one distinct instance of theHHTTTTPPSSeessssiioonn class during the lifetime of a session. Itmust be perfectly allowable to call the accessor methodmore than once throughout the lifetime of a request,since our application may feature utility classes whichmake use of the session. Those utility classes should beable to import the HHTTTTPPSSeessssiioonn instance into scopewithout recourse to global variables. We, in turn, mustensure that the accessor method returns a brand newinstance of HHTTTTPPSSeessssiioonn only on the first call of arequest, and that all subsequent calls return a reference

of the previously instantiated class.We’ll figure out exactly how HHTTTTPPSSeessssiioonn will achieve

this a little later on. For now, let’s look at the requiredfunctionality in isolation.

First and foremost, we’ll want the constructor of ourclass to take the burden of determining if a valid sessionexists or not, and creating a new one if necessary. Ourapplication will probably only want to do one of twothings with the instantiated session object: reading orwriting data to the session, and associate the sessionwith a particular user.

I should point out that the latter can actually beaccomplished with the former. You could quite easilydefine a session-level variable called uusseerr__iidd, have yourapplication handle the user’s login, and then set thevalue of uusseerr__iidd to equate to the identifier of the rele-vant user in your underlying database table. The act ofauthenticating a user login, however, is something thatis so common to such a huge number of applicationsthat it makes sense to make it part of our toolkit, too.

In order to handle the need to read and write sessionvariables, we’ll expose accessor methods calledggeettVVaalluuee(()) and sseettVVaalluuee(()), respectively.

Handling our authentication needs requires function-ality that is a little more sophisticated. We’ll need allooggiinn(()) method, which will take a username and pass-word as its parameters. If this pair proves to be valid,the relevant user ID value will need to be married to thesession. If the credentials are invalid, we simply returnfalse so our application can provide an appropriatemessage. We’ll also need to provide a llooggoouutt(()) methodto divorce the user from the session.

Continuing the OOP theme, you’ll want your applica-tion to make use of a UUsseerr class. An object of this classwill allow your application to read and write attributesto the associated user’s record in your datastore.

The final requirement of HHTTTTPPSSeessssiioonn, therefore, is amethod to expose the instantiated object representingthe logged in user. We’ll call this method ggeettUUsseerr(())—itconsults the internal private member variable uusseerr__iiddbefore instantiating a UUsseerr object, in order to referencethe relevant user’s data.

The UUsseerr class is typical of a utility class found in anevery-day OOP-compliant web application. It sits neat-ly alongside such classes as OOrrddeerr, PPrroodduucctt and CCaatteeggoorryyas may well be found in any typical e-commerce site,for example.

What makes UUsseerr unique is its relationship withHHTTTTPPSSeessssiioonn. Most applications which employ sessionmanagement at some stage will require the end user tologin to complete a particular process, or access a par-ticular restricted page. In our e-commerce example, theuser would probably be allowed to browse the site andeven add products to her shopping basket withoutbeing logged in, right up to the point of purchase. Theunderlying SSeessssiioonn exists from the moment the userhits the web site, but a particular UUsseerr object is only tiedto the session at the point of checkout.

A session can exist without a corresponding user. Auser can exist without a corresponding session. Butwhen that user does log in, her existence is intrinsicallylinked to that session, and that session inextricablylinked to that particular user. This is a one-to-one rela-tionship, albeit a strange one.

Let’s look at the properties and methods that ourUUsseerr class will need. Like its sister classes, most of itsparticulars (first name, last name, username, passwordand so forth) will exist in an underlying database table.An instance of User, therefore, effectively encapsulates

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

44

More on Advanced Sessions and Authentication in PHP5

“There are a few simple steps that can be taken to help detect

a malicious user guessing session identifiers.”

Listing 2

1 <?2 // Definitions for connecting to a PostgreSQL database.3 define(DATABASE_USERNAME, “phpa”);4 define(DATABASE_PASSWORD, “phpa”);5 define(DATABASE_HOSTNAME, “localhost”);6 define(DATABASE_BASENAME, “phpa”);7 $GLOBALS[“db”] = pg_Connect(“user=’” . DATABASE_USERNAME8 . “‘ password=’” . DATABASE_PASSWORD . “‘ dbname=’”9 . DATABASE_BASENAME . “‘ host=’” . DATABASE_HOSTNAME

10 . “‘“);11 ?>12

Page 45: php|architect (May 2005)

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

More on Advanced Sessions and Authentication in PHP5

45

Listing 4

1 <?2 class HTTPRequest {34 static function GetParameter($strParameterName) {5 global $_REQUEST;6 return($_REQUEST[$strParameterName]);7 }89 static function GetSession() {

10 if (!$GLOBALS[“SESSION_OBJECT”]) {11 $GLOBALS[“SESSION_OBJECT”] = new HTTPSession();12 };13 $objSession = &$GLOBALS[“SESSION_OBJECT”];14 return($objSession);15 }1617 }18 ?>

Listing 3

1 <? 23 require_once(“dbaccess.php”);45 class sql {67 private $result_rows; # Result rows hash8 private $query_handle; # db: the query handle9 private $link_ident; # db: the link identifier

10 private $close_connection; # should close connection11 # after use?12 private $DEBUG = 0;1314 public function __construct($blShouldClose = false) {15 # Constructor16 $this->link_ident = $GLOBALS[“db”];17 $this->close_connection = $blShouldClose;18 }1920 public function query($sql, $code_return_mode = 0) {21 $q_handle = pg_exec($this->link_ident, $sql);22 for ($i=0; $i<=pg_numrows($q_handle)-1; $i++) {23 $result = pg_fetch_array($q_handle,$i);24 $return_array[$i] = $result;25 };26 if (!$q_handle) {27 error_log(“QUERY FAILED: $sql\n”);28 };29 if ($this->DEBUG) {30 error_log(“DEBUG: QUERY: $sql\n”);31 };32 $this->result_rows = $return_array;33 if ($code_return_mode) {34 if (!$q_handle) {35 return(1);36 } else {37 return(0); # return 0 if it fails38 };39 } else {40 return(1);41 };42 }4344 public function get_result($row_num, $column_name) {45 return ($this->result_rows[$row_num][$column_name]);46 }4748 public function get_row_hash($row_num) {49 return ($this->result_rows[$row_num]);50 }5152 public function get_table_hash() {53 return $this->result_rows;54 }5556 public function __destruct() {57 if ($this->close_connection == true) {58 pg_Close($this->link_ident);59 };60 }6162 };

Listing 5

1 <?2 require_once(“dbaccess.php”);3 require_once(“sql.php”);4 require_once(“httprequest.php”);5 require_once(“user.php”);67 class HTTPSession {8 private $session_id; # Current session ID. Should9 # theoretically never be null

10 private $secondary_key; # Secondary key. Used to stop11 # session-guessing algorithms12 private $logged_in; # Logged in? 1/013 private $user_id; # User ID? 1/01415 # Inactivity timeout (seconds).16 private $inactivity_timeout = 600;17 # Maximum session age (seconds).18 private $max_session_age = 3600;1920 public function isLoggedIn() {21 return $this->logged_in;22 }2324 public function getUserID() {25 return $this->user_id;26 }2728 public function getUser() {29 return new User($this->getUserID());30 }3132 public function getSessionID() {33 return $this->session_id;34 }3536 private function newSession() {37 # Establish DB connection.38 $sql = new sql();39 # See if the browser was offering an invalid40 # session which we could vacuum out now to save41 # space.42 $t_session_id = HTTPRequest::GetParameter(“php5prises-

sid”);43 $t_secondary_key = HTTPRequest::GetParameter(“php5sec-

sessid”);44 if (($t_session_id) && ($t_secondary_key)) {45 # Checking the secondary key, RIP and RUA46 # means we don’t accidentally obliterate a47 # legitimate user’s session.48 $sql->query(“DELETE FROM \”user_session\” “49 . “WHERE session_id=’$t_session_id’”);50 # If and only if obliterating the session on51 # this criterion was successful, we can kill52 # any session variables associated with it,53 # too.54 $sql->query(55 “DELETE FROM \”user_session_variable\” WHERE56 \”session_id\” = ‘$t_session_id’ AND57 \”session_id\” NOT IN (SELECT \”session_id\”58 FROM \”user_session\” WHERE59 \”session_id\” = ‘$t_session_id’)”);60 };61 # Now Create a new session.62 # First, make up a session ID. Use the current63 # timestamp, combined with the user’s IP address64 # (first two octets only), user agent and65 # contents of `ps axuw`.66 $ip_parts = Array();67 $ip_parts = split(“\.”, $_SERVER[“REMOTE_ADDR”]);68 $remote_ip = join(“.”, array($ip_parts[0],69 $ip_parts[1]));70 $timestamp = mktime();71 $user_agent = $_SERVER[“HTTP_USER_AGENT”];72 $string_to_md5 = $remote_ip . $timestamp .73 $user_agent . `ps axuw`;74 $n_session_id = md5($string_to_md5);75 $this->session_id = $n_session_id;76 setcookie(“php5prisessid” ,$n_session_id,77 mktime() + 128800, “/”);78 # Now make a secondary key using the process79 # table and microtimestamp.80 $strSecondaryKey = md5(`ps auxw` . microtime());81 $this->secondary_key = $strSecondaryKey;82 setcookie(“php5secsessid”, $strSecondaryKey,83 mktime() + 128800, “/”);84 # Now inject it into the database.85 return($sql->query(“INSERT INTO \”user_session\”86 (session_id, logged_in, user_id, ip_address,87 user_agent, secondary_key, last_impression,88 when_created) VALUES (‘$n_session_id’, ‘f’,89 ‘0’, ‘$remote_ip’, ‘$user_agent’,90 ‘$strSecondaryKey’, now(), now())”));

Page 46: php|architect (May 2005)

a particular row of that table. For this reason, classeslike UUsseerr are sometimes called entity classes.

This architecture allows us to assume that our UUsseerrclass would have a constructor which accepts a numer-ic identifier to map that instance of the class to a corre-sponding row in the underlying table. This would thenbe stored as a private member variable. By making thisparameter to our constructor optional, we can allowour application to pass in a blank or zero value; this rep-resents a new user, which does not yet exist in theunderlying database.

You may consider it appropriate to provide attribute-manipulation methods like ggeettFFiirrssttNNaammee(()),ggeettZZiippCCooddee(()), sseettPPaasssswwoorrdd(()) and so forth, but this seri-ously limits the portability of your class, since not allyour applications will share the same database schemafor the uusseerr table. A better bet is to provide genericggeettFFiieelldd(()) and sseettFFiieelldd(()) methods which take, as aparameter, the name of the database field you wish toretrieve or set. The sseettFFiieelldd(()) method would also takethe value to which you wish to set the property as asecond parameter. If this feels awkward when coding,you could always make use of the ____ggeett(()) method pro-vided in PHP 5 to dynamically translate a method calllike ggeettFFiirrssttNNaammee(()) to ggeettFFiieelldd((““ffiirrsstt__nnaammee””)).

Most important, of course, is the interface to theunderlying database. Again, your instincts may driveyou to make your ggeettFFiieelldd(()) accessor method invoke aSSEELLEECCTT statement, and your sseettFFiieelldd(()) an IINNSSEERRTT orUUPPDDAATTEE method, as appropriate. While this works forreading data, it is not efficient—getting the value of

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

46

More on Advanced Sessions and Authentication in PHP5

Listing 5 (cont’d)

91 }9293 public function __construct() {9495 # Import session ID and secondary key, if96 # possible, using cookies (or in theory,97 # get/post vars)98 $t_session_id = HTTPRequest::GetParameter(“php5prises-

sid”);99 $t_secondary_key = HTTPRequest::GetParameter(“php5sec-

sessid”);100 # If we still don’t have a session ID, we need101 # to send a new one to the browser102 if (!$t_session_id) {103 $this->newSession();104 $this->logged_in = 0;105 $this->user_id = 0;106 } else {107 $ip_parts = array();108 $ip_parts = split(“\.”,109 $_SERVER[“REMOTE_ADDR”]);110 $remote_ip = join(“.”, array($ip_parts[0],111 $ip_parts[1]));112 $user_agent = $_SERVER[“HTTP_USER_AGENT”];113 # Otherwise, attempt to retrieve session114 # information from database.115 $sql = new sql(0);116 $maxAge = $this->max_session_age;117 $maxIdle = $this->inactivity_timeout;118 $stmt = “SELECT id, logged_in, user_id FROM119 \”user_session\” WHERE120 session_id=’$t_session_id’ AND121 secondary_key = ‘$t_secondary_key’ AND122 (now() - last_impression <=123 ‘$maxIdle seconds’) AND (now() - when_created124 <= ‘$maxAge seconds’) AND125 user_agent=’$user_agent’ AND126 ip_address=’$remote_ip’”;127 $sql->query($stmt);128 if ($sql->get_result(0,”logged_in”) == ‘t’) {129 $this->logged_in = 1;130 } else {131 $this->logged_in = 0;132 };133 $this->user_id = $sql->get_result(0,”user_id”);134 if (!($sql->get_result(0,”id”) > 0)) {135 # Invalid (in some way) session ID was136 # supplied, so force a new session.137 $this->newSession();138 } else {139 $sid = $sql->get_result(0,”id”);140 $this->session_id = $t_session_id;141 $this->secondary_key = $t_secondary_key;142 if (($this->logged_in == 1) &&143 ($this->user_id)) {144 # Update ‘last impression’ property.145 $sql->query(“UPDATE \”user_session\”146 SET last_impression = now()147 WHERE id = $sid”);148 };149 };150 };151 }152153 public function login($username, $password) {154 # Attempt to login using given username155 # and password.156 $this->logged_in = 0;157 $sql = new sql();158 $sql->query(“SELECT id, md5_pw FROM159 \”user\” WHERE username=’$username’”);160 $real_id = $sql->get_result(0,”id”);161 $real_pw_md5 = $sql->get_result(0,”md5_pw”);162 if ($real_id > 0) {163 $proposed_md5 = md5($password);164 if ($proposed_md5 == $real_pw_md5) {165 $this->logged_in = 1;166 $this->user_id = $real_id;167 # Update session table to reflect168 # this information.169 $session_id = $this->session_id;170 $sql->query(“UPDATE \”user_session\”171 SET logged_in=’t’, user_id=’$real_id’172 WHERE session_id=’$session_id’”);173 # Set user logged in timestamp174 $sql->query(“UPDATE \”user\” SET175 last_login = now() WHERE id = “176 . $this->user_id);177 };178 };179 # This will effectively return 0 or 1 depending180 # on whether the login was successful.

Listing 5 (cont’d)

181 return ($this->logged_in); 182 }183184 public function logout() {185 $sql = new sql();186 $session_id = $this->session_id;187 $sql->query(“UPDATE \”user_session\” SET188 logged_in=’f’, user_id = 0 WHERE189 session_id=’$session_id’”);190 $this->logged_in = 0;191 $this->user_id = 0;192 }193194 public function SetValue($varName, $varValue) {195 $sql = new sql();196 $sql->query(“DELETE FROM user_session_variable197 WHERE session_id = “ . $this->session_id .198 “ AND variable_name = ‘“ . $varName . “‘“);199 return($sql->query(“INSERT INTO200 user_session_variable(session_id, variable_name,201 variable_value) VALUES (‘“ . $this->session_id202 . “‘, ‘’” . $varName . “‘, ‘“ .203 serialize($varValue) . “‘“));204 }205206 public function GetValue($varName, $varValue) {207 $sql = new sql();208 $sql->query(“SELECT variable_value FROM user209 session_variable WHERE session_id = “ .210 $this->session_id . “ AND211 variable_name = ‘“ . $varName . “‘“);212 return(@unserialize($sql->get_result(0,213 ‘variable_value’)));214 }215216 };217218 ?>

Page 47: php|architect (May 2005)

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com

More on Advanced Sessions and Authentication in PHP5

47

Listing 6

1 <?23 require_once(“dbaccess.php”);4 require_once(“sql.php”);56 class User {78 # Member Variables9 protected $id; # The id of the user in question

10 # referring to the value of the11 # ‘id’ column1213 # A hash of table column values,14 # e.g. $database_fields[“release_date”]15 protected $database_fields;16 # Flag 1 or 0 if loaded17 private $loaded;18 # Hash of fields (with values true/false) that have19 # been modified since loading.20 private $modified_fields;2122 public function __construct($user_id = NULL) {23 $this->id = user_id;24 }2526 # [Re]loads information from database27 public function Reload() {28 $sql = new sql();29 $id = $this->id;30 $sql->query(“SELECT * FROM \”user\”31 WHERE id=’$id’”);32 $result_fields = $sql->get_row_hash(0);33 $this->database_fields = $result_fields;34 $this->loaded = 1;35 if (sizeof($this->modified_fields) > 0) {36 foreach ($this->modified_fields as37 $key => $value) {38 $this->modified_fields[$key] = false;39 };40 };41 }4243 public function Load() {44 $this->Reload();45 $this->loaded = 1;46 }4748 public function ForceLoaded() {49 $this->loaded = 1;50 }5152 # Returns the value of the database table entry for53 # column “$field”54 public function GetField($field) {55 if ($this->loaded == 0) {56 if ($this->id > 0) {57 $this->Load(); # If no database information58 # has been loaded in yet,59 # best do that now.60 } else {61 error_log(“Not loading as no ID”);62 };63 };64 return $this->database_fields[$field];65 }6667 public function GetAllFields() {68 if ($this->loaded == 0) {69 if ($this->id > 0) {70 $this->Load(); # If no database information71 # has been loaded in yet,72 # best do that now.73 } else {74 error_log(“Not loading as no ID”);75 };76 };77 return($this->database_fields);78 }7980 public function GetID() {81 return $this->id;82 }8384 # Sets the value of the database table entry for85 # column “$field” to “$value”86 public function SetField($field, $value) {87 if ($this->loaded == 0) {88 if ($this->id) {89 $this->Load(); # If no database information90 # has been loaded in yet,91 # best do that now.92 };93 };94 $this->database_fields[$field] = $value;95 $this->modified = 1;96 $this->modified_fields[$field] = true;97 }9899 public function Destroy() {

100 $id = $this->id;101 if ($id) {102 $sql = new sql();

Listing 6 (cont’d)

103 $stmt = “DELETE FROM \”user\”104 WHERE id=’” . $id . “‘“;105 $sql->query($stmt);106 };107 }108109 public function Save() {110 $id = $this->id;111 $sql = new sql();112 if (!$id) {113 $this->loaded = 0;114 };115 if ($this->loaded == 0) {116 # assume this is a new entity117 $stmt = “INSERT INTO \”user\”(“;118 foreach ($this->database_fields as119 $key => $value) {120 if (!is_numeric($key)) {121 $key = str_replace(“‘“, “\’”, $key);122 if ($value != “”) {123 $stmt .= “\”$key\”,”;124 };125 };126 };127 # Chop last comma128 $stmt = substr($stmt,0,strlen($stmt)-1);129 $stmt .= “) VALUES (“;130 foreach ($this->database_fields as131 $key => $value) {132 if (!is_numeric($key)) {133 if ($value != “”) {134 $value = str_replace(“‘“,135 “\’”, $value);136 $stmt .= “‘$value’,”;137 };138 };139 };140 # Chop last comma141 $stmt = substr($stmt,0,strlen($stmt)-1);142 $stmt .= “)”;143 } else {144 $stmt = “UPDATE \”user\” SET “;145 foreach ($this->database_fields as146 $key => $value) {147 if (!is_numeric($key)) {148 if ($this->modified_fields[$key] ==149 true) {150 $value = str_replace(“‘“, “\’”,151 $value);152 if ($value == “”) {153 $stmt .= “\”$key\” = NULL, “;154 } else {155 $stmt .= “\”$key\” =156 ‘$value’, “;157 };158 };159 };160 };161162 # Chop last comma and space163 $stmt = substr($stmt,0,strlen($stmt)-2);164 $stmt .= “ WHERE id=’$id’”;165 };166 $return_code = $sql->query($stmt, 1);167168 if ($this->loaded == 0) {169 # Try to get the ID of the new tuple.170 $stmt = “SELECT MAX(id) AS id FROM171 \”user\” WHERE “;172 foreach ($this->database_fields as173 $key => $value) {174 if (!is_numeric($key)) {175 if ($value) {176 if ($this->modified_fields[$key]177 == true) {178 $value = str_replace(“‘“,179 “\’”, $value);180 $stmt .= “\”$key\” =181 ‘$value’ AND “;182 };183 };184 };185 };186 # Chop last “ AND “ (superfluous)187 $stmt = substr($stmt,0,strlen($stmt)-5);188 error_log($stmt);189 $sql->query($stmt);190 $result_rows = $sql->get_table_hash();191 $proposed_id = $result_rows[0][“id”];192 if ($proposed_id > 0) {193 $this->loaded = 1;194 $this->id = $proposed_id;195 return true;196 } else {197 return false;198 };199 };200 return($return_code);201 }202203 }204 ?>

Page 48: php|architect (May 2005)

eight different database fields will yield eight separateSSEELLEECCTT queries.

For setting values, it is not only inefficient—it may notwork at all. Consider an example where you are work-ing on a brand new user. Your first sseettFFiieelldd(()) call willneed to call an IINNSSEERRTT statement, since this is a newuser, and no database row yet exists. The automatical-ly allocated serial number (id) would be captured fromthe database, retained in the object, and subsequentcalls to sseettFFiieelldd(()) could safely use an UUPPDDAATTEE state-ment.

But what if the underlying table had NNOOTT NNUULLLL con-straints on multiple columns; e.g. both first and lastnames were considered mandatory fields? The firstIINNSSEERRTT, invoked by your first use ofsseettFFiieelldd((““ffiirrsstt__nnaammee””,, ““EEdd””)), would fail, since youhad not provided data for the second required column.

There is, however, an efficient workaround. Ratherthan relying on the database to retain of the user’s datathroughout the lifetime of the class, we can temporari-ly cache data in a private member variable. We thencreate a couple of methods—llooaadd(()) and ssaavvee(())—whose job it is to transfer data between the object andthe underlying database.

A call to llooaadd(()) yields a SSEELLEECCTT ** statement whichcollects the value of every column and stores them inour member variable. A subsequent call to ggeettFFiieelldd(())can then consult this member variable instead of thedatabase, which is a lot quicker. A call to sseettFFiieelldd(())can update the member variable, not the database—again, far quicker, and safe. When all the changes aremade, a call to ssaavvee(()) will generate the relevant UUPPDDAATTEEor IINNSSEERRTT statement to get the database back in sync.

There’s just one final consideration, and it concernsnew users—at registration, for example. We mentionedbefore that we need to pull out the value of id allocat-ed immediately after the INSERT statement has beenexecuted, so we know which value to record in theUser object. Some databases provide a mechanism todo this. MySQL, for example, offers mmyyssqqll__iinnsseerrtt__iidd(()).This is not true of all databases, though. Consult yourdatabase’s documentation for the preferred method ofretrieving this data.

That more or less sums up the architecture of theUUsseerr class. Let’s now look at its implementation, as wellas that of the other classes which comprise our toolkit.

Meet the ToolkitThe toolkit consists of two support classes, and three“core” classes. This distinction is made in that the sup-port classes are considered to be extraneous to the coregoal here; that is, to provide a robust mechanism forsession handling. Nonetheless, they are pretty crucial toour project’s success. To a degree, they provide abstrac-tion and make our core classes fairly portable. Forexample, porting our toolkit to MySQL means chang-

ing only ddbbaacccceessss..pphhpp (Listing 2), and ssqqll..pphhpp (Listing3).

The core classes are hhttttpprreeqquueesstt..pphhpp (Listing 4),hhttttppsseessssiioonn..pphhpp (Listing 5), and uusseerr..pphhpp (Listing 6).The session-focused code is in hhttttppsseessssiioonn..pphhpp, so we’llcover this a little more verbosely, but be sure to look atthe others, as all of these classes work together.

The HTTPSession ClassAn important rule to observe, regarding our customsession implementation, is: never explicitly start a ses-sion using PHP’s built-in sseessssiioonn__ssttaarrtt(()) mechanism.The HHTTTTPPSSeessssiioonn class’ constructor (or, more specifical-ly, the HHTTTTPPRReeqquueesstt::::ggeettSSeessssiioonn(()) method) takes careof this.

When HHTTTTPPRReeqquueesstt’s constructor is called, any exist-ing session identifier and secondary key are extractedfrom available cookie values using the ggeettPPaarraammeetteerr(())method of the static HHTTTTPPRReeqquueesstt class. In the event ofno such parameters being available, the privatenneewwSSeessssiioonn(()) method will be called to invoke a newsession. Assuming such parameters are available, they

are checked against the underlying database for validi-ty; that is, that an appropriate row exists in theuusseerr__sseessssiioonn table, that it has not expired or timed out,and that the current HTTP User Agent and first twooctets of the IP address match those offered when thesession was first created.

If the checks pass, the session identifier and second-ary key are stored against this instance of the class asprivate member variables for future use. In addition, thellooggggeedd__iinn and uusseerr__iidd columns of the matching data-base row are consulted to determine whether or not auser is logged into this session, and if so, will fetch theuser’s identifier.

Finally, the uusseerr__sseessssiioonn table is updated to set itsllaasstt__iimmpprreessssiioonn property, recording the current timeand date. This is important, since future instantiationsof HHTTTTPPSSeessssiioonn will consult this table to determine if asession has expired, and if so, will dispose of it.

The nneewwSSeessssiioonn(()) method will be called in one of twoscenarios, and must determine which applies: either noexisting primary session identifier and secondary keypair is found offered as a request parameter, or such

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

48

More on Advanced Sessions and Authentication in PHP5

“Handling our authentication needs

requires functionality that is a little

more sophisticated.”

Page 49: php|architect (May 2005)

data was offered but is found to be invalid due toexpiry, session timeout, or a mismatched secondarykey. In the case of the latter, we must discard the oldsession data. With that out of the way, we can invent anew session identifier to issue to the client. We gener-ate and record a new primary session identifier againstthe object as a private member variable, and also usesseettccooookkiiee(()) to push it to the web browser. We use asimilar process to assign and record a secondary key.

The only task remaining to facilitate a new user ses-sion is to record this data to our database.

At some stage throughout the life of your session,your user may wish to log in using a username andpassword. The llooggiinn(()) method should be passed, ver-batim, the username and password offered by the user.First, the method sets the llooggggeedd__iinn member variableto 0. In other words, if a user was already logged in,they’ll be logged out right away. We then hash the

password, and compare the result with the passwordassociated with the passed username. If we have amatch, we can set the value of member variablellooggggeedd__iinn to be 1, and the value of uusseerr__iidd to matchthe appropriate user. We also need to update theunderlying database.

The llooggoouutt(()) method is hopefully pretty self-explana-tory. Its sole purpose is to log the current user out ofyour application, and does not delete the session, butupdates the database to reflect the session as “loggedout.” Any user ID associated with the session is zeroed,too.

The sseettVVaalluuee(()) and ggeettVVaalluuee(()) methods should alsobe self-explanatory. They’re used to store, and fetchsession values.

Good housekeepingThere is one housekeeping requirement you need to beaware of when treating sessions in the mannerdescribed, here.

The table in which you store session data will fill uppretty quickly, so it’s prudent to schedule a regularlyrecurring job (e.g. every 24 hours) that cleans up ses-sions that are older than your session timeout

Trying it OutBefore we go, let’s very quickly touch on how you

might implement these classes in a typical (albeit small)application.

The home page of your application is unlikely to needyour user to log in. We could therefore consider it to bea typical unrestricted page. The only requirement,therefore, is to instantiate the session:

$objSession = HTTPRequest::GetSession();

The remainder of your code can follow. Let’s say some-where on your page you decided to link to anotherpage called rreessttrriicctteedd..pphhpp, access to which requiresyour user to log in.

<A HREF=”restricted.php”>View a restricted page</A>

The desired behavior would clearly be that if the user islogged in, they would be taken to the page straightaway; if he is not logged in, he would be taken to alogin page, allowed to login and then, assuming he has

logged in correctly, taken to the page he originallyrequested.

A simple check at the top of a restricted page, prior tothe content, allows you to quickly determine whetheror not a user is logged in:

$objSession = HTTPRequest::GetSession();if ($objSession->isLoggedIn() != true) {

header(“Location: login.php?redirectTo=” .$_SERVER[“REQUEST_URI”];

exit(0);};

As you can see, we use the iissLLooggggeeddIInn(()) method of oursession object to determine whether or not the user arecurrently logged in.

If he is, the conditional is allowed to pass, and therestricted content is displayed. If not, a 330022 HTTP redi-rection to our login page takes place, passing as part ofthat 330022 redirect a GET parameter called rreeddiirreeccttTToo,which contains the original URL called. This can then beused by llooggiinn..pphhpp upon successful login to redirect theuser back to the page originally requested.

The login pageThere are two valid approaches for invoking the loginpage; either directly (the user selects a “login” link), oras a result of a redirection following an attempt by alogged out user to access a restricted page.

A simple form allows us to provide a mechanism for

FFEEAATTUURREE

May 2005 ● PHP Architect ● www.phparch.com 49

More on Advanced Sessions and Authentication in PHP5

“ The session identifier and secondary key are stored against this instance

of the class as private member variables for future use.”

Page 50: php|architect (May 2005)

entering a username and password, as well as perpetu-ating (by means of a hidden form variable) the redirec-tion target, which may have been passed by attemptsto access restricted pages.

<form method=”post” action=”login.php”><input type=”hidden” name=”redirectTo”

value=”<?=$_REQUEST[“redirectTo”]”>username: <input type=”text” name=”username”>password: <input type=”password” name=”pass-

word”></form>

It’s important to perpetuate the value of redirectTo inthis manner, since it would otherwise be lost when theform is submitted.

Some simple PHP code allows us to capture form sub-missions, and act on them accordingly:

$objSession = HTTPRequest::GetSession();if ($_POST[“username”]) {

if ($objSession->login($_POST[“username”],$_POST[“password”]) == true) {

$strURL = $_REQUEST[“redirectTo”];if (!$strURL) {

$strURL = “/home.php”;};header(“Location: $strURL”);exit(0);

} else {$loginFail = true;

};exit(0);

};

As you can see, if our login succeeds, we check to see ifthe GET/POST parameter rreeddiirreeccttTToo has a value and, ifso, redirect to that URL (i.e. the page the user original-ly requested). If not, we redirect to the home page bydefault.

If the user fails to log in, we set the variable$$llooggiinnFFaaiill to be true. How best to act on this is up toyou, but it makes sense to test for this value in yourscript, and provide an error message like “SSoorrrryy,, yyoouurrllooggiinn ffaaiilleedd” should it evaluate to true.

The logout pageIt’s appropriate, too, to provide a mechanism for usersto log out. The code is quite simple:

$objSession = HTTPRequest::GetSession();$objSession->logout();header(“Location: /home.php”);exit(0);

Notice we redirect the user back to the home pageafter they he has logged out. It’s important that towhatever page we choose to redirect, there is no loginrequirement to view that page. Otherwise, the user willsimply be “bounced” back to the login page forever.

Saving and Recalling VariablesFinally, let’s take a look at how you might save andrecall variables against the HTTP session. Let’s say youhave shopping basket code which looks like this:

$objBasket = new ShoppingBasket();

$objBasket->AddItem($objProduct);

You could save this shopping basket to the session asfollows:

$objSession = HTTPRequest::GetSession();$objSession->SetValue(“BASKET”, $objBasket);

A few pages later, this basket could easily be recalledlike this:

$objSession = HTTPRequest::GetSession();$objBasket = $objSession->GetValue(“BASKET”);if ($objBasket instanceof ShoppingBasket) {

$objFirstProduct = $objBasket->GetFirstProduct();}

Note the use of iinnssttaanncceeooff to check the collection ofthe variable from the underlying database workedOK—were we not to perform this check, and the datareturned from the database did not uunnsseerriiaalliizzee(()) cor-rectly back to an instance of SShhooppppiinnggBBaasskkeett, we wouldbe hit with a runtime error.

Where we go from hereBy now, you’ve hopefully got a good idea of how to puttogether a simple application which makes use of ouradvanced session toolkit.

There’s no reason to stop there, though. Here are afew ideas for a rainy day:

Create an instance of GGeenneerriiccOObbjjeecctt, and re-writeUUsseerr and other utility classes to inherit from it.

Create a reusable, generic SShhooppppiinnggBBaasskkeett class andintegrate it directly into HHTTTTPPSSeessssiioonn, to avoid having todepend on SSeettVVaalluuee(()) and GGeettVVaalluuee(()) to access thesession-wide basket

Extend your UUsseerr class to add routines for generating“forgotten password” emails

Enhance still further then authentication mechanismof the toolkit to make use of “memorable words” andother second-factor mechanismsI hope this article has left you with a comprehensive,reusable toolkit that will bring far greater security andflexibility to your applications than PHP’s built in sessionmanagement.

May 2005 ● PHP Architect ● www.phparch.com

FFEEAATTUURREE

50

More on Advanced Sessions and Authentication in PHP5

About the Author ?>

To Discuss this article:

http://forums.phparch.com/221

Ed Lecky-Thompson (eedd@@aasshhrriiddggeenneewwmmeeddiiaa..ccoomm) is founder ofAshridge New Media, a professional development agency based inLondon, England. Ashridge works almost exclusively in PHP as a preference, and Ed has led development on more than a fifty large PHPweb applications in the past six years. Ed has also co-authoredProfessional PHP5, and contributed to Beginning PHP5, both publishedworldwide by Wrox.

Page 51: php|architect (May 2005)

TTEESSTT PPAATTTTEERRNN

May 2005 ● PHP Architect ● www.phparch.com 51

Deadlines are the bane of a manager’s life. That’snot what you expected to hear was it? Surelythey are the problem of the developer who has

to honor them. If our managers don’t like deadlinesthen why can’t they stop setting them? Well, just for aminute let’s step into the shoes of our manager. We’llpretend that they have to manage the needs of market-ing, sales, content management, customer services,accounting, and of course, us.

Now marketing would like to launch the upcomingproducts as soon as possible and also have a stack ofproduct news that they need published at particulartimes. The content manager has to set deadlines on theauthors for commissioned copy. Sales want a usable sitewith the flexibility to add special offers and both salesand marketing need to review the final look of the proj-ect. Accounting needs to supply a payment service andalso needs to keep track of the project itself. The poorold manager has to balance all of these competing con-cerns and to manage the whole project with the mini-mum of risk and cost, or he’ll get fired. How do we, thedevelopers, contribute to the final solution? We don’t.

We ask for a complete specification up front and saythat any change to this plan will be extremely costly.Faced with this ultimatum the manager has no choicebut to add every feature so far, because he is not goingto get another chance. Of course, this just results in a

jaw dropping time estimate for completion. The projectmanager now goes back to the various stakeholdersand asks them to strip features. Well some of the fea-tures were speculative anyway, because no one willknow for sure until the web site is up and running whatthe highest value components will be. These and a fewmore get dropped, but some get dropped becausethey will be useless in the timescale discussed no mat-ter how desirable. This is called “opportunity cost” andis a failure before the project is even started.

The now-stressed manager, again, negotiates withour development team regarding the completion dateand passes the pressure that they feel on to us. Facedwith this pressure the developers will usually “compro-mise” and agree timescales at the upper limit of what ispossible. The deadline now has no chance of achieve-ment.

Actually, it had no chance of happening even withoutthat pressure. Developers, especially me, are incurablyoptimistic. Try this. Ask a developer how long it would

The Never Ending Backlog

by Marcus Baker

You probably dream of clearing your backlog. You wouldlove to experience the joy of declaring a project “done”rather than starting each day with a never ending “to do”list. Isn’t this what all that planning and project manage-ment was supposed to achieve after all? A controlled, fixedtarget and nice charts saying when we are going to finish,at least that was the promise. Well maybe the backlogisn’t going away? Just suppose for a minute we embraceit. What does our project look like then...?

TTEE

SSTT

PPAA

TTTT

EERR

NN

REQUIREMENTSPHP Any

OS Any

Other Software None

Page 52: php|architect (May 2005)

take to implement some page or task. You will get ananswer of a few days I suspect. Ask them again the nextday, but this time, ask them to draw out the architec-ture and explain how a customer would use the newfunctionality. The result of these discussions will be anestimate that is likely twice the size of the original. Themore detail you go into, the longer the estimatebecomes. Never trust a developer estimate.

That’s probably not the advice our long-sufferingmanager wants to hear right now. The work seems tobe progressing well, but developers are like wizards:they mutter strange incantations that mean nothing tothe casual observer. No one really knows how the proj-ect is going, although the stakeholders still ask daily. Iguess you know the ending and it’s not a happy one.None of the stakeholders got all of the features theyneeded and none the features that they did receivearrived on time. Even the developers are not happy,and the late hours while the project crashed throughrepeated deadlines have sapped morale. Not surpris-ingly, no one believes the manager’s deadlines everagain after this. Some even say we’d be better off with-out managers.

Multitasking is EvilWhile the main project has been running, some unoffi-cial work will have been happening at the same time.This usually happens when an urgent issue comes upand the stakeholder has sidestepped the company hier-archy and gone straight to the developers. This is agenuinely effective strategy as some problems really areurgent and have a high cost if they are not acted uponstraight away. Despite our protests about not changingthings once specified, we developers are hypocriticallyhappy to accommodate such requests. We like to feelimportant too and quickly implementing a feature orfix has a certain thrill. Even more so if the main projectis going slowly.

As a company strategy though, this process leaves alot to be desired. Software developers have no way ofranking the relative importance of these interruptions.

Urgent problems always seem more important thanthey really are and development time spent fire fight-ing is time lost implementing the long term strategy.

To make things worse, nothing destroys productivitylike interruptions and task switching. Even a minor dis-traction can lose you twenty minutes while you putyourself back on track mentally. In a culture of multi-tasking though, individual tasks run slower. This meansthat when other people hand you a task they have towait longer for the result. This in turn causes them totake on more tasks, each of which gets more finely timesliced. That means other people are left waiting onthem. This is the result of a management vacuum.

Let’s try to get a grip on the problem. We don’t wantto multitask. We do want to prioritize, but don’t havethe knowledge. We are not very good at meeting dead-lines. We do want our stakeholders bringing theirurgent problems to us. We want to do the importantthings first, but we want our stakeholders to be able tochange their minds. Our stakeholders want to knowtheir projects have progressed so far so that they canadapt. Finally the stakeholders need to know the rela-tive difficulty and value of each feature so that they canhorse trade with each other. Tricky.

Embrace the BacklogThere is a system that meets all of these demands and it can be done with a pack of index cards. It comes from a methodology called Scrum (hhttttpp::////wwwwww..ccoonnttrroollcchhaaooss..ccoomm//) and it’s called the“project backlog.” Here is how it works.

Because we don’t want to multitask, and becausenothing has value unless it is finished, our atomic unitwill be a feature. We start gathering features by visitingour stakeholders. You, as either a developer or a projectmanager, ask them for all the features that they thinkthey will need over the next month or two and eachbecomes the title of an index card. Write the feature asone line at the top. Don’t turn these requests into awish list as you will get too many cards, but you wantto capture any option that is potentially valuable to theorganization.

The features should be written in the language of thebusiness, not the language of the developer. A goodexample of a business feature might be “Track requeststo the search engine by product to influence future pur-chasing.” This isn’t very detailed, but it includes thevital motivation. Extra information can be written onthe card so that everyone knows when the task is fin-ished. A poor feature request would be “Users shouldhave a cookie identifying themselves.” This is obviouslya feature that was really written by the developers. Ituses “cookie” which is a technical implementation andno business purpose is given. Because of this, it will suf-fer scope creep once the implementation process startswith the developers implementing the feature the waythey think it should be done. How much of the site

May 2005 ● PHP Architect ● www.phparch.com

TTEESSTT PPAATTTTEERRNN

52

The Never Ending Backlog

dynamic web pages - german php.node

news . scripts . tutorials . downloads . books . installation hints

Dynamic Web Pages

www.dynamicwebpages.desex could not be better |

Page 53: php|architect (May 2005)

usage must be tracked in this cookie, for example?What information is to be stored in it? Ironically havingthe business people write features usually leads totighter scoping. An example feature request is shown inFigure 1.

The sum total of all the request cards is what makesup the project backlog.

Stakeholders Select, Developers EstimateOnce all of the features have been gathered, the devel-opers get their only influence on the process. Each taskis estimated by the developers in “ideal days”. Theseare days that are free of interruptions, illness, holidays,meetings or alcohol abuse. If we attempt to build in afudge factor for these, we just end up with a wilder pre-diction and so we don’t attempt to try. The estimateswill still be overoptimistic of course, but we’ll tacklethat later. Note that the stakeholders have no say inthis. In the same way that the developers are neverallowed to make business decisions, the business isnever allowed to tell us how long something will take;this is part of the deal. We also don’t attempt to com-bine commonality across the cards to reduce the esti-mates. If building a framework in one card would savetime in another, we ignore it. You want an estimate forjust the specific feature on a given card, as if it were theonly thing you were implementing, ever. This keeps thecards as independent as possible, which will help later.Figure 2 shows the modified feature request sitting ontop of a backlog for a news site.

The world of business is chaotic. A good plan, today,may be rendered irrelevant tomorrow. We, softwaredevelopers, need more stability than that, but we haveto drop our requirements for a complete specificationup front and instead come to a compromise. That com-promise is the iteration. The business has to set a fixedgoal for a fixed period of typically one to four weeks. Atthe next iteration, the business has the right to com-pletely change direction, but it will be extremely dam-aging to change tasks within an iteration, and so it issealed. If intervention has to happen then the wholeiteration should be declared null and another iterationstarted to discourage people from sidestepping theprocess. This means that all of the planning comes to ahead in the regular iteration meeting.

The key point of the iteration meeting is that all ofthe project stakeholders and developers take part in asingle conversation. First, the stakeholders take all ofthe cards and attempt to prioritize them. You can imag-ine some pretty strong discussions at this point andthat is why we have written them on cards. The stake-holders have something physical to hold and to pointat, but more importantly different orderings can betried very quickly. If you ask an interested party to ratethe importance of the features that they desire, you canguarantee that most of them will be rated “critical”.This system avoids such useless labeling. Features are

simply compared one on one for their relative businessvalue. The development team keeps out of this discus-sion unless they are asked for advice, or they need toestimate the amount of time to complete a new fea-ture. The most important features will appear at the topof the backlog and our iteration plan comes about fromchopping the top off. The next problem is how muchto chop.

The VelocityRemember, we don’t believe our own estimates.Instead, we monitor the number of ideal days complet-ed on each iteration and use that figure to plan thenext one. The number of ideal days of work completedeach time is called the “project velocity” and the ratioof ideal days to real days is called the “load factor”.Now this is a our first iteration so we don’t yet have thisvital piece of information, but I am sure you would likeme to tell you the typical load factor so that you canmake a start. OK, I will, but first, are you sitting down?You are? OK then, a typical load factor for a well gelledteam is 3. Yes that’s right, software usually takes threetimes longer to develop than you think it does.

TTEESSTT PPAATTTTEERRNN

May 2005 ● PHP Architect ● www.phparch.com 53

The Never Ending Backlog

Figure 1

Figure 2

Page 54: php|architect (May 2005)

Assuming three developers and a two week iteration,we have our iteration plan ready in Figure 3. At thispoint, I usually point at the cards that fell just below thecut-off in case anyone wants a last minute adjustmentof priorities. If all is well then the developers have twoweeks of stability and the stakeholders arrived at theplan all on their own. The plan is a small one, so it canswing into action immediately.

The next cycle is the same as the previous one.During the next two weeks, the stakeholders will writemore cards and present them to us for estimation. Wewon’t be distracted from our current tasks, but instead,will add them to the project backlog. In addition, wepublish progress in the current iteration in a very visibleway. The easiest way is to pin the cards to a cork boardand tick them off as they are done. That way, everyonecan see the progress and react if there is a problem.This transparency helps to dissolve any previous blameculture that may have existed after any deadline fiasco.

On successive cycles, the business can adapt to

changes and can respond to progress by adding ordropping features. It also has a fast market response,because the organization can change direction, com-pletely, in the space of a few weeks. For everyone to feelthis relaxed, though, progress has to be predictable.

The DisciplineIf the estimates start to become inaccurate then thebacklog system will fall apart. One way that this couldhappen is with a lack of design, making the codebecome messy, quickly. As the code base gets largerand more entangled, features take longer and longer tocomplete and the velocity drops off. The way aroundthis is to improve the design on every iteration, eitherat the start of each feature or continually in a processcalled refactoring. It’s important to fight code rot;doing so helps maintain long term trust with the proj-ect sponsors.

Another way to make a mess of the estimates is aphased delivery. If you plan to write code for the firstfew iterations, then test for the last few and thendeploy in the last one, you are courting surprises. Forconsistent timing you should code, test and deploy afeature in it’s entirety before a backlog card is markedas done. If it’s not completely rolled out then you can-not count those ideal days to the next iteration. If youwere previously surprised by the load factor of 3 earlier,it is likely that you did not factor in the complete soft-ware lifecycle. Having a complete microcosm of this ineach iteration gives us the true cost of each feature.

Despite imposing this extra discipline, I find theScrum backlog a very effective technique for smallcompanies with lots of competing interests. As devel-opers, we feel much more involved in a process such asthis, and in turn, we think more flexibly. Not surprising-ly, this methodology is very easy to sell to project man-agers, as well.

May 2005 ● PHP Architect ● www.phparch.com

TTEESSTT PPAATTTTEERRNN

54

The Never Ending Backlog

Figure 3

About the Author ?>

To Discuss this article:

http://forums.phparch.com/222

Marcus Baker works at Wordtracker (wwwwww..wwoorrddttrraacckkeerr..ccoomm) as Head of

Technical, where his responsibilities include the development of applica-

tions for mining Internet search engine data. His previous work includes

telephony and robotics. Marcus is the lead developer of the SimpleTest

project, which is available on Sourceforge. He's also a big fan of eXtreme

programming, which he has been practising for about two years.

Page 55: php|architect (May 2005)

PPRROODDUUCCTT RREEVVIIEEWW

May 2005 ● PHP Architect ● www.phparch.com 55

This month, I am reviewing aproduct called Jaws. It is aweb development tool creat-

ed in PHP that is built on frame-works and modules. The Jaws web-site has this to say about itself:

Jaws is a Framework and ContentManagement System for buildingdynamic web sites. It aims to beUser Friendly giving ease of use andlots of ways to customize web sites,but at the same time is DeveloperFriendly, it offers a simple and pow-erful framework to hack your ownmodules.

Even though its release level is notyet at the full one-point-zero level(1.0), the Jaws product is still veryfunctional and easy to install. Figure1 shows the installation screen thatis displayed when you activate theiinnssttaallll..pphhpp script. I have seen thiskind of installation process a few

times and I like it. The process is tomake the install procedure into itsown PHP application, and thereforethe platform even for the installa-tion of the product is OperatingSystem independent.

The installation process creates atable scheme, and adds someentries into a MySQL database thatyou specify. All you have to do ismake sure that the database itselfexists, and the install takes care ofthe rest. The install process only

takes 7 steps to complete, so theease and clarity of this process isalso a plus.

In The ShallowsWhen the install process had com-pleted, I started by looking at thebuilt in administration site. This hasa very clean look and feel. Theadministration page opens up afteryou sign in with the appropriateauthentication (that you set up dur-ing the install), and it shows you all

Just when you thought it was safeto go back in the water!

by Peter B. MacIntyre

PR

OD

UC

TR

EV

IE

W

PRODUCT INFORMATION

PHP 4+

OS Any

Product Version 0.5

Price Free

Web Address hhttttpp::////wwwwww..jjaawwss--pprroojjeecctt..ccoomm//

0.5

Page 56: php|architect (May 2005)

the options that are at your dispos-al, right away. Some gadgets arenot enabled at the outset, but it isjust a matter of enabling them andsetting their parameters to makethem accessible. The jaws projectseems to be lending itself to thecurrent popularity of weblogging,but there are many other “plugins”and “gadgets” available. Figure 2shows the system control panel

with many of the gadgetsenabled—the disabled gadgets areon the right-hand side.

This control panel is very userfriendly, straightforward, and easyto use. The only drawback to its usewas the annoying recurrence of themessage shown in Figure 3. I cer-tainly have to forgive the develop-ers for this, as their product is not atthe release stage yet, so some

glitches are to be expected. Therewere some other slightly annoyingproblems in certain locations of theadministration interface wheretextboxes were defined as one col-umn wide. Once these little fixesare taken care of, the product willbe very stable and useful.

In the DeepsOnce I got into the control paneland figured out how things weredone, I designed my own test website with an on-line poll section, ablog section, and a photo albumsection. Figure 4 shows how I set upmy first poll question including thepossible answers, and Figure 5shows how the blog section wasconfigured. Once you get the hangof the layout manager, and theother inter-workings of the adminis-tration section you will be comfort-able managing multiple web sitesthrough this interface.

Other gadgets that are currentlyavailable are: Banner controller,Chat box, Friends, Menu Manager,RSS Reader, Static Page Manager,FAQ Manager, File Browser,Glossary, Preferences, Search Tool,Server Time Display, Visit Counter,Weather Displayer, and a Web Cam.That’s a lot of gadgets for a pre-1.0release!

The “cookie cutter” approach toweb design that Jaws takes on is notbad, in itself. There are manyoptions that you can implement inthe settings section of the adminarea. The default theme is the firstplace to look. This is a drop downlist that lets you give differentgraphical appearances to your sites.Currently there are only 8 themes. Ithink this is one of the main areas

PPRROODDUUCCTT RREEVVIIEEWW

May 2005 ● PHP Architect ● www.phparch.com56

Jaws 0.5 PPRROODDUUCCTT RREEVVIIEEWW

Figure 1

Figure 2

Figure 3

Page 57: php|architect (May 2005)

May 2005 ● PHP Architect ● www.phparch.com

57

that should really show somegrowth before Jaws goes to generalrelease. I used the default Jawstheme (shown in Figure 6) and thenused the Flower theme (Figure 7)just to see what it looked like. Asyou can tell, there are many ways tocreate similar web systems with thisproduct and still give them theirown unique look.

The plugins are another subset ofthe Jaws environment and they can(depending on their use) be appliedto the gadgets that you haveemployed. For example, there is anemoticon (smiley icons) plugin thatcan be added to the blog gadget sothat the bloggers can “say it with a

smile” ( J ). These plugins are effec-tive as well in adding a little flare toany site that you may build withthis product.

Summary (Back to Shore)I really liked this product! It has a lotof potential in the small “family”type web site arena. I certainlywould not recommend it for anylarge commercial site, but thenagain I have been proven wrongbefore. The product certainly hasmaturing to do and I would love totake another look at it when it turns1. Another good thing about thistool is that it was written in PHP,thus proving once again that PHP is

a very versatile language. As well,Jaws is an open source project, so ifyou are so inclined you could signup as a co-developer of the prod-uct.

I give this product 4 out of 5 stars.

About the Author ?>

Peter MacIntyre lives andworks in Prince EdwardIsland, Canada. He has beenand editor with php|architectsince September 2003.Peter’s web site is at hhttttpp::////ppaallaaddiinn--bbss..ccoomm

Jaws 0.5

Figure 5

Figure 6

Figure 7

Figure 4

PPRROODDUUCCTT RREEVVIIEEWW

Page 58: php|architect (May 2005)

Can’t stop thinking about PHP?

Write for us!

Visit us at http://www.phparch.com/writeforus.php

Page 59: php|architect (May 2005)

SSEECCUURRIITTYY CCOORRNNEERR

May 2005 ● PHP Architect ● www.phparch.com 59

Remember MeHave you ever visited a web site and noticed a check-box that says “Remember Me” directly underneath thelogin form? This is the most common phrase used todescribe this feature to the user, and there are twomajor implementations:

The user’s username is stored in a cookie, so that theuser only has to provide a password in situations wherethe user would otherwise be required to provide both ausername and password.

An authentication cookie is created that allows theuser to completely bypass the next authentication. Thisusually means that the user is automatically logged inon the next visit and often up to a certain number offuture visits.This second type is called a persistent login, and this isthe implementation that I have chosen to focus on.Remembering a user’s username in a cookie is easy todo and not as prone to errors as trying to create a per-sistent login.

Security Corner

Persistent Logins

by Chris Shiflett

Welcome to another edition of Security Corner. Thismonth’s topic is Persistent Logins, a common feature thatprovides users the option of having a web site rememberthem across browser sessions. It is most often used as away to make the authentication process more convenient.

This feature has many variations, and attempts to imple-ment this feature are frequently the cause of security vul-nerabilities. This month’s column attempts to providesome guidelines and suggestions for adding a persistentlogin feature to your web applications without compro-mising your security standards.

SS EE CC

UU

RR

II TT YY CC

OO

RR

NN

EE RR

Write for us!

Page 60: php|architect (May 2005)

Persistent LoginBecause cookies are the only good source of persistencebetween browser sessions, they provide the foundationof most persistent login implementations. The mostcommon mistake I have observed when auditing PHPapplications that attempt to provide this feature is stor-ing both the username and password in a cookie. It’seasy to understand the temptation—you simply retainthe access credentials and basically save the user thetrouble of entering them. Of course, this approach hasnumerous risks, including the fact that the access cre-dentials are subject to a drastic increase in exposure.

Your persistent login cookies should be temporary. Assuch, they should not be based on any information thatprovides permanent access. This primarily means thatyou should not be basing the cookie on the user’s pass-word. The username is far less sensitive, and the user-name is often public anyway. You can use this in yourpersistent login cookie to help you identify the user.The challenge, of course, is in authenticating the userwith this cookie in such a way that an attacker has avery difficult time reproducing your efforts.

Begin with a simple idea: the authentication token.This is a random string that you associate with a singleuser, and you can generate a good random string withthe following:

$token = md5(uniqid(rand(), true));

Because you need to associate this with a single user,you might be tempted to do so using sessions. Whilethis is a good idea when your purpose is to protectagainst session hijacking, it doesn’t help you persistlogins across browser sessions. Therefore, you need toassociate this token with the user in the database—typ-ically in the same place that you keep the usernameand password.

With this idea, you now have something that can beconsidered a temporary password. This authenticationtoken can be provided by the user to bypass theauthentication step, and you should only allow anauthentication token to be used once before it is con-sidered expired.

In order for this to be useful, you need to keep it in acookie. You also need the user to let you know the user-name associated with the authentication token, so thatyou can verify it. A good way to accomplish this is witha single cookie that has both:

setcookie(‘auth’, “$username:$token”, time() +60*60*24*7);

This cookie is set to expire in one week, and the valueof it is the username and authentication token separat-ed by a colon.

Note: Because the expiry of a cookie depends on theuser’s computer having an accurate clock, you mightconsider having it expire in the distant future and keep-ing up with when you want it to actually expire on the

server. A good place for this is in the same data store inwhich you associate an authentication token with auser.

If you implement this feature, there are some addi-tional rules to follow. One is to never allow an authen-tication token to be used more than once. If you wanta user who checks “Remember Me” to be rememberedfor a period of two weeks, for example, you will simplywant to generate a new token after each authentica-tion, and you can then set a new persistent login cookie.

Another good rule to follow is to require that the userprovide a password for any sensitive transaction. Thepersistent login should only grant access to the featuresof your site that aren’t considered to be extremely sen-sitive. There is simply no substitute for requiring a userto verify his password immediately before an importanttransaction.

Last, you want to make sure that a user who logs outis really logged out. This includes deleting the persist-ent login cookie:

setcookie(‘auth’, ‘DELETED!’, time());

This overwrites the cookie with a useless value and alsosets it to expire immediately. Thus, a user whose clockcauses this cookie to persist should still be effectivelylogged out.

Until Next Time...Hopefully, you now see how you can provide this use-ful feature to your users without placing them at anunnecessary risk. Persistent logins are very convenient,but without proper guidance, they can create majorsecurity vulnerabilities.

If you have any features that you want to add to yourPHP applications, but you’re concerned about the secu-rity implications, feel free to drop me a line, and per-haps I’ll discuss the issue in depth in a future SecurityCorner. Until next month, be safe.

SSEECCUURRIITTYY CCOORRNNEERR

May 2005 ● PHP Architect ● www.phparch.com 60

Persistent Logins

About the Author ?>

Chris Shiflett is an internationally recognized expert in the field of PHP

security and the founder and President of Brain Bulb, a PHP consultancy

that offers a variety of services to clients around the world. Chris is a

leader in the PHP industry, and his involvement includes being the

founder of the PHP Security Consortium, the founder of

PHPCommunity.org, a member of the Zend PHP Advisory Board, and an

author of the Zend PHP Certification. A prolific writer, Chris has regular

columns in both PHP Magazine and php|architect. He is also the author

of the HTTP Developer's Handbook (Sams) as well as the highly antici-

pated PHP Security (O'Reilly). You can contact him at sshhiifflleetttt@@pphhpp..nneett

or visit his web site at hhttttpp::////sshhiifflleetttt..oorrgg//.

Page 61: php|architect (May 2005)

*By signing this order form, you agree that we will charge your account in Canadiandollars for the “CAD” amounts indicated above. Because of fluctuations in theexchange rates, the actual amount charged in your currency on your credit cardstatement may vary slightly.

Choose a Subscription type:

CCaannaaddaa//UUSSAA $$ 7777..9999 CCAADD (($$5599..9999 UUSS**))

IInntteerrnnaattiioonnaall AAiirr $$110055..1199 CCAADD (($$8800..8899 UUSS**))

CCoommbboo eeddiittiioonn aadddd--oonn $$ 1144..0000 CCAADD (($$1100..0000 UUSS))

((pprriinntt ++ PPDDFF eeddiittiioonn))

Your charge will appear under the name "Marco Tabini & Associates, Inc." Pleaseallow up to 4 to 6 weeks for your subscription to be established and your first issueto be mailed to you.

*US Pricing is approximate and for illustration purposes only.

php|architect Subscription Dept.P.O. Box 545261771 Avenue RoadToronto, ON M5M 4N5Canada

Name: ____________________________________________

Address: _________________________________________

City: _____________________________________________

State/Province: ____________________________________

ZIP/Postal Code: ___________________________________

Country: ___________________________________________

Payment type:VISA Mastercard American Express

Credit Card Number:________________________________

Expiration Date: _____________________________________

E-mail address: ______________________________________

Phone Number: ____________________________________

Visit: http://www.phparch.com/print for more information or to subscribe online.

Signature: Date:

To subscribe via snail mail - please detach/copy this form, fill itout and mail to the address above or fax to +1-416-630-5057

php|architectThe Magazine For PHP Professionals

YYoouu’’llll nneevveerr kknnooww wwhhaatt wwee’’llll ccoommee uupp wwiitthh nneexxtt

Upgrade to the

Print edition

and save!

For existing

subscribers

Login to your account

for more details.

LOWER PRICE!

NEW

LLOOWWEERR PPRRIICCEE!!NEW

Page 62: php|architect (May 2005)

EEXXIITT((00));;

May 2005 ● PHP Architect ● www.phparch.com 62

Am I the only one who hasgotten tired of seeing everyother presentation about PHP

start with the same three slides?So PHP is used by millions of peo-

ple worldwide. According to recentstatistics, thirteen million people inthe world use cocaine on a regularbasis, but to me, that hardly seemslike a good reason to start doinghard drugs.

So Yahoo! Uses PHP. That’s great.Except that if after three years westill use that as the prime exampleof how far PHP has come, we’re inmore trouble than any of us is will-ing to admit. Don’t get me wrong,the fact that a company like Y! usesPHP is great, but it’s not the exam-ple for all seasons—the IT managerof a hundred-year-old bank that ismaking its foray into online accountmanagement isn’t going to beimpressed by its name.

And that’s not all. If anybody everasks me, again, whether theyshould choose Java or PHP, I prom-ise I’m going to plug his USB mousein the power outlet and then fix hiscomputer so that the only applica-tion he can run is Photoshop (with-out the keyboard).

As a write this column, I am get-

ting ready to leave for php|tropics,and am just putting the finishingtouches on my presentation for theconference (as well as on about28,754 other things that need fin-ishing before I can leave the coun-try, which probably accounts for myexceedingly cheerful mood).

This time, I am determined toshow that there is life in PHP-landbeyond Netcraft, Yahoo! and theevergreen war of the worlds againstthose coffee-bean aliens. It’s notthat difficult really, if one digs deepenough, to find out that the key tothe acceptance of PHP is not inname dropping or statistics, but inthe value that PHP consultants canbring to the table.

Consider this: would you buy aRolex from a street vendor?Probably not. Why? You don’t knowwhether the Rolex is authentic orwhether the guy found it in a boxof cereal—but you don’t trust thedeal in principle… because youdon’t trust the person at the otherhand of the transaction. The samething happens in software—mostlikely, you’ll be dealing with some-one who doesn’t have the slightestclue about technology, and whathe’s really doing is sizing you up (as

well as your ability to sell an idea).A good friend of mine, who tried

in vain to teach me how to sellthings, often told me that the keyto good salesmanship is all aboutputting yourself in the other per-son’s shoes. Now picture someonecoming to your office tomorrow tosell you a new Internet line only towaste half an hour of your lifetelling you that they use the samebrand of routers and switches thatNASA uses, and how impressivethat is. Impressive, maybe, but youdon’t really care—you want toknow that their services are reliableand competitively priced, and youknow that a Ferrari in the hands ofan idiot is just as likely to crash as aTercel.

Selling IT services (at least tosmall- to medium-size businesses)should really be the same, and somany people still do it the wrongway. Maybe we can all start with aDead Poets’ Society moment of ourown and delete those three slidesfrom our presentations, never to beseen again on the face of the Earth!I’ll start with my presentation…you’re welcome to follow.

Oh No, Not Again!

by Marco Tabini

ee xx ii tt (( 00 )) ;;

php|a