implementing a jsr-283 content repository in php

58
Implementing a JSR-283 Content Repository in PHP

Upload: karsten-dambekalns

Post on 08-May-2015

3.787 views

Category:

Technology


0 download

DESCRIPTION

Session on implementing a JSR-283 Content Repository in PHP presented at the PHP Conference in Québec, Canada in March 2008.

TRANSCRIPT

Page 1: Implementing a JSR-283 Content Repository in PHP

Implementing aJSR-283 Content Repository

in PHP

Page 2: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

Your flight planAbout me

Some words about the project’s background

What is a Content Repository?

Why should I use a CR?

Why code it ourself?

Inside the TYPO3 CR

Where do we stand? Our plans for the future...

Page 3: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

About meBorn 1977, living (mostly) in Germany

Started out with BASIC on a Commodore 128

Now a PHP addict open to other languages as well

Active member of the TYPO3 Association

Developer with the TYPO3 project

Page 4: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

Project background

Page 5: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

About TYPO3One of the leading open-source CMS

Invented by Kasper Skårhøj in 1997

Written in PHP, released under GPL in 2000

Now used with small and large companies around the world

Hundreds of thousands of websites built with TYPO3

Backed by a huge community and the TYPO3 Association

Page 6: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

The current architecture of TYPO3 is becoming outdated

We decided to write TYPO3 5.0 – it soon became clear that we'd do more than "just write a new CMS"

We started with some groundwork, resulting in the FLOW3 framework – more on that in a minute

We decided to use a CR for the new version

And of course we still have the ultimate goal to come up with a new TYPO3 CMS...

The future of TYPO3

Page 7: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

Successor to TYPO3 v4, which is the result of 10 years of development

Start from scratch, but keep the soul of TYPO3

Shall provide

lower complexity

make use of advanced PHP features

be more (and more easily) extensible, ...

TYPO3 5.0 CMS

Page 8: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What happened so far

Provides an advanced programming framework with support for

Dependency Injection / Inversion of Control

Aspect Oriented Programming

Component and Package Management

enhanced Reflection

Caching

MVC and more

FLOW3

Page 9: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What happened so far

“Best of breed”:

Inspired by the most popular frameworks and toolkits from Smalltalk, Python, Ruby and Java

Picking the best concepts, skipping the annoyances

Not tied to TYPO3 CMS, can be used for any PHP-based project

Important to you!?

Have a look at the website at flow3.typo3.org

FLOW3

Page 10: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Introduction

About the TYPO3 AssociationFounded in November 2004 by a group around Kasper Skårhøj

It’s goals:

Support TYPO3 development on a more steady basis

Improve the transparency and efficiency of various aspects of the TYPO3 project

Is funded by members and sponsors

Financed the development of TYPO3 v5 and related projects until now

Page 11: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a Content Repository?

Page 12: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a CR?

Jack Rabbit saysA content repository is a hierarchical content store with support for structured and unstructured content

In addition to a hierarchically structured storage system, common services of a content repository are versioning, access control, full text searching, and event monitoring

Typical applications that use content repositories include content management, document management, and records management systems

Page 13: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a CR?

But it is for Java, no?The Java Community Process (JCP) is very efficient, not only when compared to other standardization bodies

The Java Specification Request (JSR) 170 led to the specification

Content Repository for Java technology API (JCR) is the result

First JSR with a real open source license (Apache-style)

The API is defined in Java, but can be ported to other languages

No, it’s not only for Java!

Page 14: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a CR?

A Content Repository (CR) allows the storage and retrieval of arbitrary content as nodes and properties in a tree structure

Nodes and Properties

Page 15: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a CR?

A repository can contain multiple independent workspaces that can correspond to each other, allowing comparison

Workspaces

Page 16: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What is a CR?

The tree structure can be freely defined by the user of the CR

Nodes may be typed with a rigid structure – or free-form

The API abstracts the actual data storage used (RDBMS, ODBMS, files, ...)

Binary content can be stored and queried as effectively as textual content

Export to and import from XML are possible

Versioning, locking, transactions, event listeners, ...

The Basics

Page 17: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why use a CR?

Page 18: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why use a CR?

Best of both^Wthree worlds...

Page 19: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why use a CR?

Isn’t that convincing?

Page 20: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why use a CR?

From a coder’s perspectiveOne well-designed API instead of different ones

Common language and concepts

Properties instead of fields give flexibility

Learn once, use everywhere

Portable code allows easier reuse of existing solutions

Rich set of tools

No more SQL!

Page 21: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why use a CR?

A content repository provides a robust storage for your content - be it text, images, or code, structured or unstructured

Knowledge and tools can be reused at will

A Content Repository (CR) promises to solve a lot of problems

A stable standard with a fresh version in the making

SQL has been around for 35+ years, CR has “just started”

Summary

Page 22: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why code a CR in PHP?

Page 23: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why code a CR in PHP?

Page 24: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why code a CR in PHP?

No, really...There are better reasons, of course!

Page 25: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Jackrabbit is the reference implementation, available as open source from the Apache Foundation

Day CRX is the commercial CR implementation from the "inventor" of JSR-170, Day Software

Other implementations are eXo JCR and Jeceira, the latter also being dead, and others

JSR-170 connectors exist Alfresco, BEA Portal Server, IBM Domino and others

Existing implementations

Why code a CR in PHP?

Page 26: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

What about PHP?

Travis Swicegood ported the JSR-170 API to PHP in 2005 - project is dead

There is a port of the JSR-170 API available in the Jackrabbit sources, added 2005 - no relevant changes since then

No full port of the JSR-283 API available today

PHP ports of the JSR-170/283 API

Why code a CR in PHP?

Page 27: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Why code a CR in PHP?

What about using what’s there?We tried to integrate Jackrabbit using the PHP-Java-Bridge

(Almost) every call to Jackrabbit needs to be wrapped for type conversion, exception mapping, ...

We ran into massive memory issues

More complex to set up and maintain

A dependency on Java is a no-go (not only) for our PHP-based project

Page 28: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Various implementations exist, mostly in Java

A CR offers a truckload of advantages, we want to leverage those advantages

No PHP implementation of a CR exists

Using existing non-PHP implementations isn’t an alternative

We need to build our own CR

Summary

Why code a CR in PHP?

Page 29: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

TYPO3 Content Repository

Page 30: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Goal is a pure PHP implementation of JSR-283

although functionality needed for TYPO3 CMS has priority over specification compliance for now

Will take advantage of the FLOW3 framework, but not be tied to the TYPO3 CMS.

Could eventually become the standard CR for the PHP community?!

Three truths about the TYPO3 CR

Page 31: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Issues

Typing, some Java types simply do not exist in PHP

Constructor overloading is impossible in PHP

Binary data (might be FLOW3 Resource Manager handles instead of streams)

Interfaces will not be ported up-front, but as we need them

Useful by-product of our development process

Porting the JSR-283 API

Page 32: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Based on the FLOW3 Framework

Domain Driven Design (will be) used

Use of AOP planned to avoid tight internal coupling

Test Driven Development with Continuous Integration

Automatic checks against coding guidelines

Development model

Page 33: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Aspect Oriented ProgrammingAOP is a programming paradigm

Not a new concept, but still new to PHP

Complements OOP by separating concerns to improve modularization

OOP modularizes concerns: methods, classes, packages

AOP addresses cross-cutting concerns

Page 34: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

Cross-cutting concerns

Content RepositoryDomain Model

Page 35: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

Cross-cutting concerns

Content RepositoryDomain Model

Page 36: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

Cross-cutting concerns

Content RepositoryDomain Model

Security

Logging

Page 37: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

How AOP soundsSome language first

Aspects contain advices that you want to add to your software

Pointcuts expressed by pointcut expressions define where to add advices to your code

Join points are events in the flow of a program, such as calling a method or throwing an exception

Targets are the classes and methods being adviced by aspects

Page 38: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

How AOP worksThree steps to AOP use

Write the code for the cross-cutting concern

Define a pointcut expression telling the framework where to add that code

Get some coffee

The (hard) work is to identify the cross-cutting concerns

and to define the simplest possible pointcut expression

Page 39: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

Example: LoggingIt might be good to know who deleted the mail archive of the last four years

Logging could solve this

A logging aspect added at the right places solves this easily

Using AOP

makes changing the logging a snap

keeps the code clean

Page 40: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Aspect Oriented Programming

Example: SecurityIt would have been even better to not allow deletion of the mail archive of the last four years...

Security is a complex issue, solving this “right, now” seems impossible

Using AOP

makes changing the changing security code easier

allows to add security everywhere, anytime

keeps the code clean

Page 41: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

The underlying storage of the TYPO3CR will be a RDBMS in most cases

Currently PDO is used to access SQLite

Easy to use for development and unit testing

The use of PDO already enables any PDO-supported database

Specialized DB connectors will follow, using optimized queries, stored procedures, ...

Actual data storage

Page 42: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Actual data storage

Basically we need to store a simple tree

Read access must be fast, write access should be fast, as the majority of requests are read requests

Traditional approach as used in TYPO3 today is to store a triplet (uid,pid,sorting) resulting an an adjacency list

Alternative & sometimes faster methods

Materialized Path

Nested sets, Nested intervals

Data storage techniques

Page 43: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Actual data storage

Better suited to how RDBMS work internally

Stores numbers determinedby preorder tree traversal

Very fast read access,problematic write access

Concurrency demands locking

On average half of all nodes need to be updated on insertion of a new node

Nested sets

Page 44: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Actual data storage

Speeding up nested sets!?Write access can be sped up by various approaches like spacing and variable length indices for the pre/post numbers or by partitioning the data over more tables

Materialized path works like adjacency list and stores the full path to the node

Nested intervals sometimes considered OMPM – “Obfuscated Materialized Path Method”

All methods have their (dis-)advantages

Finally: DB-specific tricks change the problem!

Page 45: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Level 1 methods

Using getRootNode() and friends from the API

Using XPath queries

Optional methods

Using SQL queries

Querying the TYPO3 CR

With JSR!283 XPath will be dropped

Page 46: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Querying the TYPO3 CR

To enable XPath we need

a XPath parser

an efficient way to transform a XPath query into SQL for the used low-level data structure

The latter is a lot easier when storing the tree as a nested set

The problems caused by this have been mentioned already...

XPath support for TYPO3RWith JSR!283 XPath will be dropped

Page 47: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

XPath support for TYPO3R

Stores number determined by preorder and postorder tree traversal

Allows to partition the nodes into four regions, as shown for node ƒ

Very fast read access, e.g. a single SELECT to query all ancestors to a node ƒ SELECT * FROM nodes WHERE pre < ƒ.pre AND post > ƒ.post

Pre/Post Plane Encoding

Page 48: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Querying the TYPO3 CR

Using SQL we need

a (simple) SQL parser

an efficient way to transform that SQL into equivalent SQL for the used low-level data structure

This still needs to be investigated, possible approaches

storing a reference to the parent node

using the pre/post plane only as a cache for XPath read queries, optimizing the native storage for SQL read queries

SQL support for TYPO3R

Page 49: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

A vendor may choose to offer additional features in his CR implementation

The TYPO3CR will offer support for

Persistency through code annotations

Automatic node type generation based on class members

Rules for setting up virtual root nodes based on node types

Extensions to JSR-283

Page 50: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Extensions to JSR-283

Persistency to the CRAnnotations define objects and their properties to be persistable

Properties are stored in the CR according to reflection results and hints from annotations

The FLOW3 persistence manager is transparently enhanced by the CR persistence mechanism

An object-to-object mapper does the hard work

Page 51: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Extensions to JSR-283

Automatic node type generationPersistency stores properties in the CR according to reflection results and hints from annotations

Node types can be generated automatically if wanted

Manually adding content cannot break the needed structure

Browsing the repository reveals a clear structure

Using content from other applications is less error-prone

Maybe this is utter nonsense - depends on whom you ask :)

Page 52: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

Extensions to JSR-283

Virtual root nodesThe repository has one root node, added nodes must be placed somewhere

It might be useful to find all nodes under a common node, depending on type or other attributes

Such a virtual root node is

like a smart folder or playlist

like a view in a RDBMS

Page 53: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Currently the code supports a subset of the required features of levels 1 & 2 and the optional parts of the JSR-283 specification

Basic read & write access

Namespace registration

Node type discovery and registration

Data storage uses the naive approach known from TYPO3 v4

Have a look at the Subversion repository for up-to-date information

Current status

Page 54: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Write test

Code

Test

Write test

Code

Test

...

Future plans

Page 55: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

The TYPO3 CR

Implementing the specification is not an easy task, but doable

For the various parts a lot of research has already been done

2008 will see full-time development on the TYPO3 CR

The repository is a major improvement over currently widespread ways of storing data

The whole PHP community could^Wwill benefit!

Summary

Page 56: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

So long and thanks for the fish

LinksTYPO3 Websitehttp://typo3.org

TYPO3 Development Websitehttp://forge.typo3.org

FLOW3 Websitehttp://flow3.typo3.org

TYPO3 5.0 Subsitehttp://typo3.org/gimmefive

Page 57: Implementing a JSR-283 Content Repository in PHP

Inspiring people toshare

So long and thanks for the fish

Questions?

beer

Page 58: Implementing a JSR-283 Content Repository in PHP