improved bsdl-based content adaptation for jpeg 2000 and hd photo (jpeg xr)

16
Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR) Wesley De Neve a, , Davy Van Deursen b , Wim Van Lancker b , Yong Man Ro a , Rik Van de Walle b a Image and Video Systems Lab, Korea Advanced Institute of Science and Technology, Research Wing R304,103-6 Munji-dong, Yuseong-gu, Daejeon 305-732, Republic of Korea b Multimedia Lab, Department of Electronics and Information Systems, Ghent University - IBBT, Gaston Crommenlaan 8, bus 201, B-9050 Ledeberg-Ghent, Belgium article info Article history: Received 12 February 2009 Accepted 18 February 2009 Keywords: BSDL Content adaptation HD Photo JPEG 2000 JPEG XR STX XML abstract JPEG 2000 and HD Photo (JPEG XR) enable the coding of images with several adaptivity provisions. These provisions allow taking into account the constraints of diverse usage environments. However, the actual adaptation of the coded bitstreams requires additional system complexity. This paper investigates how JPEG 2000 and HD Photo can be used in a standardized and XML-based framework for format-independent content adaptation in the compressed domain. An analysis is provided of the cost of providing adaptivity in the compressed domain, in terms of loss of coding efficiency, as well as in terms of the complexity of the format-agnostic adaptation system used. This complexity is expressed in terms of execution times, memory consumption, and file size overhead of the XML descriptions. Using the outcome of our analysis, a number of improvements are proposed that allow reducing the complexity of XML-based adaptation for JPEG 2000 and HD Photo, in particular the use of compact XML descriptions and an adaptation chain based on the Simple API for XML (SAX). Compact XML descriptions contain a minimal amount of information for the purpose of adaptationtheir use is enabled by introducing an additional lightweight pre- and post-processing step in the overall adaptation chain. Our results show that editing-style operations on JPEG 2000 and HD Photo bitstreams can be executed with a feasible complexity when relying on compact XML descriptions. These editing-style operations include exploiting spatial and quality scalability, as well as interactive extraction of regions-of-interest (ROIs). However, low-level operations in the compressed domain, such as rotating an image, cannot be supported, due to the need for entropy decoding and data reordering at macroblock or codeblock level. & 2008 Elsevier B.V. All rights reserved. 1. Introduction The ever-increasing heterogeneity in terms of coding formats, devices, and networks hampers the straightfor- ward consumption of multimedia content. This observa- tion has triggered several research and standardization efforts in the domain of scalable coding formats, which allow taking into account different device and network characteristics, and format-independent adaptation sys- tems, which allow dealing with an increasing number of scalable coding formats. An overview of these research and standardization activities, including a discussion of other content adaptation approaches such as transcoding and transmoding, can be found in [1–3]. This paper discusses the integration of two formats for the coding of still images into a standardized and format- independent adaptation system. The investigated coding formats are JPEG 2000, which became an international Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication ARTICLE IN PRESS 0923-5965/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2009.02.013 Corresponding author. Tel.: +82 42 866 6279; fax: +82 42 866 6245. E-mail addresses: [email protected] (W. De Neve), [email protected] (D. Van Deursen), [email protected] (W. Van Lancker), [email protected] (Y. Man Ro), [email protected] (R. Van de Walle). Signal Processing: Image Communication 24 (2009) 452–467

Upload: wesley-de-neve

Post on 26-Jun-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 24 (2009) 452–467

0923-59

doi:10.1

� Cor

E-m

davy.va

wim.va

(Y. Man

journal homepage: www.elsevier.com/locate/image

Improved BSDL-based content adaptation for JPEG 2000 and HDPhoto (JPEG XR)

Wesley De Neve a,�, Davy Van Deursen b, Wim Van Lancker b,Yong Man Ro a, Rik Van de Walle b

a Image and Video Systems Lab, Korea Advanced Institute of Science and Technology, Research Wing R304, 103-6 Munji-dong, Yuseong-gu,

Daejeon 305-732, Republic of Koreab Multimedia Lab, Department of Electronics and Information Systems, Ghent University - IBBT, Gaston Crommenlaan 8, bus 201, B-9050 Ledeberg-Ghent, Belgium

a r t i c l e i n f o

Article history:

Received 12 February 2009

Accepted 18 February 2009

Keywords:

BSDL

Content adaptation

HD Photo

JPEG 2000

JPEG XR

STX

XML

65/$ - see front matter & 2008 Elsevier B.V. A

016/j.image.2009.02.013

responding author. Tel.: +82 42 866 6279; fax:

ail addresses: [email protected] (W. D

[email protected] (D. Van Deursen),

[email protected] (W. Van Lancker), ymr

Ro), [email protected] (R. Van de

a b s t r a c t

JPEG 2000 and HD Photo (JPEG XR) enable the coding of images with several adaptivity

provisions. These provisions allow taking into account the constraints of diverse usage

environments. However, the actual adaptation of the coded bitstreams requires additional

system complexity. This paper investigates how JPEG 2000 and HD Photo can be used in a

standardized and XML-based framework for format-independent content adaptation in

the compressed domain. An analysis is provided of the cost of providing adaptivity in the

compressed domain, in terms of loss of coding efficiency, as well as in terms of the

complexity of the format-agnostic adaptation system used. This complexity is expressed in

terms of execution times, memory consumption, and file size overhead of the XML

descriptions. Using the outcome of our analysis, a number of improvements are proposed

that allow reducing the complexity of XML-based adaptation for JPEG 2000 and HD Photo,

in particular the use of compact XML descriptions and an adaptation chain based on the

Simple API for XML (SAX). Compact XML descriptions contain a minimal amount of

information for the purpose of adaptation—their use is enabled by introducing an

additional lightweight pre- and post-processing step in the overall adaptation chain. Our

results show that editing-style operations on JPEG 2000 and HD Photo bitstreams can be

executed with a feasible complexity when relying on compact XML descriptions. These

editing-style operations include exploiting spatial and quality scalability, as well as

interactive extraction of regions-of-interest (ROIs). However, low-level operations in the

compressed domain, such as rotating an image, cannot be supported, due to the need for

entropy decoding and data reordering at macroblock or codeblock level.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

The ever-increasing heterogeneity in terms of codingformats, devices, and networks hampers the straightfor-ward consumption of multimedia content. This observa-tion has triggered several research and standardization

ll rights reserved.

+82 42 866 6245.

e Neve),

[email protected]

Walle).

efforts in the domain of scalable coding formats, whichallow taking into account different device and networkcharacteristics, and format-independent adaptation sys-tems, which allow dealing with an increasing number ofscalable coding formats. An overview of these researchand standardization activities, including a discussion ofother content adaptation approaches such as transcodingand transmoding, can be found in [1–3].

This paper discusses the integration of two formats forthe coding of still images into a standardized and format-independent adaptation system. The investigated codingformats are JPEG 2000, which became an international

Page 2: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 453

standard in December 2000 (ISO/IEC 15444-1; [4]), andHD Photo, which is under consideration for standardiza-tion by the Joint Photographic Experts Group (JPEG) at thetime of writing. This potentially new standard is tenta-tively called JPEG Extended Range (JPEG XR; ISO/IEC29199-2). Both JPEG 2000 and HD Photo offer support forthe scalable and lossless representation of still images andintra-only video. Further, as HD Photo is natively sup-ported by the Windows Vista operating system, it can beconsidered the first widespread scalable coding format.

The adaptation system investigated in this papermakes use of the Bitstream Syntax Description Language(BSDL; ISO/IEC 23001-5; [5]), which enables the auto-matic creation of XML-based descriptions of the high-level structure of coded bitstreams. These format-specificdescriptions contain information required for contentadaptation, thus concealing the internal complexity of ascalable format. The main merit of BSDL lies in the factthat it allows developing a format-independent adaptationsystem—the same logic can be applied for the adaptationof scalable audio, video, and still image bitstreams [6].Format-independence also means that the adaptationsystem is future proof and suitable for a hardwareimplementation.

This paper is an extended and improved version of apaper presented during the VIE 2008 Workshop onScalable Coded Media Beyond Compression, providing aninitial analysis of XML-based content adaptation for JPEG2000 and HD Photo. In particular, we additionally addressthree issues identified in the VIE 2008 paper: the slowcreation of XML descriptions for HD Photo (practicallyunacceptable), the file size overhead of the textual XMLdescriptions for both JPEG 2000 and HD Photo, and thecomputationally intensive nature of bitstream reorderingfor JPEG 2000, used for the combined exploitation ofspatial and quality scalability.

To solve the first two issues, we propose the use ofcompact XML descriptions, which contain minimal in-formation for the purpose of content adaptation. As forthe computationally intensive nature of bitstream reor-dering in JPEG 2000, except for the use of faster hardware,

header index table spati

DC tile lo

elementary bitstreamin frequency mode

0

0

tile tile bitstream offsetDC 0

lowpass 4726highpass 65556flexbits 304543

DC 752333... ...

index table

4726

Fig. 1. HD Photo bitst

no solution can be readily offered in this paper, due to theneed for a significant number of byte-copy operations.Therefore, we essentially propose to avoid the use ofbitstream ordering by relying on in-place exploitation ofcombined scalability. Finally, we also briefly discuss andevaluate an SAX-based content adaptation chain for thepipelined and on-the-fly execution of the XML transfor-mation and binarization steps.

The organization of this paper is as follows. Section 2provides a technical discussion of JPEG 2000 and HDPhoto, putting the emphasis on the emerging HD Photoformat. Content adaptation using BSDL is shortly dis-cussed in Section 3, while extensive experimental resultsare presented in Section 4. Improved BSD-based adapta-tion for HD Photo and JPEG 2000 is discussed andevaluated in Section 5. Finally, a number of directionsfor future research and conclusions are, respectively,provided in Sections 6 and 7.

2. Technical discussion of HD Photo and JPEG 2000

This section reviews the new HD Photo format in termsof its bitstream structure and its adaptivity provisions.Wherever meaningful, a concise comparison is drawnwith JPEG 2000. JPEG 2000 has already been extensivelycovered by the scientific literature (see for instance [7,8]),while details about HD Photo are provided in an overviewpaper written by Srinivasan et al. [9]. Consequently, weassume some awareness of the basic design principles ofJPEG 2000 and HD Photo.

2.1. Bitstream structure

As illustrated by Fig. 1, an HD Photo elementarybitstream consists of a header, an index table, and asequence of tiles. The spatial tiles can be stored using twomodes of operations: spatial mode or frequency mode. Inspatial mode, the bitstream of each spatial tile is laid outin macroblock order, this is, the compressed bits of eachmacroblock are located together. In frequency mode, the

al tile 1 spatial tile 2 spatial tile 3 spatial tile 4

wpass tile highpass tile flexbits tile

image with 4 spatial tiles and16 frequency tiles

1

43

752333 1516993 1928318

65556 304543

2

ream structure.

Page 3: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Table 1Adaptivity provisions in JPEG 2000 and HD Photo.

Adaptivity provision JPEG 2000 HD Photo

Flipping and rotating + +

Arbitrary ROI extraction + +

Tile-aligned ROI extraction ++ ++

Spatial scalability ++ ++

Lossless-to-lossy degradation ++ ++

Fine-grained quality scalability ++ -

Color component scalability ++ +

Bitstream reordering ++ +

Spatial retiling + +

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467454

transform coefficients of all macroblocks in a spatial tileare laid out as a hierarchy of four frequency subbands: theDC subband, the lowpass (LP) subband, the highpass (HP)subband, and the flexbits subband (Flexbits). Thesesubbands enable several types of scalability, as will bediscussed in Section 2.2. It should be clear that thenumber of tiles in frequency mode is four times higherthan in spatial mode.

Compared to JPEG 2000, a spatial tile in HD Photoalways consists of four frequency tiles when the frequencymode is in use, while a spatial tile in JPEG 2000 maycontain a variable number of packets, dependent on thenumber of quality layers, spatial layers, and colorcomponents used. Further, to parse a bitstream, HD Photoprimarily relies on absolute tile bitstream offsets stored inan index table. Start codes are also specified by HD Photo,but they cannot be reliably used for parsing purposes dueto the lack of means for start code emulation prevention.1

JPEG 2000 allows both parsing based on start codes andtile and packet bitstream offsets.

2.2. Adaptivity provisions in the compressed domain

In frequency mode, a trivial form of quality scalabilityis supported by simply removing all Flexbits tiles from anHD Photo bitstream. For images coded in a losslessmanner, removing the Flexbits tiles comes down todegrading the image quality from a lossless to a lossyrepresentation. Spatial scalability is supported by addi-tionally removing the HP and LP subband tiles, each timeresulting in a reduction of the spatial resolution by afactor of four along the horizontal and vertical axis.Dependent on the application targeted, spatial scalabilityin HD Photo can also be seen as coarse-grained qualityscalability when still displaying the adapted image at itsoriginal resolution. The removal of tiles also requiresupdating the index table in order to maintain formatcompliance: bitstream offsets belonging to removed tilesneed to be eliminated from the index table, whileremaining offsets need to be recomputed due to theirabsolute nature. Correctly updating the index table isresponsible for a major part of the complexity ofexploiting scalability in HD Photo.

Besides quality and spatial scalability, HD Photo alsosupports other operations in the compressed domain:flipping and rotating, arbitrary cropping or ROI extraction(where the ROI may have arbitrary pixel coordinates), fastcropping or ROI extraction (where the ROI is aligned withtile boundaries), color component scalability, bitstreamreordering (going from spatial mode to frequency mode,and vice versa), and spatial retiling.

The previously enumerated compressed-domain op-erations are also summarized in Table 1, and compared toJPEG 2000: a ‘�’ symbol means that the operation is notsupported by the format in question; a ‘þ’ symbol denotesthat the operation requires (partial) entropy decodingand/or bitstream reordering at macroblock (HD Photo) or

1 HD Photo uses offset-based parsing as this is more efficient in

terms of I/O, an important characteristic for mobile applications.

codeblock (JPEG 2000) level; and a ‘++’ symbol indicatesthat the operation can be executed in an editing-stylefashion (e.g., by simply removing particular tiles andupdating certain syntax elements).

In Table 1, the term ‘bitstream reordering’ refers tochanging the bitstream progression order in JPEG 2000(e.g., from spatial progression to progression by quality),while in HD Photo, ‘bitstream reordering’ refers tochanging the bitstream mode (e.g., from spatial tofrequency mode). Further, arbitrary ROI extraction in JPEG2000 includes the ability to perform ROI extraction at thelevel of precints (i.e., packet partition locations) andcodeblocks, coming with reduced decoding but increasedadaptation complexity [10].

3. Description-driven content adaptation

Our research uses a format-independent system foradapting scalable bitstreams, relying on the standardizedand XML-based BSDL [11]. The overall method foradapting a scalable bitstream using MPEG-B BSDL isillustrated in Fig. 2 for fast (i.e., tile-aligned) ROI extrac-tion. Explanatory notes are provided below:

1.

a Bitstream Syntax Schema (BS Schema) is designed,a document manually written in BSDL and containinga formal description of the high-level syntax of aparticular media format;

2.

a Bitstream Syntax Description (BSD) is created by theformat-independent BintoBSD Parser, taking as input aparticular bitstream and a BS Schema;

3.

the BSD is transformed, taking into account constraintsimposed by the targeted usage environment;

4.

the format-independent BSDtoBin Parser is used forcreating an adapted bitstream, taking the BS Schema,the original bitstream and the transformed BSD asinput.

BSDL does not standardize the way a BSD is trans-formed. This can for instance be done by stylesheetswritten in Extensible Stylesheet Language Transforma-tions (XSLT; [12]) or Streaming Transformations for XML(STX; [13]). These stylesheets contain the actual logic forremoving and updating particular syntax structures (e.g.,for updating the index table in HD Photo). An in-depthevaluation of the adaptation chain in Fig. 2, in the context

Page 4: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

<bitstreamxml:base=“breeze.wdp”><header>0 24</header><index>24 67</index><tile>91 746</tile><tile>837 903</tile><tile>1740 857</tile><tile>2597 1103</tile></bitstream>

<bitstreamxml:base=“breeze.wdp”><header>0 24</header><index>24 46</index><tile>837 903</tile><tile>2597 1103</tile>

</bitstream>

bitstream

adaptedbitstream

bitstream syntaxdescription

transformed bitstreamsyntax description

transformationBS

Schema

BintoBSD

BSDtoBin

(2)

(3)

(4)

(1) usage environmentdescription

1 2

43

2

4

Fig. 2. Description-driven content adaptation with BSDL [11].

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 455

of JPEG 2000 and HD Photo, will be provided in Sections 4and 5.

BSDL was initially part of the MPEG-21 Digital ItemAdaptation standard (MPEG-21 DIA; ISO/IEC 21000-7;[14]). However, to promote its use for purposes beyondadaptation (such as format-independent streaming [15]),the language in question was recently relocated to astandalone standard known as MPEG-B BSDL (ISO/IEC23001-5; [5]). An in-depth overview of BSD-drivencontent adaptation can be found in [6,16].

4. Performance analysis and problem identification

This section presents a number of performance results,obtained by integrating HD Photo and JPEG 2000 in anadaptation system using BSDL. An outline for an applica-tion scenario is given as well. Further, more details arealso provided regarding the test setup used.

4.1. Application scenario

This paper targets a static client–server application, inwhich the server allows heterogeneous clients to consumestill images in two different coding formats: JPEG 2000and HD Photo. The images, pre-coded in a lossless andscalable manner, support a number of common adaptivityprovisions that are to be exploited in an on-the-fly fashion(i.e., while handling a client request): lossless-to-lossydegradation of the perceptual image quality, spatialscalability, and interactive and tile-aligned ROI extraction.Interactive means that the location of the ROI is onlyknown after encoding. Consequently, a tile structure witha fine granularity has to be used. This may have a highimpact on the coding efficiency and the performance ofthe adaptation system, as further discussed in Section 4.3.

The server relies on BSDL for the format-agnostic

adaptation of images in the compressed domain, takinginto account information about the usage environmentand the properties of the scalable image content. A moreadvanced scenario might also include audio and video

content. As outlined in Section 3, the use of BSDL requirestranslating the high-level structure of each image into aBSD. We assume that BSDs are first generated offline andsubsequently stored on the server for further processing(i.e., on-the-fly BSD transformation and binarization whilehandling a client request).

A mobile client may communicate with the server asfollows. First, after having selected a particular codingformat, the client may request a low-resolution image thatfits the limited display resolution. This request can beimplemented by the server by extracting a low-resolutionversion of the selected image. Next, the client may requesta high-quality version of a semantically meaningful regionthat fits the resolution of the client device. The server canimplement this request by first extracting an ROI and bysubsequently dropping a quality layer to go from a losslessto a lossy representation.

A client may communicate usage environment proper-ties (e.g., format support, display resolution, ROI coordi-nates, and user preferences) to the server by relying on theusage environment description (UED) tool of DIA [14].Determining an ROI can be done either manually, withuser intervention, or automatically, by using a content-aware image processing tool [17].

4.2. Evaluation methodology

Integrating HD Photo and JPEG 2000 in a BSDL-basedadaptation system requires the design of BS Schemata,which allow exposing the high-level structure of thecoded bitstreams as a BSD, and a number of format-specific stylesheets, which allow transforming the BSDs ina pipelined fashion. In our research, the stylesheets werewritten in STX, because of its low computational complex-ity, low memory footprint, and streaming capabilities [13].

For HD Photo, a BS Schema and two stylesheets werenewly designed: one stylesheet for dropping subbandsand one stylesheet for interactive and tile-aligned ROIextraction. In Appendix A, part of the BS Schema for HDPhoto is listed in Fig. 11, while a BSD excerpt can be foundin Fig. 12.

Page 5: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467456

For JPEG 2000, the BS Schema and the XSLT stylesheetsin the MPEG-21 DIA reference software repository wereused as a starting point: the BS Schema was extended withsupport for handling multiple tiles in a JPEG 2000bitstream, while the XSLT stylesheets for exploiting color,spatial, and quality scalability were rewritten in STX.These stylesheets use simple bitstream truncation foreach spatial tile in order to adapt a JPEG 2000 bitstreamalong the color, spatial, or perceptual quality axis [6].Further, a new STX stylesheet was written to supportinteractive and tile-aligned ROI extraction in a JPEG 2000bitstream, as well as a new STX stylesheet to change thebitstream order from progression by spatial resolution toprogression by quality.

BSD-based extraction of pre-encoded ROIs in JPEG2000 bitstreams has been briefly discussed before in [18].However, the authors focus on the performance ofthe automatic detection of ROIs using a visual attentionmodel—they do not discuss the efficiency of theBSD-based content adaptation chain, nor do they outlinehow to automatically transform a BSD in order to extract apre-encoded ROI.

Our motivation for studying the use of bitstreamreordering in JPEG 2000 is twofold. First, the use ofbitstream reordering allows to easily reuse the function-ality provided by the XSLT stylesheets, but rewritten inSTX, for the combined exploitation of spatial and qualityscalability in a single JPEG 2000 bitstream (as required bythe example application discussed in Section 4.1). Indeed,spatial scalability can be exploited first using simplebitstream truncation operations, followed by bitstreamreordering to go from progression by spatial resolution toprogression by perceptual quality. Finally, quality scal-ability can be exploited by again using simple bitstreamtruncation operations. Second, bitstream reordering inJPEG 2000 is a high-level compressed-domain operation,thus motivating an analysis whether this feature could beefficiently exploited using a BSD-based approach.

Operations marked with a ‘+’ symbol in Table 1 cannotbe supported by BSDL, due to a lack of support for entropydecoding and data reordering at the level of macroblocks(HD Photo) and codeblocks (JPEG 2000). Indeed, the vision

Fig. 3. Test images: (a) Breeze; (b) Kung

of BSDL is to only support editing-style operations in thecompressed domain, facilitated by descriptions of thehigh-level bitstream syntax that abstract the internalcomplexity of the format processed.

Experiments were carried out on a PC with an IntelPentium 4 2.6 GHz processor, 2 GB of system memory, andan ST3120026AS hard disk with a speed of 7200 revolu-tions per minute (RPM). The PC used was runningWindows XP Pro SP2 and Sun Microsystems’ Java 2Runtime Environment (Standard Edition version1.5.0_05). The memory consumption of Java programswas measured by relying on JProfiler 4.2.1. Version 1.2.1 ofthe BSDL reference software was used, supporting thecontext management attributes as introduced in a revi-sion of BSDL by MPEG. These attributes allow keeping theinternal memory consumption of BintoBSD constant [19].Compressing the textual BSDs was done using WinRAR3.70. The STX stylesheets were executed using Joost(version 2005-05-21).

Time measurements were executed six times—anaverage was calculated over the last five runs to mitigatestartup effects, resulting in a standard deviation that isless than 0.8, 0.3, and 0.1 s BSD creation, transformation,and binarization. The low values for the standard devia-tion show that our methodology produces reliable andconsistent time measurements.

Five 24-bits RGB test images with a resolution of1920� 1072 were used (visualized in Fig. 3). Each image,respectively, maps to the first frame of one of the fiveVIPER HD video sequences distributed by FastVDO duringthe MPEG meeting in Munich. The images were encodedin a lossless and scalable manner, once by using theKakadu v5.2.5 JPEG 2000 encoder [20] and once by relyingon the encoder in Microsoft’s HD Photo Device Porting Kit1.0 [21]. Table 2 provides more information about a subsetof our test bitstreams, in particular, bitstreams onlycontaining one spatial tile. For HD Photo, the numbersbetween brackets denote size information for bitstreamshaving the frequency mode.

As a trade-off between the coarse- and fine-grainedscalability capabilities of HD Photo and JPEG 2000,respectively, the JPEG 2000 bitstreams were encoded with

fu; (c) Night; (d) Plane; (e) Waves.

Page 6: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Table 2Bitstream characteristics.

Name File size (KB) Textual BSD

size (KB)

Compressed

BSD size (KB)

Breeze.wdp 2590 (2590) 4 (5) 1 (1)

Kungfu.wdp 2208 (2208) 4 (5) 1 (1)

Night.wdp 2002 (2002) 4 (5) 1 (1)

Plane.wdp 2571 (2571) 4 (5) 1 (1)

Waves.wdp 2278 (2278) 4 (5) 1 (1)

Breeze.j2c 2483 15 2

Kungfu.j2c 1956 15 2

Night.j2c 1451 15 2

Plane.j2c 2394 15 2

Waves.j2c 2063 15 2

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 457

four quality layers, six spatial layers, and three colorcomponents. The four quality layers in the JPEG 2000bitstreams resemble the quality scalability features of HDPhoto bitstreams (when spatial scalability in HD Photo isseen as coarse-grained quality scalability), while the sixspatial layers in the JPEG 2000 bitstreams mimic thespatial scalability capabilities of HD Photo bitstreams(given the fact that HD Photo bitstreams enable theexploitation of three different spatial resolutions in thecompressed domain, and the fact that the design philo-sophy of HD Photo indicates that intermediate resolutionscan be achieved by relying on client-side downsamplingtechniques after bitstream extraction). This impliesthat each spatial tile in a JPEG 2000 bitstream contains72 ð4� 6� 3Þ packets (where each spatial tile in an HDPhoto bitstream in frequency mode always contains fourfrequency tiles).

2 As the concept of a macroblock does not exist in JPEG 2000, we

assume that a macroblock also refers to 16� 16 pixels in the context of

JPEG 2000.

4.3. Experimental results

Our performance analysis breaks up the functioning ofthe BSD-based adaptation system in four basic steps:creation of scalable bitstreams, BSD generation, BSDtransformation, and adapted bitstream generation. Thefirst step analyzes how offering adaptivity in the com-pressed domain influences the coding efficiency. The nextthree steps are analyzed in terms of their memory andcomputational complexity. The file size overhead oftextual and compressed BSDs is also taken into account.Furthermore, information is provided regarding the trade-off between the tile size granularity, the file size of acompressed image, and the file size of textual andcompressed BSDs. Knowing this trade-off is important toefficiently enable interactive and tile-based ROI extrac-tion. Having to support ROI extraction is the mostchallenging requirement in our application scenario, dueto the need for a fine-grained tile structure.

In the remainder of this paper, if performanceresults for Breeze are used as a representative example,then similar results were achieved for the other testimages (due to a high similarity in coded bitstreamstructure).

4.3.1. Creation of scalable bitstreams

Fig. 4 shows the file size overhead of HD Photo andJPEG 2000 representations of the Breeze test image, for auniform tile size varying between 8� 8 and 1� 1 macro-blocks.2 The loss in coding efficiency is measuredcompared to a representation of Breeze that only containsone spatial tile (see Table 2). The following observationcan be made: the smaller the tile size used, the finer theinteractive ROI selection, but the lower the codingefficiency. For HD Photo, a tile size of 4� 4 macroblocks(64� 64 pixels) is feasible in terms of file size overhead,which is less than 4% for the five test images used. ForJPEG 2000, to achieve an overall overhead strictly lessthan 4%, a tile size of 12� 12 macroblocks (192� 192pixels) needs to be used.

Compared to HD Photo, the file size overhead for JPEG2000 is significantly higher, due to a lack of intraprediction and a broken context for entropy coding. Asfor JPEG 2000, ROI extraction at the level of precints orcodeblocks should result in a smaller decrease of thecoding efficiency (thanks to the use of a coarse-grainedtile structure). However, this approach is not possiblewhen using BSDL, due to the need for partial decoding.

The file size overhead of HD Photo bitstreams infrequency mode, which enables quality and spatialscalability, compared to HD Photo bitstreams in spatialmode, is limited. This overhead stems from the use ofadditional tile headers and a larger index table. Forexample, for the Breeze test image in frequency mode,compared to the Breeze test image in spatial mode, the filesize overhead can be ignored when the image containsone spatial tile (see Table 2), while the overhead amountsto 4.5% or 154 KB when the Breeze test image contains8040 spatial tiles (or 32160 frequency tiles).

4.3.2. BSD creation

For the five test images used, Table 3 clarifies the trade-off between the tile size granularity, the file size ofcompressed images, and the file size of textual andcompressed BSDs. In general, the file size overhead oftextual BSDs is significantly high, but can also besignificantly reduced when using compressed BSDs. Theobservation regarding the efficient compression of BSDs isalso in line with results previously presented in [6].

Fig. 5 shows the speed at which BSDs can be created bythe format-independent BintoBSD Parser for HD Photorepresentations of Breeze. The speed is constant for abitstream with a particular number of tiles, but decreaseswhen the number of tiles increases. This behavior, whichis practically unacceptable, is due to an increasing numberof entries in the index table.

The index table is accessed multiple times by BintoBSDwhen parsing a tile (to check whether a tile bitstreamoffset is equal to zero, signalling an absent Flexbits tile,and to compute the payload length of a tile). As such, themore entries in the index table, which is internally

Page 7: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Table 3Trade-off tile size, image file size, and BSD file sizes (for all test images).

Overhead for HD Photo (frequency mode)

Tile size 4� 4 5� 5 6� 6 8� 8

Image file size (%) ½2;4� ½1;3� ½1;2� ½0;1�

Textual BSDs (%) ½25;34� ½17;23� ½11;15� ½6;9�

Compressed BSDs (%) ½1;2� ½0;2� ½0;1� ½0;1�

Overhead for JPEG 2000 (progression by resolution)

Tile size 4� 4 12� 12 14� 14 16� 16

Image file size (%) ½15;25� ½2;4� ½2;4� ½1;3�

Textual BSDs (%) ½202;323� ½28;46� ½20;35� ½18;35�

Compressed BSDs (%) ½5;10� ½1;2� ½0;2� ½0;2�

1 1 1 2 3 5 10

33

1 1 2 2 3 613

39

4 7 912 16

29

59

192

0

20

40

60

80

100

120

140

160

180

200

8x8 7x7 6x6 5x5 4x4 3x3 2x2 1x1

File

siz

e ov

erhe

ad (%

)

Tile size

HD Photo - spatialHD Photo - frequency

HD Photo file with one spatial tile or four frequency tiles : 2590 KB.

File size overhead: 3.4% ∼ 87 KB.

JPEG 2000

HD Photo file with 508 spatial tiles or 2040 frequency tiles: 2677 KB.

Fig. 4. Image file size overhead (for Breeze).

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467458

represented by BintoBSD using an XML tree, the moreverbose the internal context, the more time needed toaccess the index table. Accessing the index table is doneusing regular expressions that take the form of rathercomplicated XML Path (XPath; [22]) expressions (longlocation steps need to be used to navigate to a particulartile bitstream offset in the XML tree, as well as severalpredicates).

Further, as shown in Fig. 5, JPEG 2000 bitstreams canbe efficiently parsed by BintoBSD, thanks to the presenceof start codes that allow to easily detect tile and packetboundaries, thus not requiring the execution of XPathexpressions.

To generate BSDs for HD Photo in a more efficient way,we have also used BFlavor3 (BSDLþ XFlavor; [23]). BFlavoris a modification of XFlavor; it allows to automaticallycreate a format-specific parser, based on a manuallydesigned description of the high-level syntax of theformat in question using a Java-alike approach. This

3 http://multimedialab.elis.ugent.be/bflavor/.

parser is then able to produce BSDs that are equivalentto the BSDs generated by BintoBSD.

Fig. 5 shows that BFlavor significantly outperformsBintoBSD in terms of BSD generation speed for HD Photo.This can be explained by the internal data model used: torepresent and access the index table of HD Photo,BintoBSD uses an XML tree and XPath, where BFlavoruses arrays and simple indexing operations. Further, theBSD generation speed of BintoBSD and BFlavor is similarfor JPEG 2000. This can be attributed to the fact thatparsing our JPEG bitstreams does not require the execu-tion of complicated XPath expressions against a verbosecontext by BintoBSD (thanks to the use of start codes inthe JPEG 2000 bitstreams). As such, I/O (i.e., the size of thetextual BSDs) is the main bottleneck in the BSD generationprocess for JPEG 2000, for both BintoBSD and BFlavor. Thelatter also explains why BFlavor’s BSD generation speedfor HD Photo is significantly higher than for JPEG 2000, asBSDs for JPEG 2000 tend to be more verbose than for HDPhoto, given a particular tile size (as shown in Tables 2and 3).

The peak memory consumption for BintoBSD andBFlavor is illustrated in Fig. 6 for a number of HD Photorepresentations of Breeze. The following observation canbe made: the memory consumption is constant for an HDPhoto bitstream with a fixed number of tiles, thanks to theuse of the context management attributes. However, thememory consumption increases when the number of tilesincreases in an HD Photo file, which is due to theincreasing size of the index table (thus resulting in anincreasing size of BintoBSD’s internal XML tree). Indeed,the index table needs to be kept entirely in the systemmemory as this syntax structure is parsed before tiles areparsed. This illustrates that BSDL’s context managementattributes fail in their goal to enable an efficient evalua-tion of XPath expressions by keeping the memoryconsumption in BintoBSD constant; to do so, it is also

Page 8: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

0

5

10

15

20

25

30

35

0 2000 4000 6000 8000

BSD

gen

erat

ion

spee

d (ti

les/

s)

Number of spatial tiles

BintoBSD - HD Photo BintoBSD - JPEG 2000 BFlavor - HD Photo BFlavor - JPEG 2000

Downscaled by a factor of 100 for visualization purposes.

Fig. 5. BSD generation speed (for Breeze).

3.0 3.64.8

7.8

25.0

0.4 0.8 1.2 1.3 2.3

0

5

10

15

20

25

30

5x5 4x4 3x3 2x2 1x1

Peak

mem

ory

cons

umpt

ion

(MB

)

Tile size

BintoBSDBFlavor

Fig. 6. Memory consumption during BSD creation (for Breeze).

4 In the remainder of this paper, wherever appropriate, results are

presented for Breeze, containing tiles with a size of 4� 4 and 12� 12

macroblocks for an HD Photo and JPEG 2000 representation, respectively.

These results are to be interpreted as an upper limit for performance

results that can be obtained by using a coarser tile grid.

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 459

necessary to keep the size of the context in BintoBSDbelow a particular threshold.

For JPEG 2000, the peak memory consumption isconstant and low for both BintoBSD and BFlavor, andindependent of the number of tiles present in a particularfile (as parsing is done using start codes). As an example,for the JPEG 2000 representation of Breeze, the peakmemory consumption has a constant value of 1.4 MB forBintoBSD and 0.7 MB for BFlavor.

4.3.3. BSD transformation and binarization

Transforming and binarizing BSDs for HD Photo bit-streams can be efficiently implemented. This is animportant observation for the use case discussed inSection 4.1, as these steps need to be implemented in apipelined and on-the-fly fashion (in contrast to creatingscalable bitstreams and their BSDs, which can be doneoffline). As a representative example, for the Breeze testimage containing tiles with a size of 4� 4 macroblocks, allBSD transformations (subband skipping, ROI extraction)can be separately performed in less than 1.1 s, using lessthan 2 MB of system memory. Further, adapted versions ofthe Breeze test image can be generated by BSDtoBin in

less than 1.2 s, again using less than 2 MB of systemmemory.4

As for JPEG 2000, all BSD transformations (exploitingspatial and quality scalability, bitstream reordering, ROIextraction) can be separately performed in less than 4.3 s,using less than 2 MB of system memory (for the Breezetest image containing tiles with a size of 12� 12 macro-blocks). Further, adapted JPEG 2000 bitstreams aregenerated by BSDtoBin in less than 7.0 s, again using lessthan 2 MB of system memory. The slow execution ofBSDtoBin is due to the need for bitstream reordering. Thisoperation is computationally intensive, for both BSDtransformation and BSDtoBin. When not taking intoaccount bitstream reordering, all BSD transformationscan be separately performed in less than 3.3 s, while alladapted bitstreams can be generated in less than 0.9 s.

The execution times for BSD transformation andbinarization do not include the time necessary for loadingthe different STX stylesheets and the BS Schemata, asthese steps can be done offline (in less than 0.7 and 0.4 s,respectively, for both coding formats).

Using the outcome of our initial performance analysis,the next section proposes a number of improvements thatallow reducing the complexity of BSD-based contentadaptation for JPEG 2000 and HD Photo.

5. Improved BSD-based content adaptation

This section addresses the three issues identified in theprevious section: the slow creation of BSDs for HD Photoby BintoBSD, the file size overhead of textual BSDs for HDPhoto and in particular JPEG 2000, and the computation-ally intensive nature of bitstream reordering for JPEG2000, used in our research for the exploitation ofcombined scalability.

Page 9: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

compact BSDBSD pre-processingBSDBintoBSDbitstream

compact BSD transformation

BSSchema

BSD post-processing

transformedcompact BSDBSDtoBin expanded and

transformed BSDadaptedbitstream

Fig. 7. Pre- and post-processing in a BSD-based content adaptation chain, enabling the use of compact BSDs.

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467460

To solve the first two issues, we propose the use ofcompact BSDs, which contain a minimal amount ofinformation for the purpose of content adaptation. Theiruse is enabled by an additional pre- and post-processingstep in the content adaptation chain, as shown in Fig. 7. Asfor the computationally intensive nature of bitstreamreordering in JPEG 2000, except for the use of fasterhardware, no solution can be readily offered, due to theneed for a significant number of byte-copy opera-tions. Therefore, we recommend to avoid the use ofbitstream ordering by relying on in-place exploitation ofcombined scalability. Finally, this section also evaluates anSAX-based approach for the pipelined execution of theBSD transformation and binarization steps.

5.1. Enhanced BSD-based adaptation for HD Photo

5.1.1. Compact BSD creation

As discussed in Section 4.3, BSD creation for HD Photois time-consuming and memory-intensive when makinguse of BintoBSD in a conventional way. BFlavor is asolution for the efficient generation of BSDs. However, theuse of BFlavor introduces an external tool in the contentadaptation chain. In this section, we propose a secondsolution that relies on a more intelligent use of BintoBSDand the BSD transformation step, thus no longer requiringBFlavor. The proposed solution is based on the followingtwo observations:

1.

a BSD typically contains two classes of information:information required for guiding the adaptation pro-cess (i.e., for steering the BSD transformation step) andinformation required for correctly guiding the binar-ization process (i.e., for steering BSDtoBin);

2.

BintoBSD creates information for BSD transformationand subsequent binarization at once, although infor-

mation needed by BSDtoBin is not necessarily required

during BSD transformation.

As for HD Photo, information for guiding the BSDtransformation step is stored in the main header and inthe index table of a compressed bitstream. Parsing thisinformation only requires the execution of a limitednumber of rather straightforward XPath expressions, thusnot resulting in performance issues for BintoBSD. On theother hand, information needed by the binarization

process is essentially stored in the tile descriptions,containing tile header and payload information. Parsingtile descriptions by BintoBSD is a time-consuming andmemory-intensive process, due to the use of XPathexpressions for querying the index table. However, tileheader and payload information is also implicitly availablein the main header and the index table of an HD Photobitstream, albeit in a format that cannot be readilyunderstood by BSDtoBin. Therefore, the following solutionis proposed for the efficient creation of BSDs for HD Photobitstreams by BintoBSD:

The main header and the index table of an HD Photobitstream are translated into an XML description byBintoBSD. � A BSD post-processing step is introduced, responsible

for creating tile descriptions after BSD transformation.This is illustrated in Fig. 8. In particular, tile header andpayload information, implicitly available in the mainheader and the index table of an HD Photo bitstream, isrewritten during the post-processing step into a formatthat can be understood by BSDtoBin. The post-proces-sing step, which guarantees a correct functioning of theBSDtoBin Parser, can for instance be implemented byan additional stylesheet that is to be executed after theactual BSD transformation.

� A BSD pre-processing step is introduced, responsible

for duplicating the tile bitstream offsets stored in theindex table of an HD Photo bitstream, as well as foradding information about the length of each tile to theindex table. This is illustrated in Fig. 9. Informationabout the original tile bitstream offsets, changedduring the actual BSD transformation step, and thelength of each tile is needed by the post-processingstep in order to generate tile descriptions that stilldescribe the original HD Photo bitstream. The pre-processing step, which guarantees a correct function-ing of the post-processing step, can for instance beimplemented by an additional stylesheet that is to beexecuted before the actual BSD transformation.

The BSD that is the output of pre-processing contains aminimal amount of information to guarantee a correctBSD transformation step. This XML description is furtherreferred to as a compact BSD. A BSD as used in Section 4.3is further referred to as a complete BSD.

Page 10: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Fig. 8. Post-processing for HD Photo.

Fig. 9. Pre-processing for HD Photo.

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 461

5.1.2. Qualitative and quantitative evaluation

The proposed solution for BSD creation for HDPhoto has three advantages, which are listed below.The first two advantages are experimentally quantified,and compared to the performance results of Section 4.3(in which complete BSDs are used for adaptationpurposes).

1.

BSD creation can be done in an efficient way withoutrequiring the use of additional tools such as BFlavor. Asa representative example, for the Breeze test image

containing tiles with a size of 4� 4 macroblocks,BintoBSD generates an XML representation of the mainheader and the index table in 1.2 s, using 0.2 MB ofsystem memory, while the subsequent pre-processingstep is executed in 0.6 s, using 1.5 MB of systemmemory (the execution of the post-processing steprequires 0.9 s and 2.1 MB of system memory). On theother hand, generating a complete BSD with BintoBSDrequired 222 s and 3.6 MB of system memory, whileBFlavor generated a complete BSD in 1.6 s, consuming0.8 MB of system memory. These observations are alsosummarized in Table 4.

Page 11: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Table 5File size reduction factor (numbers between brackets denote standard

deviation).

Textual BSD size Compressed BSD size

HD Photo 3.6 (0.7) 1.6 (0.3)

JPEG 2000 4.0 (0.4) 1.4 (0.1)

Table 4Performance of BSD creation for complete and compact BSDs (only using

BintoBSD and/or STX).

Execution time (s) Memory consumption (MB)

Complete Compact Complete Compact

HD Photo 222 1.8 3.6 1.7

JPEG 2000 4.2 4.9 1.4 1.8

Table 6Performance of BSD transformation and binarization for complete and

compact BSDs, using separate and pipelined execution.

Execution time (s) Memory consumption

(MB)

Complete +

separate

Compact +

pipelined

Complete +

separate

Compact +

pipelined

HD Photo 2.3 1.5 4.0 2.8

JPEG 2000 4.2 1.4 4.0 2.2

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467462

2.

Throughout a major part of the content adaptationchain, compact BSDs are used with a file size that issignificantly smaller. Indeed, during BSD transforma-tion, a compact BSD only contains an XML representa-tion of the main header and an extended version of theindex table. It is up to the post-processing step totransform a compact BSD into a complete BSD (i.e., toadd tile descriptions to a compact BSD), which canthen be used by BSDtoBin. For the Breeze test bit-streams used, we have observed that the file sizeoverhead of the textual and the compressed BSDsdecreases with a factor of 3.6 and 1.6 on average(as shown in Table 5).

3.

Stylesheets written for transforming complete BSDscan be reused without any modification for transform-ing compact BSDs. The main difference in theirexecution lies in the fact that templates matching ontile descriptions are not applied during the transfor-mation of compact BSDs.

The proposed solution for BSD creation for HD Photo hasthree disadvantages.

1.

The construction of tile descriptions during the post-processing step assumes that implicit knowledge isavailable about the bitstream order of the frequencytiles. As for the HD Photo bitstreams used in ourresearch, the frequency tiles in a particular spatial tileare always ordered as follows (default output of the HDPhoto encoder available in Microsoft’s HD Photo DevicePorting Kit 1.0): DC, LP, HP, and flexbits.

2.

Additional complexity is added to the BSD transforma-tion step. Indeed, a pre-processing step is needed foradding information to the index table, while a post-processing step is needed to construct tile descriptions.However, similar to BintoBSD, the pre-processing stepcan be executed offline for the application scenariodiscussed in Section 4.1 (i.e., the generation of acompact BSD can be completely done offline). Also,the BSD post-processing step can be pipelined with the

BSD transformation step and BSDtoBin (see further inthis section).

3.

The stylesheets implementing the pre- and post-processing step are both format-specific, respectively,preventing their integration in BintoBSD and BSDtoBin.Nonetheless, the format-specific stylesheets are stillexecuted in a format-agnostic way by the underlyingtransformation engine.

To quantify the complexity of the pipelined execution ofBSD transformation, post-processing, and binarization, aprogram was written on top of the Java API for XMLProcessing (JAXP; [24]), which includes an SAX packagethat enables the chain-linking of several transformerobjects into an SAX filter pipeline [25]. This program actsas a wrapper for Joost, responsible for BSD transformationand post-processing, and BSDtoBin, allowing to minimizethe communication overhead in the content adaptationchain (see [16,26]).

Using our wrapper, BSD transformation, post-proces-sing, and binarization can be done in less than 1.5 s for theBreeze test image containing tiles with a size of 4� 4macroblocks, consuming less than 2.8 MB of systemmemory (see Table 6 for a comparison with the approachused in Section 4.3, using a separate BSD transformationand binarization step to process complete BSDs). Theexecution time of 1.5 s does not include the timenecessary for initializing the SAX pipeline, as this stepcan be done offline (the SAX pipeline for HD Photo can beinitialized in 1.5 s).

5.2. Enhanced BSD-based adaptation for JPEG 2000

5.2.1. Compact BSD creation

As outlined in Section 4, the file size overhead oftextual BSDs is significant for JPEG 2000. This observationtriggered the question whether compact BSDs can also bemeaningfully used in the context of JPEG 2000.

Similar to HD Photo, the use of compact BSDs in JPEG2000 can be enabled by introducing a pre- and post-processing step in the BSD-based content adaptationchain. The pre-processing step is responsible for removinginformation in the tile and packet headers that is notnecessary for the purpose of BSD transformation (asshown in Fig. 10), while the post-processing step regen-erates the information removed during the pre-processingstep, eventually updating the regenerated information in

Page 12: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

Fig. 10. Tile and packet filtering for JPEG 2000.

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 463

order to reflect the changes applied during the BSDtransformation step.

5.2.2. Qualitative and quantitative evaluation

Throughout a major part of the content adaptationchain, compact BSDs can be used with a significantlysmaller file size. For the Breeze test bitstreams used, thefile size overhead of the textual and compressed BSDs,respectively, decreases with a factor of 4.0 and 1.4 onaverage (as shown in Table 5). Table 4 summarizes thetime and memory needed for creating compact BSDs byBintoBSD and the pre-processing step.

Using our Java-based wrapper, we could alsoobserve that, for the Breeze test image containing tileswith a size of 12� 12 macroblocks, BSD transformation,post-processing, and binarization can be done in less than1.4 s, consuming less than 2.2 MB of system memory(see Table 6). Note that the execution time does notinclude the time necessary for initializing the SAX pipe-line, as this step can be done offline (the SAX pipeline forJPEG 2000 can be initialized in 1.8 s). Also, the execution

time does not include the time needed for reordering thepackets in a JPEG 2000 bitstream, which is furtherdiscussed in Section 5.3.

A disadvantage of the use of compact BSDs in theBSD-based content adaptation chain for JPEG 2000 is theadded complexity, due to the need for a pre- and post-processing step. However, this increase in complexity doesnot weigh up with the reduction of the overall computa-tional complexity of the BSD-based content adaptationchain for JPEG 2000 (as shown in Table 6). Further, similarto HD Photo, the pre- and post-processing steps for JPEG2000 are also format-specific.

5.3. Bitstream reordering in JPEG 2000

Initially, the application proposed in Section 4.1 reliedon bitstream reordering to facilitate the exploitationof both spatial and quality scalability in JPEG 2000bitstreams, for reasons explained in Section 4.2. However,as discussed in Section 4, bitstream reordering is an

Page 13: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467464

operation that is computationally expensive, at the levelof the STX stylesheet (due to the use of a multi-passalgorithm for reordering the packets of each spatial tile,where the number of passes depends on the number ofquality layers), as well as at the level of BSDtoBin (due to ahigh number of out-of-order byte-copy operations, requir-ing a significant amount of processor-bound I/O).

At the level of the BSD transformation step, twosolutions are possible to avoid the use of a multi-passalgorithm. First, an upper bound for the number of qualitylayers can be hard-coded in the STX stylesheet. Second, in-place exploitation of spatial scalability can be used for aJPEG 2000 bitstream having the progression by perceptualquality order, requiring the (straightforward) computationof the scalability coordinates for each packet. The firstmethod has the disadvantage that it limits the number ofquality layers in a JPEG 2000 bitstream. The second methodhas the disadvantage that only one progression order can beused (i.e., progression by perceptual quality). Further, as forthe first method, BSDtoBin still has to cope with a stream ofpackets that are out-of-order, resulting in inefficient bit-stream reordering, while in the second method, BSDtoBinonly has to deal with a stream of in-order packets.

To summarize, for the application scenario presentedin Section 4.1, in-place exploitation of spatial scalability isthe most efficient way to exploit both quality and spatialscalability in a single JPEG 2000 bitstream, avoiding theneed for bitstream reordering. As an example, for theBreeze test image containing tiles with a size of 12� 12macroblocks, in-place exploitation of spatial scalabilitycan be realized in less than 1.3 s, consuming less than1.8 MB of system memory, when making use of compactBSDs and our Java-based wrapper for Joost and BSDtoBin.On the other hand, using the same settings, only applyingbitstream reordering requires an execution time of 10.0 sand a system memory consumption of 2.3 MB (relying ona STX stylesheet that contains an upper bound for thenumber of quality layers).

6. Directions for future work

Future research might investigate the use of XPathvariables in the context of HD Photo. XPath variables wereintroduced as an optional feature in BSDL during a recentrevision of the language by MPEG. These variables mayallow for a more efficient handling of HD Photo’s indextable as BintoBSD can then rely on an internal data modelthat is similar to the data model used by BFlavor. However,at the time of writing, support for XPath variables was notintegrated (yet) in the BSDL reference software by itsproponents. Also, the use of XPath variables would onlysolve the problem of the slow creation of BSDs for HDPhoto—issues related to the file size overhead of thetextual BSDs would remain.

Future research may also investigate the usefulness ofthe proposed BSD pre- and post-processing steps in thecontext of other coding formats. Finally, future researchmay also consider application scenarios requiring higherresolution image content, such as a scenario involvinggeospatial content.

7. Conclusions

This paper discussed the integration of HD Photo andJPEG 2000 in a format-independent adaptation systemusing MPEG-B BSDL and STX. As BSDL aims at supportinghigh-level, editing-style adaptation operations in the com-pressed domain, only a subset of the adaptivity provisionsin HD Photo and JPEG 2000 could be exploited, in particularquality and spatial scalability, as well as interactive and tile-aligned ROI extraction. For JPEG 2000, BSDL also allowsexploiting color component scalability and changing thebitstream progression order. However, low-level operationsin the compressed domain, such as rotating an image,cannot be supported by BSDL, due to a lack of support forentropy decoding and data reordering at the level ofmacroblocks (HD Photo) and codeblocks (JPEG 2000).

Our research also investigated the loss in codingefficiency when providing adaptivity in the compresseddomain, as well as the complexity of the adaptation systemused. In general, adaptivity in HD Photo and JPEG 2000, andin particular interactive ROI extraction, can be providedwith a feasible loss in coding efficiency by choosing anappropriate tile size during encoding: for the test imagesused, a tile size of 4� 4 and 12� 12 macroblocks was,respectively, appropriate for HD Photo and JPEG 2000,implying a loss in coding efficiency that is less than 4%.

An initial analysis of the complexity of BSD-basedadaptation for HD Photo and JPEG 2000 resulted in theidentification of three bottlenecks: the slow creation ofBSDs for HD Photo, the file size overhead of the BSDs forHD Photo and in particular JPEG 2000, and the computa-tionally intensive nature of bitstream reordering for JPEG2000, used for the combined exploitation of spatial andquality scalability in a single JPEG 2000 bitstream. To solvethe practically unacceptable problem of slow BSD creationfor HD Photo by BintoBSD, we have proposed the use ofcompact BSDs, which contain minimal information for thepurpose of content adaptation. The use of compact BSDs isenabled by an additional lightweight pre- and post-processing step in the overall content adaptation chain.

As a side effect, the use of compact BSDs alsosignificantly reduced the file size overhead of the textualXML descriptions for HD Photo and JPEG 2000. Inparticular, thanks to the use of compact BSDs, we haveobserved that the file size overhead of textual BSDsdecreases with a factor of 3.8 on average, while theoverhead of the compressed BSDs decreases with a factorof 1.5 on average. As for the computationally intensivenature of bitstream reordering in JPEG 2000, except forthe use of faster hardware, no solution could be readilyoffered in this paper, due to the need for a significantnumber of byte-copy operations. Therefore, we recom-mend to avoid the use of bitstream ordering by relying onin-place exploitation of combined scalability.

Finally, this paper also briefly discussed and evaluated anSAX-based content adaptation chain for HD Photo and JPEG2000, allowing a pipelined and on-the-fly execution of theBSD transformation and binarization steps. To summarize, bymaking use of compact BSDs and an SAX-based contentadaptation chain, all adaptivity tools in HD Photo and JPEG2000 could be exploited in less than 1.5 s, consuming less

Page 14: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 465

than 2.8 MB of system memory. It should be noted that theresults obtained in this paper are obviously dependent on thehardware and software platform used, and that these resultsmight be the subject of further optimization. Moreover, the

Fig. 11. Excerpt from the BS Schema for the HD Photo coding format, formally d

structures in BSDL (context management attributes omitted for simplicity).

overall complexity of the adaptation system used can befurther reduced by relaxing the trade-off between the tile sizegranularity, the coded bitstream overhead, and the overheadof the textual and compressed BSDs.

escribing the image header, image plane header, and index table syntax

Page 15: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467466

Acknowledgments

The research activities that have been described in thispaper were funded by Ghent University (UGent), theInformation and Communications University (ICU –merged with the Korea Advanced Institute of Scienceand Technology (KAIST) as of March 2009), the Inter-disciplinary Institute for Broadband Technology (IBBT),

Fig. 12. Excerpt from a BSD for a particular HD Photo bitstream, describing ins

structures. Two tile instances are also described.

and the Brain Korea 21 (BK21) program of the KoreanMinistry of Education, Science and Technology.

Appendix A. BS Schema and BSD for HD Photo

In this appendix, part of the BS Schema for HD Photo islisted in Fig. 11, while a BSD excerpt can be found in Fig. 12.

tances of the image header, image plane header, and index table syntax

Page 16: Improved BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)

ARTICLE IN PRESS

W. De Neve et al. / Signal Processing: Image Communication 24 (2009) 452–467 467

References

[1] J. Magalhaes, F. Pereira, Using MPEG standards for multimediacustomization, Signal Processing: Image Communication 19 (5)(2004) 437–456.

[2] A. Vetro, C. Christopoulos, T. Ebrahimi, Universal multimedia access,IEEE Signal Processing Magazine 20 (2) (2003) 16.

[3] A. Vetro, C. Timmerer, Digital item adaptation: overview ofstandardization and research activities, IEEE Transactions onMultimedia 7 (3) (2005) 418–426.

[4] ISO/IEC JTC 1, Information technology—JPEG 2000 image codingsystem: core coding system.

[5] ISO/IEC JTC 1, Information technology—MPEG systems technologies– Part 5: Bitstream Syntax Description Language (BSDL), ISO/IEC23001-5:2008, 2008.

[6] G. Panis, A. Hutter, J. Heuer, H. Hellwagner, H. Kosch, C. Timmerer, S.Devillers, M. Amielh, Bitstream syntax description: a tool formultimedia resource adaptation within MPEG-21, Signal Proces-sing: Image Communication 18 (8) (2003) 721–747.

[7] A. Skodras, C. Christopoulos, T. Ebrahimi, The JPEG 2000 still imagecompression standard, IEEE Signal Processing Magazine 18 (5)(2001) 36–58.

[8] D.S. Taubman, M.W. Marcellin, JPEG2000: Image CompressionFundamentals, Standards, and Practice, Kluwer Academic Publish-ers, US-MA, USA, 2001.

[9] S. Srinivasan, C. Tu, S.L. Regunathan, G.J. Sullivan, HD Photo: a newimage coding technology for digital photography, in: Proceedings ofSPIE, vol. 6696, San Diego, US-CA, USA, 2007.

[10] M.W. Marcellin, M.J. Gormish, A. Bilgin, M.P. Boliek, An overview ofJPEG-2000, in: Proceedings of IEEE Data Compression Conference,Snowbird, Utah, USA, 2000, pp. 523–541.

[11] M. Amielh, S. Devillers, Bitstream Syntax Description Language:application of XML-schema to multimedia content adaptation, in:WWW2002: The 11th International World Wide Web Conference,Honolulu, Hawaii, USA, 2002, Available on hhttp://www2002.org/CDROM/alternate/334/i.

[12] J. Clark, XSL Transformations (XSLT) version 1.0, W3C Recommen-dation, W3C, November 1999.

[13] Petr Cimprich et al., Streaming Transformations for XML (STX),Technical Report, 2004, available on hhttp://stx.sourceforge.net/documents/spec-stx-20040701.htmli.

[14] ISO/IEC JTC 1, Information technology—multimedia framework(MPEG-21)—Part 7: digital item adaptation, ISO/IEC 21000-7:2004, 2004.

[15] J. Thomas-Kerr, I. Burnett, C. Ritz, Format-independent rich mediadelivery using the bitstream binding language, IEEE Transactions onMultimedia 10 (3) (2008) 514–522.

[16] S. Devillers, C. Timmerer, J. Heuer, H. Hellwagner, Bitstream syntaxdescription-based adaptation in streaming and constrained envir-onments, IEEE Transactions on Multimedia 7 (3) (2005) 463–470.

[17] H. Li, Y. Wang, C. Chen, An attention-information-based spatialadaptation framework for browsing videos via mobile devices,EURASIP Journal on Advances in Signal Processing 2007 (20) (2007)1–12.

[18] Y. Hu, L.-T. Chia, D. Rajan, JPEG2000 Image Adaptation for MPEG-21Digital Items, Lecture Notes in Computer Science—Advances inMultimedia Information Processing—PCM 2004, vol. 3331, 2004,pp. 470–477.

[19] D. De Schrijver, W. De Neve, K. De Wolf, R. De Sutter, R. Van deWalle, An optimized MPEG-21 BSDL framework for the adaptationof scalable bitstreams, Journal of Visual Communication & ImageRepresentation 18 (3) (2007) 217–239.

[20] Kakadu Software, available on hhttp://www.kakadusoftware.com/i.[21] HD Photo Device Porting Kit 1.0, available on hhttp://www.

microsoft.com/whdc/xps/hdphotodpk.mspxi.[22] J. Clark, S. DeRose, XML path language (Version 1.0), W3C

recommendation, W3C, November 1999.[23] W. De Neve, D. Van Deursen, D. De Schrijver, S. Lerouge, K. De Wolf,

R. Van de Walle, BFlavor: a harmonized approach to media resourceadaptation, inspired by MPEG-21 BSDL and XFlavor, Signal Proces-sing: Image Communication 21 (10) (2006) 862–889.

[24] JAXP Reference Implementation Project, available on hhttp://jaxp.dev.java.net/i.

[25] O. Becker, Extended SAX filter processing with STX, available onhhttp://www2.informatik.hu-berlin.de/�obecker/Docs/EML2003/script.html/i.

[26] D. Van Deursen, S. De Bruyne, W. Van Lancker, W. De Neve, D. DeSchrijver, H. Hellwagner, R. Van de Walle, MuMiVA: a multimediadelivery platform using format-agnostic, XML-driven contentadaptation, in: P. Kellenberger (Ed.), Proceedings of the 9thInternational Symposium on Multimedia, IEEE Computer Society,Los Alamitos, CA, USA, 2007, pp. 131–138.