how to avoid plagiarism

Real-time reference A-line subtractionand saturation artifact removal usinggraphics processing unit for high-frame-rate Fourier-domain optical coherencetomography video imaging

Yong HuangJin U. Kang

Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/12/2012 Terms of Use: http://spiedl.org/terms

Real-time reference A-line subtraction and saturationartifact removal using graphics processing unit forhigh-frame-rate Fourier-domain optical coherencetomography video imaging

Yong HuangJin U. KangThe Johns Hopkins UniversityDepartment of Electrical and Computer

EngineeringBarton Hall 4023400 North Charles StreetBaltimore, Maryland 21210E-mail: [email protected]

Abstract. Variations in the spectral shape and the amplitude of the opticalcoherence tomography (OCT) signal and reference cause fixed-patternnoise and light reflected from a highly specular surface might causesaturation artifacts. In real-time video-rate OCT imaging, these effectsmake the OCT video image appear unstable and difficult to view. To elim-inate these problems, we implemented real-time reference A-line subtrac-tion and saturation detection and correction on standard Fourier-domainoptical coherence tomography (FD-OCT) video imaging frame-by-frame.This real-time OCT data processing method eliminates the need for thephysical reference measurement procedure and automatically detectsand corrects saturated A-scans if there is any within one frame. This tech-nique is also robust to the reference and signal amplitude variations,and provides higher signal-to-noise ratio compared to the normal fixed-reference subtraction method. To implement an effective interventionalOCT imaging system, the technique was integrated along with othergraphics processing unit-based OCT processing techniques [resampling,dispersion compensation, fast Fourier transform, log-scaling, and soft-thresholding]. The real-time fixed-pattern artifact-free FD-OCT imagingwas achieved at 70 frames∕s for a frame size of 1000 (lateral) by 1024(axial) pixels. The theoretical maximum processing and rendering ratewas measured to be 266; 000 A-scans∕s. © 2012 Society of Photo-Optical Instru-mentation Engineers (SPIE). [DOI: 10.1117/1.OE.51.7.073203]

Subject terms: optical coherence tomography; interventional imaging; referencefixed pattern artifact removal; saturation A-scan correction; graphics processingunit.

Paper 120383 received Mar. 13, 2012; revised manuscript received Apr. 19, 2012;accepted for publication May 21, 2012; published online Jul. 6, 2012.

1 IntroductionFourier-domain optical coherence tomography (FD-OCT) isa high-speed, high-resolution, 3-D, imaging modality widelyused in biomedical imaging. Real-time image processing anddisplay are required for optical coherence tomography(OCT) to find applications in the interventional imagingarea. The presence of the fixed-pattern-noise artifact thatforms strong erroneous horizontal lines laying over theimage is among the most common noises of FD-OCT.1

The fixed-pattern-noise artifact can be removed if the refer-ence spectrum of that imaging frame is known. Therefore, inthe case of high-resolution OCT imaging of a fixed site, sim-ple subtraction of the reference spectrum from the OCTsignal spectra works very effectively. However, the sourcespectrum shape varies over time; the OCT signal level andthe spectra vary frame-by-frame; the optical phase and polar-ization can also vary frame-to-frame. Therefore, an effectivesignal processing method that can remove the DC levelsdue to these changes in real-time from the image data andthus remove the fixed pattern noise is of great importancein improving the quality of OCT images.2,3 However,

periodically measuring and verifying the reference spectrumfor OCT video imaging is highly inconvenient and imprac-tical. Saturation artifacts occur when light reflected backfrom a highly specular surface generating signals that areover the dynamic range of the data acquisition system.4 It isnot uncommon to see OCT images that are corrupted bysaturations artifacts, for example, in cornea imaging,5,6 intra-coronary imaging,4 and finger pad imaging.7 Real-timeremoval of the saturation artifacts will increase the diagnosticand interventional accuracy. Due to the tremendous parallelcomputing power of graphics processing unit (GPU)-basedOCT, processing techniques have recently been shown to behighly effective in accelerating a wide range of OCT signalprocessing methods such as fast Fourier transform (FFT),resampling, discrete Fourier transform, dispersion compen-sation, 3-D rendering in high speed, and high quality real-time FD-OCT imaging.8–13 In this work, we developed anoptimized algorithm for GPU based on the minimum var-iance mean-line subtraction (MVMS) method which wasfirst proposed by Moon et al.,2 to remove the referencefixed pattern artifact. To improve upon and to address a com-monly encountered problem with a large fluctuation of thesignal intensity, we developed and incorporated an algorithmto also perform automatic detection of the saturated scan0091-3286/2012/$25.00 © 2012 SPIE

Optical Engineering 073203-1 July 2012/Vol. 51(7)

Optical Engineering 51(7), 073203 (July 2012)


http://dx.doi.org/10.1117/1.OE.51.7.073203

http://dx.doi.org/10.1117/1.OE.51.7.073203

http://dx.doi.org/10.1117/1.OE.51.7.073203

http://dx.doi.org/10.1117/1.OE.51.7.073203

http://dx.doi.org/10.1117/1.OE.51.7.073203

http://dx.doi.org/10.1117/1.OE.51.7.073203

lines and correction using linear interpolation of adjacentnormal A-scan lines. We optimized both methods for theGPU architecture and successfully integrated them withexisting GPU-based OCT signal processing to demonstratereal-time fixed-pattern-noise-free and automatic saturationartifact correction FD-OCT video imaging at 70 frames∕sfor a frame size of 1000 ðlateralÞ × 1024 ðaxialÞ pixels.The theoretical maximum processing and rendering ratewas calculated to be 266; 000 A-scans∕s.

2 MethodsThe proposed method was demonstrated using an in-housedeveloped spectral-domain OCT. The system configurationis shown in Fig. 1. The customized spectrometer was builtusing a 12-bit, 70 kHz, 2048 pixel, CCD line-scan (EM4,e2v, USA) as the detector. A superluminescent (SLED)light source with an output power of 10 mWand an effectivebandwidth of 105 nm centered at 845 nm which gave thetheoretical axial resolution of 3.0 μm in air was used forthe experiment. A quad-core @ 2.4 GHz Dell PrecisionT7500 workstation was used to host a frame grabber (NIPCIe-1429, PCIE-x4 interface), and an NVIDIA GeForceGTX 590 GPU (PCIE-x16 interface, 32 stream multipro-cessors, 1024 cores at 1.21 GHz, 3 GB graphics memory).Residual dispersion of the system caused by the components,L1, L2, L3, SL, and the optical fibers was compensatednumerically through a GPU process described in Ref. 12.All the data acquisition, image processing, and renderingwere performed on this multi-thread, CPU-GPU heteroge-neous computing system. A customized user interface wasdesigned and programmed through C++ (Microsoft VisualStudio, 2008). Computer unified device architecture (CUDA)4.0 from NVIDIA (Santa Clara, California) was used toprogram the GPU for the general purpose computation.14

Figure 2 shows the data processing flowchart of the OCTsystem. The acquired data (16-bit unsigned integer format)from the frame grabber was stored in a host pinned or page-locked memory buffer before the GPU processing. Thiseliminated the need for having to copy the data to the GPUglobal memory in mapped memory method supported byCUDA 4.0. The GPU processing included cubic resampling,saturation detection, dispersion compensation, FFT, MVMS,log scaling, soft thresholding and saturation correction, ifnecessary, to form the final OCT image data. The image datawere then copied back to the host memory for the displayand for saving. Details of the implementation of the signal

processing other than MVMS, saturation detection andcorrection can be found in Refs. 8–13.

In the MVMS scheme, a segment of the minimum var-iance is selected for its mean value μΩðzÞ to be assignedas the A-line value of that axial position, z.

μΩðzÞ ¼1

L

Xl∈Ω

glðzÞ; (1)

where, Ω represents a segment of the A-line data, and L isthe number of data points of Ω. glðzÞ is a complex A-linedata point obtained after Fourier transform of the cubicspline interpolated spectrum. The segmental variance, vΩ isdefined as

vΩ ¼ 1

L

Xl∈Ω

ðRefglg − RefμΩgÞ2

þ 1

L

Xl∈Ω

ðImfglg − ImfμΩgÞ2: (2)

The mean of the segment, whose variance is the minimumin a horizontal line, is assigned to the reference A-line for anaxial position of z.2

To implement this method in real-time on a GPU archi-tecture, a customized data struct composed by three floatnumbers which are the mean values of the real part, the meanvalue of the imaginary part, and the variance, respectively,were used. The reference A-line is represented by the datastruct unit with a length of half the number of data pointsin one A-scan spectrum. After a reference A-line wasobtained, it was subtracted from the image in the later logscaling and soft thresholding step. To make Eqs. (1), (2)more suitable for the GPU thread execution, a recursionmethod was used to minimize the thread global memory readoperation, shown in Eqs. (3), (4):

μlþ1ðzÞ ¼ μlðzÞ þ1

lþ 1

�glðzÞ − μlðzÞ

�(3)

Fig. 1 System configuration, C: 50–50 broadband fiber coupler;L1, L2, L3: achromatic lenses; M: mirror; GVS: galvanometer pairs;PC: polarization controller; SL: scanning lens.

Fig. 2 Data processing flow chart for standard FD-OCT, Dashedarrows: thread triggering; solid arrows: data stream. Here, the entireGPU memory buffers were allocated on the global memory. Thread 1controls the data acquisition, thread 2 controls the data processingthat is all done on GPU, and thread 3 controls the image display.Dashed squares indicate the task of each thread.


Huang and Kang: Real-time reference A-line subtraction and saturation artifact removal : : :


vðzÞ2lþ1 ¼ vðzÞ2lþ 1

lþ 1

�l

lþ 1

�glþ1ðzÞ − μlðzÞ

�2

− vðzÞ2l�.

(4)

Increasing index l one-by-one will lead to a final varianceand mean value of that segmentation. The recursion methoddecreases the memory read operation by half. The algorithmwas further optimized through block and grid design of theGPU kernel launch and the optimization of the segmentationnumbers for certain lateral A-scan numbers in each frame.For a frame with 1000 lateral pixel size, a block size of128 threads and a grid size of 128 were found to be optimal.This way, each stream multiprocessor has eight blocksrunning simultaneously. The segmentation number of 10was found to be optimum, and the result was based on pre-vious kernel function launch configuration of 5, 10, and 20segmentations.

The saturation detection process was combined with thecubic spline interpolation to make the signal processingseamless. Since the spectrometer’s dynamic range is 12-bit,the outputs are expressed in values between 0 through 4095.Therefore, an unsaturated spectrum should have all thevalues less than 4095. While each A-line spectrum point isbeing calculated, the number of points within that A-scanthat have values equal to 4095 are updated. If the numberof points in that A-scan contains more than two points thatare saturated, that A-line spectrum is marked as saturatedA-line. This is done by changing the corresponding statusflag to false in an array that records the status of eachA-line. The saturation correction was performed in the spatialimage domain after log rescaling and soft thresholding.The kernel function checks the status of each A-line of theprocessed frame. If a certain A-line is marked as false, thenthe program searches the most adjacent upper and bottomA-lines that are marked as true. Then that A-line was correctedby linear interpolation of the upper and bottom A-lines.

3 Results and DiscussionCUDA profiler 4.0 from CUDA Toolkit 4.014 was used toanalyze the time cost of each kernel function of our GPUprogram. We launched five sessions of our application; eachsession ran the application five times. All the following timecosts are based on the average measured value with standarddeviation marked.

The GPU processing time for MVMS based on Eqs. (1),(2) was 440� 0.6 μs, and Eqs. (3) and (4) was 400� 0.6 μsfor image frame size of 1024 × 1000. This simple compar-ison shows that the proposed recursive MVMS processingmethod improves the processing speed as we expected,though it is only 10% time cost reduction to verify thespeedup compared to a CPU-based recursion method. Thesame MVMS processing method based on Eqs. (3), (4) wasimplemented on the same workstation where the GPU ishosted using C++ (Microsoft Visual Studio, 2008) and theCPU is the only computing source. The artificial generatedframe of complex data is of size of 1024 × 1000. The CPU-based MVMS was performed with 10 segmentations with100 points within each. Running the same procedure 1000times, the averaged time cost was measured to be 55.22�0.54 ms. This shows that GPU-based MVMS method

provided 138 times the speed enhancement compared to aCPU-based MVMS method.

The saturation detection step added a very little computa-tion burden to the cubic spline interpolation processing. Wemeasured the time cost of the cubic spline interpolation andcubic spline interpolation combining saturation detection.The additional time cost for the saturation detection was90� 0.15 μs. The saturation correction step costs 36�0.07 μs. Note that this time cost may vary depending onhow many saturated A-lines there are within one frame. Thetotal time cost for the saturation corrected imaging is only126� 0.22 μs.

The time cost for all the system GPU kernel functions fora single image frame size of 1024 × 1000, corresponding tospectrum data size of 2048 × 1000, is presented in the Fig. 3pie chart. As seen from Fig. 3, cubic interpolation of thespectrum data was the main GPU computational consumer.Optimization of the cubic interpolation was obtained.12 Byusing the pinned memory, the explicit data transfer fromhost to GPU is negligible. Note that different implementationand optimization might cause various cost time distribution.For our system, MVMS takes 10.5% of the total GPU com-puting time, saturation detection and correction takes 2.4%and 1%, respectively. The total GPU time is 3.76� 0.01 ms,which correspond to the maximum imaging speed of266,000 A-lines∕s, which is well above the camera imagingspeed of 70,000 A-lines∕s and provides enough room forfurther GPU accelerated functionality. Our software proces-sing system could benefit significantly from a wavelengthswept-source OCT operating with a k-clock and otherFourier Domain OCT systems that are based on k-sampledspectrometer. If cubic interpolation can be saved, the systemcould potentially run at peak processing speed of 714,000A-lines∕s.

To demonstrate the effectiveness of MVMS method, weimaged a layered polymer phantom and a human finger andanalyzed the result. The result is shown in Fig. 4. Figure 4(a)is an image without the reference subtraction. One canclearly see the strong horizontal line artifacts running acrossthe image. Figure 4(b) is an image using MVMS processing;Fig. 4(c) is an image obtained with the physical referencesubtraction. The physical reference was measured beforethe image acquisition by blocking the sample arm and wassubtracted from the measured spectrum during the imageacquisition process. We can clearly see that MVMS proces-sing removes the fixed pattern artifact cleanly and performsbetter than physical reference subtraction in terms of image

Fig. 3 Pie chart of time cost of all system GPU kernel functions.




visibility. Further analysis of the images shows that theimage in Fig. 4(b) exhibits 2 dB signal-to-noise ratio (SNR)improvements over the image in Fig. 4(c). The reason for thisimprovement is that MVMS performed in real-time is moreeffective in removing the effect of time varying referencespectrum and other DC component during the image acqui-sition process. Since wavelength dependent absorption andscattering from the imaging sample changes the overall spec-trum, physical reference spectrum measured when there isno sample at the sample arm doesn’t truly reflect the effec-tive reference spectrum that should be subtracted. We alsoobtained video sequences of an in vivo human finger tipcaptured at 70 Hz with and without MVMS processing andthey are shown in Fig. 4(d)–4(k), respectively. One canclearly see that fixed-pattern-artifact was effectively removedin Fig. 4(h)–4(k).

To further demonstrate the effectiveness of the saturationdetection and correction, we imaged a layered polymer phan-tom again. The layer polymer phantom was chose due to itsabundance of highly specular reflections along the scanningsurface. We also increased the integration time of our systemto make the saturation occur more frequently. The resultis shown in Fig. 5. The images are cropped to the first 100pixels in axial direction to clearly show the saturation arti-facts. We scanned the same area of interest with and withoutthe saturation detection and correction. Figure 5(a) is thesaved image frame without the saturation detection and cor-rection. The saturation areas are marked with arrows on the

image. Figure 5(b) is the same image with the saturationdetection and correction. We can see clearly that the satura-tion artifacts have been detected and removed. Here, thesaturation correction was performed with a linear interpola-tion. Switching to cubic interpolation will increase the inter-polation accuracy but with more time cost. Nevertheless, thecubic interpolation based saturation correction is currentlybeing developed in our laboratory. Note that this methodis limited to when saturated artifacts are sparse. Whenthere are large areas of continuous saturation, the interpola-tion would cause errors. The system integration time andreference level needs to be adjusted dynamically instead.

4 ConclusionIn conclusion, we implemented real-time reference A-linesubtraction, and saturation detection and correction techni-que for high-frame rate FD-OCT video imaging on a GPUarchitecture to remove the fixed pattern noise and saturatedartifacts. This method is shown to be highly effective inremoving the fixed pattern noise and saturated artifacts andcan be easily integrated into GPU-based, real-time FD-OCTsignal processing. Our system was configured to operate at70 fps with a frame size of 1024 × 1000 and significantlyimproved the quality of the OCT video. The system is cap-able of operating up to 266; 000 A-lines∕s. The system isbeing developed for interventional imaging of microsurgicalprocedures.

AcknowledgmentsThis work was supported in part by NIH grants1R01EY021540-01A1 and R21 1R21NS063131-01A1.Yong Huang was partially supported by the ChinaScholarship Council (CSC).

References

1. R. A. Leitgeb and M. Wojtkowski, “Complex and coherent noise freeFourier domain optical coherence tomography,” in Optical CoherenceTomography, Technology and Applications, Wolfgang Drexler andJames G. Fujimoto, Eds., pp. 177–207, Springer-Verlag, New York(2008).

2. S. Moon, S-w. Lee, and Z. Chen, “Reference spectrum extraction andfixed-pattern noise removal in optical coherence tomography,” Opt.Express 18(24), 24395–24404 (2010).

3. B. Hofer et al., “Artefact reduction for cell migration visualization usingspectral domain optical coherence tomography,” J. Biophotonics 4(5),355–367 (2011).

4. H. G. Bezerra et al., “Intracoronary optical coherence tomography:a comprehensive review,” JACC: Cardiovascular Interventions 2,1035–1046 (2009).

5. A. Sanjay et al., “Detailed visualization of the anterior segment usingFourier-domain optical coherence tomography,” Arch. Ophthalmol.126(6), 765–771 (2008).

6. F. LaRocca et al. and S. Farsiu, “Robust automatic segmentation of cor-neal layer boundaries in SDOCT images using graph theory anddynamic programming,” Biomed. Opt. Express 2(6), 1524–1538 (2011).

7. M. A. Choma, K. Hsu, and J. A. Izatt, “Swept source optical coherencetomography using an all-fiber 1300-nm ring laser source,” J. Biomed.Opt. 10(4), 044009 (2005).

8. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain opti-cal coherence tomography system using a graphics processing unit,”J. Biomed. Opt. 14(6), 060506 (2009).

9. S. Van Der Jeught, A. Bradu, and A. G. Podoleanu, “Real-time resam-pling in Fourier domain optical coherence tomography using a graphicsprocessing unit,” J. Biomed. Opt. 15(3), 030511 (2010).

10. J. Rasakanthan, K. Sugden, and P. H. Tomlins, “Processing and render-ing of Fourier domain optical coherence tomography images at a linerate over 524 kHz using a graphics processing unit,” J. Biomed. Opt.16(2), 020505 (2011).

11. K. Zhang and J. U. Kang, “Real-time 4D signal processing and visua-lization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18(11), 11772–11784 (2010).

Fig. 4 Images of layered polymer phantom: (a) without referencesubtraction; (b) real time reference A-line subtraction; (c) physicalreference spectrum subtraction and real-time in vivo finger tip videoimaging (d)–(g) with reference correction and (h)–(k) without refer-ence correction (time bar was marked on the bottom, scale bar is400 μm, white arrows indicate several obvious fixed pattern noise).

Fig. 5 Layered polymer phantom imaging (a) without saturationdetection and correction (b) with saturation detection and correction(saturated lines are pointed out by arrows).




http://dx.doi.org/10.1364/OE.18.024395

http://dx.doi.org/10.1364/OE.18.024395

http://dx.doi.org/10.1002/jbio.v4.5

http://dx.doi.org/10.1016/j.jcin.2009.06.019

http://dx.doi.org/10.1001/archopht.126.6.765

http://dx.doi.org/10.1364/BOE.2.001524

http://dx.doi.org/10.1117/1.1961474

http://dx.doi.org/10.1117/1.1961474

http://dx.doi.org/10.1117/1.3275463

http://dx.doi.org/10.1117/1.3437078

http://dx.doi.org/10.1117/1.3548153

http://dx.doi.org/10.1364/OE.18.011772

12. K. Zhang and J. U. Kang, “Real-time numerical dispersion compensa-tion using graphics processing unit for Fourier-domain optical coher-ence tomography,” Electronics Lett. 47(5), 309–310 (2011).

13. K. Zhang and J. U. Kang, “Real-time intraoperative 4D full-rangeFD-OCT based on the dual graphics processing units architecture formicrosurgery guidance,” Biomed. Opt. Express 2(4), 764–770 (2011).

14. NVIDIA CUDA Zone: http://www.nvidia.com/object/cuda_home_new.html (4 September 2011).

Yong Huang received his BS degree in phy-sics from Peking University, Beijing, China,in 2009. Since 2009, he is a recipient of theChinese Scholarship Council (CSC) Fellow-ship. He is currently a PhD candidate workingon intraoperative optical coherence tomogra-phy in the Department of Electrical and Com-puter Engineering, Johns Hopkins University,Baltimore, Maryland.

Jin U. Kang received his PhD degree in elec-trical engineering and optical sciences fromthe University of Central Florida, Orlando, in1996. From 1996 through 1998, he was aresearch engineer with the United StatesNaval Research Laboratory, Washington,DC. He is currently a professor and chair inthe Department of Electrical and ComputerEngineering, Johns Hopkins University,Baltimore, Maryland. His research interestsinclude fiber optic sensors and imaging

systems, novel fiber laser systems, and biophotonics.




http://dx.doi.org/10.1049/el.2011.0065

http://dx.doi.org/10.1364/BOE.2.000764

http://www.nvidia.com/object/cuda_home_new.html




how to avoid plagiarism

Documents