brianj. harker kennethj. mighell arxiv:1208.0302v1 …arxiv:1208.0302v1 [astro-ph.sr] 1 aug 2012 in...

arX

iv:1

208.

0302

v1 [

astr

o-ph

.SR

] 1

Aug

201

2In preparation, to be submitted to The Astrophysical JournalPreprint typeset using LATEX style emulateapj v. 2/16/10

A GPU-COMPUTING APPROACH TO SOLAR STOKES PROFILE INVERSION

Brian J. HarkerNational Solar Observatory, Tucson, AZ 85719

and

Kenneth J. MighellNational Optical Astronomy Observatory, Tucson, AZ 85719In preparation, to be submitted to The Astrophysical Journal

ABSTRACT

We present a new computational approach to the inversion of solar photospheric Stokes polarizationprofiles, under the Milne-Eddington model, for vector magnetography. Our code, named genesis(genetic stokes inversion strategy), employs multi-threaded parallel-processing techniques to har-ness the computing power of graphics processing units (gpus), along with algorithms designed toexploit the inherent parallelism of the Stokes inversion problem. Using a genetic algorithm (ga) engi-neered specifically for use with a gpu, we produce full-disc maps of the photospheric vector magneticfield from polarized spectral line observations recorded by the Synoptic Optical Long-term Investi-gations of the Sun (solis) Vector Spectromagnetograph (vsm) instrument. We show the advantagesof pairing a population-parallel genetic algorithm with data-parallel gpu-computing techniques, andpresent an overview of the Stokes inversion problem, including a description of our adaptation to thegpu-computing paradigm. Full-disc vector magnetograms derived by this method are shown, usingsolis/vsm data observed on 2008 March 28 at 15:45 ut.a

Subject headings: line: profiles — methods: data analysis — polarization — radiative transfer — Sun:magnetic fields — Sun: photosphere

1. INTRODUCTION

Since Hale (1908) first inferred the presence of mag-netic fields on the sun by observing the Zeeman-inducedseparation of the components of magnetically-sensitivespectral lines, the reliable determination of vector fieldsfrom spectral line observations has played a pivotal rolein diagnosing solar magnetism. The Stokes vector, Iλ =[Iλ, Qλ, Uλ, Vλ]

T , whose elements are linear combinationsof measured polarization intensities at a wavelength λ,provide a convenient way to observe and characterizethe effects of magnetic fields on the absorbing medium.Unno (1956) first described the Zeeman-induced atten-uation of the Stokes vector by magnetic fields in theframework of the radiative transfer equations neglectingmagneto-optical effects. The solution was later gener-alized to include magneto-optical (Faraday rotation) ef-fects by Rachkovsky (1962).Much information can be directly extracted from

the observed Stokes vectors themselves; Rees & Semel(1979) developed a center-of-gravity technique for esti-mating the longitudinal component of the magnetic fieldfrom Stokes I & V observations, and Ronan et al. (1987)described an integral method for recovering the longitu-dinal and transverse fields from integrated Stokes linearand circular polarization profiles. Furthermore, severalconvenient weak-field calculation “recipes” can be foundin Landi Degl’Innocenti (1994).The unique specification of the magnetic and thermo-

dynamic state of the solar photosphere solely from ob-servations of the Stokes vector is classified as an “in-

[email protected]@noao.edu

a Full-resolution versions of the images in this paper are availablein the journal- and electronic-version.

verse” problem. These types of problems are oftencomputationally complex and may be ill-conditioned.Conversely, the forward-modeling of the Stokes vectoremergent from an assumed model atmosphere is triv-ial. This asymmetry can be exploited to interpret theobserved Stokes vector within the framework of a mag-netic model atmosphere, by tuning its configurationto minimize (in a least-squares sense) the model de-viation from the observed Stokes vector. We collec-tively refer to all such solution procedures as “inversionmethods.” Auer et al. (1977) developed a now tradi-tional Stokes inversion method based on the Levenberg-Marquardt (l-m) algorithm (Levenberg 1944; Marquardt1963), which was subsequently extended and improvedupon by Skumanich & Lites (1987). This optimiza-tion approach performs a nonlinear least-squares fit ofa Milne-Eddington model to the observations, return-ing the set of atmospheric parameters that describe thepolarized spectral line. Ruiz Cobo & del Toro Iniesta(1992) extended the optimization method by creatingan inversion code called sir (stokes inversion based onresponse functions), which performs the inversion of ob-served profiles while simultaneously inferring the strati-fication of the model parameters with optical depth.More recently, artificial intelligence methods and pat-

tern recognition approaches have been gaining groundin the computational methods of spectropolarimetricanalysis. Carroll & Staude (2001) proposed a techniquebased on an artificial neural network (ann), wherebya large database of Stokes profiles (either synthetic orpre-inverted by some other means) is used to train theann to recognize the functional relationship between themodel parameters and the spectral lines in the train-ing set. Once suitably trained, the network can gener-

http://arxiv.org/abs/1208.0302v1

mailto:[email protected]

mailto:[email protected]

2

alize the relationship to other samples not explicitly in-cluded in the training set. Rees et al. (2000) developeda technique, based on the Principal Component Analy-sis (pca) formulated by Pearson (1901), to decompose aline profile into their so-called eigenprofiles. The eigen-values associated with these eigenprofiles define a pointin the model manifold, from which the associated modelparameters can be calculated by interpolation over thetraining set. This method has been subsequently usedon real observations by Socas-Navarro et al. (2001) andEydenberg et al. (2005).This paper introduces a new approach to the syn-

thesis and inversion of spectral lines, based on graph-ics processing units (gpus), which can rapidly calculatefull-disc vector magnetic fields at the photospheric level.The technique is based on the combination of a highly-parallel genetic algorithm with a computing architecturewell suited to exploit the many levels of parallelism in theStokes inversion problem. The remainder of this paperis organized as follows. Section 2 presents the obser-vational data used in this work. Sections 3 and 4 out-line our genetic algorithm optimization engine and gpu-programming techniques, respectively. Details of our im-plementation of Stokes inversion under the assumptionof a Milne-Eddington atmosphere are given in Section 5.Results from the analysis presented in Section 6. Finally,we offer some outlooks on the future implementation ofour method in the vector field data reduction pipelinefor the Vector Spectromagnetograph (vsm), the scanningspectropolarimeter instrument package in operation aspart of the Synoptic Optical Long-term Investigations ofthe Sun (solis) telescope, located at the National SolarObservatory atop Kitt Peak, AZ.

2. DATA & OBSERVATIONS

This work uses full-disc solis/vsm observations of thefour Stokes I, Q, U , and V spectra in a 3.45A bandpassencompassing Fe i multiplet #816 near 6302A (see Table1). For this work, we utilize only the absorption lineat λ0 = 6302.5017A, but note that our inversion codecalculates the full Zeeman pattern of an input transition.Laboratory wavelengths for this multiplet and for twonearby terrestrial O2 absorption features are taken fromPierce & Breckinridge (1973).

Table 1Line-formation parameters for Fe i multiplet #816.

Wavelength Transition geff χe log(gf)(A) (eV)

6301.5091 5P2–5D2 1.667 3.654 -0.7186302.5017 5P1–5D0 2.500 3.686 -1.235

Figure 1 shows the solar disc as it was observedbysolis/vsm in the Stokes I continuum (redward of6302.5A), on 2008 March 28 at 15:45 ut.Figure 2 shows sample Stokes profiles taken from a

penumbral region in NOAA 10988 (marked by the ×symbol in Figure 1), close to disc-center. The some-what low spatial resolution of solis/vsm pixels (1′′.125pixel−1) smears the individual Zeeman components ofthe Fe i 6302.5A line, preventing observation of fully re-

Figure 1. Solar disc on 2008 March 28 15:45 ut as observed bythe solis/vsm instrument in the 6302.5A continuum. The compasspoints to heliographic solar north. The inset shows NOAA 10988in more detail.

solved Zeeman lobes, even in sunspot umbrae where thesplitting is largest.

Figure 2. Observed Stokes profiles (normalized by the local con-tinuum) for a penumbral pixel (represented by the × symbol in theinset of Figure 1) belonging to NOAA 10988. solis/vsm spectraldispersion is ∆λ = 27.1 mA, and the wavelength scale is relativeto the Fe i λ6301.5 line-core wavelength. The narrow, shallow linesare terrestrial O2 absorption features, and are ignored in all anal-yses. The vertical lines denote the illumination edges of the CCD.

For the inversion of this dataset, our routines analyzeonly those pixels whose fractional polarization degree,

p =

√

Q2max + U2

max + V 2max

Ic, (1)

exceeds a threshold set by the polarimetric noise, defined

3

by

σP = Pthresh

√

σ2Q + σ2

U + σ2V

Ic(2)

where σ2QUV are the variances of the Stokes polarization

profiles in a spectrally-quiet region away from the ob-served spectral lines, Ic is the continuum intensity (de-termined in this same spectral window), and Pthresh is aproportionality factor. A value of Pthresh equal to unitywill signal the inversion code to process all pixels withpolarization degree strictly larger than the noise, whilea value of ∼ 10 typically provides good discriminationbetween active regions and surrounding quiet-sun, as ob-served by solis/vsm. While the spectral uncertaintieswill vary somewhat between scanlines and pixels, theyare typically of the order of the vsm polarimetric accu-racy (∼ 0.1% of the continuum intensity). Pixels withpolarization degree less than that given by equation 2 areinverted under the weak-field approximation; the longi-tudinal field strength is calculated from the separationbetween the I±V line-cores, while the field orientationis calculated from Stokes Q, U , and V maxima, as inAuer et al. (1977). To perform the inversion, we utilizea class of global optimization algorithms known as ge-netic algorithms.

3. A STANDARD GENETIC ALGORITHM

Genetic algorithms (gas) are a broad class of searchand optimization algorithms which exploit computa-tional analogues to the principles of evolutionary biology,first proposed by Darwin (1859). gas operate on (poten-tial) solutions which are encoded into a binary represen-tation. This requires distinction between the phenotypicexpression (the parameters of the problem themselves) ofthe genotypic representation (the encoded parameters).Genetic algorithms are receiving ever-increasing at-

tention as powerful tools for search, optimization,and pattern recognition in fields as far-ranging asengineering design (Karr & Freeman 1999), job-shopscheduling (Grefenstette 1985), and stock market anal-ysis (Mahfoud & Mani 1996); Diver & Ireland (1997),McIntosh et al. (1998), and Ramirez & Fuentes (2002)have applied them to spectral analysis of nighttime astro-nomical observations; Charbonneau (1995) created thepikaia genetic algorithm and demonstrated its utilityon several distinct problems of astrophysical importance,while Lagg et al. (2004) more recently applied the pikaiaalgorithm to the problem of diagnosing magnetic fields inthe upper chromosphere with the He i triplet at 10830A.They are quite robust, and often succeed where more tra-ditional methods fail (e.g. multi-modal optimization).The ga solution procedure relies on genetic operators

which are repeatedly applied to a “parent” population ofcandidate solutions to produce a new “offspring” popula-tion of solutions, containing better solutions than thosefound in the previous population. The quantity whichdetermines whether one solution is better than anotheris called the fitness, evaluated for each candidate solu-tion via a fitness function. Here, the fitness function isa χ2-like merit score describing model deviations fromobservations. After some number of iterations of thisprocess, the population will have converged to a smallneighborhood around candidate solutions which have the

best fitness scores. Although a comprehensive review ofgenetic algorithms and operators is well beyond the scopeof this paper, we present here a simple overview of basicga properties. An N -bit binary string is used to encodethe jth model parameter,

b(j) ≡[

b(j)0 b

(j)1 b

(j)2 · · · b

(j)N−1

]

, (3)

where b(j)i = [0, 1], for each free parameter of the model.

The convention adopted here is that bit 0 is the most-significant bit, while bit N −1 is the least-significant bit.If the jth real-valued free parameter, pj , is constrainedsuch that

uj ≤ pj ≤ vj , (4)

then each binary string can be decoded into a realfloating-point value via the transformation:

pj ≡ E−1(b(j)) = uj +

vj − uj2N − 1

[N−1∑

k=0

b(j)N−k × 2k

]

. (5)

The genetic algorithm population therefore consists ofNpop binary strings of the form:

M×N bits︷︸︸︷

1000010101001001︸︷︷︸

N bits

· · · 1010010100100101︸︷︷︸

N bits

, (6)

where M represents the number of free parameters ofthe model. The following list elucidates the notation ofAlgorithm 1, which presents a pseudocode listing of thebasic genetic algorithm functionality.

• Pt denotes the population of candidate solutionsat generation t. Each (real-valued) candidate so-lution represents a single realization of a plane-parallel, Milne-Eddington m-e model of the lineformation region. We seek the candidate whoseforward-modeled Stokes vector shows minimum de-viation from the observed profiles.

• Gt denotes the population of binary-encoded can-didate solutions at generation t.

• Ft denotes the population fitness at generation t,and is generated by application of the evaluationoperator, F , to Pt. Better candidate solutions willhave smaller fitness values.

• The operator S represents sampling without re-placement. The sampling is stochastically biasedsuch that better candidate solutions are (proba-bilistically) more frequently selected from the pop-ulation.

• The operator E represents the encoding of a can-didate solution into its binary representation. AnN -bit binary string gives a numerical resolution of∆pi = (vi − ui) /

(2N − 1

), if ui and vi are the

lower and upper bounds, respectively, of the ith

model parameter. The corresponding decoding op-eration (equation 5) is denoted by E−1.

4

• The operatorR represents pair-wise recombinationof the population. Two candidate solutions (se-lected by S) swap segments of their binary repre-sentations to create the representations of two newoffspring solutions.

• The operator M represents a probabilistic muta-tion for each member of the offspring population.Each bit on the binary string has a small prob-ability of being flipped to its complementary bit.The mutation operator used in this work was de-signed to favor local exploration by mutating less-significant bits more frequently.

Algorithm 1 Pseudocode for a standard genetic algo-rithm.

t = 0Create P0

Evaluate F0 ← F [P0]while (not termination-condition) do

t← t+ 1Select: Pt ← S[Pt−1,Ft−1]Encode: Gt ← E[Pt]Recombine: Gt ←R[Gt]Mutate: Gt ←M[Gt]Decode: Pt ← E

−1[Gt]Evaluate: Ft ← F [Pt]

end while

We use gpu-computing techniques to offload data-parallel, compute- intensive calculations with high arith-metic intensity to a massively-parallel high-speed com-pute architecture. For this work, we employ an extensionof the genetic algorithm inversion engine developed anddescribed in Harker (2009) and a single nvidia TeslaC1060 gpu. We utilize nvidia’s Compute Unified De-vice Architecture (cuda) programming interface to han-dle data transfer and computations the gpu. The nextsection gives a brief introduction to structured parallelprogramming with cuda.

4. THE CUDA PROGRAMMING MODEL

The cuda programming interface consists of a minimalset of extensions to the standard C/C++ language whichallow a properly-constructed data-parallel algorithm toexecute among the thousands of concurrently runningthreads resident on the gpu. This is accomplished bya kernel function, which controls the computations andmemory accesses to be performed at the thread granu-larity. The kernel function is callable from another user-defined function (the host function), which controls datatransfers to/from the gpu and is responsible for launch-ing the kernel execution.A traditional criticism of the application of genetic al-

gorithms to real-world problems is the fact that theytend to be slow, since they process a potentially largeset of candidate solutions. The cuda programming in-terface allows nvidia gpus to execute such data-parallelalgorithms, by organizing the computations into groupsof concurrently-executing threads called blocks, whilethese blocks are themselves organized into groups ofconcurrently- executing grids. Mirroring this thread or-ganization in the genetic algorithm population allows usto write a kernel function that dedicates each block to

the calculations for a specific member of the popula-tion, while all threads in each block are dedicated tothe per-wavelength calculations required by the spectralline synthesis and fitness function evaluation for eachpopulation member. The power of this approach is ev-ident; in a serial genetic algorithm, the total numberof wavelengths to synthesize (per generational iterationof Algorithm 1) is NλNpop, where Nλ is the number ofwavelengths spanning the spectral line, and Npop is thesize of the population. While this can only be done onewavelength at a time, one population member at a timefor the serial algorithm, using a cuda-capable gpu al-lows all NλNpop calculations to be done simultaneously,constrained only by the physical limitations of the gpuhardware itself (i.e., the maximum possible number ofconcurrently-executable threads and total onboard mem-ory).While a comprehensive review of the cuda architec-

ture is outside the scope of this paper, we refer the inter-ested reader to the cuda Programming Guide and Soft-ware Development Kit (sdk), currently available fromhttp://www.nvidia.com/getcuda.

4.1. Thread & Memory Hierarchy

The cuda programming model requires an executionconfiguration, whereby the thread distribution is and or-ganized into thread blocks, and similarly how individualthread blocks are organized into a grid. Figure 3 showsschematically how an example execution configuration isindexed, so that each thread can be uniquely identifiedin the grid.Threads from the same thread block may communicate

with each other by utilizing shared memory (see below),although one must be careful to structure the program insuch a way as to avoid memory access conflicts betweenthreads. Global memory allows threads from differentblocks to communicate with each other. Once the execu-tion configuration is defined, the cuda kernel function islaunched by invoking it with a special syntax, shown inAlgorithm 2. Global memory for the input data and out-put result is allocated via cudaMalloc, and the data iscopied from cpu memory to the allocated gpu memorywith cudaMemcpy. The kernel function is invoked withnBlocks blocks of nThreads threads, and the desired re-sults are copied back from device to host memory, againvia cudaMemcpy. Please note, however, that an actualproduction-grade kernel invocation is considerably moreinvolved than the toy example presented in Algorithm 2.

Algorithm 2 An example C-style cuda kernel invoca-tion.cudaMalloc( (void**)&dResult, sizeof(hResult) );cudaMalloc( (void**)&dData, sizeof(hData) );cudaMemcpy( dData, hData, sizeof(hData),

cudaMemcpyHostToDevice );MyKernel<<<nThreads, nBlocks>>>( dData, dResult );cudaMemcpy( dResult, hResult, sizeof(hResult),

cudaMemcpyDeviceToHost );

4.2. Memory transfers between cpu and gpu

An important principle of gpu-computing is to min-imize the amount of data that is copied between cpumemory and the global memory on the gpu. Band-width across the PCI-Express (pci-e) bus connecting

http://www.nvidia.com/getcuda

5

Figure 3. cuda thread hierarchy ad heterogeneous computingparadigm. cuda kernel functions are interleaved with the hostcode, and multiple (potentially different) kernels may be launchedfrom a host. The <<<>>> syntax is one of the cuda exten-sions to the C language; it is used to specify the size and geome-try of the execution configuration. Reproduced, with permission,from the cuda Programming Guide, courtesy of nvidia corpora-tion. (NVIDIA Corp. 2011).

cpu memory to gpu memory is much lower than theshared memory bandwidth, as can be seen in Table 2.The table shows the results of an initial bandwidth testperformed on the gpu used in this work; these figurescharacterize our single realization of hardware compo-nents, and will vary depending on the exact system com-ponents and hardware specifications. The theoreticalmaximum bandwidth for the Tesla C1060 is 102 GB s−1

(NVIDIA Corp. 2010), while we achieve ≈ 72% of thistheoretical limit in our hardware configuration.To maximize the efficiency of the data transfer to

the gpu device, we modified our genetic algorithm byrephrasing the representation from the traditional bi-nary arrays to unsigned short integer arrays. For exam-ple, consider a binary encoding of M parameters withN = 16 bits each. Instead of defining the binary geno-type string as char gene[M*N], with gene[i] = 0 or 1(see equation (6)), the genotype is defined as unsignedshort gene[M], since each unsigned short is internally

Table 2Initial bandwidth tests for the nvidia Tesla C1060 used in this

work.

Transfer Direction Bandwidth(GB s−1)

cpu-to-gpu (global) 1.4634gpu-to-cpu (global) 1.1840gpu-to-gpu (shared) 73.3165

represented by 16 bits. The former encoding techniquerequires 8 ×M × N bits of storage, while the latter re-quires onlyM ×N bits, representing a savings in storagespace (and transfer times) of a factor of 8. Furthermore,the traditional encoding technique is incredibly wasteful,since only the least-significant bit of each 8-bit char isneeded to encode a 0 or 1, meaning only 1/8 of the datatransferred to the gpu memory would actually be use-ful. In contrast, our modified binary genotype encodesthe same amount of useful information, but occupies afraction of the space in memory.Transferring the binary encoded parameters to the gpu

allows the threads to use the high-speed shared memoryto decode the unsigned short integers to the real floating-point values needed for the model synthesis on the gpu,so this is always a net gain in performance over decodingthe parameters serially and transferring the real floating-point values to the gpu.This modified binary genotype also allows our genetic

operators to use the bit-shifting and bit-masking tech-niques of Iuspa & Scaramuzzino (2001) to operate di-rectly on the internal binary representation of the un-signed short integers. These bit-manipulation techniquesyield faster genetic operators than would be used for thetraditional binary representation, which typically involveseveral nested loops for analysis of each bit in the binaryarray.We have presented the computational aspects of our

gpu code in this section, so we now turn to the imple-mentational details of our inversion and synthesis rou-tines to be run on the gpu.

5. IMPLEMENTATION OF STOKES INVERSION

The polarized radiative transfer equations (prte) de-scribe the modification of the Stokes vector of a beam asit propagates, in the direction s, through some medium.Formally, it is given here as

µdIλds

= Kλ (Iλ − Sλ) , (7)

where µ is the cosine of the heliocentric angle, and Kλ

and Sλ are the propagation matrix and source functionvector, respectively.If we model both continuum and line absorption pro-

cesses, thenKλ = 1+ η0Φλ, (8)

where η0 is the ratio of line-to-continuum absorption co-efficients, and the matrix Φλ includes absorption andmagneto-optical effects, parameterized by the magneticand thermodynamic properties of the model atmosphere.Expressions for these matrix elements may be found in(e.g.) Landi Degl’Innocenti & Landolfi (2004) and ref-erences therein. These matrix elements are functions of

6

the Voigt and Faraday-Voigt line profiles, calculated tohigh accuracy via the rational function approximationof Hui et al. (1978). This formulation is extremely well-suited to evaluation on a gpu, due to its high arithmeticintensity (ratio of math operations to memory accesses).The source function vector describes the ratio of emis-

sion to absorption in the beam, and includes both con-tinuum and line contributions, so that

Sλ=Sce+ η0SlΦλe, (9)

e=(1, 0, 0, 0)T, (10)

where Sc is the continuum source function and Sl isthe line source function. Assuming local thermodynamicequilibrium reduces both continuum and line source func-tion to the Planck function at the local temperature,Bλ(T ). Adopting a Milne-Eddington (m-e) relation forthe source function variation as a linear function of op-tical depth,

Sc = Sl = Bλ(T ) = S0 + S1τ = S0(1 + β0τ), (11)

where β0 = S1/S0 represents the inverse of the character-istic length scale over which the source function changesappreciably, the prte admits an analytical solution forthe model Stokes profiles, IMλ , given here as

IMλIc

=[

(1− β)1+ β (1+ η0Φλ)−1

]

e (12)

β=µβ0

1 + µβ0, (13)

where 1 is a 4 × 4 identity matrix and Ic denotes theobserved local continuum intensity. This m-e solutionis characterized by magnetic and thermodynamic pa-rameters assumed to be constant with depth throughthe line-formation region, here collectively representedby the model vector of free parameters, p. Since wedo not consider gradients with respect to optical depthof any parameter except the source function, the m-eatmosphere represents a kind of integrated behavor ofthe true parameters over the height of line-formation(Orozco Suarez & Del Toro Iniesta 2007). Formally, thekth candidate solution in the genetic algorithm representsa single model vector

pk ≡ [B,ψ, χ, λ0, adc,∆λD, η0, S0, β0]T , (14)

where B is the magnetic field strength, ψ is the incli-nation of the field with respect to the observer’s line ofsight, χ is the azimuthal angle of the field, λ0 is theline-center wavelength of the spectral line, adc is theatomic damping constant of the spectral line, ∆λD is theDoppler line-width, η0 is the line-to-continuum opacityratio, and S0 and β0 are the linear source function coef-ficients such that the continuum intensity is given by theEddington approximation as

Ic = S0 + µS1 = S0 (1 + β0µ) , (15)

To maintain generalizability in the inversion code, thefull Zeeman pattern of the spectral line is calculated atthe start of inversion. This is done only once, so theoverhead incurred is negligible, considering the flexibil-ity gained. The particular spectral line to be synthesizedis configurable by the user, and the code contains all the

necessary generalizations of the m-e solutions to allowit to function with arbitrary photospheric spectral lines.The fitness function to be minimized by the genetic al-gorithm is the following χ2-like merit function, whichquantifies the fit of the Stokes vector (IM ) generated bythe model pk (with ν = 4Nλ−M degrees of freedom) tothe observations (IO), given here explicitly as

χ2(pk) =1

ν

∑

i

Nλ∑

j=1

w2ij

[IOi (λj)− IMi (λj ;pk)

]2, (16)

where i = I,Q, U, V . The quantities wij are weightingfactors, traditionally used to adjust the contribution ofdifferent wavelengths to the total deviation across thespectral line, and are discussed further in Section 5.4.Here, the j index is dropped from the weights; theyare taken as constant over the spectral line, but distinctfor each of the four Stokes profiles. Using this form ofthe merit function allows the calculation of uncertainties(over the χ2 hypersurface) in the recovered model pa-rameters by straightforward techniques, once the geneticalgorithm has converged.It is an important principle of optimization to work in

the smallest possible parameter space; reducing the di-mensionality of the model vector will increase the speedof the inversion and enhance the stability of the algo-rithm, since there are fewer (potentially degenerate) pa-rameters to simultaneously determine. The next sectiondescribes some of the techniques used in our approach toreduce the dimension of the parameter space and there-fore enhance the efficiency of the genetic search.

5.1. Reduction of the model manifold

Although the line-center wavelength can be a free pa-rameter of the fit, genesis instead directly uses the ob-served line-center wavelength as measured from the coreof the Stokes I profile. After calibrating the observedwavelength scale by measuring the separation of the twoterrestrial O2 absorption lines near 6302A, a center-of-symmetry approach,

λsym ≡ argminλi

S(λi) =∑

j

|I(λi+j)− I(λi−j)| (17)

is used to determine a rough estimate of the line-centerwavelength. This estimate is refined by bracketing λsymwith a wavelength triplet and calculating the minimumof its uniquely-fit polynomial.The m-e model atmosphere specifies a source func-

tion linear in optical depth, characterized by its valueat the τ = 0 photospheric surface (S0) and its in-verse characteristic length scale (β0). Inspection of theUnno-Rachkovsky solutions reveals a simple normaliza-tion scheme that eliminates the dependence on S0, as wasadopted in Auer et al. (1977). Here, we define a modifiedStokes vector,

IMλ ← Ice− IMλ , (18)

which consists of the Stokes I line depression and StokesQ, U , and V profiles. Note we choose to leave β0 (whichinfluences the amplitude of the synthesized Stokes pro-files) as a free parameter of the fit.

7

5.2. Limited spatial resolution & scattered light

To account for limited spatial resolution, a new free pa-rameter is introduced; the magnetic fill-fraction α repre-sents the fractional pixel area occupied by the magneticfield. The remainder of the pixel area (1−α) is assumedto be field-free. The Stokes vector then becomes a linearsuperposition of the magnetic and non-magnetic profiles,weighted by α,

IMλ ← αIMλ + (1− α)Inmλ e. (19)

The non-magnetic profile is assumed to be a quiet-sun Stokes I profile from the local surroundings. Us-ing a locally-averaged quiet-sun profile would requirebreaking one of the most advantageous properties ofthe algorithm, namely that each scanline/pixel can beinverted independently of the data from neighboringscanlines/pixels. Furthermore, this approach requiresa tremendous amount of disk I/O, and can be quiteslow. To maintain a totally independent scanline inver-sion, we have taken a different approach; the center-to-limb variation (CLV) of the quiet-sun Stokes I profilehas been measured (Harvey, 2010, private communica-tion) and parameterized as a pure Voigt function char-acterized by its amplitude, full-width at half-maximum(FWHM) and atomic damping parameter. Gaussian andLorentzian components of the profile can be derived fromthe FWHM. A 3rd-order polynomial is fit (as a functionof heliocentric µ) to the CLV of each of these parameters.The resulting polynomial fits are used within the inver-sion to calculate the appropriate values of the quiet-sunparameters as a function of disc position for each pixel.The non-magnetic profile is then synthesized from theseparameters, centered on the observed Stokes I line-centerwavelength.We account for scattered light at the ∼5% level by first

correcting the measured continuum. The baseline of theobserved Stokes I profile is increased, and subsequentlyrenormalized to the corrected continuum, following Gray(2005).

5.3. Thermodynamic parameters

It is well known that there exists some level of degen-eracy between the magnetic and thermodynamic param-eters, with respect to their influence on the model pro-file line-shapes. Figure 11.1 of del Toro Iniesta (2003)shows an explicit example of this effect; it is not al-ways clear whether one combined magnetic and thermo-dynamic configuration leads to a better fit to the observa-tions than another, even if the configurations are notice-ably different (Borrero et al. 2011). The convention typ-ically adopted by inversion practitioners is that the ther-modynamic model parameters are of less importance tothe final quality of the fit than the magnetic parameters.Skumanich & Lites (1987) suggested that in order to finda robust fit, some of the thermodynamic parameters mustbe fixed prior to the inversion, and Borrero et al. (2011)investigated the effects of holding the atomic dampingat a constant value during their inversions. They foundnegligible differences between the recovered vector mag-netic fields. It is not clear, however, that this proce-dure is generally acceptable for observations with muchhigher spectral resolution than in Borrero et al. (2011).We have decided not to hold fixed any of the thermody-

namic parameters, and have implemented a “pre-fitting”initialization in which we fit a non-magnetic Stokes Iprofile to the observed Stokes I profile, using a simpleLevenberg-Marquardt algorithm. This returns values forthe atomic damping parameter, adc, Doppler width ∆λD,line-to-continuum opacity ratio, η0, and source functionparameter, β0. Assuming a non-magnetic model for a(potentially) Zeeman-broadened profile will, of course,lead to errors in the derived thermodynamic variables.However, these values are utilized only to constrain theparameter space to sensible ranges within which the gacan search.

5.4. Weighting scheme

Since the Stokes I profile will always have much largersignal strengths than the polarization profiles, deviationsbetween the observed and synthesized Stokes I profileswill dominate the χ2 value, essentially causing the algo-rithm to fit the model atmosphere solely to the StokesI intensity profile. To mitigate this effect, we equalizethe importance of all four Stokes profiles by using an“inverse-max” weighting scheme to ensure that all devi-ations contribute roughly equally to the calculated χ2.The weighting scheme is given here explicitly as:

wI =(Ic − I

obs0

)−1(20)

wQ=(max |Qobs

λ |)−1

(21)

wU =(max |Uobs

λ |)−1

(22)

wV =(max |V obs

λ |)−1

, (23)

where Iobs0 is the observed line-core intensity.

5.5. Model manifold boundaries

The genetic inversion will be most efficient in a pa-rameter space that has the smallest physically-realisticdomain for each parameter. By seeding the initial popu-lation of the genetic algorithm with some specific heuris-tic knowledge of the problem at hand, we can both ensurethat we start with at least a few high-quality solutions,and suitably restrict the domain of the searchable param-eter space. Both scenarios will accelerate the convergenceand improve the final accuracy of the solutions.Here, the initial population includes representations

of magnetic field vectors generated from the weak-fieldapproximations in Auer et al. (1977), as well as from afunctional relationship between the field geometry andintegrated measures of the Stokes polarization profilesfound in Ronan et al. (1987). The longitudinal fieldstrength estimated from the center-of-gravity approachby Rees & Semel (1979) is also seeded into the initialpopulation. In the case of full-disc inversions, where ev-ery on-disc pixel is inverted, the trivial non-magnetic so-lution is seeded into the population as well.The field inclination domain is naturally restricted to a

single polarity, ψ ∈ [0, 90]◦ or [90, 180]◦, initialized to bein agreement with the order of the blue- and red-lobes ofthe observed Stokes V profile. The seed fields are checkedto ensure that they are consistent with this polarity.Additionally, we use empirical knowledge derived

specifically from solis/vsm data to discriminate be-tween different magnetic structures observed on the disc.

8

Figure 4. Line morphology classification for NOAA 10988. Theimage depicts Stokes I line-core intensity, overlaid with red, blue,and yellow contours demarcating umbral, penumbral, and plageregions, respectively.

Spatial pixels are labeled as likely belonging to the struc-tures seen in Figure 4, based on their observed continuumand line-core intensities relative to the average quiet-sunprofile. Following Hestroffer & Magnan (1998), we ap-ply a simple correction for limb-darkening to flatten theobserved intensity profile in pixels far from disc-center,before a label is assigned. The details of this empiricalclassification scheme (and the subsequent compartmen-talization of the B-α subspace) are given in Table 3.

Table 3Active region structure discrimination parameters.

discriminator Umbra Penumbra Plage

Ic/Iqsc < 0.65 [0.65-0.90] > 0.90

I0/Iqs0

< 0.69 > 0.69 > 0.85B [G] [1000-3500] [0-2500] [0-1500]α [0.25-1] [0.25-1] [0-0.5]

5.6. Population initialization

As described in Section 3, the genetic algorithm worksby continually evolving good solutions out of a popula-tion of candidate solutions, represented by the reducedmodel vectors

pk ≡ [B,ψ, χ, adc,∆λD, η0, β0, α]T , (24)

with the number of free parameters, M = 8. Hardwarelimitations dictate the maximum size of the population;the Tesla C1060 gpu contains 30 multiprocessors, eachof which is composed of 8 stream processors. Therefore,the total number of thread blocks (candidate solutions)which can be simultaneously processed isNpop = 30×8 =240.It is traditional to initialize the genetic algorithm with

a random (but bounded) population to ensure enoughdiversity for the genetic operators to produce meaning-ful evolution. However, we instead generate the initialpopulation (of non-seeded candidate solutions) by re-peatedly sampling from a Sobol sequence generated via

the Bratley & Fox (1988) algorithm. An M -dimensionalSobol sequence is a quasi-random sequence that is maxi-mally self-avoiding; the points in the sequence tend to(roughly) evenly distribute themselves throughout theM -dimensional hypercube [0, 1]M . This property givesrobust, even coverage of the parameter space withoutplacing the initial population on a regularized grid, whichwould completely inhibit the search action of the recom-bination operator (R). In addition, this approach elimi-nates chance clustering in the initial population, therebyavoiding the processing of redundant candidate solutions.The final benefit of using quasi-random initialization liesin the efficiency of population restarts; when the popu-lation has converged to some self-monitored degree, allbut the best individual(s) are re-initialized. Generat-ing new candidates according to a quasi-random scheduleguarantees that we will be refreshing the “gene pool” inthe most efficient way, by using the self-avoidance prop-erty to automatically ignore previously-sampled regionsof the parameter space. We exploit this property to ad-dress any issues related to premature/false convergence;every Nsamp iterations of the genetic algorithm, we dis-card the worst half of the population (“dead solutions”)and reinitialize them according to the Sobol mechanismdescribed above. In practice, for a maximum numberof generations Ngen over which to evolve, we note thatNsamp ≈ Ngen/4 provides a good balance between deepgenetic search and the introduction of new genetic ma-terial. We allow the population to evolve for Ngen = 100generations.

6. RESULTS & DISCUSSION

Proceeding along each scanline and performing thegenesis inversion on the corresponding spectra for eachpixel builds a map of the model parameters over the full-disc. Figure 5 shows the magnetic field strength, inclina-tion, azimuthal angle, and fill-fraction over the full-discas inferred by the genesis inversion. The inset showsNOAA 10988.The 0◦ reference direction for the azimuthal angle is

along the horizontal axis of the image. The field has notbeen resolved of the π-ambiguity inherent in all Stokes in-version techniques, hence the antisymmetric color wheelNevertheless, the radial structure of penumbral fields iswell-determined. The fill-fraction displays the expectedbehavior of values ≈ 1 for the umbral and penumbralregions, with a decline to values ≤ 0.5 for surroundingplage regions.The statistical spread of the final population provides

a convenient means to generate initial estimates of theuncertainties associated with the fitted model parame-ters. Measuring the spread around the identified opti-mum gives population uncertainties, from which we esti-mate the gradient of the χ2 manifold:

[∇χ2

(popt

)]

j≈χ2 (popt)− χ2 (pi)

[∆p]j, (25)

where pi is selected from the best of the final popula-tion, chosen such that we avoid numerical difficulties inthe calculation of the finite differences (i.e. ratio of twovery small numbers). Using this estimate, we bootstrapthe derivatives to higher accuracy by Richardson extrap-olation to zero stepsize within Neville’s algorithm (see,

9

Figure 5. (top left): field strength, (top right): field inclination, (bottom left): field azimuth, (bottom right): fill-fraction. The verticalstripe down the center of the azimuthal angle image is an artifact of the dual Rockwell camera system in solis/vsm, and do not appear innewer Sarnoff camera data. The compass points to solar north.

e.g., Press et al. 1988). The Hessian matrix, H, is thuscalculated as the Jacobian of the resulting gradient vec-tor and the variance of the ith model parameter is givenby Sanchez Almeida (1997):

σ2pi

=[H−1

]

ii

χ2(popt)

M. (26)

This evaluation requires a (potentially) large number offitness function calls, but is only done once per pixel, sothe small overhead incurred is outweighed by the ben-efit of having an accurate error estimate for the model

parameters.Table 4 presents an estimation of the uncertainties in

the fitted model parameters determined in this man-ner. The uncertainties are broken down according to thestructure discrimination method in Section 5.5. For eachstructure, we calculate the average uncertainty, ∆i, andcorresponding standard deviation, σ∆i

. The table showsthe quantity ∆i±σ∆i

, so that ∆i+3σ∆irepresents an up-

per limit to the model parameter uncertainties in 99.73%of the pixels belonging to each active region structure.For example, 99.73% of umbral pixels have uncertainties

10

Table 4Distribution of uncertainties in Fe i λ6302.5 magnetic parameters,

broken down by structure.

Structure ∆B ± σ∆B∆γ ± σ∆γ

∆χ ± σ∆χ∆α ± σ∆α

(G) (◦) (◦) (×10−3)

Umbra 1.3±2.9 0.45±0.69 0.73±0.89 2.4±5.9Penumbra 2.4±4.5 0.62±0.80 0.73±0.91 2.5±5.4Plage 1.2±2.7 0.65±0.79 0.84±0.96 2.1±3.6

in field strength of 10 G or less. These quantities arenot meant to represent the uncertainties in any particu-lar pixel or structure, but instead characterize the overalldistribution of uncertainties derived from the inversion.It should be noted, however, that these uncertainties arederived solely from the topology of the χ2 hypersurface,and do not include systematic errors resulting from theassumptions and/or limitations of the Milne-Eddingtonmodel, which is sometimes not a suitably accurate de-scription of the real solar atmosphere.Table 5 shows a speed comparison between the serial

and gpu-accelerated versions of the genesis inversioncode, for both iron lines, averaged over 50 independentruns. The total evaluation time (∆Teval) is the time spentsynthesizing the model spectra and evaluating the var-ious contributions to equation 16. For the serial code,the evaluation time consumes a strong majority (65.7%and 78.5% for Fe i λ6302.5 and Fe i λ6301.5, respec-tively) of the total algorithm runtime, ∆Trun. Use of thegpu as a co-processor has reduced the total evaluationtime to only 7.1% and 12.1%, respectively, of the gpu-accelerated total runtime. The total runtime is quitestable for both the serial and gpu-accelerated versions ofthe code. The table shows 1σ standard deviations, whichdemonstrate that the serial and gpu- accelerated versionscan vary in their total runtimes by up to a minute. Thegpu-accelerated evaluation time further shows why thegpu chip architecture is so amenable to parallel compu-tations. The variation in the average time spent inter-acting with the gpu is less than a second; this stabilityis precisely due to the many-cored nature of the gpu,which uses a very efficient thread scheduler to keep thecores of each multiprocessor optimally busy.Overall, the gpu-accelerated Fe i λ6302.5 inversion is a

factor of 2.6 times faster than the serial version, with anincrease to a factor of 5.2 for the Fe i λ6301.5 non-normaltriplet. The total synthesis and evaluation time showsjust how powerful the cuda programming paradigm canbe, performing the computations 23.7 and 33.3 timesfaster than the serial versions of the code can manage.Since the non-normal triplet Fe i λ6301.5 has four con-tributions (from levels with equal ∆MJ) to each of thethree Zeeman components, a greater proportion of workis done on the gpu. This highlights another principle ofgpu computing; the more work you assign to the gpu,the more efficiently it can perform it.

7. CONCLUSIONS

We have described a novel computational approach tothe inference of photospheric vector magnetic fields fromobservations of the Stokes polarization profiles. Our newinversion code, named genesis, is capable of quickly pro-ducing full-disc spatial maps of the magnetic structure

of the solar photosphere observed by the solis/vsm in-strument located atop Kitt Peak at the National SolarObservatory, in a fraction of the time required by a sim-ilar serial technique. The inversion code is capable ofrecovering magnetic fields with uncertainties on the or-der of 0.5%, with errors in the field orientation of a fewdegrees. Fill-fractions are recovered with uncertaintieson the order of 2%. Currently, the code is only capableof inverting a single line at a time, though we plan toinvestigate the extension to the simultaneous inversionof both lines of the Fe i λ6302 multiplet.We have shown the technique to be amenable to the

reduction and analysis of large volumes of spectropolari-metric data. To this end, we are currently investigatingthe the assimilation of the gpu hardware and special-ized cuda-based algorithms into the solis/vsm vectorfield pipeline. Our long-term goals for this work are toprovide near real-time vector magnetic fields to the sci-entific community. Increasing the cadence of solis/vsmvector data products will also allow us to support andcomplement observations taken by the Solar DynamicsObservatory (sdo) Helioseismic and Magnetic Imager(hmi), which produces full-disc vector magnetograms at4096× 4096 resolution with a cadence of approximately12 minutes.The gpu programming paradigm is highly scalable; a

compiled cuda application can execute on any cuda-capable device, subject to hardware limitations. Thethread scheduler will automatically allocate the appro-priate number of thread blocks to the stream processors.Coupled with the Message Passing Interface (mpi) to par-allelize over scanlines (i.e. independent scanlines are in-verted independently, utilizing their own distinct gpu),this could lead to incredibly fast (i.e. near-realtime or re-altime) full-disc inversions with a modest number of cpu-gpu pairs. Improvements in gpu hardware are steadilyadvancing; the Fermi architecture is the recent successorto the Tesla architecture, offering up to 512 cuda cores,larger-capacity memory banks, and increased floating-point performance. With the next-generation Kepler ar-chitecture on the horizon, the prospects for acceleratedsolar data processing are indeed promising. Finally, asgpu computing matures, we expect to extend this ap-proach to better hardware, with a cautious eye towardspectropolarimetric analyses of data recorded by the up-coming Advanced Technology Solar Telescope (atst).The volume of data expected from this next-generationobservatory will greatly exceed that of the current gen-eration, requiring new and faster techniques to properlyhandle and reduce the observations in a timely man-ner. We feel the integration of gpu-accelerated data-reduction techniques will be key for the analysis of suchlarge datasets, and may make available important (near)real-time information on the photospheric vector mag-netic field to the space-weather forecasting community.

The authors wish to thank J. Enos and V. Kindratenkoof the National Center for Supercomputing Applica-tions (ncsa) at the University of Illinois at Urbana-Champagne for kindly providing access to the Accel-erator Cluster. We are also indebted to H. Lin (Insti-tute for Astronomy) for providing gpu hardware withwhich to work locally. The authors also wish to thank

11

Table 5Timing profiles for genesis inversions of Fe i λλ6301.5,6302.5

mode ∆Trun ∆Teval

(min) (min)

6301.5A 6302.5A 6301.5A 6302.5Aserial 187.83±0.69 84.56±0.26 147.54±0.68 55.53±0.17gpu-acc. 36.40±0.15 32.82±0.36 4.43±0.01 2.34±0.01

12

A. Pevtsov, J. Harvey, and the anonymous referee forhelpful comments on the manuscript. solis/vsm dataused here are produced cooperatively by nsf/nso andnasa/lws. Support for this work was provided by nasaGrant NNH08AH25I (A. Norton, PI).Facilities: SOLIS (VSM).

REFERENCES

Auer, L. H., House, L. L., & Heasley, J. N. 1977, Sol. Phys., 55, 47Borrero, J. M., Tomczyk, S., Kubo, M., et al. 2011, Sol. Phys.,

273, 267Bratley, P., & Fox, B. L. 1988, ACM Trans. Math. Softw., 14, 88Carroll, T. A., & Staude, J. 2001, A&A, 378, 316Charbonneau, P. 1995, ApJS, 101, 309Darwin, C. 1859, On the Origin of Species by Means of Natural

Selection (London, U.K., W. Clowes and Sons)del Toro Iniesta, J. C. 2003, Introduction to Spectropolarimetry

(Cambridge, UK: Cambridge University Press, April 2003.)Diver, D. A., & Ireland, D. G. 1997, Nuclear Instruments and

Methods in Physics Research A, 399, 414Eydenberg, M. S., Balasubramaniam, K. S., & Lopez Ariste, A.

2005, ApJ, 619, 1167Gray, D. F. 2005, The Observation And Analysis Of Stellar

Photospheres (Cambridge University Press)Grefenstette, J. J., ed. 1985, Proceedings of the 1st International

Conference on Genetic Algorithms, Pittsburgh, PA, USA, July1985 (Lawrence Erlbaum Associates)

Hale, G. E. 1908, ApJ, 28, 315Harker, B. J. 2009, PhD thesis, Utah State UniversityHestroffer, D., & Magnan, C. 1998, A&A, 333, 338Hui, A., Armstrong, B., & Wray, A. 1978, Journal of Quantitative

Spectroscopy and Radiative Transfer, 19, 509Iuspa, L., & Scaramuzzino, F. 2001, Soft Computing - A Fusion

of Foundations, Methodologies and Applications, 5, 58,10.1007/s005000000066

Karr, C. L., & Freeman, L. M. 1999, Industrial Applications ofGenetic Algorithms, CRC Press International Series OnComputational Intelligence (CRC Press)

Lagg, A., Woch, J., Krupp, N., & Solanki, S. K. 2004, A&A, 414,1109

Landi Degl’Innocenti, E. 1994, in Solar Surface Magnetism, ed.R. J. Rutten & C. J. Schrijver, 29

Landi Degl’Innocenti, E., & Landolfi, M., eds. 2004, Astrophysicsand Space Science Library, Vol. 307, Polarization in SpectralLines

Levenberg, K. 1944, Quarterly of Applied Mathematics, 2, 164Mahfoud, S., & Mani, G. 1996, Applied Artificial Intelligence, 10,

543Marquardt, D. W. 1963, SIAM Journal on Applied Mathematics,

11, 431McIntosh, S. W., Diver, D. A., Judge, P. G., et al. 1998, A&AS,

132, 145NVIDIA Corp. 2010, Tesla C1060 Computing Processor Board:

Board Specifications (NVIDIA Corp.)—. 2011, NVIDIA CUDA C Programming Guide (NVIDIA Corp.)Orozco Suarez, D., & Del Toro Iniesta, J. C. 2007, A&A, 462,

1137Pearson, K. 1901, Philosophical Magazine, 2(6), 559Pierce, A. K., & Breckinridge, J. B. 1973, The Kitt Peak Table of

Photographic Solar Spectrum Wavelengths, Contribution (KittPeak National Observatory) (Kitt Peak National Observatory)

Press, W. H., Teukolsky, S. A., Flannery, B. P., & Vetterling,W. T. 1988, Numerical Recipes in C (Cambridge UniversityPress)

Rachkovsky, D. N. 1962, Izv. Krymsk. Astrofiz. Obs., 27, 148Ramirez, J., & Fuentes, O. 2002, Experimental Astronomy, 14,

129, 10.1023/B:EXPA.0000009933.44289.e4Rees, D. E., Lopez Ariste, A., Thatcher, J., & Semel, M. 2000,

A&A, 355, 759Rees, D. E., & Semel, M. D. 1979, A&A, 74, 1Ronan, R. S., Mickey, D. L., & Orrall, F. Q. 1987, Sol. Phys.,

113, 353Ruiz Cobo, B., & del Toro Iniesta, J. C. 1992, ApJ, 398, 375Sanchez Almeida, J. 1997, ApJ, 491, 993Skumanich, A., & Lites, B. W. 1987, ApJ, 322, 473Socas-Navarro, H., Lopez Ariste, A., & Lites, B. W. 2001, ApJ,

553, 949Unno, W. 1956, PASJ, 8, 108

brianj. harker kennethj. mighell arxiv:1208.0302v1 …arxiv:1208.0302v1 [astro-ph.sr] 1 aug 2012 in...

Documents