mission to planet earth: ai views the world

Mission to Plane# €urfb: AI Views f ie Wodd

these other mssions depends on makmg the appropnate information base accessible to both scientists and nonscientists in a form both can readily understand This effort will support scientists, educanonal programs, and related activities through the EOS Data In- formation System

Yet, ensunng the proper processing and retneval of t h s data and information is a for- rmdable task because of the enormous size of EOSDIS The system will involve collecting several hundred gigabytes of data per day for processing into several levels of data products Over the 15-year life of EOS, th~s system will produce an estimated 11,000 tera- bytes of collected, processed, and stored data

While mass storage systems promise to solve some of the archiving problems, having enough storage capacity for this complex data does not guarantee that the environmental community will find relevant informahon.

Nicholas M. Short, Jr., Robert F. Cromp, William J. Campbell, James C. Elton, Jacqueline J. LeMoigee, Gyorgy Fekete, Nathan S. Netanyahu,and Keith Wichmann, NASWGoddard Space Flight Center Walter B. Ligon 111, Clemson University

For example, this project will produce hun- dreds of output data product types for various times, locations, and sensor types More- over, these data will consist of a variety of data types, including raster satellite images, ancillary vector and raster maps, multidimensional nonscalar data, derived spatial products from model simulations (such as output from global temperature models), and associated engineenng and management tex- tual data Hence, the complexity, diversity, and geographc &stnbubon of the archve and its indexes will require extensive data and production line management

Earth Systems Science- A global view

The goal of ESS is to understand the Earth as a holistic system, including atmosphere,

RESPONSE TO CONCERNS ABOUT understanding the environment, the US gov- ernment began the Mission to Planet Earth and the Global Change Research Programs to promote a new science, called earth systems science. Unlike earth science’s past approach to examining the Earth through inde- pendent scientific disciplines, ESS intends to describe the Earth as a dynamic system of interacting constituents influenced by natural and anthropogenic processes. The Na- tional Aeronautical and Space Administra- tion’s role in ESS involves the Earth Observing System program and several smaller earth science missions. These m i s - sions will study the interrelationships among the Earth’s geosphere, biosphere, atmosphere, and cryosphere by using space-based observations of surface temperature, ozone depletion and greenhouse effects, land vegetation and ocean productivity, and desert and vegetation patterns, to name a few. (See the Landsat box.)

With participation from the European Space Agency, Japan, and Canada, NASA will launch several platforms containing a multitude of sensors in the late 1990s. These platforms will provide scientists with the necessary global measurements for docu- menting global change in an extensive information base. The success of EOS and

I N THE 4 UNCI 4 SERlES OF SATELLITES TO STUDY THE EARTH AS A DYNAMIC SYSTEM. THE ENORMOUS SIZE AND COMPLEXITY OF THE RESULTING

DATA HOLDINGS POSE SEVERAL CHALLENGES AND PROMISE TO TEST THE LIMITS OF PRACTICAL AI TECHNIQUES.

24 0885-9000/95/$4 00 0 1995 IEEE IEEE EXPERT

Figure 1. The Intelligent Information Fusion System components.

ocean, land surface, and living organisms, by a macroscopic examination using remotely sensed data. At the core of the science, ESS researchers perform remote object recognition by mapping the spectral characteristics, called a spectval signature, to objects with known spectral properties at specific wavelengths of the electromagnetic spectrum. This recognition includes the ability to dis- tinguish different forms of land cover- urban, forest, desert, and so forth-and the ability to quantify atmospheric makeup at different altitudes, for example. In fact, much of the research in remote sensing for the past 20 years has been to prove and enlarge this capability by statistically comparing sensor data to small ground surveys or standard weather data-that is, in situ observations.

Based on the capability of object recognition from a distance, remote sensing from space involves satellite-based sensors that collect data covering the entire Earth. The data collection occurs over several channels, each corresponding to a band of the electromagnetic spectrum, over a range of electromagnetic wavelengths, and spanning several years. This data is processed to detect features of interest and to summarize and correlate the different sensor data. Ancillary data, in the form of ground observations, demographics, economics, industrial metabolism and productivity, energy consumption, and public health, supplement the remotely sensed data. These data flow into a database and, possi-

bly, a geographic information system for photointerpretation, analysis, and map generation. In some cases, Earth scientists study the data to develop predictive models. These researchers then simulate models using the collected data for initialization. They further ver- ify and refine the models by comparing them to the available data. The results of this process then help shape the focus of future science and policy making.

For this approach to be successful, the collected data need to cover the entire Earth using a variety of instruments during similar time periods. Taken together, current satellite sensor platforms, such as the Landsat, GOES, and the Nimbus series, provide good coverage, but are not well coordinated. Even within each series of platforms, the various sensors are distributed among the sateIlites so that compiling a full range of data for a given point at a given time is impractical. At best, the scientists must correlate and average the data from different times. In fact, the coordination difficulties arise because many of these satellite applications provide service to different scientific disciplines (and the commercial sector). Each of these disciplines concerns itself with regional applications such as crop assessment and management, regional planning, geologic mapping, and weather forecasting.

Given that remote sensing has proven itself fairly well at the regional level in both the scientific community and the commer-

cial sector, ESS, by contrast, focuses on the global view. It has more platforms, sensors, and channels, and a wider range of the electromagnetic spectrum collected from multiple sensors simultaneously The final result will be a twofold system with improved accuracy over other remotely sensed Earth data systems, and well-developed Earth process models that relate different scientific disciplines By developing these process models that describe the Earth as a system, the project will provide a basis for unifying these different disciplines. Unifying these fields will, more than ever, depend on improve- ments in the processing technologies. (See Modis sidebar )

An intelligent information fusion system

As a result of the original research program started in 1983,' NASA's Information Science and Technology Branch has developed an end-to-end scientific spatial database management system called the Intelli- gent Information Fusion System This system's purpose is to develop, incorporate, and evaluate state-of-the-art techniques for handling EOS-era scientific data challenges. Since 1989, the IIFS has resided on an object-oriented database that stores metadata about remotely sensed images (see Figure 1). The metadata itself is organized to en-

26 IEEE EXPERT

able fast, efficient access to the appropriate images with respect to typical image header data, plus it incorporates spatial data struc- tures known as sphere quadtrees (SQT), which more naturally represent data ac- quired globally.

DECEMBER 1995

Additionally, the IIFS contains a number of fast techniques for automatically extracting information about image content, enabling users to query for pertinent datasets based on features of scientific interest within the images themselves. This feature enables

the formulation and satisfaction of queries involving change detection and analogy. Fi- nally, the IIFS includes a complex real-time planner and scheduler module that monitors and assigns system resources for processing the data flow, including all processing of

27

higher level products, extraction and organi- zation of metadata, and even statistic gath- ering for proper planning in the overall dynamic computing environment.

As Figure 1 shows, the IIFS modules provide dual archiving and retrieving functions. When new data arrives, the system assigns a unique ID to each meaningful granule (in most instances, this is a scene or snapshot from an instrument). This ID permits the planner to communicate additional metadata about a scene as it becomes available asynchronously. (Meanwhile, the object database instantiates a new entity and fills in its structure with basic information such as spatial location and the platform, sensor, or instrument involved. The header files associated with the data make the basic information readily available.) Depending on the type of data, scientific requirements, demand measured from monitoring retrieval profiles, and available resources, the planner may spawn a number of processes to derive additional information from the data. These mi,ght include performing special cal- ibrations or running standard analyses, such as land-use and land-cover percentages, or cloud type. The object database is updated as this additional information becomes available or demand for particular products increases.

In the retrieval role, the object database communicates what it knows to the user- interface module so that the user can always query on the available holdings. Thus, when the object database learns of a new platform and its instruments, all future user-interface connections automatically display the availability of that data type. A user’s query may be satisfied directly by the object database, as in the case of “Which Landsat Thematic Mapper scenes exist which include the Chesapeake Bay in May 1995.”

Other queries, however, cause the object database to invoke the planner module, as in a user request for a vegetation index of some study area at a specific time. After locating an appropriate complete scene containing the user’s region, the object database would need to invoke the planner to have a vegetation index algorithm performed on the specific study area. This problem itself may require multiple steps. The planner must:

allocate a machine for this task, 0 isolate the region, 0 compute the algorithm, 0 place the resulting file in an appropriate

location so that the user interface can retrieve it, and inform the object database of the status and availability of the result.

The object database can then relay the result to the user interface, which can then retrieve and display the file automatically. Thus, the IIFS runs in both a data- and demand-driven mode.

We designed the IIFS to mimic the major functionality of an advanced ESS processing system. The overall task of ESS processing is to transform raw sensor and ancillary data into working models of the complex whole-Earth systems called standard products. The process of going from raw sensor

THE OVERALL TASK OF ESS PROCESSRVG IS TO TRANSFORM

RAW SENSOR AND A N C E U R Y DATA INTO WORKRVG MODELS

OF THE COMPLEX WHOLE- k T H SYS7Eh4S CALLED

STANDARD PRODUCTS.

data to integrated Earth models involves several levels of processing through a production pipeline. Typically, raw data are received either directly at receiving stations scattered throughout the world or through a network of relay satellites called the Tracking and Data Relay Satellite System.

After being dumped to some form of mass storage, data flow to various processing locations for preprocessing. The first type of preprocessing involves radiometrically and geometrically calibrating the data caused by atmospheric anomalies, sensor noise, and spacecraft attitude or orientation. After cali- bration, the IIFS transforms the data into their intended sensor units-such as radar backscatter cross section, brightness, or temperature-and, depending on the target application, it maps the data into a set of sci- entifically meaningful features. This classification can be as simple as taking the ratio of two channels to quantify biomass or as complex as statistically classifying the spectral signatures to known features such as land-use categories.

Next, the IIFS considers dynamic issues by mapping the data to a uniform space- time coordinate system This often involves the interpolation of missing values due to orbital track characteristics and the juxta- posing or mosaicking of multiple orbits Data processed to this level is correlated with other datasets and ancillary data to validate and refine numerical simulation models Processing at this level becomes highly application-specific, involving various types of numerical code Remotely sensed data serve for initializing the models and validating intermediate simulation results. Analysis of this validation may then lead to further refined models

Eventually, the IIFS compares results from working models against other simulations to factor in system interrelationships. For example, ocean current models can be used in conjunction with atmospheric models to study ocean-atmosphere interac- tion Finally, the data that this research produces must be presented to the scientists in a readily understandable form This implies both statistical and scientific visualization, which in turn require still more image and graphical processing, multimedia, and data communications

Once processed, researchers must archive all of the data and make it available for future reprocessing. This is a particularly dif- ficult problem because simply placing the data in mass storage is not enough; there must be a mechanism to locate the data efficiently for reprocessing based on innu- merable doman-specific search criteria. For example, a scientist may want to retrieve a set of data for a particular region over a spec- ified period of time to study the region’s dynamics. Another may wish to select data covering the entire Earth for a single time frame to correlate new information to exist- ing information

Each case presents a considerable problem in finding the desired data, registering the data in terms of time, geographical orientation, and resolution, and selecting appropriate piocessing for the given application This must be done in terms of metadata that define parameters of the data, including information about the sensor that collected it, preprocessing parameters, and summary content information. Summary metadata, known as browse products, provide a low- resolution, low-accuracy view of the data to assist in determining which data may be of interest to the scientists These metadata,

28 IEEE EXPERT

stored in an electronic card catalog called a metadatabase, include interface software to allow novice users to browse the database and initiate retrieval and reprocessing.

Finally, all of the activities considered- data retrieval, preprocessing, application- specific processing, reprocessing, and data management-must be coordinated to en- sure smooth operation and timeliness. A major factor in this aspect is that ESS processing is not a static field; in fact, researchers are constantly proposing, developing, and refining new processes and algorithms. Once these new methods are ac- cepted through peer review, these additions could require reprocessing of much of the data into new products at the same time that data are being gathered from the satellites. Because of these issues, automated, real-time coordination of interdependent tasks poses significant processing problems because of arbitrary arrival times.

Automatic data characterization and cataloging

One of the most challenging parts of the IIFS is the module responsible for automatically characterizing image ~ o n t e n t . ~ As data holdings continue to grow, catalogs will need more semantically rich discriminators so that users can efficiently locate pertinent datasets. Currently, the earth science community queries datasets only through the most mun- dane of attributes, such as time of observation, location, or range of wavelength. These attributes provide little or no information of the phenomena being observed, and in the end can only serve to provide soft recom- mendations on the relevance of individual observations. Attributes detailing information on the content of the images themselves are noticeably lacking, except perhaps for an indication of the percentage of clouds within an image. Even this attribute makes only a cameo appearance, as its primary purpose is for quality control.

To enable users to search data holdings based on image content, the substance of each image must be summarized. Just as a postage stamp or thumbnail version of an image serves as a browse product of the full image, a characterization vector categorizes at a high level the features contained within an image. Each sensor is designed to detect

DECEMBER 1995

specific features. The success of this sensing depends on the bands of wavelengths used in the sensor’s channels, the time of observation, the pixel resolution, the fre- quency of observation, viewing conditions, the health of the satellite and receiving sta- tion, and so on.

The IIFS project takes the view that each sensor can be used to detect a finite set of known features. Application developers then build characterization algorithms specific to each sensor for extracting these features from images at a high level, producing characterization vectors that then serve to populate the object-oriented metadatabase. As we will see in the next section, a plannedscheduler chooses which characterization algorithm to

AS DATA HOLDlNGS CONTlNUE TO G R O Y CATALOGS WILL NEED MORE SEMANTICALLY RICH DlSCRlMlNATORS SO THAT USERS CAN EFFIClENTLY LOCATE PERTINENT DATASETS.

apply to a given image. Therefore, this se- lection depends on the level of accuracy re- quired, the allotted processing time, the available computing and data resources, and the kinds of products that need to be generated using the image.

All characterization algorithms rely on clustering techniques. Unsupervised clustering methods can identify homogeneous regions within an image in feature space, but cannot attach semantic labels to those regions. With respect to metadata extraction, the output from unsupervised clustering algorithms requires further processing if it is to be used for enabling querying by content. Supervised clustering algorithms are trained on labeled datasets, so they can identify the class to which a pixel or region belongs. The training data used by supervised clustering techniques consist of inputloutput pairings. In the remote-sensing domain, we could define the input as the sensor reading for a given pixel, and the output as the class to which that pixel should be assigned, as claimed by ground reference data. Typically,

accurate classification requires a minimum sample size of 60 pixels.

The IIFS has promoted a vigorous research program into the development of robust, fast, and efficient characterization algorithms. Conventional Bayesian classifiers apply a maximum likelihood decision nile to identify the class to which an input vector belongs, typically involving a discriminant function centering on a multivariate normal density distribution in the data. Neural network-based classifiers do not make any as- sumptions on the underlying data distribution, and marginally outperform Bayesian classifiers. The computationally intensive backpropagation algorithm has so far produced the most accurate characterization algorithms for remotely sensed data,4 but the prohibitive training times have led to the in- vestigation of probabilistic neural networks5 The training time for PNNs is O(n), where n is the size of the training set, whereas the backpropagation algorithm has been estimated as O ( ~ Z ~ ) . ~

It naturally follows that a combination of probabilistic neural networks and backpropagation-trained neural networks is best- suited to the task of rapid, accurate pixel labeling. The PNN has a fast training time, whereas the backpropagation algorithm learns notoriously slowly. Hence, the IIFS uses the PNN to fine-tune the training set. This adjustment produces contingency tables to help elucidate which classes are being con- fused, indicating that the training sets in these areas should be strengthened. Finally, once a sufficiently descriptive training set has evolved, backpropagation produces an efficient neural network.

Both of these neural network techniques require the user to supply a training set of sample spectral readings, ancillary data (such as wavelet coefficients, elevation, and aspect), and associated classifications. Be- cause of the prohibitive costs of performing on-site ground truth studies, the IIFS uses a photointerpretation tool (PIT) that enables a remote-sensing scientist to identify, select, and label homogeneous regions within a multispectral image. Typically, we use PIT on a 2,982~4,320 Landsat image to label about 50,000 pixels. Some classes are easier to identify than others.

System developers can then choose pixels that are labeled through PIT for placement in the training set. Because of potential sim- ilarities in the spectral signatures of some of the classes, several iterations may be needed

29

before a descriptive set of sample points from all of the classes has been assembled. Fortunately, the PNN model can quickly highlight those classes for which training data is lacking or insufficient.

The PNN centers a Gaussian distribution at each point in a user-fur- nished training set. To determine the classification of a new pixel in hy- perspace, a sum is made for each known class of each training point’s Gaussian contribution at that point. The pixel is assigned to the class with the highest strength of all classes.

From this brief description, it is apparent that every point in the training set is directly involved in the computation for classifying each pixel. For training and testing on a small sample set, this is computationally feasible. For use in an operational environment on millions of pixels per scene, this is no longer a viable option.

Luckily, we can apply the backpropagation algorithm to the final version of the training set to produce an efficient neural network with the same level of performance accuracy as achieved by the PNN. Once the PNN performs at a high level of accuracy (generally around 70% or more of the pixels correctly classified with respect to a test dataset), we can apply the backpropagation neural network training algorithm to the training set to produce a definitive feed-for- ward neural network that is computationally more efficient (due to a smaller number of internal nodes), while producing a compara- ble level of labeling accuracy.

In an operational mode, the characterization module realizes further computational savings by classifying randomly sampled pixels from an image. An examination of no more than 17,000 pixels (much less than one percent of an entire Landsat image) can produce a characterization vector that has more than a 90% chance of accurately summariz- ing the overall percentage (within one percent) of each land-cover class within the image according to the classifier.

Regardless of the type of classifier em- ployed, users can adjust the overall characterization vector to account for the statistical performance of the classifier. When classifiers misclassify, they tend to do so consis- tently. By deriving the conditional probabil- ity matrix for a given classifier, users can

30

probabalistically adjust the characterization vector to produce better results.’

Data production ptanning and scheduling

The IIFS solution for managing the product-generation side is to model modern-day manufacturing and production plants. This model switches the emphasis in processing from a supply-oriented production, which has a tendency for mass production irre- spective of scientific demand, to a demand- driven approach in which the delivery of products occurs just-in-time for scientific needs and uses a minimal amount of inven- tory-that is, intermediate data products. In this type of system, the production pipeline must reconfigure itself to adapt to new product lines, varying resources, and changing product needs according to newly arriving raw material, while still meeting manufacturing deadlines and budgetary constraints.

If the IIFS cannot generate products in time because of very high data rates, after the raw data have been placed in mass storage, it generates browse products for use by scientists to guide further product processing. These summaries or browse products include standard information such as the satellite name, sensor, and time of the observation, and a description of the contents of the imagery. This latter type of browse product,

which relies heavily on automated image processing, is typically less accurate than its corresponding standard product because of a tradeoff between the accuracy level and the amount of time spent processing This tradeoff can involve algorithms that can be interrupted at any time for useful results or through an examination of resampled (smaller) images Such an approach ties scientific demand to production by allowing the scientist to make a processing decision based upon the easily generated browse product.

Given the dynamic, complex na- ture of thls environment, several techniques from the planning field must be used to manage both the genera tion of standard and browse products in the context of soft and hard goals For standard products, the stability of the processing environment allows for the application of classical ap-

proaches using hierarchical planning with

Standard and browse product requests are represented by explicit or hard goals, such as “produce a characterization vector” or “create a postage stamp ” However, because the aim of this production pipeline is to gather as much information about a raw satellite image as possible, the degree of success can vary greatly This desire to satlsfy as many of the possible hard goals for an image to the highest degree of accuracy implies expressing intentions as soft goals

Hence, methods such as transforming a soft goal into a hard goal by satdicing can be a preliminary step toward construction of a plan. That is, unfilled information from the database serves as an initial set of goals The current computational environment, having deadlines and previous runtime estlmates for simlar goals, determnes which goals can re- alistically be produced and in what priority In particular, a simple satisficing procedure orders the unfilled goal slots from high to low mean demand After summing the mean mn- times, the procedure removes the last k goals that may push the unknown plan past its deadline Finally, these hard goals go to the planner for task reduction, where computationally cheaper algorithms may be substi- tuted because the initial satisficing procedure’s runtime estimates were incorrect

After hard goal reduction takes place, plans go to an execution monitor that inter-

IEEE EXPERT

prets plans according to the runtime environment, assigns uncommitted tasks to processes, and collects statistics for the planner. These statistics provide best- and worst- case estimate intervals for primitive tasks and propagate back up a task-definition hierar- chy’ to provide better constraints during task decomposition. Operators have a functional duration that allows for runtimes to be estimated analytically from input size. During the propagation of intervals up the task hier- archy, the system uses both the analytical and empirical estimations.

For browse-product generation, the de- mands of image processing require that plans be conditional-that is, each action depends on the outcome of previous actions.8 For example, in the absence of a geographical database, distinguishing between lakes and rivers requires evidence from land-cover categories derived from the spectral signature as well as noticing highly elongated features from successful spatial analysis. Moreover, conditional plans can be probabilistically dependent on effects of operators that involve database search. For example, the choice of a classification method depends on, among other things, whether previously classified images exist for the desired region.

Like planning, scheduling of both standard and browse product tasks is an iterative process in which loose constraints must be tightened as more information about resources and responses from information retrieval arrive.

Sphere quadtrees: A global knowledge representation

Data representation poses its own challenges. Often the manner in which the investigating scientists like to visualize or retrieve their data will strongly influence the choice of a particular representation scheme. The result is a sampling scheme that is very conducive to generating graphic renditions using specific projection methods. For example, there are formidable volumes of pub- lished earth science data gridded along lines of latitude and longitude, which make storage of data in tables of rows and columns very convenient. (See Figure 2. )

Although it is easy to generate an image from data already organized into rows and columns, other gridding schemes have desirable properties. For example, sea-surface in- vestigators may try to keep as much of the

~~~

DECEMBER 1995

Figure 2. Conceptual diagram of a sphere quodtree.

water surfaces together as possible even at the expense of introducing tears into the land masses. Others may opt for leaving the land masses undisturbed, while tearing up the globe along lines contained in the oceans. In any case, attempts at making digital flat maps from the sphere involve warping (stretching) and tearing followed by resampling (gridding).

Although these methods may support an individual project’s immediate data representation and analysis needs adequately, they do have some significant drawbacks. These are unacceptable distortions, broken topol- ogy, space inefficiency, and difficulty with sharing data between projects caused by a lack of good global representation schemes.

First, many regular gridding schemes that deal with global coverage either create large gaps, or create a lack of uniformity among some geometric properties of the grid cells. For example, in a rectilinear array of data gridded by latitude and longitude, a loxlo grid cell covers a larger area near the equa- tor than it does near the poles. A simple map-

ping of rows and columns of data to pixels in a rectangular image causes increasing distortions as we approach the poles. The fa- miliar oversized appearance of Greenland and the greatly distorted shape of Antarctica are direct consequences of such gridding.

Usually, software systems that deal with these projections also compensate for unde- sirable artifacts by warping and resampling the data as appropriate. There is a problem of reduced space efficiency that is directly caused by the disparity of the areas of the grid cells. The same number of cells is associated with a small area near the pole as the equa- tor, although the amount of useful information is much less. With the advent of terabyte-sized data archives, even a small percentage of wasted resources translates to an unnecessarily increased cost.

Data sharing among projects that use their own custom gridding scheme is also a problem. Managing the coordination efforts involved with reprojecting, resampling, and reformatting data is no easy task, and leads

31

Find all images that completely contain a region of interest

The SQT, just as linear quadtrees, also makes it possible to handle regions of arbitrary shapes with any number of holes.

Figure 3. Multiresolutional representation in a sphere quadtree.

to a confusion of conversion programs. Al- though conventional quadtree or R-tree based geographic information systems manage spatial data efficiently, they still consider the Earth as a quilt made up of piecewise flat regions. Rectangular lati- tudellongitude query boundaries can become inadequate for rectangular regions that contain one of the poles.

To address these shortcomings, the IIFS uses a hierarchical spatial data structure called a sphere q ~ a d t r e e . ~ An SQT can model a digital spherical image consisting of small triangular picture elements called trixels that cover the sphere’s surface (see Figures 2 and 3). The method for generating trixels is based on the successive subdivision of the faces of a regular convex poly- hedron (an icosahedron in our case). The vertices of the newly obtained facets are pro- jected onto the circumscribing sphere. The larger the number of subdivisions, the closer the new object approximates the surface of a true sphere. The subdivision scheme can be naturally organized into a forest of quadtrees, one root for each of the 20 faces of the icosahedron. This data structure has several desirable properties:

Abstraction. We have a well-defined, discrete arithmetic over trixel addresses that allows us to manipulate topological objects over a digital spherical geometry.

0 Topological consistency. Our abstraction allows us to derive efficient integer-based algorithms to deal with connectivity, distance, and neighborhoods of trixels. The algorithms are insensitive to the special- case considerations of conventional methods that need to treat regions close to the poles differently. Rapid access to data. The spatially in-

32

dexed tree structure speeds up searches based on location and size. It is well known that a well-balanced tree with n nodes can be searched in U(1og n) time. Variable resolution. Quadtrees-spherical or linear--can also be organized so that full extension of the tree is only re- alized if there is variance in the data in a region represented by the node that covers it. This directly translates into efficient use of space. Multiple resolution. Quadtrees can also represent a pyramid of images at varying resolutions. This would be very useful for distributed browsers that incorporate any- time transmission and rendering algorithms that improve image quality with time. Often an image of lower spatial and spectral information content suffices for the browsing individual to either abort or confirm the transmission a prospective image (see Figure 3). Improved interactive browsing. There is no need to reproject an SQT-based map for interactive browsing of global datasets. Except for translations and ro- tations in the viewing screen’s two di- mensions, the IIFS can quickly generate local maps that avoid breaks and tears where they are not convenient for the browsing individual.

For the IIFS, the SQT can implement a structure called the hypercylinder, which organizes metadata about satellite observations of the Earth or space within the object- oriented database management system. The key objective is to provide a method for sat- isfying queries of the following kind:

Find all images that overlap a region of interest.

Object-oriented databases. Because of the complexity of ESS data, relahonal databases do not map well into the ESS domain Con- ventional databases were designed to support commercial data-processing applicatibns, which are charactenzed by simple data types, transaction-oriented operations, and rela- tional-associatwe access to data m a restnchve retneval language such as SQL The relahonal vendors, in fact, have focused on maximzing the number of simple transachons per second at the sacrifice of data modeling ESS applications, on the other hand, require complex, flexible, and often application-dependent index methods for retneving data types such as images, video, and text

The database portion of the IIFS includes a commercial object-oriented database management system, called ObjectStOre (by Ob- jectDesign, Inc ), a complete data model of the ESS processing domam, several special- ized indexing methods such as SQTs, and a Lisp-like interprocess query language interface Use of standard techniques such as abstraction, encapsulation, inheritance, and multiple indexing methods makes the ex- tended OODBMS a natural interface paradigm to the AI components within the IIFS

Data mining. Given the existence of extensive multdayered datasets, it is natural to won- der if we could construct a curiosity module that would be responsible for discovering in- teresting relabons among portions of the data Minimal research exists in this area

Data mining is the area of machine learning concerned with automatically discovering relations embedded in complex datasets Still in its infancy, research investigating scientific discovery has concentrated on producing equations that describe the rela- tionships among scalar variables, or for more complex data, discovering qualitative, conceptual descriptions to guide the €or- mulation of equations A qualitative version of calculus has been developed to explain the dynamics of a complex system using state diagrams The KAM programi0 uses classic AI techniques and domain-specific knowledge to perform automatic qualitative analysis of nonlinear Hamiltonian systems

IEEE EXPERT

A curiosity module for exploring datasets of remotely sensed images would require as- pects of all of the earlier approaches, and pose several new challenges. Better search techniques would be needed because of the size and resolution of the data. The system would require comprehensive knowledge of its domain to keep it from discovering rela- tionships already well known to the scientific community. To test theories, the system would need to be able to locate images containing instances of specific phenomena, and use these results to refine its hypotheses.”

LTHOUGH THE INTENT OF THE IIFS has been to create a framework for testing technologies in ESS, several components of the IIFS have been applied to various projects. Recently, a team at NASA has been applying the IIFS to the acquisition of real-time satellite data from both the GOES and NOAA series satellites. As several IIFS technologies become better understood, researchers are applying them based upon the evolving requirements of their projects.

The IIFS is the culmination of over 10 years of research by the authors. It combines technologies from AI as well as conventional computer science to understand the systemic properties of the ESS domain. Unlike current systems-analysis approaches that use top-down design, the AI paradigm for systems engineering, as exhibited in the IIFS, is to have the system gradually evolve towards a solution. Because of the enormous size of imminent ESS missions, future ESS systems must use the AI approach to meet the de- mands of changing requirements and new scientific knowledge.

Acknowledgments The authors extend special thanks to Me1 Mon-

temerlo (NASA/HQ). Also, we thank Samir Chettri (GST, Inc.), Howard Leckner (GST, Inc.), Lloyd Treinish (IBM-Watson), Compton Tucker (NASAIGSFC), Marc Imhoff (NASAIGSFC), Joel Susskind (NASA/GSFC), M. Manohar (USRA), Mark Boddy (Honeywell TC), Jim White (Honeywell TC), John Beane (Honeywell TC), Amy Lansky (NASA-Ames), Ron Rymon (U. Pitt), Phil Romig (U. Nebraska, Lincoln), Becky Hertenstein (Notre Dame), Tony Lopez (Loyola,

DECEMBER 1995

New Orleans), and Phuong Tran (Loyola, New Or- leans) for their invaluable contributions throughout the project. Finally, thanks to the anonymous reviewers for their comments and viewpoints.

References 1. W.J. Campbell and L.H. Roelofs, “Artificial

Intelligence Applications Concepts for the Remote Sensing and Earth Science Com- munity,” Proc. 9th Pecora Symp., IEEE Computer Society Press, Los Alamitos, Calif., Oct. 1984, pp. 232-242.

2. N.M. Short, Jr., and L. Dickens, “Automatic Generation of Products for Terabyte-Size Geographic Information Systems Using Planning and Scheduling,” Int’l J. Geo- graphic Information Systems, Vol. 9, No. 1, 1995, pp. 47- 65.

3. W.J. Campbell, L.H. Roelofs, and M. Gold- berg, “Automated Cataloging and Charac- terization of Space-Derived Data,” Telem- atics and Informatics, Vol. 5 , No. 3, 1988, pp. 279-288

4. W.J. Campbell, S.E. Hill, and R.F. Cromp, “Automatic Labeling and Characterization of Objects Using Artificial Neural Net- works,” Telematics and Informatics, Vol. 6 , Nos. 34,1989, pp. 259-271.

5. S.R. Chettri and R.F. Cromp, “The Proba- bilistic Neural Network Architecture for High-speed Classification of Remotely Sensed Imagery,” Telematics and Informat- ics, Vol. 10, No. 3, 1993, pp. 187-198.

6. H. Muhlenbein, “Limitations of Multilayer Perceptron Networks-Steps Towards Ge- netic Neural Networks,” Parallel Comput- ing, Vol. 14, 1990, pp. 249-260.

7. R.F. Cromp, “Automated Extraction of Meta- data from Remotely Sensed Satellite Im- agery,” Tech. Papers, 1991 ACSM-ASPRS Ann. Conv., Vol. 3, American Congress on Surveying and Mappiug/American Soc. Photogrammetry and Remote Sensing, 1991, pp. 91-101.

8. M. Boddy et al., “Planning Applications in Image Analysis,” 1994 NASNGSFC Proc. Space Applications of AI, NASA-Goddard, Greenbelt, Md., May 1994.

9. G. Fekete and L.S. Davis, “Property Spheres: ANew Representation for 3D Object Recog- nition,” Proc. Workshop on Computer Vi- sion: Representation and Control, CS Press, 1984, pp. 192-204.

10. K.M. Yip, “Understanding Complex Dynam- ics by Visual and Symbolic Reasoning, Arti- ficial Intelligence, Vol. 51, 1991, pp. 179-221.

11. R.F. Cromp and W.J. Campbell, “Data Min- ing of Multidimensional Remotely Sensed Images,” Proc. Second Int’l Conf Informa- tion and Knowledge Management, ACM Press, New York, 1993, pp. 471-481.

Nicholas M. Short, Jr., has been a computer scientist in the Information Science and Technology branch of the NASA/Goddard Space Flight Cen- ter, where he has been a key architect and co-investigator in the development of the Intelligent In- formation Fusion System. His research interests include planning and scheduling, distributed AI, image classification, databases, and natural language processing. He received his degrees in math- ematicsIstatistics, systems analysis, and computer science/Al (BA 1986, BS 1986, and MSE 1990) from Miami University and the University of Penn- sylvania. He chaired the 1994 Conference on Space Applications ofArtificia1 Intelligence and the 1995 Workshop for Planning/Scheduling of Earth Sci- ence Data Processing for the EOS project.

Robert F. Cromp is principal investigator of the Intelligent Information Fusion System project at NASNGoddard Space Flight Center in the Infor- mation Sciences and Technology Branch, Code 935. He also teaches at the University of Mary- land, University College. His research interests include image data mining, knowledge acquisition and representation, the theory of strategies, information fusion, machine learning, spatial data struc- tures, natural language processing, and discrete mathematics. He received his BAin computer science from the State University of New York at Bnf- falo in 1982, and his MS and PhD in computer science from Arizona State University, Tempe, in 1983 and 1988. He is a member of the AAAI, ACM, IEEE, INNS, ASPRS, and MAA.

William J. Campbell is the head of the Infor- mation Science and Technology Branch at NASNGoddard Space Flight Center, which is responsible for the development of intelligent information and data management value-added systems. His work at GSFC concentrated on development of large-scale data information systems for Earth science and on the integration and modeling of satellite remote-sensing data into geographic information systems. He has an MS in physical ge- ography from Southem Illinois University. He also serves as an associate editor for the Photogram- metric Engineering and Remote Sensing Journal, and has served as a member of the Committee on Information, Robotics, and Intelligent Systems, Na- tional Science Foundation, and as a committee member of the National Resource Council of the National Academy of Sciences.

James C. Tilton is a computer engineer with the Information Science and Technology Branch of the Space Data and Computing Division at the NASNGoddard Space Flight Center. As a member of ISTB, he helps define future requirements for image analysis information extraction and data compression in support of NASAprograms. He is also responsible for designing and developing

33

computer software tools for space and earth science image analysis and compression algorithms, and encouraging the use of these computer tools through close interactions with space and earth scientists. He received a BA in electrical engineering, environmental science and engineering, and anthropology (1976), and an MEEE from Rice University (1976), anMS in optical sciences from the University of Arizona, Tucson (1978), and his PhD in electrical engineering from Purdue Uni- versity (1981). He is a member of the IEEE Com- puter Society, as well as Phi Beta Kappa, Tau Beta Pi, and Sigma Xi honoraries. He currently serves as a member of the IEEE Geoscience and Remote Sensing Society Administrative Committee.

Jacqueline Le Moigne is at NASA/Goddard Space Flight Center with the Center of Excellence in Space Data and Information Sciences (CES- DIS), where she is a senior scientist and the head of the Computational Sciences Branch. Her research interests mainly focus on computer vision using massively parallel computers and applied to Earth and Space Science problems. She received her BS in mathematics in 1977, MS in artificial intelligence in 1980, and PhD in computer vision in 1983, all from the Pierre and Marie Curie Univer- sity, Paris. To bridge the gap between earth science and computer science, she also organized several

CESDIS-sponsored seminar series on Earth Re- mote Sensing.

Gyorgy (George) Fekete is the chief architect and implementor of the sphere quadtree data type. His interests are spatial data management, graphics, visualization, and interactive techniques. He re- ceivedhisBS (1978),MS (1979), andPhD(1988) in computer science from the University of Mary- land at College Park. He is a member of the ACM and the IEEE Computer Society.

Nathan S. Netanyahu is an assistant research scientist, affiliated with the Center for Automation Research, University of Maryland, College Park, and the Center of Excellence in Space Data and Information Sciences, NASA Goddard Space Flight Center. He received his BSc and MSc in electrical engineering from the Technion, Israel Institute of Technology, and his MSc and PhD in computer science from the University of Mary- land, College Park. His main research interests are in algorithm design and analysis, computational geometry, image processing, pattem recognition, remote sensing, and robust statistical estimation.

Walter B. Ligon III is an assistant professor of computer engineering at Clemson University and is the head of the Parallel Architectures Research

Lab at Clemson Current projects include the Mac- intosh Telemetry and Control project, the Parallel Virtual File System, and parallel problem-solving environments for computational engineering ap- plicahons His primary research interests are computer architecture, parallel processing, operating systems, and compiler design He received his PhD in computer science from the Georgia Institute of Technology in 1992 Readers can contact him at the Dept of Elecmcal and Computer Engineering, 102 Riggs Hall, Clemson Univ , Box 340915, Clemson, SC 29634-0915

Keith Wichmann is working on his PhD disser- tation at Clemson while working on the IlFS at NASA GSFC Code 935 as an employee of Global Science and Technology He has a strong interest in object-onented techniques and computer architecture He earned his BS in computer engineer ing from Clemson University in 1989 and his MS in 1991

Address correspondence for authors other than Walter B Ligon 111 to Code 935, NASMGoddard Space Flight Center, Greenbelt, MD 20771, lastname@gsfc nasa gov

34 IEEE EXPERT

mission to planet earth: ai views the world

Documents