meat quality evaluation by computer vision

9
Meat quality evaluation by computer vision Jinglu Tan * Department of Biological Engineering, University of Missouri, Columbia, MO 65211-0001, USA Received 18 April 2002; accepted 2 May 2003 Abstract Applying computer vision in meat quality evaluation has been an active area of research in recent years. Various studies have addressed issues from basic technique development to applications. This paper summarizes the main results from a number of recent application studies, which include characterization of quality attributes such as color, marbling, maturity and texture; prediction of sensory scores and grades; and prediction of cooked-meat tenderness. The promise of computer vision for objective meat quality evaluation is demonstrated and the remaining challenges are discussed. Ó 2003 Elsevier Ltd. All rights reserved. Keywords: Image processing; Meat grading; Tenderness 1. Introduction The existing methods for meat quality (palatability) grading heavily rely on subjective visual appraisal of certain carcass characteristics. The official USDA (US Department of Agriculture) beef quality grading system consists of three visually assessed attributes: abundance of marbling (intra-muscular fat), muscle color, and skeletal maturity. An official pork grading system does not yet exist, but color is considered an important quality attribute because it influences consumers in their product selection. While visual appraisal has been serving the industry for decades, it has some major drawbacks. Although the graders are professionally trained, inconsistencies and variations are intrinsic of subjective evaluations (Cross, Gilliland, Durland, & Seideman, 1983). This has seri- ously limited the ability of the meat industry to provide consumers with products of consistent quality. Fur- thermore, the grades have limited predictive power of the eating quality. Research has shown that marbling and color characteristics, and thereby, the final grade, only explain a rather low percentage of the variations in important palatability measures such as tenderness (Li, Tan, Martz, & Heymann, 1999; Li, Tan, & Shatadal, 2001). Objective measures of beef quality have been a long- time desire of the industry and there have been many research efforts in developing instruments. One popular, and obvious, approach has been to measure the me- chanical properties as indicators of tenderness. A num- ber of devices have appeared. The most well known one may be the Warner–Bratzler shear force instrument. The shear strength of cooked meat is correlated with sensory tenderness scores (Shackelford, Wheeler, & Koohma- raie, 1995). This method, however, is not practical for commercial fresh-meat grading. Computer vision has been recognized as the most promising approach to objective assessment of meat quality from fresh-meat characteristics. Research in this area began in the early 80s. Lenhert and Gilliland (1985) describe the design of a black-and-white (B/W) imaging system for lean yield estimation. Cross et al. (1983) and Wassenberg, Allen, and Kemp (1986) report application results of the system. Beef quality assessment by image processing started with the work by Chen, McDonald, and Crouse (1989) to quantify the marbling area per- centage in six standard USDA marbling photographs. McDonald and Chen (1990a, 1990b) used morphologi- cal operations to separate connected muscle tissues from the longissimus dorsi (LD) muscle. A Boolean random set model was proposed to describe the spatial marbling distribution (McDonald & Chen, 1992) but it did not significantly improve marbling score prediction over using only the marbling area (McDonald & Chen, 1991). To separate lean from fat in B/W steak images, Chen, Journal of Food Engineering 61 (2004) 27–35 www.elsevier.com/locate/jfoodeng * Corresponding author. Tel.: +1-573-882-7778; fax: +1-573-884- 5650. E-mail address: [email protected] (J. Tan). 0260-8774/$ - see front matter Ó 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0260-8774(03)00185-7

Upload: jinglu-tan

Post on 02-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Journal of Food Engineering 61 (2004) 27–35

www.elsevier.com/locate/jfoodeng

Meat quality evaluation by computer vision

Jinglu Tan *

Department of Biological Engineering, University of Missouri, Columbia, MO 65211-0001, USA

Received 18 April 2002; accepted 2 May 2003

Abstract

Applying computer vision in meat quality evaluation has been an active area of research in recent years. Various studies have

addressed issues from basic technique development to applications. This paper summarizes the main results from a number of recent

application studies, which include characterization of quality attributes such as color, marbling, maturity and texture; prediction of

sensory scores and grades; and prediction of cooked-meat tenderness. The promise of computer vision for objective meat quality

evaluation is demonstrated and the remaining challenges are discussed.

� 2003 Elsevier Ltd. All rights reserved.

Keywords: Image processing; Meat grading; Tenderness

1. Introduction

The existing methods for meat quality (palatability)

grading heavily rely on subjective visual appraisal of

certain carcass characteristics. The official USDA (USDepartment of Agriculture) beef quality grading system

consists of three visually assessed attributes: abundance

of marbling (intra-muscular fat), muscle color, and

skeletal maturity. An official pork grading system does

not yet exist, but color is considered an important

quality attribute because it influences consumers in their

product selection.

While visual appraisal has been serving the industryfor decades, it has some major drawbacks. Although the

graders are professionally trained, inconsistencies and

variations are intrinsic of subjective evaluations (Cross,

Gilliland, Durland, & Seideman, 1983). This has seri-

ously limited the ability of the meat industry to provide

consumers with products of consistent quality. Fur-

thermore, the grades have limited predictive power of

the eating quality. Research has shown that marblingand color characteristics, and thereby, the final grade,

only explain a rather low percentage of the variations in

important palatability measures such as tenderness (Li,

Tan, Martz, & Heymann, 1999; Li, Tan, & Shatadal,

2001).

*Corresponding author. Tel.: +1-573-882-7778; fax: +1-573-884-

5650.

E-mail address: [email protected] (J. Tan).

0260-8774/$ - see front matter � 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0260-8774(03)00185-7

Objective measures of beef quality have been a long-

time desire of the industry and there have been many

research efforts in developing instruments. One popular,

and obvious, approach has been to measure the me-

chanical properties as indicators of tenderness. A num-ber of devices have appeared. The most well known one

may be the Warner–Bratzler shear force instrument. The

shear strength of cooked meat is correlated with sensory

tenderness scores (Shackelford, Wheeler, & Koohma-

raie, 1995). This method, however, is not practical for

commercial fresh-meat grading.

Computer vision has been recognized as the most

promising approach to objective assessment of meatquality from fresh-meat characteristics. Research in this

area began in the early 80s. Lenhert and Gilliland (1985)

describe the design of a black-and-white (B/W) imaging

system for lean yield estimation. Cross et al. (1983) and

Wassenberg, Allen, and Kemp (1986) report application

results of the system. Beef quality assessment by image

processing started with the work by Chen, McDonald,

and Crouse (1989) to quantify the marbling area per-centage in six standard USDA marbling photographs.

McDonald and Chen (1990a, 1990b) used morphologi-

cal operations to separate connected muscle tissues from

the longissimus dorsi (LD) muscle. A Boolean random

set model was proposed to describe the spatial marbling

distribution (McDonald & Chen, 1992) but it did not

significantly improve marbling score prediction over

using only the marbling area (McDonald & Chen, 1991).To separate lean from fat in B/W steak images, Chen,

28 J. Tan / Journal of Food Engineering 61 (2004) 27–35

Nguyen, and Park (1995) suggested using the sixthmoment as a thresholding criterion.

To develop a computer vision system for objective

meat grading, several steps must be completed. While

the existing human grading system has many weak-

nesses, any new system designed as a replacement must

be compared against the human system before the new

system is accepted. Since the existing system is defined in

terms of qualitative subjective assessments, the quanti-tative characteristics that contribute to the human

grading are not always obvious. It is therefore necessary

to search for image features that are related to human

scores for marbling abundance, muscle color and ma-

turity; and eventually, the USDA grades. Moreover, to

improve the usefulness of the grading system, new in-

strumentally measurable characteristics are needed to

enhance the power of the grades in predicting the eatingquality such as tenderness.

This paper attempts to show where we stand towards

computer vision-based meat grading by summarizing

some recent results from work performed at the Uni-

versity of Missouri and discussing some remaining

challenges. This will shed light on how successful we

have been in characterizing the traditional quality at-

tributes, predicting official grades, and finding comput-able features for tenderness prediction. The difficulties

discussed will point to some future directions of re-

search.

Fig. 1. Example beef image. Upper: original. Lower: segmented LD

muscle. The holes in the LD muscle give the marbling image.

2. Characterizing USDA quality attributes

Meat images were processed to characterize thequality attributes defined in the USDA beef grading

system. Color image features were extracted to predict

human scores of color, marbling and maturity.

2.1. Color and marbling

Sixty wholesale beef ribs of varying color and degree

of marbling were acquired from a local supplier. Five-

cm-thick slices were taken from the ribs. Each slice was

then cut into two 2.5-cm-thick steaks. The two freshly

cut surfaces, mirror images of each other, were used for

analysis, one for image capture and the other for sensoryanalysis.

A 10-member panel was assembled and trained to

evaluate color and marbling of beefsteaks. Muscle color

was evaluated according to a beef color guide in an 8-

point scale: 1––bleached red, 2––very light cherry red,

3––moderately light cherry red, 4––cherry red, 5––

slightly dark red, 6––moderately dark red, 7––dark red,

and 8––very dark red. Marbling was rated according tothe USDA marbling scorecards in a 9-point scale: 1––

devoid, 2––practically devoid, 3––traces, 4––slight, 5––

small, 6––modest, 7––moderate, 8––slightly abundant,

and 9––moderately abundant. The panel averages wereused as the sensory scores.

Images were acquired while the sensory evaluations

were proceeding. A color imaging system was used to

capture sample images. The illumination and camera

settings were selected so that the image resolution was

appropriate for revealing the small marbling flecks.

The steak images were subjected to several steps of

processing: filtering, background removal, segmentationof fat from muscles, isolation of the LD muscle, and

segmentation of marbling from the LD muscle. Details

of the segmentation algorithms developed and used can

be found in Gao, Tan, and Gerrard (1995), Lu and Tan

(1998), and Lu (2002). Fig. 1 shows an original image

and the segmented LD muscle. The holes in the LD

muscle give an image of the marbling flecks (not shown).

From the segmented LD muscle and marbling images,image features relating to muscle color and marbling

abundance were extracted. The LD muscle color was

characterized with the means (lR, lG and lB) and

standard deviations (rR, rG and rB) of the red, green

and blue color functions.

J. Tan / Journal of Food Engineering 61 (2004) 27–35 29

Features representing the amount and the spatialdistribution of marbling were extracted from labeled

marbling images (Gerrard, Gao, & Tan, 1996). The size

of each marbling fleck was calculated. To take into ac-

count the effects of fleck size, the marbling flecks were

classified into the following three size categories:

A1: <2.7 mm2,

A2: 2.7–21.4 mm2, andA3: >21.4 mm2.

The following marbling features were then computed

to measure marbling abundance:

Dci count of marbling flecks in size category Ai per

unit ribeye area,

Dai marbling area in size category Ai per unit ribeyearea,

Dc count of all marbling flecks per unit ribeye area,

and

Da total marbling area per unit ribeye area.

Dc and Da define two marbling densities: Dc is a count

density and Da is an area density. Dci and Dai represent

the densities for each fleck size category. The two den-sities describe marbling abundance from different per-

spectives.

To characterize the spatial distribution of marbling, a

ribeye image was divided into 5.5-cm2 sub-regions. Dc

and Da were computed for each sub-region. The varia-

tions in these localized densities would indicate the

spatial distribution of marbling over the LD. The fol-

lowing statistics were computed from these densities asimage features:

rDc, rDa standard deviations of the count and area

densities, and

MDc, MDa third moments of count and area densities

about mean.

Statistical analysis was conducted. The SAS Back-

ward Elimination procedure (SAS, 1996) was used to

select image features significant for sensory score pre-

diction. For color scores, two image features (lR and

lG) were significant. For marbling scores, five image

features (lR;Dc1;Da1;Da3 and Da) were useful. The R2

values of regression were 0.86 for color and 0.84 formarbling, indicating that the features were useful in

explaining the variations in sensory scores.

The means of red and green (lR and lG) were sig-

nificant for color prediction but lB was not. This indi-

cates that, while all three color components varied, the

green component did not affect the judges� scoring. Thefact that lR was significant in marbling prediction shows

that the judges� opinions were influenced by the leancolor. Both the count and area densities of small mar-

bling flecks (Dc1 and Da1) influenced the sensory scoresas expected. The area density of large flecks ðDa3Þ beingsignificant indicated that the presence of a few large

marbling flecks affected the sensory scoring although the

judges were instructed not to put more weight on larger

flecks. The global marbling area density ðDaÞ was in-

fluential on the scoring.

The image features characterizing the spatial varia-

tion of marbling were not significant in the regression.This agrees with McDonald and Chen (1992) that in-

formation on the spatial distribution of marbling did

not correlate significantly with marbling scores.

Since the sensory scores were imprecise in nature, the

data were also analyzed by using fuzzy set and neural

network techniques (Tan, Gao, & Gerrard, 1998). The

sensory scales were described as fuzzy sets, sensory at-

tributes as fuzzy variables, and sensory responses assample membership grades. Multi-judge responses were

formulated as a fuzzy membership vector or fuzzy his-

togram of response, which gave an overall panel re-

sponse free of unverifiable assumptions implied in

conventional approaches. Neural networks were used to

predict the sensory responses in their naturally fuzzy

and complex form from the image features selected by

backward elimination. A maximum method of defuzz-ification was employed to give a crisp grade of majority

opinion. The fuzzy set and neural network method

classified color and marbling of a set of samples 100%

correctly. This further verified the usefulness of the im-

age features extracted.

In addition to beef color and marbling, fresh pork

color was also evaluated by image processing (Lu, Tan,

& Gerrard, 1997; Lu, Tan, Shatadal, & Gerrard, 2000).Since consumers pay attention to pork color during

product selection, it is of interest to know if human re-

sponses to pork color can be predicted by image pro-

cessing. Forty-four pork loins were randomly picked

and cut at the 10th rib. The muscle color was subjected

to evaluation by a seven-member sensory panel trained

according to procedures described in the American

Meat Science Association guidelines (AMSA, 1991).The color scale used was: 1––pale-purplish gray, 2––

grayish pink, 3––reddish pink, 4––purplish red, and 5––

dark purplish red. Images were captured immediately

after sensory scoring under the same lighting conditions.

Algorithms were developed to segment the loin images

into background, muscle and fat (Lu et al., 2000). Color

image features were computed, which included the

means and standard deviations of red, green, and bluevalues of the segmented loin muscle areas. Both statis-

tical and neural network models were employed to

predict the sensory color scores from the image features.

The partial least squares technique was used to determine

the latent variables, which were subsequently used to

derive a statistical model by multiple linear regression

(MLR) and a neural network model by back-propagation.

Fig. 2. Vertebra image. Upper: original. Lower: segmented bone-

cartilage object.

30 J. Tan / Journal of Food Engineering 61 (2004) 27–35

The correlation coefficients between the predicted andthe sensory color scores were 0.75 for the neural net-

work model and 0.52 for the statistical model. A pre-

diction error of 0.6 or lower was considered negligible

from a practical point of view. For the neural network

model, 93.2% of the samples had a prediction error

lower than 0.6; and for the statistical model, the per-

centage was 84.1%. These results showed that image

processing plus neural network modeling is an effectivetool for predicting consumer responses to fresh pork

color.

2.2. Skeletal maturity

Image processing and neural network techniques

were developed to determine the skeletal maturity of

beef carcasses (Hatem & Tan, 1998). Maturity refers to

the physiological age, which is not synonymous with

chronological age. Maturity is a factor in beef quality

grading. Beef tenderness reduces with increasing matu-

rity. In the United States, beef carcass maturity is de-termined by subjective evaluation of indicators such as

cartilage ossification in the vertebrae. There are five

maturity levels, designated as ‘‘A’’ (young) through ‘‘E’’

(old).

Two sets of carcass samples were used. The first set

included 110 carcasses selected from a commercial plant.

An official USDA grader graded the carcasses and

provided the maturity scores, which ranged from ‘‘A’’ to‘‘E’’. The second set included 28 cattle of known chro-

nological age. Most of them were of ‘‘A’’ maturity and

the rest were of ‘‘B’’ maturity. They were prepared in a

different commercial plant.

Digital color images were taken right after official

grading under the same florescent lighting conditions.

For each set of images, the lighting and camera settings

were kept constant. The images were focused on thethoracic vertebra around 13th to 15th ribs (Fig. 2).

Existing research has shown that the degree of car-

tilage ossification in the vertebrae is the most important

indicator of skeletal maturity. For ‘‘A’’ maturity, the

cartilage in the thoracic vertebrae is free of ossification;

and for ‘‘B’’ maturity, there is some evidence of ossifi-

cation. Then the cartilage becomes progressively ossified

with age until it appears as bone.The degree of cartilage ossification was characterized

by processing the vertebra images. First, image seg-

mentation was performed to isolate the bones with the

cartilage (Hatem & Tan, 2000). Color and spatial fea-

tures were used for segmentation. The hue in the HSI

(hue, saturation, and intensity) color system was found

effective in segmenting the cartilage areas while the a�

value in the CIE-Lab color system gave good results forsegmenting the bones. After a set of morphological

operations to refine the segmented cartilage and bone,

the two were combined into a bone-cartilage object

(Fig. 2), which was then used to characterize cartilage

ossification.

The fact that cartilage has a lighter color than bones

was used to yield image features. The color values would

vary along the bone-cartilage object differently for dif-ferent degrees of ossification or maturity (Hatem & Tan,

1998). Younger animals have more cartilage and thus

give a longer segment of light colors along the length of

the object. The average hue value was computed along

the length of the bone-cartilage object as image features.

After normalization, these hue values were used as input

vectors to a neural network. The output of the neural

network was the maturity score. The network had a two-layer log-sigmoid/log-sigmoid structure. It was trained

by using the back-propagation algorithm in the Matlab

environment (The Mathworks, Natick, MA). The

trained neural network served as both a feature extrac-

tor and maturity score predictor. Every set of samples

was divided into five sub-sets for training and testing in

a rotating fashion. For each rotation, four sub-sets were

used for training and the fifth sub-set for testing. Thepredicted maturity scores were compared with those

J. Tan / Journal of Food Engineering 61 (2004) 27–35 31

given by the professional grader and the percentage ofcorrect classification was calculated. The fuzzy concept

described in Tan et al. (1998) was incorporated in the

use of human scores. The average correct classification

rates for the five rotations varied from 55% to 77%.

To verify the results from the first set of samples, the

same algorithm was applied to the second set of sam-

ples. The correct classification rate ranged between 57%

and 86% with an average of 75%. This shows that theprocedure developed had certain generality and ro-

bustness.

3. Predicting USDA grades

USDA beef quality and yield grades were predicted

from image features (Lu, Tan, Gao, & Gerrard, 1998).

Beef carcasses (247 for quality grading, 241 for yield

grading) of the same maturity were selected in a com-

mercial packing plant during normal production oper-

ations. The carcasses were prepared the way they are

normally processed in the industry. The quality andyield grades were given by an official USDA grader.

Quality was appraised with an 8-point scale (prime,

choice, select, standard, commercial, utility, cutter, and

canner). Yield was scored with a 5-point scale (1–5).

Digital color images of the ribbed surfaces (steaks)

were captured immediately after the official grading

under the same lighting conditions. After various steps

of image segmentation to obtain the image regions ofinterest, image features were computed. The color and

marbling features are described in a previous section.

Fat thickness is an indicator of lean meat yield. Fig. 3

shows the fat area for the sample shown in Fig. 1. The

back-fat area was partitioned into two halves: the dorsal

part (the upper-left half of the fat area in Fig. 3) and the

Fig. 3. Fat area for the sample shown in Fig. 1.

ventral part (the lower-right half of the fat area in Fig.3). The thickness was computed in the direction ap-

proximately perpendicular to the back curvature (lower

boundary of the fat area in Fig. 3). The average thick-

ness of the ventral part and that of the dorsal part were

used as fat thickness measures.

A method of divergence maximization was developed

to maximize the differences among classes by applying

linear and nonlinear transforms (Lu & Tan, 1998). Lin-ear transforms were employed for quality classification.

Linear, quadratic and cubic transforms were applied for

yield classification. The Euclidean distance was used as a

similarity measure. Supervised classifiers were trained

for both quality and yield classification. The data set

was randomly partitioned into 10 sub-sets. Nine of the

10 sub-sets were used for training and the 10th used for

testing in a rotating manner till all the sub-sets were usedfor testing.

The rate of correct quality classification varied with

the rotations of the procedure. For three of the 10 ro-

tations, it was 100%; four of them, 90–99%; and the

remaining three, 60–70%. The overall rate of correct

quality classification was 85.3%. For classification of

biological products, which usually involve significant

variability and inconsistencies, the results were conside-red excellent.

With a linear transform, the rate of correct yield

classification for the 10 rotations of the procedure was

above 50% for all the rotations except two. Applying

quadratic and cubic transforms did not bring about

significant improvements. The linear transform yielded

the best performance. The overall rate of correct clas-

sification was 64.2%. This was considered reasonablygood, given the fact that the grades from a single grader

are usually not very consistent.

4. Predicting tenderness

In commercial production, marbling and color con-

stitute most of the beef quality grade since most animalsare young (‘‘A’’ or ‘‘B’’ maturity). Color and marbling,

and thus the quality grade, however, are only weak

predictors of the eating quality attributes, of which

tenderness is the most important. There is need to find

other quantitative predicators of beef tenderness. Mus-

cle texture was characterized by image processing. Col-

or, marbling and textural features were used to predict

beef tenderness measured with Warner–Bratzler shearforces and sensory evaluations.

4.1. Warner–Bratzler shear force

Two hundred sixty-five carcasses were selected to

differ in USDA quality grades in a commercial packing

plant. The samples were all of ‘‘A’’ maturity. A rib

Fig. 4. The saturation images of two samples of different tenderness

exhibit different image textures. The upper sample is less tender.

32 J. Tan / Journal of Food Engineering 61 (2004) 27–35

section (posterior end) was removed and vacuum-packaged. This rib section was fabricated into 2.54-cm

thick steaks for Warner–Bratzler shear force measure-

ments. The steaks were cooked on a Hobart Model-

CB51 char broiler. During broiling, steaks were turned at

4, 8, 11, and, if necessary, 14 min until they reached a

final internal temperature of 70 �C. Temperature was

monitored with a thermocouple-based thermometer.

After cooking, the steaks were cooled to room temper-ature (approximately 18 �C). Eight cores of 1.27-cm di-

ameter were removed from each steak parallel to the

muscle fibers and sheared with a Warner–Bratzler in-

strument. The average maximum shear force readings

were used in the data analysis and they varied from 1.25

to 5.24 kg (12.25–51.35 N).

Images of the ribbed surfaces were captured in the

plant immediately following quality grading. The sameexposure and focal distance were used for all the images.

The steak images were segmented into muscle, fat and

marbling as described previously. The color and mar-

bling features were those described in Section 2.1.

The image texture of beef muscles was considered

directly or indirectly related to tenderness. Fig. 4 shows

differences in the image textures of beef samples of dif-

ferent tenderness and these differences are measurableby image processing (Li, Tan, & Shatadal, 1999). Image

textural features based on pixel value run length and

spatial dependence were computed as predictors of

tenderness.

A pixel value run is a set of connected pixels having

the same or close pixel values. Pixel value runs can be

characterized by the pixel value, the length, and the di-

rection of the run. Within a run, the pixel values arewithin a relatively narrow band, forming a fairly smooth

area. The histogram of pixel value band run length is

defined as P ðR; h; T Þ, where P stands for probability, R is

run length in number of pixels, h is the run direction on

the image plane, and T is the pixel value band thickness

or range of pixel values included in a run (Gao & Tan,

1996a, 1996b). Features computed from the pixel value

band run length histogram included:

lrT mean, giving the average run length,

rrT standard deviation, showing variation of run

length, and

MrT third moment, indicating the imbalance of run

length.

Subscript T denotes the band thickness and it rangedfrom 2 to 7.

Pixel value spatial dependence can be described by a

matrix P . Entry P ði; j; d; aÞ is the normalized frequency

at which two pixels d-pixels apart in direction a have

pixel values i and j respectively. A general procedure can

be found in Haralick, Shanmugum, and Dinstein (1973)

for extracting textural properties from this matrix and

14 texture features are suggested. The first 13 featureswere used in the research.

Since the images were roughly isotropic, only direc-

tion a ¼ 0 (horizontal) was used. Distance d was chosen

from 2 to 7. A method based on the auto-correlation

function was used to determine the appropriate distance

(Li et al., 1998).

When d ¼ 3, the texture features had the highest

correlation with shear force and they were selected forsubsequent analyses. Principal component regression

(PCR), which is principal component analysis (PCA)

followed by MLR, and partial least squares (PLS) were

performed to test the improvement in shear force pre-

diction after adding the texture features to the color and

marbling features. The SAS Stepwise procedure was

performed to select the significant variables. Discrimi-

nation analysis was also used to classify the beef samples.Predictions of shear force from color and marbling

characteristics turned out to be very poor. When the

official color and marbling scores were used as predictor

variables, the shear force regression R2-value was less

J. Tan / Journal of Food Engineering 61 (2004) 27–35 33

than 0.05. When the color and marbling image featureswere used, R2 ¼ 0:16.

Muscle image textural features were combined with

the color and marbling features to improve the predic-

tion of shear force. Three methods were used to analyze

the usefulness of textural features: PCR, PLS and dis-

criminant analysis.

PCR gave a model with an R2-value of 0.18 while PLS

yielded a model with an R2-value of 0.34. These resultsare still very poor, but the PLS analysis showed that

adding the textural features significantly improved the

shear force prediction over using color and marbling

features only ðR2 ¼ 0:16Þ.The beef samples were segregated into three catego-

ries based on shear force values (61.71 kg, 1.71–3.09 kg,

and P3.09 kg). One hundred ninety-eight samples were

used as calibration data and 45 samples were used as testdata. The SAS Discriminant procedure with a linear

discriminant function was used to classify the samples.

The calibration samples could be classified with 76%

accuracy and the test samples with 77% accuracy.

Although there are possibilities to improve the

models for shear force prediction from color, marbling

and textural features, the results obtained were poor.

This may indicate that color, marbling and muscle im-age texture do not contain sufficient information to

define cooked-meat shear force. Nevertheless, the in-

clusion of textural features brought about significant

improvement. The image texture of muscles is at least as

significant an indicator of the mechanical properties of

beef muscles as color and marbling (Li et al., 1999).

4.2. Sensory tenderness

Beef samples were cut from the short lions of pasture-

finished steers and feedlot-finished steers of the same

maturity grade. Two sets of short strip loins were takenfrom the carcass samples, each having 97 pieces. One set

of the samples was used for sensory tenderness evalua-

tion and the other used for image analysis.

A 10-person panel was trained for sensory evaluation.

The steak samples were cooked individually by broiling

to an internal temperature of 68 �C. The cooked sampleswere then cut into small size and evaluated, while warm,

by the trained panel for tenderness. A line scale of 16.4-cm long was used for tenderness scoring.

Images of the beef samples were acquired individually

in a chamber with a uniform black background and

warm-white-deluxe fluorescent lighting. The same expo-

sure and focal distance were used for all the samples. The

images were processed and the features described in the

last section were computed. The features computed were

first screened for colinearity. By using MLR, the runlength features for band thickness T ¼ 2 and the spatial

dependence features for distance d ¼ 1 were selected.

Finally, 37 features were selected for further analysis.

The beef samples (97) were divided into a trainingsub-set (72) and a test sub-set (25). PCR and PLS were

performed to test the improvement in tenderness pre-

diction resulting from adding texture features. The SAS

Stepwise procedure was performed to select the vari-

ables significant for tenderness prediction. A neural

network model was also developed.

All the samples were used in PCR. The R2-value for

the regression model increased from 0.30 with the colorand marbling features alone to 0.72 after adding the

texture features. This improvement indicated that the

texture features made a significant contribution to beef

tenderness prediction.

In the PLS analysis, the first 14 factors explained

most of the variations and were used for regression. The

R2-values for the training data set and test data set were

0.35 and 0.17 respectively for using only the color andmarbling features. They increased to 0.70 and 0.62 re-

spectively after adding the texture features. The increase

in R2 -value again verified the usefulness of the texture

features as indicators of beef tenderness.

The 14 factors from the PLS analysis were used as

inputs and the tenderness scores were used as the output

for a neural network with one hidden layer. The back-

propagation algorithm was used to train the neuralnetwork. After the network was trained, the test data set

was used to test the model. A prediction R2-value of 0.70

was obtained, a similar result to those from PCR and

PLS (Li et al., 1999).

In a further study (Li et al., 2001), 59 crossbred steers

were used. The sample preparation and sensory evalu-

ation were conducted in the same manner as described

above. The samples were then segregated into ‘‘tough’’(tenderness score <8) and ‘‘tender’’ (otherwise) groups.

An area about 250 mm2 of the LD muscle was imaged

for each sample. Each color image (480 · 512 pixels) wasdivided into several sub-images of 64 · 64 pixels. Forty-five of these smaller images from the tough group and 45

from the tender group were randomly selected for fur-

ther analysis. The tenderness scores for the tough sam-

ples ranged from 4.17 to 6.67 and those for the tendersamples varied from 8.54 to 11.99.

A wavelet-based method was developed to decompose

a texture image into textural primitives of different sizes

(Li et al., 1999). In this research, the textural primitives

were rectangles. Each image (the saturation function

used) was decomposed into 35 different primitives. The

degree of presence of each primitive type was measured

with the percentage of the image area occupied by theprimitive, which was referred to as the primitive fraction.

Analysis of variance and correlation analysis reduced the

35 primitive fractions to 10, which were significantly

different between the tough and tender groups and were

not significantly correlated to one another.

The 10 primitive fractions were used for multivariate

classification based on the Fisher�s liner discriminant

Fig. 5. A near-infrared beefsteak image taken at wavelength ¼ 850 nm.

34 J. Tan / Journal of Food Engineering 61 (2004) 27–35

(SAS, 1996). The leave-one-out scheme was used, in

which 89 of the 90 samples were used to train the Fisher�slinear discriminant and the remaining one used for test-

ing. This scheme was repeated rotationally for all

90 samples. Out of the 45 tough and 45 tender samples,

38 tough and 37 tender samples were correctly classified,

giving an overall correct classification rate of 83.3%.

Instead of the Fisher�s linear discriminant, a neural net-work classifier was trained and used in the same rota-

tional fashion, which yielded an overall correct

classification rate of 82.3%. The nonlinear neural net-

work classifier showed little advantage over the linear

Fisher classifier.

Near-infrared (NIR) spectral features have also been

studied for beef tenderness prediction (Hatem, Tan, &

Shatadal, 1999). Forty steers were slaughtered andgraded. Short loins were removed for sensory tenderness

evaluation by a trained panel as described previously

and for NIR imaging. Steak images were captured with

an NIR camera and a tunable NIR filter for 20 different

wavelengths from 650 to 1030 nm at 20-nm increments.

Fig. 5 shows an image taken at 850-nm wavelength. The

image intensities were adjusted according to the spectral

characteristics of the camera and the filter.Areas of muscle, fat and marbling were segmented

and a 20-point spectrum was computed for muscle, fat

or marbling of every sample. Different combinations of

these spectra were used as inputs to train neural network

models for tenderness prediction. The most useful

combination found was the difference between the

muscle and fat spectra. The sensory tenderness scale was

from 0 and 16.4. If a prediction within unity from theaverage sensory panel score was considered correct, the

correct prediction rate was 67%. Over 90% of the pre-

dictions were within one standard deviation from the

averages of the 10-member panel scores.

5. Remaining challenges and opportunities

Great strides have been made towards computer

vision-based evaluation of meat. To make this techno-

logy practically useful, several challenges still remain. At

the same time, there are many research opportunities of

great potential.

5.1. Image segmentation

System robustness, real-time capability, sample han-

dling, and standardization are among the issues that

remain to be addressed. The last three issues will require

further research and development, but there do not seem

to be insurmountable difficulties. System robustness or

reliability, however, entails further in-depth research. Itis a major challenge to design a system that has sufficient

flexibility and adaptability to handle the biological

variations in meat products. At the core of this issue is

image segmentation. Reliably and consistently seg-

menting a meat image into parts of interest without

human intervention is prerequisite to success of all

subsequent operations and thereby prerequisite to the

eventual success in computer vision-based meat grading.Given the complex nature of meat images, no existing

algorithms are totally effective for meat image segmen-

tation. There have been successes in employing multi-

variate and nonlinear approaches (Gao et al., 1995; Lu

& Tan, 1998). Our recent work has focused on funda-

mental methodology development in multivariate clas-

sification that will lead to unsupervised image

segmentation algorithms effective for meat image pro-cessing (Lu, 2002). Unsupervised techniques and algo-

rithms with self-learning capability appear to be the key

to robust meat image segmentation. Basic methodology

development is needed.

5.2. Power of quality indicators

As discussed earlier, the conventional quality indi-cators of color, marbling and maturity lead to poor

predictions of important quality measures such as ten-

derness. There are both needs and opportunities to

empower the quality grading system by discovering new

measurable fresh-meat characteristics that are predictors

of cooked-meat quality. The ability of image processing

to characterize complex and subtle differences provides a

range of possibilities. Our efforts have focused on imagetexture and NIR characteristics. Many more possibili-

ties remain unexplored. In particular, images captured

with different wavelengths and even different modalities

have the potential to reveal additional quality informa-

tion.

J. Tan / Journal of Food Engineering 61 (2004) 27–35 35

6. Conclusions

Results from several applications show that color

image processing is a useful technique for meat quality

evaluation. Quality attributes such as muscle color,

marbling, maturity and muscle texture can be effectively

quantified and characterized. USDA quality and yield

grades and cooked-beef tenderness can be predicted to a

satisfactory accuracy. Computer vision is a promisingtechnology for objective meat quality grading. The ex-

isting research has formed a foundation so that a com-

puterized grading assistant can be implemented. This

assistant can provide a human grader with quantitative

information unobtainable subjectively. To replace the

human graders, however, much work remains. Among

the issues requiring continued research, effective meth-

odologies for consistent meat image segmentation are onthe top of the list. This will entail basic research on

image segmentation and classification. There are also

needs and opportunities for discovering new quality

indicators. In particular, imaging with different wave-

lengths and modalities hold promises.

References

AMSA (1991). American Meat Science Association Committee on

guidelines for meat color evaluation. Contribution no. 91-545-A,

ASMA, Savoy, IL, USA.

Chen, Y. R., McDonald, T. P., & Crouse, J. D. (1989). Determining

percent intra-muscular fat on ribeye surface by image processing.

1989 ASAE Annual International Meeting, Paper no. 893009,

ASAE, St. Joseph, MI, USA.

Chen, Y. R., Nguyen, M., & Park, B. (1995). An image processing

algorithm for separation of fat and lean tissues on beef cut surface.

1995 ASAE Annual International Meeting, St. Joseph, MI, USA.

Cross, H. R., Gilliland, D. A., Durland, P. R., & Seideman, S. (1983).

Beef carcass evaluation by use of a video image analysis system.

Journal of Animal Science, 57(4), 910–917.

Gao, X., & Tan, J. (1996a). Analysis of expanded-food texture by

image processing, part I: geometric properties. Journal of Food

Process Engineering, 19(4), 425–444.

Gao, X., & Tan, J. (1996b). Analysis of expanded-food texture by

image processing, part II: mechanical properties. Journal of Food

Process Engineering, 19(4), 445–456.

Gao, X., Tan, J., & Gerrard, D. E. (1995). Image segmentation in 3-

dimensional color space. 1995 ASAE Annual International Meeting,

Paper no. 953607, ASAE, St. Joseph, MI, USA.

Gerrard, D. E., Gao, X., & Tan, J. (1996). Determining beef marbling

and color scores by image processing. Journal of Food Science,

61(1), 145–148.

Haralick, R. M., Shanmugum, K., & Dinstein, I. (1973). Textural

features for image classification. IEEE Transactions on System,

Man, and Cybernetics, SMC-3(6), 610–621.

Hatem, I., & Tan, J. (1998). Determination of skeletal maturity by

image processing. 1998 ASAE Annual International Meeting, Paper

no. 983019, ASAE, St. Joseph, MI, USA.

Hatem, I., & Tan, J. (2000). Cartilage segmentation in vertebra images.

2000 ASAE Annual International Meeting, Paper no. 003125,

ASAE, St. Joseph, MI, USA.

Hatem, I., Tan, J., & Shatadal, P. (1999). Beef quality prediction by

using near-infrared image features. 1999 ASAE Annual Interna-

tional Meeting, Paper no. 993159, ASAE, St. Joseph, MI, USA.

Lenhert, D. H., & Gilliland, D. A. (1985). The design and testing of an

automated beef grader. 1985 ASAE International Meeting, Paper

no. 853035, ASAE, St. Joseph, MI, USA.

Li, J., Tan, J., Gao, X., Lu, J., Gerrard, D. E., Smith, G. C., Tatum, J.

D., & George, M. H. (1998). Image texture features as indicators of

beef muscle mechanical properties. 1998 ASAE Mid-Central

Conference, Paper no. MC98130, ASAE, St. Joseph, MI, USA.

Li, J., Tan, J., Martz, F., & Heymann, H. (1999). Image texture

features as indicators of beef tenderness. Journal of Meat Science,

53, 17–22.

Li, J., Tan, J., & Shatadal, P. (1999). Discrimination of beef images by

texture features. 1999 ASAE Annual International Meeting, Paper

no. 993158, ASAE, St. Joseph, MI, USA.

Li, J., Tan, J., & Shatadal, P. (2001). Classification of tough and tender

beef by image texture analysis. Journal of Meat Science, 57, 341–

346.

Lu, J. (2002). Transforms for multivariate classification and application

to tissue image segmentation. PhD Dissertation, University of

Missouri, Columbia, MO, USA.

Lu, J., & Tan, J. (1998). Application of image segmentation to meat

image processing. 1998 ASAE Annual International Meeting, Paper

no. 983016, ASAE, St. Joseph, MI, USA.

Lu, J., Tan, J., Gao, X., & Gerrard, G. E. (1998). USDA beef

classification based on image processing. 1998 ASAE Mid-Central

Conference, Paper no. MC98131, ASAE, St. Joseph, MI, USA.

Lu, J., Tan, J., & Gerrard, D. E. (1997). Pork quality evaluation by

image processing. 1997 ASAE Annual International Meeting,

Paper no. 973125, ASAE, St. Joseph, MI, USA.

Lu, J., Tan, J., Shatadal, P, & Gerrard, D. E. (2000). Evaluation of

pork color by using computer vision. Journal of Meat Science, 56,

563–566.

McDonald, T. P., & Chen, Y. R. (1990a). Application of morpho-

logical image processing in agriculture. Transactions of the ASAE,

33(4), 1345–1352.

McDonald, T. P., & Chen, Y. R. (1990b). Separating connected muscle

tissues in images of beef carcass ribeyes. Transactions of the ASAE,

33(6), 2059–2065.

McDonald, T. P., & Chen, Y. R. (1991). Visual characterization of

marbling in beef ribeyes and its relationship to taste parameters.

Transactions of the ASAE, 34(6), 2054–2499.

McDonald, T. P., & Chen, Y. R. (1992). A geometric model of

marbling in beef longissimus dorsi. Transactions of the ASAE,

35(3), 1057–1062.

SAS. (1996). The SAS System for Windows (Release 6.11). SAS, Cary,

NC, USA.

Shackelford, S. D., Wheeler, T. L., & Koohmaraie, M. (1995).

Relationship between shear force and trained sensory panel

tenderness ratings of 10 major muscles from Bos indicus and Bos

Taurus cattle. Journal of Animal Science, 73, 3333–3340.

Tan, J., Gao, X., & Gerrard, D. E. (1998). Application of fuzzy sets

and neural networks in sensory analysis. Journal of Sensory

Studies, 14, 119–138.

Wassenberg, R. L., Allen, D. M., & Kemp, K. E. (1986). Video image

analysis prediction of total kilograms and percent primal lean and

fat yield of beef carcasses. Journal of Animal Science, 62(6), 1609–

1616.