chapter 2 a methodology for text-based medicinal...
TRANSCRIPT
33
CHAPTER 2
A METHODOLOGY FOR TEXT-BASED MEDICINAL
PLANTS’ INFORMATION RETRIEVAL
In Chapter 1, we have introduced certain applications of
computers connected to the research work carried out. The literature
survey revealed that inspite of computers used in various fields such
as agriculture, horticulture and forestry, we do not have standard
databases on images of medicinal plants. India has rich treasure of
medicinal plants spread across the country. These medicinal plants
have disease healing properties. Hence, this Chapter deals with the
development of a methodology for text-based information retrieval
from a developed medicinal plant’s database.
2.1 NECESSITY OF MEDICINAL PLANT’S DATABASE
At present, we find a renewed interest in traditional system of
medicine, known as Ayurveda. India has rich repository of medicinal
plants. A proper care is needed to conserve, domesticate and use the
medicinal plants. The efforts are on to improve Ayurveda and
naturopathy as alternate systems of medicine. There is a proverb,
prevention is better than cure. To keep ourselves healthy and to
prevent diseases the knowledge of medicinal plants is necessary. The
documentation on medicinal plants exists in the form of books and
palm leaves. There also exist some online databases with free access.
The data available on the said media is inadequate. A holistic and
34
systematic approach is very much essential to promote productivity of
medicinal plants.
The knowledge of Indian medicinal plants is required by a
common man in view of prevailing home remedies. It would be
difficult, if not impossible, to identify, classify and use the Indian
medicinal plants by novice users. The knowledge of botanical and
colloquial names of medicinal plants are essential. Even though the
medicinal plants are catalogued over the years, still there is a serious
deficiency in terms of conservation, establishment of natural
preserves, location and protection of rare species. There is no
database for wide usage at present. The need for leveraging the
advanced technology is necessary for enabling easy, rapid
identification and retrieval of medicinal plants.
In this connection, we have developed a database and proposed
a methodology for information retrieval from the database. The work is
useful to common people, Ayurveda practitioners, researchers and
other agencies are involving in the use of medicinal plants. The design
of database depends on the data content, accessing methods and user
requirements. We have obtained the basic authentic data from (P.K.
Warrier, [99] for the development of database. We have carried out
literature survey to know about the state-of-the-art in medicinal plant
database.
35
2.2 RELATED WORK
Following is the gist of the literature survey in the area of
application of databases in different fields.
Carsten Kettner et. al., have created relational database system
MEDPHYT, which contains a complete information of European
pharmaceutical and toxicological plants. The database contains the
details of plant botanical characteristics, therapeutic use, etymology,
and synonyms. Apart from the botanical characterization, there exists
information on medical relevant biochemical compounds and their
physicochemical characteristics and toxicological as well as
pharmaceutical facts. The user queries for the required data using
AND Boolean operator. The information is retrieved with matching text
strings about botanical name, biochemy and toxication [76].
Srikanta Bedathur et al., have created a database system, called
BODHI (Bio-diversity Object Database arcHItecture). The database is
specifically designed for catering the special needs of biodiversity
applications. BODHI hosts purely plant-related data. An object-
oriented database is constructed to process queries in multi-domain
such as taxonomy characteristics, spatial distribution and genomic
sequences [119].
A multi-faceted information on medicinal plants of India is aggregated
by Foundation for Revitalization of Local Health Traditions (FRLHT),
Bangalore, India in the form of computerized databases, specialized
36
reports, information products, websites and trade bulletins
(http://www.frlht.org.in/ -Encyclopedia of medicinal plants) [52].
A plant conservation software called “PlantCon” is proposed
(http://www.cimap.res.in), Central Institute of Medicinal and
Aromatic Plants) [51].
(Andres, have proposed an approach to retrieve plant Images using
fuzzy concepts such as fuzzy subset theory and fuzzy thesauri. Few
electronic databases like, MAPA, CABabstracts, AGRIS, AGRICOLA,
PASCAL, MEDILINE, EMBASE, APINMAP are presented in this work
on the database retrieval, certain query processing and access
mechanism are also necessary to achieve time and space complexity.
Hence, to achieve an optimal query processing of medicinal plants
database, the authors have presented few literature on database
access approaches [6].
(Cao and Badia), has proposed an approach for representing query
in the form of graph model, in which each node represents an
operation and an edge associated with weight gives the cost. The
query transformation algorithms are provided for writing nested SQL
queries into equivalent flat queries, which are processed more
efficiently. Algebric optimization rules are used. The selection and
projection operations are pushed down. The join operation is executed
to reduce the number of tuples. The method is efficient and takes
shorter time with increasing sizes of relation [18].
37
(Ceri and Gottlob), have developed a translator that transforms SQL
queries into relational algebra with aggregate functions. The SQL
queries are written in the form relational algebra and query
optimizations are carried out using aggregate functions. (Hellerstein),
[50] has introduced the use of rank parameters to each operation
based on selectivity and cost per tuple. A heuristics algorithm is
proposed for selecting the tuples for execution with lower rank having
less number of attributes [22].
From the literature, it is observed that there exist few databases
on plants giving information on only plants name, chemical
composition and biodiversity. We have also found some medicinal
plants database in the form of books and websites. However, a
standard Indian medicinal plants database is not available to the
researcher, practitioners and common users. Further, it is also
observed that amount of technology application effort gone into this
area is very less. Hence, we have designed a database for medicinal
plants keeping in mind the need for medicinal values of the plants and
image based information retrieval. We have also proposed a
methodology for efficient information retrieval from the database of
medicinal plants.
2.3 DESIGN OF DATABASE SCHEMA
Over the last decade, the database research community is
actively involved in building varieties of ways of data management,
namely, genetic data, criminal data, demographic data and the like.
38
There is no data management work cited on Indian medicinal plants.
A medicinal plant is identified by its morphological characteristics.
Each plant has certain biochemical content through which it is drawn
for therapeutic use. Such information is widely used by the people in
the treatment of diseases. Hence, we have listed plants with their
features of parts and medicinal usage. The Figure 2.1 shows the major
available features about the medicinal plants.
Figure 2.1: Medicinal plant properties
A medicinal plant is identified and classified based on these
features. But, sometimes, the medicinal plants identification becomes
a problem due to overlapping features. Hence, for proper identification
of plants, an unique discriminating features need to be devised. In
Botany, taxonomy expert classifies the plants based on many criteria.
The plants having common characteristics are grouped into a class.
Therefore, to classify plants, it is necessary to identify the
characteristics of the given group. One of the oldest and commonly
used methods of grouping plants is based on physical or
Plant
Leaf arrangement
code
MedicinalProperty
Family code
Leaf shape
Plant picture
Local name
Seed Shape Part Blooming Type
Flower position
Reproduction
Bark Type
Tree Type
Leaf Composition
FruitType
Type Medicinal PropertyType
39
morphological characteristics. These characteristics are shown in
Table 2.1. A plant is identified with its height, type, flowers and leaves.
Table 2.1: Database Schema PLANTS for Medicinal Plants
Sl
No
Coll
No
Hindi
name
Kannada
name
Englis
h name
Scientifi
cname
Distri
bution
About_pant Parts_
used
Property_used
1 AVS
2358
Tikho
r
Kuvehittu
Tavaksiri
Arrow
root
Marant
a
arundin
acea
Linn.
Cultiv
ated
throu
ghout
India
Herb,90-180cm high
leaves ovate-oblong
to ovate, lanceolate
base rounded or
cuneate,tip acute;
flowers white in
clusters,fertile
stamen with
appendage,ovary
one-celled,one-
ovuled.
Under
ground
rhizome
Starch of rhizome
refrigerant, tonic, cough
aphrodisiac, diarrhea
dyspepsia, bronchitis, a
nourishing food for infants,
invalids conval escents. As
ingredient in biscuits,cakes,
puddings, jellies,facepowder
2 AVS
1174
Jasu
m,
Jasut,
Java
Dasavala Shoe-
flower
plant,
hibisc
us
Hibiscu
s rosa-
sinensis
Linn.
Malvace
ae
Throu
ghout
india,
cultiv
ated
An evergreen shrub,
with pale grey or
whitish bark.
Leaves simple,
bright green, ovate,
entire,Serrate
towards the top,
minutehairs eneath;
flowers, solitary and
axillary, pedicels
jointed with pistil
and stamens
projecting from the
centre, anters
reniform or kidney
shaped; 1-celled.
roots,
leaves,
flowers
The roots are sweetish
useful in cough, venereal
diseases, menorrhagia,
pruritus and fever.
The leaves are burning
sensation, hepatopathy,
fatigue, abscesses,
expulsion of the placenta,
skin diseases, fever,
constipation and pruritus.
The flowers are used as
brain tonic, constipating,
urinary astringent and
cardiotinic. Useful in
kapha and pitta, boils,
inflammations,epilepsy,
cerebropathy,dysentery,
haemorrhoids,~
urethrorrhea, diabetes,
cardiac debility,
haemoptysis.
3 AVS 2516
Nim, Nimb
Huccabev, Cikkabevu
Neem tree,
Azadirachta indica
Throughout india,
in deciduous
forests,
also widely
cultivated
A medium to large sized tree, 15-20 m in height and 7 mt having grayish to bark greytubercled bark;
leaves compound, imparipinnte,leaflets
,opposite, elliptic, very oblique at
base,acute tip; flowers cream or
yellowish white in axillary panicles, staminal cylindric, widening above, 9-
10 lobed at the apex;
fruits1-seeded drupes,woody endocarp greenish yellow when ripe,
seeds ellipsoid,
bark,
leaves,
flowers,
seeds,
oil
The bark is astringent,
acrid,antiperiodic,
demulcent, insecticidal,
liver tonic, expectorant,
urinary astringent,
anthelmintic, pectoral and
tonic.
Use vitiated conditions of
pitta hyperdipsia, leprosy,
skin diseases, eczema,
leucoderma, pruritus,
intermittent and malarial
fevers, wounds, ulcers,
burning sensation, tumour,
uterine stimulant and
urinary astringent. They are
useful in fumours, leprosy,
skin diseasesl odontalgia,
intestinal worms,
haemorrhoids, pulmonary
tuberculosis,
opthalmopathy, wounds.
40
≈ : : : : : : : : : : :
≈
The arrangements include parts within a flower, arrangements of
groups of flowers, leaves shapes, patterns of veins, leaves bases and
apices shapes, margins, arrangements of leaflets, stems types, shapes
of fruits, sap color and smell of flowers etc.
We have designed a database for medicinal plants containing
the text, image features and information on medicinal plants in the
form of a relational database. The MS-Access format is used for data
storage and accessing is given through ASP.net interface. The text
database comprises of 500 medicinal plants species of which 164 are
herbs, 125 are shrubs and 159 are trees. The database also includes
52 plant species of climbers and creepers. The features for climbers
and creepers are not considered for image based retrieval. We have
collected images of 174 plant species from forests and parks, of which
32 plants species are herbs, 80 shrubs and 62 trees. However, few
plants species has irregular plant structure and uneven background.
From the images it is observed that these plants species are not
Sl No
Coll No
Hindi name
Kannada name
English
name
Scientificname
Distribution
About_pant Parts_ used
Property_used
500
AVS
2428
Lobiya
Santa
Alasandi Cow
pea
Vigna
unguicu
lata
Throu
ghout
india,
cultiva
t
A herb; leaves
pinnately 3-foliate,
leaflets 7.5-15 cm
long, broadly or
narrowly ovate,
often rhomboidal,
entire or slightly
lobed; flowers white,
pale violet or purple
with a yellow eye,
fruits pods, upto 90
cm long,10-20
seeded; seeds vary
in size, shape and
colour
Seeds The seeds are sweet,
astringent, appetiser,
laxative, anthelmintic,
anaphrodisiac, diuretic,
galactagogue and liver
tonic. They are useful in
vitiated conditions of kapha
and pitta, anorexia,
constipation, helminthiasis,
strangury, agalactia,
jaundice and general
debility.
41
suitable for developed feature extraction. Hence, we have considered
images of 30 plant species of each class, herbs, shrubs and trees for
the work. The leaves images of medicinal plant species are used to
investigate the leaf characteristics. The leaves samples of certain
species have varying shapes and margin features. The sample images
of plants species are shown in Figure 2.2.
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
Figure 2.2: Sample plant species in the database (a) Ocimum basilicum (b) Costus speciosa (c) Solanum melongena(d) Cycas circinalis
(e)Hibiscus rosa-sinensis (f)Ixora coccinia (g)Cocos nucifera
(h)Mangifera indica (i) Nerium indica (j) Prunus domestica
The master database (named as PLANTS) and contains the properties
such as, SlNo (Serial number), CollNo (Collection number), plant
names in Hindi, Kannada, English languages, Distribution,
About_plant, Parts_used and Property_used. The Hindi, kannada, and
English languages are national, Karnataka region and international
languages respectively. The property About_plant includes the
description of leave, flower, fruit and other properties required for the
description of a particular plant. The SlNo is used as primary key. The
Collection numbers are drawn from the book written by P.K.Warrier,
et. al,(2003). In addition to this, scientific names associated with the
42
medicinal plants are also kept as part of the database for unique
identification.
The global schema for medicinal plant database, PLANTS is
represented with Slno, Collno, Hindiname, Kanadaname,
Englishname, Scietificname, Distribution, About_plant, Parts_used,
Property_used. This schema contains few anomalies because of which
information is not inferred for further sub level access operation.
Hence, normalization of the databases is carried out.
2.3.1 Normalization of Databases
The database (PLANT) is not in 1NF (first normal form) and does
not satisfy the atomicity requirement. The schema contains multi-
valued attributes, about_plant (Mark I. Hwand et. al., 2001). The
about_plant is a composite attribute and consists of more than one
data. Hence, the attribute about_plant is decomposed to satisfy
atomicity and uniqueness as shown in Table 2.2. However, this
approach produces more number of duplications, which increases the
redundancy. Once the database is normalized, the 1NF produces
more number of null values if a plant does not contain the attributes
such as flower or fruit. Hence, we have formed separate sub schemas’
such as PLANT_BODY, LEAVES, FLOWERS and FRUIT_SEEDS. These
are self-descriptive and contain atomic values as shown in Tables 2.2
to 2.6. The tables contain no grouping of elements and each row has
unique identifiers such as serial number and collection number,
which forms a concatenated key.
43
Table 2.2: PLANTS database schema in 1 NF with atomic values
Sl
No
Coll
No
Hind
i nam
e
Kannada
name
Englis
h name
Scientificna
me
Distributio
n
About_pant Parts_
Used
Property_used
3 AVS 251
6
Nim, Nim
b
Huccabev,
Cikkabevu
Neem tree,
Azadirachta indica
Throughout india…,
medium to large sized tree, 15-20 mt in height and
7mt grayish bark,tubercled bark
bark, leaves,
flowers,
seeds..
The bark is astringent,
acrid,antiperiodic,….
3 AVS
2516
Nim,
Nimb
Huccabe
v, Cikkabev
u
Neem
tree,
Azadirachta
indica
Througho
ut india, , …
leavescompound,
imparipinnte,leaflets,opposite, elliptic,oblique
atbase,acute tip
bark,
leaves,
flowers
,
seeds,.
.
The bark is
astringent…..,
tonic
3 AVS
2516
Nim,
Nimb
Huccabe
v, Cikkabev
u
Neem
tree,
Azadirachta
indica
Througho
ut india, deciduous
forests, …
flowers cream or yellowish
white in axillary panicles, staminal cylindric,
widening above, 9-10 lobed at the apex;
bark,
leaves,
flowers
,
seeds,
…
The bark is
astringent……,
tonic
3 AVS
2516
Nim,
Nimb
Huccabe
v, Cikkabev
u
Neem
tree,
Azadirachta
indica
Througho
ut india, deciduous
forests, …
fruits 1-seeded
drupes,woody endocarp greenish yellow when ripe,
seeds ellipsoid,
bark,
leaves,
flowers
,
seeds,.
.
The bark is
astringent……,
tonic
Table 2.3: Decomposing PLANTS into 1NF Relation PLANT_BODY
≈ ≈
: : : : : : : :
i … … … … … …
Table 2.4: Decomposing PLANTS into 1NF Relation LEAVES
SlNo CollNo Tip Base Hairy Special Shape
3 AVS 2516 acute
tip
oblique at base with
acute angle
- Leaves compound,
imparipinnte,leaflets,opposite,
Elliptic
≈ : : : : :
≈
j … … … … … …
Table 2.5: Decomposing PLANTS into 1NF Relation FLOWERS
SlNo CollNo Color Petals Shape Others
3 AVS 2516 cream or yellowish white
9-10 lobed atapex
Cylindric flowers in axillary panicles, staminal cylindric, widening above,
≈ : : : : : : ≈
K … … … … …
Table 2.6: Decomposing PLANTS into 1NF Relation FRUIT_SEEDS
SlNo CollNo Fruit_shape Fruit_color Seed_property Overy Seed_no
3 AVS 2516 drupes,woody endocarp, seeds greenish yellow - - 1
SlNo CollNo Type Height_from Height-To Stem Branching Roots
3 AVS
2516
Evergreen Tree 15mt 20mt 7 mt having
grayish bark grey,tubercled bark, ….
- -
44
ellipsoid when ripe
≈ : : : : : : : ≈ L … … … … … …
The subschema LEAVES and FLOWERS violate 2NF (second
normal form). Because every non-prime characteristic, such as,
shape, base and apices of leaves are completely dependent on primary
key. The flower is also dependent on the color and petals. Hence,
further normalization makes the features to be independent so that
leaves and flowers are used to identify uniquely the plants. Therefore,
the databases LEAVES and FLOWERS are normalized into 2NF and
further into sub schemas. Thus, this property helps us to recognize
the medicinal plants. Similarly, other tables PLANT_BODY and
FRUIT_SEED are also subjected to 2NF normalization.
SubSchemas:
LEAVES1(SlNo,CollNo,Tip)
LEAVES2(SlNo,CollNO,Base),LEAVES3(SlNo,CollNo, Shape)
FLOWER1(SlNo,CollNo,Color),FLOWER2(SlNo,CollNo,Petals)
FLOWER3(SlNo,CollNo,Shape).
2.4 QUERY PROCESSING
The database design contains two major components, the
database and the query processor. The data base design writes the
data to and reads data from the data store. It manages record level
access with minimum cost. The query processor accepts Structured
Query Language (SQL) query from user, executes the query and
produces the relevant information. A typical query processing on a
Medicinal Plant Database for retrieving the required information is
45
shown in Figure 2.3. The typical stages through which a query
proceeds have the following functionality.
Figure 2.3: Query execution using heuristic rule
The query parser is designed to check the validity of the query
syntax. A syntactically correct query is represented translated into an
internal form. We have used a query tree, which is a useful
representation for relational calculus expressions. The query
standardization examines all the algebraic expressions that are
equivalent to the given query and chooses the one that has the least
cost. By cost, we mean the number of tuples executed. The Code
Generator transforms the access plan generated by the
standardization process into calls to the query processor. The query
processor is responsible for actual execution of the query.
The queries may have alternate execution plans, which are
equivalent in terms of the results, but there is variation in costs. The
cost is expressed in terms of amount of time needed to execute a
query. We have used two approaches for querying, Heuristic approach
Relational Calculus
Query Parser
Query Standardization process
Code Generator/Interpreter
Query Processor
Record–at-a-
time calls
Relational Algebra
Query Language (SQL)
Retrieved information
46
and Cost Based approach. The cost-based approach uses the
knowledge of the underlying data and storage structures, which is not
considered in this work. The goal of our approach is to retrieve the
image based information as efficiently as possible. We have used
heuristic approach for medicinal plants information retrieval.
The heuristic approach is rule-based method for producing an
efficient execution plan. The selection, projection and join operations
are used to convert a given SQL query into standard relational
algebraic expression. A query tree represents the algebraic query.
During execution, query tree is transformed into an equivalent tree by
rearranging the operations without loosing information (Ceri.S.and
Gottlob [22]. For the purpose of rearrangement, a heuristic rule is
used, where cascaded selects are broken into individual selects. In
addition, the selects and projects are pushed down so that query
results in minimum number of tuples during inner query and join
operation executions. Consider the query “Retrieve the medicinal
plants with ovate leaves”. The inner query is written as under:
σ shape = ‘ovate’ (LEAVES)
The query has resulted in retrieval of 30% of plants from the
database. From the detailed execution of a query with different
alternatives, the characteristics we observed that the number of tuples
returned for the characteristic ‘root’ is minimum and for ‘leaves’ is
maximum. We have concluded that mainly leaves than their other
parts characterize all the plants. Thus, leaves help for image based
47
retrieval of medicinal plants. For an illustration, the number of tuples
obtained after query execution on different parts of plants on plants
database of 100 plants are given in Table 2.7.
Table 2.7: Plants Database parameters after query execution
From the Table 2.7, it is revealed that ‘plant-body’ and ‘leaves’
are the most prominent parts in the identification of plants. Hence,
plant-body dimensions and leaves features are used for further image
based retrieval of information. For efficient and accurate access of the
information of plants, consider an illustration for the execution of the
query “Retrieve the plants with Kanada name and SlNo and having
ovate leaves”. The query is written in relational algebra as given
under.
∏ Kanada_name, slno ( ( σ shape= ‘ovate’(LEAVES)) ∞ LEAVES. slno = PLANTS. slno PLANT)
The possible three alternatives for execution of this query are shown
in Figure 2.4.
The push down strategy is adopted for the above query given in Figure
2.4. The query processing strategy by T3 is effective than T2 and T1.
The selection of ovate leaves is moved down at the leaf level of the
Relation Cardinality
PLANTS 100
FLOWERS 84
LEAVES 100
FRUIT_SEED 80
PLANTBODY 100
48
∏ Kanada_name, slno ∏ Kanada_name, slno ∏ Kanada_name, slno
∞ Slno=Slno σ Shape= ‘ovate’ ∞ Slno=Slno
σ Shape= ‘ovate’ PLANTS ∞ Slno=Slno ∏ slno ∏ Kanada_name,slno
LEAVES
LEAVES PLANTS σ Shape= ‘ovate’ PLANTS
LEAVES T1 T2 T3
Figure 2.4: (i) T1 and T2 are Query Trees (ii) T3 Efficient Tree
Tree and the projections and joins are rearranged so that
minimum number of tuples is obtained giving a good selectivity factor.
Based on selectivity and dilation factor efficient query execution plan
is generated. From this approach, we infer that the proposed approach
gives a good result for any type of plant species.
2.4.1 Query based retrieval of medicinal plant species
In order to retrieve the medicinal plants from the database with
respect to user needs a view-based approach is developed. The
characteristics differ from plant to plant.
We have established the relations in the view level. Many entries
in database PLANT are NULL as their values are either well defined or
exist. It not only suppresses the insert or update or delete anomalies
but also minimizes the normalization overhead. The main purpose of
the work undertaken is to uniquely identify the medicinal plants,
immaterial of normalization of the database for optimum processing
and storage. A typical example of common queries is given as in the
Box 2.1.
49
The variable names (preceded by @) represent the pass values
because the statements are all compounded and the ‘*value*’ format
detects the key words evenwhen they are embedded in a sentence
along with other words. Since “height” attribute is numeric, absolute
values need to be passed in order to detect the medicinal plants with
specific heights. The developed database design allows access to the
plant information by joining more than one subschema. Hence, we
have accessed the plant information with the combination of more
than one feature.
Therefore, the user expresses a detailed query, using
conjunctions and disjunctions of the medicinal plants features.
Moreover, in the case of medicinal plants, we use continuous
property, such as the leaf size, or an interval data such as between
2cm and 5 cm or a linguistic variable such as small, medium and tall.
It is also common to use modifiers such as very small to attribute a
different level of importance. An illustration of complex query
execution on global and fragmentation schemas for medicinal plant
database is given in Box 2.2. The query retrieves the medicinal plant’s
English name that has white flower, conical fruits and herbaceous.
We have presented some preliminary results of experimentation
on the designed database. Basically plants are classified as herbs,
shrubs and trees. Hence, to find the percenatage of these classfication
large numbers of queries are executed on the database. Plants are
divided mainly into three categories herbs, shrubs and trees using the
50
query as follows.
Box 2.1. Example of common queries
SELECT * from PLANT_BODY where type = ‘herb’.
SELECT * from PLANT_BODY where type = ‘herb’.
Figure 2.5 gives the percentage of retrieval of different medicinal
plants from the database. Pie chart indicates percentage of plants
grown in India in following categories of growth forms, herbs, shrubs
and trees.
(i) Classification based on the base tip of leaves SELECT[plants].[kanada_name],[plants].[hindi_name],[plants].[sci_name],[plants].[distribution],
[plants].[parts_used], [plants].[about_plant]
FROM plants, leaves
WHERE([plants].[slno]=[leaves].[slno]) And ([plants].[collno]=[leaves].[collno]) And (([leaves].[base] Like
[@base]) Or ([leaves].[tip] Like [@tip]));
(ii) Classification based on flower color
SELECT[plants].[kanada_name],[plants].[hindi_name],[plants].[sci_name],[plants].[distribution],[plants
].[parts_used],[plants].[about_plant], [flowers].[color], [flowers].[shape]
FROM plants, flowers
WHERE(([plants].[slno]=[flowers].[slno])And[plants].[collno]=[flowers].[collno])And (([flowers].[shape]
Like [@shape]) Or ([flowers].[color]=[@color])))
ORDER BY [COLOR];
(iii) Classification based on fruit shape description
SELECT *
FROM fruit_seed
WHERE len(fruit_shape)>1;
(iV) Classification based on fruit seed
SELECT[plants].[kanada_name],[plants].[hindi_name],[plants].[sci_name],[plants].[distribution],
[plants].[parts_used], [plants].[about_plant]
FROM plants, fruit_seed
WHERE(([plants].[slno]=[fruit_seed].[slno])And([plants].[collno]=[fruit_seed].[collno])And(([fruit_seed].[fr
uit_shape]Like[@Fruit_shape])Or [fruit_seed].[seed_property]=[@Seed])));
51
Box 2.2: Typical query execution on global and fragmentation schemas
Box 2.2 Different schema of the database
Hence, the conjugate queries usually require plant classification based
on plant type and their features. The shrubs are in small numbers
and considered for query processing.
Global schema
PLANTS(slno,collno,hindiname,kanadaname,englishname,scietificname,
distribution,about_plant, parts_used,property_used)
Subschema
LEAVES(slno,tip,base,spacial,hairy, shape)
FRUIT(slno,fruit_shape,fruit_color,seed_property,ovary,seed_no)
FLOWER(slno,color,shape,petals,others)
PLANTBODY(slno, type,height_from,height_to,stem,branching,roots)
Fragmentation Schema
PLANT_FL = SL FLOWER.color= ‘white’ PJslno (FLOWER)
PLANT_FR = SL FRUIT.shape= ‘conical’ PJ slno (FRUIT)
PLANT_TP = SL PLANTBODY.type= ’ herb’ PJ slno (PLANTBODY)
PLANT_TEMP = (PJ slno (PLANT_FL JN PLANT_FL.slno = PLANT_FR.slno PLANT_FR ))
PLANT_RECN = PLANT_TEMP JN PLANT_TEMP.slno = PLANT_TP.slno PLANT_TP
PLANTNAME_RECN_UNIQUE = PJslno,Englishname(PLANTS JN(PLANTS.slno =
PLANT_RECN.slno ) PLANT_RECN)
Plants_with_white_Flower={2,3,4,63,71,72,75,10,83,54,84,62,8,33,55,16,1,7,9,
13, 18,34, 5,11,6,20,61}
Plants_ With_Conical_Fruits = { 7,54 }
Herb_Plants={1,2,4,5,6,12,19,25,27,32,33,36,37,42,45,49,54,55,60,65,66,74,7
5, 76,77,82,84}
White flower∩ Conical_Fruits ∩ Herb_Plants = {54}
Therefore plant 54 is a herb which possesses a conical fruit with white flower.
52
Figure 2.5: Percentage of medicinal plants with categories of growth forms
Figure 2.6: Classification of medicinal plants based on the
plant properties
From the graph shown in Figure 2.6, we conclude that leaves, height,
flower, branching and fruits are the most common properties which
are defined for majority of the plants and amongst them leaves are the
most well defined properties in the recognition of plants. Hence, the
retrieval percentage obtained is high.
53
Figure 2.7: Classification based on the plant sub properties
The plants with properties given in Figure 2.6 constitute 90% of the
entire database. The climbers and creepers constitute only 10% of the
medicinal plants and are also identified by leaf and tendril properties.
Moreover, there are some plants, which have the same global but
varies with local properties. Hence, in order to classify, we have found
out some minute local properties specific to a particular plant. Such
properties are used to differentiate between similar plants. Figure 2.7
gives the percentage of retrieval versus sub properties. The medicinal
plants are categorized based on the sub properties combined in a
single query. From the experimentation, we have observed that most
of the plants are well defined by the common properties.
2.5 INDIAN MEDICINAL PLANT DATABASE ON THE WEB
In order to make the database online, it is web enabled and
implemented using ASP.Net and Web Matrix Server. The developed
design schema shows how easily the plant properties are fragmented
54
and analyzed. The OLEDB database interface is considered, which
involves the MS-Access and core component ADO.Net integrated with
Web Matrix. The connection string provides a direct path to MS-
Access, which is located on the web directory. For record
representation, a serialized object (DataGrid) and one non serializable
object (DataReader) are used. The grid is used for displaying and
joining results, where the reader is used to fetch independent records.
Summary
We have developed a standard database for Indian medicinal
plants from using various sources such as books, images and utility
package. The proposed methodology has given 85% accuracy for a
unique property. The database facilitates retrieval of plants from their
properties. Hence, the developed design methodology helps in the
preparation of home remedies. The work focuses on combining text
and image for image based information retrieval. Ultimate goal is to
develop a machine vision system for medicinal plants considering
their properties and sub properties for retrieval.