data mining query language

Click here to load reader

Post on 28-Nov-2015

36 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

dmql

TRANSCRIPT

  • h.l;u't [-',

    L

    DATA MINING QUERY LANGUAGESDMOL-A Oata tvtinine Q

    '+ Data mining language must be designed to facilitate flexible and effective knowledgediscovery.

    + Having a query language for data mining may help standardize the development ofplatforms for data mining systems.

    4 gut designed a language is challenging because data mining covers a wide spectrum oftasks and each task has different requirement.

    * Hence, the design of a language requires deep understanding of the limitations andunderlying mechanism of the various kinds of tasks.

    'S So...how would you design an efficient query language???',& Based on the primitives discussed earlier.

    + DMQL allows mining of different kinds of knowledge from relational databases and datawarehouses at multiple levels of abstraction

    + Adopts SQL-like syntax

    ,'*. Hence, can be easily integrated with relational query languages

    ,t. Defined in BNF grammar

    o [ ] represents 0 or one occurrenceo { } represents 0 or more occurrences

    .,$ Words in sans serif represent keywordsA DMQL can provide the ability to support ad-hoc and interactive data mining

    By providing a standardized language like SQL

    ' Hope to achieve a similar effect like that SQL has on relational database. Foundation for system development and evolution

  • 2. Facilitate information exchange, technology transfer, commercializationand wide acceptance

    I DesignD DMQL is designed with the primitives described as follows:

    .4x Syntax for DMQL'* Syntax for specification oftask-relevont dota* the kind of knowledge to be mined'l* con cept hi erarchy specification'&. pottern presentotion and visualizotion* Putting it all together

    -

    o DMQL query

    Syntax of DMQL,/ (DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)./ (DMQL_Stotement) ;;= (pota_Mining_Stotement) | (Concept_Hierorchy_Definition-Statement)

    | (V is ua I i zoti o n-o n d-P re se ntati o n )./ Doto_Mining_Stotement) ::= use database(dotabase_nome) | use data warehouse

    (doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}(Mine-Knowledge-Specification) in relevance to( attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) )[where (condition)] [order by (order_list) [group by (grouping-list)] [hoving (condition)]{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]

    ./ Mine_Knowtedge_Specificotion) ;;= (Mine-Char) | (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)

    ./ (Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s)),/ (Mine_Desc) ::= mine comparison [as (pattern-name)] lor (target-closs)where(torget_condition) {versus (contrast-closs_i) where (contrast-condition-i)l

    analyze (meosure(s))

    ,/ Mine-Assoc) ::= mine ossociation [as (pottern-name)] [motching (metopottern)]./ Mine_Closs) ::= mine classification [as (pottern-name)] analyze

    ( cl a ssify i n g-ott ri b ute -or-d i me n s i o n )

  • 7,

    ,/ (Concept_Hierorchy_Definition-statemeittl ::= define hierorchy (hierorchy-nonte)[for (attribute_or_dimension)] on (relotion_or_cube_or_hierarchy)as (hierarchy_description) [where (condition)]

    ./ (Visuolization_and_Presentotion) ::= display as (resultJorm) | {(Multilevel_Manipulation)}

    ./ (Multilevel_Monipulation) ::= roll up on (ottribute_or_dimension)I drill down on (ottribute_or_dimension) | odd (attribute_or_dimension)I d rop ( att ri b ute_o r_d i m e nsi o n )

    DMQL-Svntax for task-relevant data specification

    . Nomes of the relevont database or doto warehouse, conditions ond relevant attributes ordimensions must be specified

    . use ddtabase

  • / Discriminationt

    M i n e-Kn ow I e d ge-S Pe cifi coti o n : :=

    mine comporison [as pattern-name]for target-class where target-condition{versus contrast-class-i where confidst-condition-i}analYze measure(s)

    ''' Specifies thot discriminant descriptions ore to be mined, compore o given target closs of obiects

    with one or more contrasting c/osses (thus referred to os comparison)

    ' Andlyze specifies oggregote meosures

    . Example: mine comporison as purchose Groups for big Spenders where avg(t.price) >= 5L00versus budget Spenders where avg(l'price) < 5100 onalyze count

    / Associationo Mine-Knowledge-specification ::=

    mine associations [as pattern-namelr [matching(metaPattern)]

    o Specifies the mining of patterns of association

    o can provide templates (metapattern) with the matching clause

    o Example: mine associations as buyingHabits matching P(X: customer, W) and Q(X, Y; =2buys (X,Z)

    / Classificationo Mine-Knowledge-specification ::=

    m i ne cl o ssifi cqti o n Iospatte rn-na me]o no lyzeclassifyi ng-attri bute-or-di me nsion

    . Specifies that patterns for data classification are to be mined

    . Analyze clause specifies that classification is performed according to the values

    of (cl assifyi ng-attri bute-or-d i me nsion)

    . For categorical attributes or dimensions, each value represents a class (such aslow-risk, medium risk, high risk)

    4.

  • 5I For numeric attributes, each class defined by a range (such as 20-39,40-59, 60-89 for age)

    ' Example: mine classifications as classifyCustomerCreditRating analyze creditrating

    / To specifv what concept hierarchies to useuse h ie ra rchy for

    We use different syntax to define different type of hierarchies

    o schema hierarchies

    define hierarchy time_hierarchy on date as [date, monthquarter, year]

    o set-groupinghierarchies

    . define hierarchy age-hierarchy for age on customer as. levell: {young, middle_aged, seniorl < level0: allo level2: {2O, ...,39} < levelli youngo level2: {4O, ...,59} < levell: middle_agedo level2: {60, ..., 89} < levell: senior

    o operation-derived hierarchies

    Definehierarchyage_hierarchy for age on customer as{age_category (1), ...,age_category(5)} := cluster(default, age, 5)

  • 6,

    display as

    ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces

    To facilitate interactive viewing at different concept level, the following syntax is defined:

    M u lti level_Ma n i pu lati on'.'.= rol I u p o nallribute-or_d ime nsionI d ri I I dow n onattribute_or_dimensionI addattribute_or-dimension I

    dropattri b ute_o r_d i me nsi o n

    used ata ba seAll Electronics_d b

    usehiera rchylocation_hierarchy for B.address

    mine cha racteristics ascustomerPurchasing

    analyze count%

    in relevance to C.age,l.type, l.place-made

    from customer C, item l, purchases P, items-sold S, works-at W, branch

    wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD

    andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"

    andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" andl.prico= 100

    with noise threshold = 0.05

    displayas table

    / Other Data Minine Laneuaees & Standardization Efforts.'* Association rule language specifications

    o MSQL (lmielinski& Virmani'99)

    o MineRule (MeoPsaila and Ceri'96)

  • 7o Query flocks based on Datalog slntax (Tsur et al'98)

    * OLEDB for DM (Microsoft'2000)

    o Based on OLE, OLE DB, OLE DB for OLAP

    o lntegrating DBMS, data warehouse and data mining

    + CRISP-DM (CRoss-lndustry Standard Process for Data Mining)

    o Providing a platform and process structure for effective data mining

    o Emphasizing on deploying data mining technology to solve business problems

    + Other Data Mining Languages & Standardization Efforts

    + Association rule language specifications

    o MSQL (lmielinski& Virmani'99)

    o MineRule (MeoPsaila and Ceri'96)

    o Query flocks based on Datalog syntax (Tsur et al'98)

    "a! OTEDB for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)

    o Based on OLE, OLE DB, OLE DB for OLAP, C#

    o lntegrating DBMS, data warehouse and data mining

    + DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)

    o Providing a platform and process structure for effective data mining

    Hierarchy Specification

    A hierarchy is a root member of an alternate hierarchy, which is always at generation2 ofa dimension. Member value expressions are not allowed as hierarchy arguments.

    Alternate hierarchies are applicable to aggregate storage databases only.

    The dimension of the hierarchy argument passed to a function must match the dimension of theother arguments passed to the function. If they do not match, an error is return and the query isaborted.

    urN++7