logic based pattern discovery new

Upload: vishnu-prasad-goranthala

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Logic Based Pattern Discovery New

    1/73

    LOGIC BASED PATTERN DISCOVERY

    A Project Report Submitted In Partial Fulfillment of the Requirements for the Award Of

    MASTER OF TECHNOLOGY

    IN

    SOFTWARE ENGINEERING

    BY

    E.SREE LAKSHMI

    (09C31D2503)

    UNDER THE GUIDENCE OF

    MRS RAZIYA

    Assoc.Prof.

    DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

    BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE

    NARSAMPET, WARANGAL - 506331

    2011-2012

  • 7/30/2019 Logic Based Pattern Discovery New

    2/73

    DEPARTMENT OF INFORMATION TECHNOLOGY

    BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE

    NARSAMPET, WARANGAL- 506331

    CERTIFICATE

    This is to certify thatE.SREE LAKSHMI, Roll No 09C31D2503 of the M.Tech has

    satisfactorily completed the dissertation work entitled LOGIC BASED PATTERN

    DISCOVERY in the partial fulfillment of the requirements of the M. Tech degreeduring this academic year 2011-2012

    Mr.D.Venkateshwarlu Mr. M.SRINIVAS

    Supervisor(s) Head of the

    Department

    External

  • 7/30/2019 Logic Based Pattern Discovery New

    3/73

    Abstract

    Previous studies have presented convincing arguments that a frequent

    pattern mining algorithm should not mine all frequent patterns but only the

    closed ones, because the latter leads to not only a more compact yet complete

    result set but also better efficiency. However, most of the previously developed

    closed pattern mining algorithms work under the candidate maintenance-and test

    paradigm, which is inherently costly in terms of runtime and space usage when the

    support threshold is low or the patterns, become long. A new Pattern mining

    algorithm will be proposed to discover domain knowledge report as coherent rules,

    where coherent rules would be discovered based on the properties of inference

    analysis approach. In this approach I can use the Back Scan Pruning technique.

  • 7/30/2019 Logic Based Pattern Discovery New

    4/73

    CHAPTER 1

    INTRODUCTION

    Data is stored in databases, data warehouses and other information repositories. As

    size of data increases, there is a bad need for data mining which is required to

    extract knowledge of interest for the users. Thus, data mining is a process of

    extracting knowledge from information repositories by extracting interesting data

    patterns representing knowledge. We get these interesting data patterns by

    evaluating data patterns (task relevant information) obtained from variousdatabases and data warehouses.

    Data Mining is carried out using various data mining functionalities in which

    Association Rule Mining (ARM) is commonly used to extract interesting data

    patterns. It is used by marketing and retail communities in order to find

    interesting association rules between frequent item sets which can boost the

    sales of an item set in the market in order to make profits. Mining association

    rules are useful for discovering relationships among items from large databases.

    Association rule mining deals with market basket data analysis for finding

    frequent item sets to generate valid and important association rules from them.

    Association Rule Mining finds interesting data patterns based on association

    relationship between various items of a data set by using association rules which

    are used to specify the association relationship between various items of a data set.

    {milk, bread} {Butter}, now in this association rule, there is correlation

    between two item sets: {milk, bread} and {butter}.

    A frequent item set is a set of items that appear together frequently in a data

    set. Now the item sets whose frequency of occurrence is > = min_support threshold

    value given by the domain experts are only considered to be frequent patterns.

    Hence, Association Rule Mining is also called as Frequent pattern Mining.

  • 7/30/2019 Logic Based Pattern Discovery New

    5/73

    An association between these frequent itemsets is interesting, if it satisfies

    two interestingness measures called support and confidence. By using a

    min_support value given by domain expert we lose some interesting association

    rules, as this threshold value cannot be always correct. So we should shift from

    existing support and confidence framework to a framework which uses a logic

    principle in order to check the interestingness of an association rule. As this

    framework relies completely on logic instead of min_support threshold value given

    by domain expert, all the interesting association rules are discovered. This

    framework is further enhanced by applying pruning technique in order to reduce

    search space so that, time and space complexity are reduced.

    1.1 OBJECTIVE

    To eliminate the need of using the minimum support threshold for discovering

    interesting association rules by obtaining association rules based on their support

    value (i.e. their frequency of occurrence) as observed in the transactional data set

    and then evaluating these association rules based on certain logic principles. This

    process is followed in order to get only strong and interesting association rules and

    completely eliminate uninteresting association rules which we get when mining is

    performed based on minimum support threshold value given by a domain expert.

    This process is to be further enhanced in order to reduce space and time

    complexity.

    1.2 PROBLEM STATEMENT

    The use of minimum support threshold generally assumes that:-

    A domain expert can provide this threshold value accurately, which is not always

    accurate.

    The knowledge of interest i.e. an interesting data pattern in the form of an

    interesting association rule can be obtained within this threshold value.

    This single threshold value is highly enough to get knowledge of interest

    required by the user.

    Because of these assumptions, we having following disadvantages:-

    Loss of association rules involving frequently observed items.

  • 7/30/2019 Logic Based Pattern Discovery New

    6/73

    Loss of association rules involving infrequently observed items.

    In order to overcome these disadvantages, we need to shift from we need to

    use a framework other than the existing support and confidence framework

    which discovers interesting association rules

    1.3DEFINITIONS

    Data mining: - It is a process of discovering knowledge from various

    information repositories by extracting interesting data patterns representing

    knowledge. These interesting data patterns are obtained by evaluating data

    patterns (i.e. task relevant information) obtained from various data sources like

    databases and data warehouses. Task - relevant information obtained from

    various data sources like databases and data warehouses is called data pattern. Association Rule Mining: - It is a data mining functionality which is used to

    find interesting data patterns based on association or correlation relationship

    between various items of a transactional data set.

    Association Rule: - It is a rule used to specify association relationship between

    items of a frequent itemset obtained from a transactional data set.

    Let I = {i1, i2 , im} be a set of items. Let D, the task-relevant data, be a

    set of database transactions where each transaction T is a set of items such that

    T I. Each transaction is associated with an identifier, called TID. Let A be a set

    of items. A transaction T is said to contain A, if and only if A T. An association

    rule is an implication of the form A B, where A I, B I, and A B =

    . The rule A B holds in the transaction set D with support s, where s is

    the percentage of transactions in D that contain A U B (i.e., the union of sets A

    and B, or say, both A and B). This is taken to be the probability, P (A U B). The

    rule

    A B has confidence c in the transaction set D, where c is the percentage of

    transactions in D containing A that also contain B. This is taken to be the

    conditional probability, P (B / A).

    Support (A B) = P (A U B)(1.3.1)

    Confidence (A B) = P (B / A)..(1.3.2)

  • 7/30/2019 Logic Based Pattern Discovery New

    7/73

    Itemset: - A frequent itemset refers to a set of items that appear together

    frequently in a transactional data set. For ex: {milk, bread}.

    Subsequence: - A data pattern that appears in a sequential order in a data set

    is called as a frequent sequential pattern or a frequently occurring subsequence.

    For ex:- Pattern which shows that

    customers tend to purchase first a PC followed by a digital camera and then a

    memory card is a frequent sequential pattern or subsequence.

    Implication: - If an association rule meets certain logic principles based on

    values of a truth table then it is called as an implication.

    An implication is formed by using two propositions p and q, from which we have

    four implications:-

    p q

    p q

    p q

    p q

    The symbol is used to describe the relation between p and q .

    And the symbol

    means a false proposition. Thus, an association rule X Y is mapped

    to p q iff both X and Y are observed, here X is mapped to p and Y

    is mapped to q.

    Equivalence: - An equivalence is a mode of implication, where an implication

    has to satisfy the following condition in order to qualify as an equivalence: -

    p q iff (p xor q).(1.3.3)

    Here p and q are implications and is equivalence symbol. The below

    given truth table is the truth table for equivalence. Here p and q are implications

    and is equivalence symbol. The below given truth table is the truth table for

    equivalence.

    Table 1.3.1 Truth Table for Logical Equivalence

  • 7/30/2019 Logic Based Pattern Discovery New

    8/73

    Thus, the association rule whose implication satisfies the equivalence condition

    given above is considered to be an interesting association rule.

    CHAPTER 2

    LITERATURE SURVEY

    This section consists information about knowledge discovery process, architecture

    of data mining system, data warehouse, association rule mining, apriori algorithm

    and FP Growth Algorithm.

    2.1 DATA MINING

    It is the process of extracting knowledge from large amounts of data. Data

    mining is an essential step in the process of knowledge discovery called KDD.

    Which is shown as follows: -

  • 7/30/2019 Logic Based Pattern Discovery New

    9/73

    Figure 2.1.1 KNOWLEDGE DISCOVERY FROM DATA (KDD)

    The essential steps involved in data mining are:-

    Data Preprocessing: - Data cleaning and data integration are steps involved in

    datapreprocessing. In data cleaning, noisy data (data errors) is removed and in

    data integration, data from multiple data sources in merged into a single unified

    format.

    Data selection: - Where data relevant to the analysis task are retrieved from

    the database.

    Data transformation: - Where data are transformed or consolidated into forms

    appropriate for mining by performing summary or aggregation operations.

    Data mining: - It is an essential process where intelligent methods are applied

    in order to extract data patterns.

    Pattern evaluation: - In this step we identify the truly interesting patterns

    representing knowledge based on some interestingness measures.

    Knowledge presentation: - Here visualization and knowledge representation

    techniques are used to present the mined knowledge to the user.Figure 2.1.2 Architecture of a Typical Data Mining System

  • 7/30/2019 Logic Based Pattern Discovery New

    10/73

    Database, data warehouse and other information repository: - This is a

    set of databases, data warehouses and other kinds of information repositories.

    Data cleaning and data integration techniques are performed on the data.

    Database or data warehouse server: - The database or data warehouse

    server is responsible for fetching the relevant data, based on the users data

    mining request.

    Knowledge base: -This is the domain knowledge that is used to evaluate the

    interestingness of resulting patterns. Such knowledge can include concept

    hierarchies, used to organize attributes or attribute values into different levels of

    abstraction.

    Data mining engine: - This is essential to the data mining system and ideally

    consists of a set of functional modules for tasks such as characterization,

    association and correlation analysis, classification, prediction, cluster analysis,

    outlier analysis, and evolution analysis.

    Pattern evaluation module: -This component uses interestingness measures

    and interacts with the data mining modules so as to focus the search towards

    interesting patterns. It uses interestingness thresholds to filter out discovered

    patterns.

  • 7/30/2019 Logic Based Pattern Discovery New

    11/73

    User interface: - This module communicates between users and the data

    mining system, allowing the user to interact with the system by specifying a

    data mining query or task, providing information to help focus the search, and

    performing exploratory data mining based on the intermediate data mining

    results. In addition, this component allows the user to browse database and data

    warehouse schemas or data structures, evaluate mined patterns, and visualize

    the patterns in different forms.

    2.2DATA WAREHOUSE

    A Data Warehouse is a subject oriented, integrated, time variant and non

    volatile collection of data which supports managerial decision making process.

    The four keywords specified in the above definition can be described as follows:-

    Subject Oriented: - A data warehouse provides a simple and concise view

    about particular subject issues by excluding data which is not useful for decision

    making process. Thus, it is specially designed to focus on the modeling and

    analysis of data for decision makers. For ex: - A data warehouse is organized

    around major subjects like customer, supplier, product and sales. Rather than

    concentrating on day-to-day operations.

    Integrated: -A data warehouse is constructed by integrating multiple

    heterogenous sources like relational databases and Data Warehouses.

    Time-Variant: - Data are stored to provide information from historical

    perspective like a period from 5 to 10 years.

    Non-Volatile: - A data warehouse is a permanent storage of data; it is a

    physically separate store of data transformed from the application data found in

    the operational environment. Due to this separation, a data warehouse does not

    require transaction processing, recovering and concurrency control mechanisms.

    It requires only two operations on data: -

    Initial loading of data.

    Access of data

  • 7/30/2019 Logic Based Pattern Discovery New

    12/73

    Figure 2.2.1 Three-Tier Data Warehouse Architecture

    1) Bottom Tier: - It is a data warehouse server. Back-end tools are used to feed

    data into the bottom tier from operational databases or other external sources.

    These tools perform data extraction, cleaning, integration and transformation. The

    data is extracted using application program interface known as gateways, which is

    supported by the underlying DBMS and allows client programs to generate SQL

    code to be executed at a server. For ex: - ODBC (Open Database Connection).

    2) Middle Tier: - The middle tier is an OLAP server implemented by using either a

    relational OLAP model i.e. an extended relational DBMS that maps operations on

    multi dimensional data to standard relational operations or a multidimensional

  • 7/30/2019 Logic Based Pattern Discovery New

    13/73

    OLAP model, that is, a special-purpose server that directly implements

    multidimensional data and operations.

    3) Top Tier: -The top tier is a front-end client layer, which contains query and

    reporting tools, analysis tools, and data mining tools.

    2.3 ASSOCIATION RULE MINING

    It is a data mining functionality or a data mining method used to find interesting

    data patterns based on association or correlation relationship among a large set

    of data items by using association rules which specify the association

    relationship among data items. For ex:- {milk , bread} {butter}

    The itemsets whose frequency of occurrence is equal to greater than or equal to

    min_support count threshold given by domain experts are only considered to befrequent patterns or frequent itemsets. This threshold value is provided to start

    the pattern discovery process. Thus, ARM is also called as frequent pattern

    mining.

    An association or correlation between the items of these frequent itemsets is

    said to be interesting if it satisfies two interestingness measures called support

    and confidence that are used to evaluate the interestingness of an association

    rule.

    Figure 2.3.1 Example Of Association Rule Mining

    For association rule A B:

    Support = support _count (A U B) / total no: of transactions = 2 / 4 = 50%

    Confidence = support_count (A U B) / support_count (A) = 2 / 3 = 66.6%

    So this rule is considered as interesting one.

    The above example shows that the confidence of rule A B can be easily

    derived from the support counts of Aand A U B. That is, once the support counts of

    A, B and A U B are known found, it is straightforward to derive the corresponding

    association rules

  • 7/30/2019 Logic Based Pattern Discovery New

    14/73

    A B and B A and check whether they are strong. Thus the problem of mining

    association rules can be reduced to that of mining frequent itemsets.

    Association rule mining can be viewed as a two-step process:

    1. Find all frequent itemsets: - Each of these itemsets will occur at least as

    frequently as a predetermined minimum support count (min_sup).

    2. Generate strong association rules from the frequent itemsets: - These

    rules must satisfy minimum support and minimum confidence and then only they

    are considered to be interesting association rules.

    Types of Association Rules: The different types of association rules used inoperational and relational databases are: -

    1. Quantitative Association rules

    2. Single-Dimensional Association rules

    3. Multi-Dimensional Association rules

    4. Multi level Association rules

    1) Quantitative Association rules: - This approach considers individual

    numerical attributes as quantities so it is called dynamic multidimensional

    association rule.

    Here, Aquan1 ^ Aquan2 Acat, where Aquan1 and Aquan2 are tests on

    quantitative attribute intervals (where the intervals are dynamically determined),

    and Acat tests a categorical attribute from the task-relevant data. Such rules have

    been referred to as two-dimensional quantitative association rules. For ex: - age(X,

    30::: 39) ^ income(X, 42K:::48K) buys(X, HDTV)

    2) Single-Dimensional Association rules: - It consists of only a single

    dimension, which is used multiple times. For ex: - age(X, 20::: 29) ^ buys(X,

    laptop) buys(X, HP printer)

  • 7/30/2019 Logic Based Pattern Discovery New

    15/73

    3) Multi-Dimensional Association rules: - Association rules that involve two or

    more dimensions or predicates can be referred to as multidimensional association

    rules.

    For ex: - age(X, 20::: 29) ^ occupation(X, student)

    buys(X, laptop)

    4) Multilevel Association rules: - When data mining is performed at multiple

    levels of abstraction rule which are extracted are referred to as multilevel

    association rules. This is done by using Concept Hierarchy.

    CHAPTER 3

    PROBLEM ANALYSIS

    3.1 PROBLEM DESCRIPTION

    The use of previous frequent pattern mining algorithms Apriori and FP Growth use a

    minimum support threshold to find frequent itemsets in order to discover

    interesting association rules and these algorithms are based on following

    assumptions: -

    The threshold value provided by the domain expert is very accurate.

    The frequent patterns (frequent itemsets) must have occurred frequently at least

    equal to the threshold.

    Because of these assumptions, we have the following disadvantages: -

    Loss of association rules involving frequently observed items.

    Loss of association rules involving infrequently observed items.

    No consideration for negative association rules.

    Loss of Association rules involving Frequently Observed Items :-

  • 7/30/2019 Logic Based Pattern Discovery New

    16/73

    Use of a minimum support threshold assumes that an ideal minimum support

    threshold exists for frequent patterns, and that a user can identify this threshold

    accurately. But it is unclear how to find this threshold as there is no universal

    standard for setting this threshold value. Different minimum support thresholds

    would result in inconsistent mining results, even when the mining process is

    performed on the same data set. i.e. a lower minimum support threshold would

    result in more unnecessary association rules being found, and a higher minimum

    support threshold would result in fewer association rules being found. We consider

    this situation as a case of losing association rules involving frequent items. The

    problem of losing frequent association rules can be solved only by lifting the

    minimum support threshold value.

    Loss of Association Rules involving infrequently Observed Items :-

    Typically, a data set contains items that appear frequently while other items

    rarely occur. For Ex: - In a retail fruit market, fruits are frequently observed but

    occasionally bread is also observed. Some items are rare in nature or infrequently

    found in a data set. These items are called rare items. If a single minimum support

    threshold is used and is set high, those association rules involving rare items will

    not be discovered. Use of a single and lower minimum support threshold, on the

    other hand, would result in too many uninteresting association rules having that

    rare item. This is called the rare item problem.

    No consideration for negative association rules:-

    Algorithms like Apriori and FP Growth do not give importance to the absence of

    items within a transactional data set. They can discover only positive association

    rules.

    For ex: - An association rule like {milk, bread} {butter}, which tells

    about the absence of both antecedent and consequent parts of an association ruleare not considered to be discovered during the mining process.

    3.2 EXISTING SYSTEM

    The existing system can either implement Apriori algorithm or FP Growth algorithm.

    The main input parameter given to the existing system is the minimum support

  • 7/30/2019 Logic Based Pattern Discovery New

    17/73

    threshold value in order to to get frequent itemsets. An association or correlation

    between these frequent itemsets is said to be interesting, if it satisfies two

    interestingness measures called support and confidence, which are used to

    evaluate the interestingness of an association rule. In this way we can discover

    interesting association rules.

    Demerits of existing system are:-

    It is assumed that the domain expert provides an accurate minimum support

    threshold value.

    It is also assumed that the frequent patterns or frequent itemsets have occurred

    frequently atleast equal to the threshold.

    Negative association rules are not given any importance.

    Loss of association rules involving frequently observed items.

    Loss of association rules involving infrequently observed items.

    3.3PROPOSEDSYSTEM

    A novel framework is proposed which removes the demerits of the existing system

    by removing the need for minimum support threshold value. Here associations are

    discovered based on logical implications. The principle of this approach considers

    that an association rule should be reported only when there is enough logical

    evidence about it in the data set. To do this, we should consider both the presenceand absence of items during the mining process.

    For ex: An association such as A B will be reported only when there are fewer

    occurrences of A B, A B but more occurrences of A B

    Figure 3.3.1 Framework of Association Rules Based On Pseudo

    implications

  • 7/30/2019 Logic Based Pattern Discovery New

    18/73

    In the first step, the association rules that are observed in the data set are

    mapped to their implications based on comparison between their support count

    values (i.e. their frequency of occurrence). The implications which are obtained in

    this way are called as pseudo implications.

    In the second step, these pseudo implications are mapped to a mode of

    implication called equivalence based on some conditions. The pseudo implications

    which satisfy all these conditions are called as pseudo implications of equivalences.

    If only a pair of pseudo implications satisfy the same conditions, then they together

    form a coherent rule.

    Coherent rule: - If a pair of pseudo implications satisfy all the four conditions of

    equivalence then those two pseudo implications form a coherent rule, these four

    conditions are: -

    S (X, Y) > S (X, Y)

  • 7/30/2019 Logic Based Pattern Discovery New

    19/73

    S (X, Y) > S( X, Y)

    S ( X, Y) > S (X, Y)

    S ( X, Y) > S ( X, Y)

    Association rules decoupled from coherent rules are interesting association

    rules as they are related to true implication based only on logic but not on domain

    knowledge. So coherent rules dont need users to preset the minimum support

    threshold to get frequent patterns as they can be identified via truth table values.

    Pseudo implications: - Association rules mapped to implications based on

    comparison between their support count values are called pseudo implications.

    These implications are called as pseudo as they resemble real implications. A

    pseudo implication is judged true or false based on comparison between supports

    but an implication is judged true or false based on binary values 1 or 0.Mapping association rules to Equivalences

    The association rules can be mapped to equivalences in two ways: -

    Mapping an association rule to equivalence using a single transaction record

    Mapping an association rule to equivalence using multiple transaction records

    Mapping an association rule to equivalence using a single transaction

    record

    The steps involved in this process are: -

    Itemsets of an association rule are mapped to propositions of an implication:

    For ex: - Presence of an itemset X is mapped to proposition p=T, iff X is

    observed. Absence of an itemset X is mapped to p=F i.e. p iff X is not

    observed. Thus, Itemsets X and Y are mapped to p and q=T iff both X and Y

    are observed.

    Mapping association rules to implications:

    For ex: - An association rule X Y is mapped to implication p q, iff

    X is observed but Y is not observed.

  • 7/30/2019 Logic Based Pattern Discovery New

    20/73

    Mapping association rules to equivalences:

    Association rules are mapped to equivalences based on truth table values

    (T, F, F, T) for implications p q, p q, p q and p

    q.

    For ex: - Association rule X Y is mapped to p q iff

    X Y is true (as p = T and q=T, therefore p equivalence q = T)

    X Y is false

    X Y is false and X Y is true.

    Mapping an Association rule to equivalence using multiple transaction

    records

    In multiple transaction records, an itemset X is observed in many transaction

    records. So based on comparison between presence and absence of an itemset,

    each itemset can be mapped to propositions p and q as follows:-

    If S(X) > S( X) then itemset X is mapped to p = T. X is considered interesting

    as it is mostly observed in the dataset.

    But if a union of itemsets such as (X, Y) is involved, then X Y can be mapped

    to an implication p q only when union of itemsets i.e. (X, Y) is more observed

    in transactions when compared to (X, Y), ( X, Y) and ( X, Y).

    Thus, Association rules that are mapped to their implications by comparing their

    support count values are called pseudo implications.

    These pseudo implications can be mapped to equivalence when following

    conditions are met:-

    S (X ,Y) > S(X , Y)

    S(X ,Y) > S( X , Y)

  • 7/30/2019 Logic Based Pattern Discovery New

    21/73

  • 7/30/2019 Logic Based Pattern Discovery New

    22/73

    Predator (Boolean attribute)

    Toothed (Boolean attribute)

    Backbone (Boolean attribute)

    Breathes (Boolean attribute)

    Venomous (Boolean attribute)

    Fins (Boolean attribute)

    Legs (numeric attribute, integers value range: [0,2,4,5,6,8])

    Tail ( Boolean attribute)

    Domestic (Boolean attribute)

    Cat size (Boolean attribute)

    Type (Numeric attribute, integer value range: [1, 2, 3, 4, 5, 6, 7] which

    represents each class of animals.

    Table 3.3.1 Total Frequency of Occurrence for Each Class of Animals

    Let the minimum support threshold be 5%, now the classes of animals whose

    frequency of occurrence is greater than minimum support threshold are considered

    to be frequent. And the classes of animals whose frequency of occurrence is less

    than minimum support threshold are considered to be infrequent.

    Thus, the classes of animals: Mammals, Birds, Fishes, Insects and Invertebrates are

    considered to be frequent. And the classes of animals: Amphibians and Reptiles are

    considered to be infrequent.

    Problem with infrequent association rules

    The rules that involve infrequent items i.e. infrequent classes of animals like

    Amphibians and Reptiles are not discovered via Apriori approach though they are

  • 7/30/2019 Logic Based Pattern Discovery New

    23/73

    interesting as their support count is less than minimum support threshold value. But

    by using Coherent Rule Mining approach we can discover these kinds of association

    rules.

    Let us consider an association rule:-

    {Eggs (1), toothed (1), breathes (1), tail (1)}) {Reptile (1)}

    Let, X = {eggs (1), toothed (1), breathes (1), tail (1)}

    And Y = {Reptiles (1)}

    Now the association rule X Y is reported as an interesting association rule

    only when there is enough logical evidence about it in the data set, i.e. if the

    association rule satisfies all the four conditions for equivalence :-

    S (X, Y) > S(X, Y)

    S(X, Y) > S( X, Y)

    S( X, Y) > S(X, Y)

    S( X, Y) > S( X, Y)

  • 7/30/2019 Logic Based Pattern Discovery New

    24/73

    Table 3.3.2This can be shown with the help of a table given below : -

    Frequency of co-occurrences Consequent

    Y

    Y={Rept

    ile(1)}

    Not

    Y={Reptile(

    0)}

    Tot

    al

    Anteced

    ent X

    X={eggs(1),

    toothed(1),breathes(1),tail(1)}

    3 1

    4

    Not

    X={eggs(0),toothed(0),breathes(0

    ),tail(0)}

    2 9

    5

    97

    Total

    5 96

    10

    1

    Thus, the table given above shows that: -

    S (X U Y) > S (X U Y) (3 > 1)

    S (X U Y) > S ( X U Y) (3 > 2)

    S ( X U Y) > S (X U Y) (95 > 1)

    S ( X U Y) > S ( X U Y) (95 > 2)

    Thus the coherent rule formed is: -

    {Eggs (1), toothed (1), breathes (1), tail (1)} {Reptile (1)}

    {Eggs (1), toothed (1), breathes (1), tail (1)} {Reptile (1)}

    Thus, the given association rule is considered to be interesting as its interestingness

    is based on pure logic i.e. it is logically correct. This coherent rule specifies that an

    animal which lays eggs, which has teeth, which breathes through lungs and has a

  • 7/30/2019 Logic Based Pattern Discovery New

    25/73

    tail is a reptile but an animal which does not have all these four attributes is not a

    reptile. Problem with Frequent Association Rules

    Let us consider the comparison between the two approaches: Coherent Rule Mining

    and Apriori.

    Table 3.3.3 Frequent rules found for class Mammal using the two

    approaches

    Now by using the approach called coherent rule mining, we get 5 coherent rules,

    from which we get 10 association rules which are logically correct. But by using

    Apriori approach, we get some unnecessary association rules that are not

    interesting.

    By using Apriori approach we get an association rule: -

    Domestic (1) mammal (1) (whose support = 7.9 %, confidence =

    61.5%), this can be shown through a table given below: -

    Table 3.3.4Table for the above given association rule

    Frequency of co-

    occurrences

    Consequent Y

    Y={Mammal(1

    )}

    Not

    Y={Mammal(0)}

    Total

  • 7/30/2019 Logic Based Pattern Discovery New

    26/73

    Antecede

    nt

    X

    X={Domestic(1)}

    8 5

    13

    Not

    X={Domestic(0)}

    33 55

    88

    Total 41 60 101

    The above table shows that the association rule does not satisfy one condition for

    equivalence as: -

    S (X U Y) < S ( X U Y) (i.e. 8 < 33)

    So this association rule is not considered to be interesting, as it does not satisfy one

    condition for equivalence [i.e. S (X U Y) > S ( X U Y)]. We can also observe that, 33

    out of 41 mammals i.e. 80.5% of mammals are not domestic, but this fact is ignored

    and a weak association rule like

    Domestic (1) mammal (1) is reported which when used in a business

    application which leads to wrong decisions.

    Enhancement to Proposed System

  • 7/30/2019 Logic Based Pattern Discovery New

    27/73

    Here we are using the concept of pruning; Pruning is the process of removing

    the super sets of the item sets that do not satisfy any one of the four conditions for

    equivalence.

    For ex: - let us consider the association rule Domestic (1) mammal (1) is not

    considered to be interesting as it is logically incorrect. Therefore all its supersets are

    pruned, this is called as downward closure property.

    This downward closure property is used within the Forecast To Prune Technique,

    where we calculate the coherent rule measure H for an itemset. Now by

    considering an opening window value w% (i.e. minimum support threshold) we

    calculate a moving window value mv, where mv = H (H * w %). If the H value ofa superset is not within the range (mv, H) of the itemset, then that superset is

    pruned.

    We calculate the coherent rule measure H of an association rule as follows:-

    H = [min (Cov Y, m Cov Y) min (q1, q2) min (q3, q4)]

  • 7/30/2019 Logic Based Pattern Discovery New

    28/73

    min (Cov Y, m Cov Y)

    Here Cov Y = q1 + q3, m Cov Y = q2 + q4, q1 = X U Y, q2 = X U Y, q3 =

    X U Y and q4 = X U Y.

    Table 3.3.5 Table for the above association rule: - Milk (1) Mammal

    (1)

    Frequency of co-

    occurrences

    Consequent Y

    Y={Mammal(1)

    }

    Not Y={Mammal(0)} Total

    Anteceden

    t X

    X={milk(1)}

    41 0 41

    Not X={milk(0)}

    0 60 60

    Total 41 60 101

    We calculate the coherent rule measure H of above association rule as per

    above given table as follows:-

    H = [min (41, 60) min (41, 0) min (0, 60)] = 1

    min (41, 60)

    and mv = H (H * 5%) = 1 (1* 5%) = 1 0.05 = 0.95

    Now, we consider a superset of the itemset {milk (1), mammal (1)} which is:-

    {milk (1), feathers(0), and mammal (1)}, whose association rule is:-

    Milk (1), feathers (0) mammal (1) whose H value is 1. Which is

    within the range of (mv, H) i.e. (0.95,1). Therefore it is not pruned.

  • 7/30/2019 Logic Based Pattern Discovery New

    29/73

    CHAPTER 4

    SYSTEM STUDY

    4.1 Feasibility Study

    4.1.1 Technical Feasibility

    Evaluating the technical feasibility is the trickiest part of a feasibility study.

    This is because, at this point in time, not too many detailed design of the system,making it difficult to access issues like performance, costs on (on account of the

    kind of technology to be deployed) etc. A number of issues have to be considered

    while doing a technical analysis.

    i) Understand the different technologies involved in the proposed system:

    Before commencing the project, we have to be very clear about what are the

    technologies that are to be required for the development of the new system.

    i) Find out whether the organization currently possesses the required

    technologies:

    Is the required technology available with the organization?

    If so is the capacity sufficient?

    For instance

    Will the current printer be able to handle the new reports and forms required

    for the new system?

    4.1.2 Operational Feasibility

    Proposed project is beneficial only if it can be turned into information systems

    that will meet the organizations operating requirements. Simply stated, this test of

    feasibility asks if the system will work when it is developed and installed. Are there

  • 7/30/2019 Logic Based Pattern Discovery New

    30/73

    major barriers to Implementation? Here are questions that will help test the

    operational feasibility of a project.

    Is there sufficient support for the project from management from users? If the

    current system is well liked and used to the extent that persons will not be able to

    see reasons for change, there may be resistance. Are the current business methods

    acceptable to the user? If they are not, Users may welcome a change that will bring

    about a more operational and useful systems. Have the user been involved in the

    planning and development of the project? Early involvement reduces the chances of

    resistance to the system and in general and increases the likelihood of successful

    project. Since the proposed system was to help reduce the hardships encountered.

    In the existing manual system, the new system was considered to be operational

    feasible.

    4.1.3 Economical Feasibility

    Economic feasibility attempts 2 weigh the costs of developing and

    implementing a new system, against the benefits that would accrue from having the

    new system in place. This feasibility study gives the top management the economic

    justification for the new system.

    A simple economic analysis which gives the actual comparison of costs and

    benefits are much more meaningful in this case. In addition, this proves to be a

    useful point of reference to compare actual costs as the project progresses. There

    could be various types of intangible benefits on account of automation. These could

    include increased customer satisfaction, improvement in product quality better

    decision making timeliness of information, expediting activities, improved accuracy

    of operations, better documentation and record keeping, faster retrieval of

    information, better employee morale.

    The system is completely based on the Model View Controller architecture.This architecture defines a pattern in which the three individual components will

    work to-gather. The model is considered with all the business logic. The view will

    consider the user interface design and controller will transfer the data between the

    model and view.

  • 7/30/2019 Logic Based Pattern Discovery New

    31/73

    In our system the view is designed using Java swing components provided

    with java programming language. The model and controller are developed using

    pure core java classes.

    The following block diagram will show the MVC architecture.

    CHAPTER 5

    REQIREMENT ANALYSIS

    5.1 Functional Requirements

    Inputs:

    The input to the system will be a dataset. Zoo dataset has taken

    as an input dataset in this project. The inputs will be as follows.

    Select type of animal:

    Select one animal type among the given seven animal types.

    Processing

    The input data i.e. zoo data is processed by the model.

    Output

    The output will be coherent rules which satisfy the

    propositional logic.

    Performance requirements

  • 7/30/2019 Logic Based Pattern Discovery New

    32/73

    Due to the high scope of the software, the performance

    requirements are high. The speed at which the software is

    required to operate is nominal.

    Error message design

    The design of error messages is an important part of the user

    interface design. As user is bound to commit some errors or other

    while designing a system the system should be designed to be

    helpful by providing the user with information regarding the error

    he/she has committed.

    Error detection:

    Even though every effort is make to avoid the occurrence of

    errors , still a small portion of errors are always likely to occur ,

    these type of errors can be discovered by using validations to

    check input data.

    The system is designed to be a user friendly one. In other words the system

    has been designed to communicate effectively with the user. The system has been

    designed with Button.

    5.2 Non Functional Requirements

    The major non-functional Requirements of the system are as follows

    Usability

    The system is designed with completely automated process hence there is no or

    less user intervention.

    Reliability

    The system is more reliable because of the qualities that are inherited from the

    chosen platform java. The code built by using java is more reliable.

    Performance

  • 7/30/2019 Logic Based Pattern Discovery New

    33/73

    This system is developing in the high level languages and using the advanced

    front-end and back-end technologies it will give response to the end user on

    client system with in very less time.

    Supportability

    The system is designed to be the cross platform supportable. The system is

    supported on a wide range of hardware and any software platform, which is

    having JVM, built into the system.

    5.3 Hardware Requirements

    The hardware used for the development of the project is:

    PROCESSOR : A CPU with CORE2duo

    RAM : 2 GB RAM

    MONITOR : 17 COLOR

    HARD DISK : 80 GB

    5.4 Software Requirements

    The software used for the development of the project is:

    OPERATING SYSTEM : ANY OS

    USER INTERFACE : AWT AND SWINGS

    PROGRAMMING LANGUAGE : JAVA

    IDE/WORKBENCH : MY ECLIPSE 6.0

    CHAPTER 6

    SYSTEM DESIGN

  • 7/30/2019 Logic Based Pattern Discovery New

    34/73

    Design is multi-step process that focuses on data structure software

    architecture, procedural details, (algorithms etc.) and interface between modules.

    The design process also translates the requirements into the presentation of

    software that can be accessed for quality before coding begins.

    Computer software design changes continuously as new methods; better

    analysis and broader understanding evolved. Software Design is at relatively early

    stage in its revolution.

    Therefore, Software Design methodology lacks the depth, flexibility and

    quantitative nature that are normally associated with more classical engineering

    disciplines. However techniques for software designs do exist, criteria for design

    qualities are available and design notation can be applied.

    6.1 Modules:

    The system after careful analysis has been identified to be presented with the

    following modules:

    The Modules involved are

    User Interface Module

    Mapping Association rule Deriving Coherent Rules from mapped association rules

    6.1.1 User Interface Module:

    Rich user interface developed in order to select the type of animal from the

    drop down list and a button to for generating coherent rules of that type.

    6.1.2 Mapping Association rule

    In this module we derive the approach of mapping Association rule to

    equivalence. A complete mapping between the two is realized in three progressive

    steps. Each step depends on the success of a previous step. In the first step, item

    sets are mapped to propositions in an implication. Item sets can be either observed

    or not observed in an association rule. Similarly, a proposition can either be true or

  • 7/30/2019 Logic Based Pattern Discovery New

    35/73

    false in an implication. Analogously, the presence of an item set can be mapped to

    a true proposition because this item set can be observed in transactional records.

    6.1.3 Deriving Coherent Rules from mapped association rules

    The pseudo implications of equivalences can be further defined into a

    concept called coherent rules. We highlight that not all pseudo implications of

    equivalences can be created using item sets X and Y. Nonetheless, if one pseudo

    implication of equivalence can be created, then another pseudo implication of

    equivalence also coexists. Two pseudo implications of equivalences always exist as

    a pair because they are created based on the same, since they share the same

    conditions, two pseudo implications of equivalences. Coherent rules meet the

    necessary and sufficient conditions and have the truth table values of logical

    equivalence, by definition; a coherent rule consists of a pair of pseudo implications

    of equivalences that have higher support values compared to another two pseudo

    implications of equivalences. Each pseudo implication of equivalence is an

    association rule with the additional property that it can be mapped to a logical

    equivalence.

  • 7/30/2019 Logic Based Pattern Discovery New

    36/73

    6.2 Module Diagrams:

    6.2.1 UML Diagrams

    Use Case Diagram

  • 7/30/2019 Logic Based Pattern Discovery New

    37/73

  • 7/30/2019 Logic Based Pattern Discovery New

    38/73

    Sequence Diagram:

  • 7/30/2019 Logic Based Pattern Discovery New

    39/73

    Activity Diagram:

  • 7/30/2019 Logic Based Pattern Discovery New

    40/73

    6.3 Algorithm used: Search Algorithm

    We propose to search for coherent rules by exploiting the antimonotone property

    found on the condition S(X, Y ) > S(~X, Y ) targeting at a preselected consequenceitem set Y .

    6.3.1 Distinct Features of ChSearch

    We list some features of ChSearch compared to a priori. Unlike a priori, ChSearch:

    Does not require a preset minimum support threshold. ChSearch does not

    require a preset a minimum support threshold to find association rules.

    Coherent rules are found based on mapping to logical equivalences. From the

    coherent rules, we can decouple the pair for two pseudoimplications of

    equivalences. The latter can be used as association rules with the property

    that each rule can be further mapped to a logical equivalence.

    Does not need to generate frequent item sets. ChSearch does not need to

    generate frequent item sets. Nor does it need to generate the association

    rules within each item set. Instead, ChSearch finds coherent rules directly.

    Coherent rules are found within the small number of candidate coherent rules

    allowed through its constraints.

    Identifies negative association rules. ChSearch, by default, also identifies negative

    association rules. Given a set of transaction records that does not indicate item

    absence, a priori cannot identify negative association rules. ChSearch finds the

  • 7/30/2019 Logic Based Pattern Discovery New

    41/73

    negative pseudoimplications of equivalences and uses them to complement both

    the positive and negative rules found.

    6.3.2 Quality of Logic-Based Association Rules

    Coherent rules are defined based on logic. This improves the quality of association

    rules discovered because there are no missing association rules due to threshold

    setting. A user can discover all association rules that are logically correct without

    having to know the domain knowledge. This is fundamental to various application

    domains. For example, one can discover the relations in a retail business without

    having to study the possible relations among items. Any

    Association rule that is not captured by coherent rules can be denied its

    importance. These rules are either in contradiction with others (among the positiveand negative association rules) or less stringent compared to the definition of

    logical equivalences.

    As an example, consider that a nonlogic-based association rule is found within 100

    transaction records between item i1 and item i2 with confidence at 75 percent and

    support at 30 percent. This association rule is not important if the absence of the

    same item i1 (i.e. ~i1) is found associated with item i2 with a higher confidence at

    85 percent and a higher support at 51 percent. Without the further analysis, the

    first discovery misleads decision makers to conclude that item i1 is associated with

    item i2, whereas the relation having item ~i1 is, in fact, stronger. Coherent rules

    avoid this problem all together based on logic.

  • 7/30/2019 Logic Based Pattern Discovery New

    42/73

    CHAPTER 6

    IMPLEMENTATION

    Implementation is the most crucial stage in achieving a successful system

    and giving the users confidence that the new system is workable and effective.

    Implementation of a modified application to replace an existing one. This type of

    conversation is relatively easy to handle, provide there are no major changes in the

    system.

    Each program is tested individually at the time of development using the

    data and has verified that this program linked together in the way specified in theprograms specification, the computer system and its environment is tested to the

    satisfaction of the user. The system that has been developed is accepted and

    proved to be satisfactory for the user. And so the system is going to be

    implemented very soon. A simple operating procedure is included so that the user

    can understand the different functions clearly and quickly.

  • 7/30/2019 Logic Based Pattern Discovery New

    43/73

    Initially as a first step the executable form of the application is to be created

    and loaded in the common server machine which is accessible to the entire user

    and the server is to be connected to a network. The final stage is to document the

    entire system which provides components and the operating procedures of the

    system.

    6.1 SCREEN SHOTS

  • 7/30/2019 Logic Based Pattern Discovery New

    44/73

    Fig.1.Animal Table

    Screen description: the above figure represents the table that is used in this

    project ,which is used for retrieval and comparison of attributes.

  • 7/30/2019 Logic Based Pattern Discovery New

    45/73

    Fig.2. Main Window

    Screen description: the above figure represents the mainwindow of this project,

    through which we can select animal type that is mammal or reptile ete.

  • 7/30/2019 Logic Based Pattern Discovery New

    46/73

    Fig.3. Selecting Mammal Type

    Screen description: the above figure represents that mammal type selected from

    dropdown list.

  • 7/30/2019 Logic Based Pattern Discovery New

    47/73

  • 7/30/2019 Logic Based Pattern Discovery New

    48/73

    Fig.4. Coherent rules generated for Mammal type

    Screen description: the above figure represents the output generated (Coherent

    rules) for the mammal type.

  • 7/30/2019 Logic Based Pattern Discovery New

    49/73

    6.2 SAMPLE CODE

    Orderedpowerset.java

    package coherent;

    import java.util.ArrayList;

    import java.util.Iterator;

    import java.util.StringTokenizer;

    public class OrderedPowerSet {

    private ArrayList list=new ArrayList();

    Iterator it1=null;

    ArrayList result=new ArrayList();

    public ArrayList getSet(String[] src)

    {

    result.add(" ");

    int source[]=new int[src.length];

    for(int var=0;var

  • 7/30/2019 Logic Based Pattern Discovery New

    50/73

    Iterator it1=list.iterator();

    while(it1.hasNext())

    {

    ArrayList list1=getSetList(list,source,src);

    list=new ArrayList();

    list=list1;

    list1=new ArrayList();

    it1.next();

    }

    return result;

    }

    public ArrayList getSetList(ArrayList list,int[]

    source,String[] src)

    {

    ArrayList res=new ArrayList();

    it1=list.iterator();

    while(it1.hasNext())

    {

    String s=(String)it1.next();

    String ss=s;

    int x=Integer.parseInt(getLastToken(s,","));

    for(int i=x;i

  • 7/30/2019 Logic Based Pattern Discovery New

    51/73

    String s1=ss+","+source[i];

    res.add(s1);

    addToResult(src,s1);

    }

    }

    return res;

    }

    public void addToResult(String src[],String str)

    {

    StringTokenizer st=new StringTokenizer(str);

    StringBuffer sb=new StringBuffer();

    while(st.hasMoreTokens())

    {

    int loc=Integer.parseInt(st.nextToken(","));

    if(st.hasMoreTokens())

    {

    sb=sb.append(src[loc-1]+",");

    }

    else

    {

    sb=sb.append(src[loc-1]);

    }

    }

    String r=new String(sb);

  • 7/30/2019 Logic Based Pattern Discovery New

    52/73

    result.add(r);

    }

    private String getLastToken(String strValue,String token )

    {

    String strlttoken = null;

    String []strArray = strValue.split(token);

    strlttoken = strArray[strArray.length-1];

    return strlttoken;

    }

    }

    Powerset.java

    package coherent;

    import java.sql.SQLException;

    import java.util.*;

    import java.io.FileWriter;

    import java.io.IOException;

    import java.sql.Connection;

    import java.sql.ResultSet;

    import java.sql.Statement;

    public class PowerSet {

  • 7/30/2019 Logic Based Pattern Discovery New

    53/73

    ArrayList list;

    int values[]=new int[4];

    public void getSet(String args[],String filename) throws IOException {

    list=new ArrayList();

    OrderedPowerSet ops=new OrderedPowerSet();

    list=ops.getSet(args);

    FileWriter fw=new FileWriter(filename);

    Iterator itr = list.iterator();

    StringBuffer s=new StringBuffer();

    int powercount=0;

    while(itr.hasNext())

    {

    powercount+=1;

    String item=itr.next().toString().replace("{","[").replace("}","]");

    s=s.append("{"+item+"},");

    }

    String result=new String(s);

    fw.write(result);

    fw.flush();

    fw.close();

    System.out.println(powercount);

    System.out.println("With "+filename+" ,PowerSet is generated

    in current working directory");

    s=new StringBuffer();

  • 7/30/2019 Logic Based Pattern Discovery New

    54/73

    setList(list);

    result=null;

    }

    int len;

    Connection con=null;

    Statement stmt=null;

    CompareTable ct = new CompareTable();

    public void doCalculation(ArrayList list1,ArrayList list2,int

    selectedatr) throws SQLException{

    ct.connectionEstablish();

    Iterator it2=list2.iterator();

    String qryatr2=new String();

    while(it2.hasNext()){

    String s=it2.next().toString();

    s=s.replace('[', ' ');

    s=s.replace(']', ' ');

    s=s.trim();

    if(!s.equals("")){

    qryatr2=new String(s);

    }

    }

    DBConnection db=new DBConnection();

    int q1=0,q2=0,q3=0,q4=0;

    try {

  • 7/30/2019 Logic Based Pattern Discovery New

    55/73

    con = db.getConnection();

    stmt=con.createStatement();

    } catch (ClassNotFoundException ex) {

    Logger.getLogger(PowerSet.class.getName()).log(Level.SEVERE, null, ex);

    }

    int totalcount=0;

    Iterator it1=list1.iterator();

    int lines=0;

    while(it1.hasNext())

    {

    String s=it1.next().toString();

    s=s.replace('[', ' ');

    s=s.replace(']',' ');

    s=s.trim();

    if(!s.equals(""))

    {

    StringTokenizer st=new StringTokenizer(s,",");

    int i=0;

    len=0;

    while(st.hasMoreTokens()){

    st.nextToken();

    len=len+1;

    }

    StringTokenizer st1=new StringTokenizer(s,",");

  • 7/30/2019 Logic Based Pattern Discovery New

    56/73

    String qryatrs1[]=new String[len];

    boolean legbo=false;

    while(st1.hasMoreTokens())

    {

    qryatrs1[i]=st1.nextToken().trim();

    if(qryatrs1[i].equals("LEGS"))

    {

    legbo=true;

    }

    i=i+1;

    }

    int legatr[]={0,2,4,5,6,8};

    if(legbo)

    {

    for(int j=0;j

  • 7/30/2019 Logic Based Pattern Discovery New

    57/73

    rs.next();

    q1=rs.getInt(1);

    rs.next();

    q2=rs.getInt(1);

    rs.next();

    q3=rs.getInt(1);

    rs.next();

    q4=rs.getInt(1);

    if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3)))

    {

    totalcount+=1;

    String rel1=ct.displayOutput1ForLeg(qryatrs1,qryatr2,legatr[j]);

    System.out.println(q1+" "+q2+" "+q3+" "+q4);

    System.out.println(rel1);

    values[0]=q1;

    values[1]=q2;

    values[2]=q3;

    values[3]=q4;

    this.setValues(values);

    this.setResult(rel1);

    }

    }

    }

  • 7/30/2019 Logic Based Pattern Discovery New

    58/73

    else

    {

    String qry1=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,true,true);

    String qry2=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, true, false);

    String qry3=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, true);

    String qry4=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, false);

    String qry=qry1+" UNION ALL "+qry2+" UNION ALL "+qry3+" UNION ALL "+qry4;

    ResultSet rs=stmt.executeQuery(qry);

    rs.next();

    q1=rs.getInt(1);

    rs.next();

    q2=rs.getInt(1);

    rs.next();

    q3=rs.getInt(1);

    rs.next();

    q4=rs.getInt(1);

    if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3)))

    {

    String re1=ct.displayOutput1(qryatrs1,qryatr2);

    System.out.println(q1+" "+q2+" "+q3+" "+q4);

    System.out.println(re1);

    values[0]=q1;

    values[1]=q2;

    values[2]=q3;

  • 7/30/2019 Logic Based Pattern Discovery New

    59/73

    values[3]=q4;

    this.setValues(values);

    this.setResult(re1);

    this.setX(qryatrs1);

    this.setY(qryatr2);

    totalcount+=1;

    }

    }

    }

    }

    System.out.println("Total Count = "+totalcount);

    ct.closeConnection();

    }

    public void setList(ArrayList list)

    {

    this.list=list;

    }

    public ArrayList getList()

    {

    return list;

    }

    public void setValues(int[] values)

    {

    this.values=values;

  • 7/30/2019 Logic Based Pattern Discovery New

    60/73

    }

    public int[] getValues()

    {

    return values;

    }

    private String[] x;

    private String y;

    private String result;

    public String getResult() {

    return result;

    }

    public void setResult(String result) {

    this.result = result;

    }

    public String[] getX() {

    return x;

    }

    public void setX(String[] x) {

    this.x = x;

    }

    public String getY() {

    return y;

    }

  • 7/30/2019 Logic Based Pattern Discovery New

    61/73

    public void setY(String y) {

    this.y = y;

    }

    }

    Comparetable.java

    public class CompareTable {

    Connection con=null;

    Statement stmt=null;

    ResultSet rs=null;

    CompareTable()

    {

    }

    public void connectionEstablish() throws SQLException

    {

    try {

    con = DBConnection.getConnection();

    } catch (ClassNotFoundException ex) {

    }

    }

    public void closeConnection() throws SQLException

  • 7/30/2019 Logic Based Pattern Discovery New

    62/73

    {

    con.close();

    }

    public String prepareQuery(String atr1[],String atr2,int i,int j,boolean

    b1,boolean b2) throws SQLException

    {

    StringBuffer sb=new StringBuffer();

    if(!b1)

    {

    sb=sb.append("NOT(");

    }

    else

    {

    sb=sb.append("(");

    }

    for(int k=0;k0)

    {

    sb=sb.append(" and ");

    }

    sb=sb.append(atr1[k]+"="+i);

    }

    sb=sb.append(")");

  • 7/30/2019 Logic Based Pattern Discovery New

    63/73

    if(b2)

    {

    sb=sb.append(" and (TYPE="+j+")");

    }

    else

    {

    sb=sb.append(" and NOT(TYPE="+j+")");

    }

    String str=new String(sb);

    String qry="select count(*) from animal where "+str;

    return qry;

    }

    public String displayOutput1(String atr1[],String atr2)

    {

    StringBuffer s=new StringBuffer();

    s=s.append("{");

    for(int i=0;i

  • 7/30/2019 Logic Based Pattern Discovery New

    64/73

    for(int i=0;i

  • 7/30/2019 Logic Based Pattern Discovery New

    65/73

    }

    }

    s=s.append(" } ");

    s=s.append("==> { "+atr2+"(1) }\n");

    s=s.append("Not{ ");

    for(int i=0;i

  • 7/30/2019 Logic Based Pattern Discovery New

    66/73

    if(!b1)

    {

    sb=sb.append("NOT(");

    }

    else

    {

    sb=sb.append("(");

    }

    for(int k=0;k0)

    {

    sb=sb.append(" and ");

    }

    if(atr1[k].equals("LEGS"))

    {

    sb=sb.append(atr1[k]+"="+legatr);

    }

    else

    {

    sb=sb.append(atr1[k]+"="+i);

    }

    }

    sb=sb.append(")");

  • 7/30/2019 Logic Based Pattern Discovery New

    67/73

    if(b2)

    {

    sb=sb.append(" and (TYPE="+j+")");

    }

    else

    {

    sb=sb.append(" and NOT(TYPE="+j+")");

    }

    String str=new String(sb);

    String qry="select count(*) from animal where "+str;

    return qry;

    }

    }

  • 7/30/2019 Logic Based Pattern Discovery New

    68/73

    CHAPTER 7

    SCOPE FOR FUTURE DEVELOPMENT

    Every application has its own merits and demerits. The project has covered

    almost all the requirements. Further requirements and improvements can easily be

    done since the coding is mainly structured or modular in nature. Changing the

    existing modules or adding new modules can append improvements.

    Further enhancements:

    Further enhancements can be made to the application, so that the windows

    application functions are very attractive and useful manner than the present one.

    We applied the logic on zoo dataset. We can apply the same logic to any transaction

    dataset by doing slight modifications as well.

  • 7/30/2019 Logic Based Pattern Discovery New

    69/73

    CHAPTER 8

    CONCLUSION

    We used mapping to logical equivalences according to propositional logic to

    discover all interesting association rules without loss. These association rules

    include item sets that are frequently and infrequently observed in a set of

    transaction records. In addition to a complete set of rules being considered, these

    association rules can also be reasoned as logical implications because they inherit

    propositional logic properties. Having considered infrequent items, as well as being

    implicational, these newly discovered association rules are distinguished from

    typical association rules. These new association rules reduce the risks associated

    with using an incomplete set of association rules for decision making, as following:

    Our new set of association rules avoids reporting that item A is associated

    with item B if there is a stronger association between item A and the absence

    of item B. Using prior association rules that do not consider this situation

    could lead a user to erroneous conclusions about the relationships among

    items in a data set. Again, identifying the strongest rule among the same

    items will promote information correctness and appropriate decision making.

    The risks associated with incomplete rules are reduced fundamentally

    because our association rules are created without the user having to identify

    a minimum support threshold. Among the large number of association rules,

    only those that can be mapped to logical equivalences according to

    propositional logic are considered interesting and reported.

  • 7/30/2019 Logic Based Pattern Discovery New

    70/73

    CHAPTER 9

    BIBLOGRAPHY

    Books:

    Java 2 complete Reference by Herbert Schieldt

    2. Software Engineering, A Practitioners Approach, 6th Edition, Tata

    McGrawHill

    3. Software Testing principles and practices, Srinivasan Desikan, Gopalaswami

    Ramesh, Pearson edition, India.

    4. A Unified Modeling Language User Guide, 2nd edition, Book by Grady Booch,

    James RamBaugh, IvarJacobson for UML concepts and models.

    References:

    Logic-Based Pattern Discovery Alex Tze Hiang Sim, Maria Indrawan, Samar

    Zutshi and Bala Srinivasan.

    R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules between

    Sets of Items in Large Databases,

    Sim, A.T.H, Indrawan, M., Srinivasan, B., Mining Infrequent and Interesting

    Rules from Transaction Records

    S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, Dynamic Itemset Counting and

    Implication Rules for Market Basket Data,

    X. Wu, C.Zhang & S.Zhang Mining Both Positive and Negative Association

    Rules

  • 7/30/2019 Logic Based Pattern Discovery New

    71/73

    CHAPTER 10

    APPENDIX

    10.1 List of Symbols

    S.NO SYMBOL NAME SYMBOL DESCRIPTION

    1 ClassClasses represent a collection of

    similar entities grouped together.

    2 Association

    Association represents a static

    relationship between classes.

    3 Aggregation

    Aggregation is a form of association.

    It aggregates several classes into

    single class.

    4 Actor

    Actors are the users of the system

    and other external entity that react

    with the system.

    5 Use Case

    A use case is a interaction between

    the system and the external

    environment.

    6 Relation (Uses)It is used for additional process

    communication.

    7 CommunicationIt is the communication between

    various use cases.

  • 7/30/2019 Logic Based Pattern Discovery New

    72/73

    8 State

    It represents the state of a process.

    Each state goes through various

    flows.

    9 Initial StateIt represents the initial state of the

    object.

    10 Final StateIt represents the final state of the

    object.

    11 Control FlowIt represents the various control flow

    between the states.

    12 Decision BoxIt represents the decision making

    process from a constraint.

    13 Node

    Deployment diagrams use the nodes

    for representing physical modules,

    which is a collection of components.

    14 Data Process/State

    A circle in DFD represents a state or

    process which has been triggered

    due to some event or action.

    15 External Entity

    It represent any external entity such

    as keyboard, sensors etc which are

    used in the system.

    16 TransitionIt represent any communication that

    occurs between the processes.

    17 Object Lifeline

    Object lifelines represents the

    vertical dimension that objects

    communicates.

    18 MessageIt represents the messages

    exchanged.

  • 7/30/2019 Logic Based Pattern Discovery New

    73/73

    10.2 List of Abbreviations

    S.NO ABBREVATION DESCRIPTION

    1 DFD Data Flow Diagram

    2 API Application Programming Interface

    3 UML Unified Modelling Language

    4 GUI Graphical User Interface

    5 IDE Integrated Development Environment

    6 LBPD Logic based Pattern Discovery

    7 AR Association Rule

    8 PD Pattern Discovery