query processing

of 61 /61

Author: goodnesskani

Post on 26-Oct-2014




4 download

Embed Size (px)






The main objective is to find out the approximate answers for the systems

facing Dynamic Failure, in less time. We are going to do query processing in peer to peer network.

1.2 EXISTING SYSTEM : The existing system uses structured P2P network with Distributed Hash Table. In the existing system exact query processing is possible, but there are certain disadvantages. Structured network is organized in such a way that data items are located at specific nodes in the network, and those nodes maintain some state information to enable efficient retrieval of data. Structured nodes are not efficient and flexible enough for applications where nodes join or leave the network frequently. The Sequential Algorithm is used which increases the latency of the project. Since the nodes are selected sequentially, if any node gets disconnected , the exact answer is not received. And identification of the disconnected node becomes tedious and some times impossible.


1.3 PROPOSED SYSTEM : The proposed system uses unstructured P2P network.

No assumptions about the location of the data items in the node are made in our project. We can join the nodes at random times and depart without a prior notification. We use approximate query processing to reduce the latency, which is the aim of our project. It is possible to run the project by dynamically adding and removing the nodes.





SYSTEM ANALYSIS : As P2P systems mature beyond file sharing applications and start getting deployed in increasingly sophisticated e-business and scientific environments, the vast amount of data within P2P databases poses a different challenge that has not been adequately researched. Aggregation queries have the potential of finding applications in decision support, data analysis, and data mining. For example, millions of peers across the world may be cooperating on a grand experiment in astronomy, and astronomers may be interested in asking queries that require the aggregation of vast amounts of data covering thousands of peers.

There is real-world value for aggregation queries in Intrusion Detection Systems, and application signature analysis in P2P networks.

2.2 SYSTEM REQUIREMENTS : 2.2.1 HARDWARE REQUIREMENTS : Hard disk RAM Processor : 40 GB : 512 MB

Processor Speed : 3.00GHz : Pentium IV Processor


2.2.2 SOFTWARE REQUIREMENTS : Front End Code Behind Back End 2.3 FLOW CHART: The data flow diagrams (fig. 2.1) are used to explain the process of working of the project in detail. After the registration and login process, the query is passed from query node. Internally the random selection of node is done from which the data is retrieved and stored. : VS .NET 2005 : C#.net : SQL SERVER 2000



The Initial ProcessStart Generate the report Login in Query Node

Peer Listers

Sql Server connected peers Two Phase Sampling

Visited nodes (Active peers) Random walk for the Active nodes Passing Aggregate Rules

UnVisited nodes (Inactive Peers)

Select table (Product or order details) Calculate Probability of Active node Prove the result of InActive node


MODULE DESCRIPTION3. MODULE DESCRIPTION : In this project SIX modules are used as follows: Sign in Peerlister Activepeers Aggregation Viewtable Report Sign In:

The registration of the users and their passwords are done here. Next using the login form, the users can enter inside the Query processing Unit.

Peerlister: Peerlister lists the peers which are connected with the query node. FIGURE 3.1No

Logi n


Peer 5

Login Error Peer 1 Peer 2

Peer Lister Peer 4

Peer 3


Activepeers: This module is to get all the peers which are connected with the sql server. All sql server connected peers are in the group visited peers. Remaining peers are maintained by unvisited peers. After this the two phase sampling is carried out. FIGURE 3.2 SQL connected peers

Peer Lister

SQL Server Connected Peers

Disconnected Peers

Connected Peers


Two phase sampling

Connected peers

Random walk from nodes

Segment into Two phases


Aggregation (Process):

Pass the aggregate rules to the table selected in the northwind database, for the peers in the visited nodes.

Viewtable: This module enables us to view tables and its respective fields in any database. Report module:

We are going to produce the report from our two phases and visited and unvisited peers in a chart representation and enter the representation of time of each peers.





P2P systems are becoming very popular because they provide an efficient mechanism for building large scalable systems.

Recent work has developed powerful techniques for employing sampling in the database engine to approximate aggregation queries and to estimate database statistics.

Recent techniques have focused on providing formal foundations and algorithms for block-level sampling and are thus most relevant to our work. The objective in block-level sampling is to derive a representative sample by randomly selecting a set of disk blocks of a relation.


GOAL OF THE PROJECT Given an aggregation query at a query node, compute with minimum

cost an approximate answer to this query with least errors. 4.3

CHALLENGES FACED Picking even a set of uniform random peers is a difficult problem, as the query node does not have the Internet Protocol (IP) addresses of all peers in the network. This is a well-known problem that other researchers have tackled (in different contexts) by using random-walk techniques on the P2P graph . That is, a Markovian random walk is12

initiated from the query node that picks adjacent peers to visit, with equal probability and under certain connectivity properties, the random walk is expected to rapidly reach a stationary distribution. If the graph is badly clustered with small cuts, then this affects the speed at which the walk converges. Even if we could select a peer (or a set of peers) uniformly at random, it does not make the problem of selecting a uniform random set of tuples much easier. This is because visiting a peer at random has an associated overhead; thus, it makes sense to select multiple tuples at random from this peer during the same visit. However, this may compromise the quality of the final set of tuples retrieved, as the tuples within the same peer are likely to be correlated


THE PEER TO PEER MODEL Each peer p is identified by the processors IP address and a port number (IPp and portp). The peer p is also characterized by the capabilities of the processor on which it is located, including its CPU speed (pcpu), memory bandwidth (pmem), and disk space (pdisk). The node also has a limited amount of bandwidth in the network, say pband.

In unstructured P2P networks, a node becomes a member of the network by establishing a connection with at least one peer currently


in the network. Each node maintains a small number of connections with its peers.


QUERY COST MEASURE The primary cost measure that we consider is latency, which is

the time that it takes to propagate the query across multiple peers and receive replies at the query node.




5. SYSTEM ENVIRONMENT:5.1 FRONT END USED: Microsoft Visual Studio dot Net is used as front end tool. The reason for selecting Visual Studio dot Net as front end tool is as follows.

FEATURES OF MICROSOFT VISUAL STUDIO DOT NET: Visual Studio .Net has flexibility, allowing one or more language to interoperate to provide the solution. This Cross Language Compatibility allows us to do projects at faster rate.

Visual Studio. Net has Common Language Runtime, which allows the entire component to converge into one intermediate format and then interact.

Visual Studio. Net provides excellent security when an application is executed in the system

Visual Studio.Net has flexibility, allowing us to configure the working environment to best suit our individual style. We can choose between a single and multiple document interfaces, and we can adjust the size and positioning of the various IDE elements.


Visual Studio. Net has intelligent feature that makes the coding easy and also dynamic help provides very less coding time. The working environment in Visual Studio.Net is often referred to as Integrated Development Environment because it integrates many different functions such as design, editing, compiling and debugging within a common environment.

After creating a Visual Studio. Net application, if we want to distribute it to others we can freely distribute any application to anyone who uses Microsoft windows. We can distribute our applications on disk, on CDs, across networks, or over an intranet or the internet. Toolbars provide quick access to commonly used commands in the programming environment. We click a button on the toolbar once to carry out the action represented by that button. By default, the standard toolbar is displayed when we start Visual Basic dot Net. Additional toolbars for editing, form design, and debugging can be toggled on or off from the toolbars command on the view menu. Many parts of Visual Studio are context sensitive. Context sensitive means we can get help on these parts directly without having to go through the help menu. For example, to get help on any keyword in the Visual Basic language, place the insertion point on that keyword in the code window and press F1. Visual Studio interprets our code as we enter it, catching and highlighting most syntax or spelling errors on the fly. Its almost


like having an expert watching over our shoulder as we enter our code. 5.2 BACK END USED: Microsoft SQL SERVER 2000 is used as back end tool. The reason for selecting SQL SERVER 2000 as a back end tool is as follows:

FEATURES OF SQL SERVER 2000 The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services. SQL-SERVER database consist of six type of objects, They are, 1. TABLE 2. QUERY 3. FORM 4. REPORT 5. MACRO


1) TABLE: A database is a collection of data about a specific topic. We can View a table in two ways, a) Design View b) Datasheet View

a)Design View To build or modify the structure of a table, we work in the table design view. We can specify what kind of datas will be holded. b)Datasheet View To add, edit or analyses the data itself, we work in tables datasheet view mode.

2) QUERY: A query is a question that has to be asked to get the required data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot (it cannot be edited).Each time we run a query, we get latest information in the dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it, such as deleting or updating.



A form is used to view and edit information in the database record. A form displays only the information, we want to see in the way we want to see it. Forms use the familiar controls such as textboxes and checkboxes. This makes viewing and entering data easy.We can work with forms in several views. Primarily there are two views, They are, a) Design View b) Form View a) Design View To build or modify the structure of a form, we work in forms design view. We can add control to the form that are bound to fields in a table or query, includes textboxes, option buttons, graphs and pictures. b) Form View The form view displays the whole design of the form.

4) REPORT: A report is used to view and print the information from the database. The report can ground records into many levels and compute totals and average by checking values from many records at once. Also the report is attractive and distinctive because we have control over the size and appearance of it.



A macro is a set of actions. Each action in a macro does something, such as opening a form or printing a report .We write macros to automate the common tasks that work easily and save the time.



6. SYSTEM TESTING & MAINTANENCE6.1 TESTING : Testing is done for each module. After testing all the modules, the modules are integrated and testing of the final system is done with the test data, specially designed to show that the system will operate successfully in all conditions. The procedure level testing is made first. By giving improper inputs, the errors occurred are noted and eliminated. Thus the system testing is a confirmation that everything is correct and an opportunity to show the user that the system works. The final step involves Validation testing, which determines whether the software, functions as the user expected. The end-user rather than the system developer conducts this test. Most software developers has a process called Alpha and Beta test to uncover those, that only the end user seems able to find. This is the final step in system life cycle. Here we implement the tested error-free system into real-life environment and make necessary changes, which runs in an 6.1.1 SYSTEM TESTING:


online fashion. Here system maintenance is done every months or year based on company policies, and is checked for errors like runtime errors, long run errors and other maintenances like table verification and reports. 6.1.2 UNIT TESTING: Unit testing verifies the smallest unit of software design module. This is known as Module Testing. The modules are tested separately. This testing is carried out during programming stage itself. In these testing steps, each module is found to be working satisfactorily as regard to the expected output from the module.

6.1.3 INTEGRATION TESTING: Integration testing is a systematic technique for constructing tests to uncover error associated within the interface. In the project, all the modules are combined and then the entire program is tested as a whole. In the integration-testing step, all the errors uncovered is corrected for the next testing steps. 6.1.4 VALIDATION TESTING: To uncover functional errors, that is, to check whether functional characteristics confirm to specification or not.


SYSTEM MAINTANANCE :The objective of this maintenance work is to make sure that the

system gets into work all time, without any bugs. Provision must be made for environmental changes which may affect the computer or software23

system. This is called the maintenance of the system. Nowadays there is rapid change in the software world. Due to this rapid change, the system should be capable of adapting these changes. Maintenance plays a vital role. The system should be designed to favor all new changes. Doing this will not affect the systems performance or its accuracy.



CONCLUSION & FUTURE ENHANCEMENTOur approach requires a minimal number of communications over the network and provides tunable parameters to maximize performance for various network topologies. Our approach provides a powerful technique for approximating the aggregates of various topologies and data clustering, but comes with limitations based upon a given topologies structure and connectivity.


For topologies with very distinct clusters of peers, it becomes increasingly difficult to accurately obtain random samples due to the inability of random-walk process to quickly reach all clusters. 7.2 FUTURE ENHANCEMENT : The APPROXIMATE QUERY PROCESSING may be enhanced to EXACT query processing, which at the present poses many difficulties because of the use of Unstructured network instead of a Structured one and also because of congestion, high latency and difficulty posed while frequently joining or leaving the network without prior information.


The Approximation of query processing technique used in this project decreases the latency, which is one of the major considerations compared to accuracy.





















Two Phase Sampling



Random Nodes



View Table & Fields




Aggregation Rules














Error Rate




TABLES :Table 1 Table register in aqp Database Data Type varchar varchar Length 50 50

Column Name uname upwd

Table 2 Column Name pid peername

Table peers in aqp Database Data Type int varchar Length 4 50

Table 3 Column Name vpid npname Table 4 Column Name vpid

Table visitpeers in aqp Database Data Type int varchar Length 4 50

Table unvisitpeers in aqp Database Data Type int Length 445

npname Table 5 Column Name vpid vpname res stime etime resptime

varchar 50 Table revisit in aqp Database Data Type int varchar varchar varchar varchar varchar Length 4 50 50 50 50 50

Table 6 Column Name pid vpname prob fname aggregate Table 7 Column Nameprob

Table apValue in aqp Database

Data Type


int 4 varchar 50 nvarchar 50 varchar 50 varchar 50 Table errorrate in aqp Database Data Type varchar varchar varchar Length 50 50 50

sprob err


REFERENCESsREFERENCES:[1] S. Acharya, P.B. Gibbons, and V. Poosala, Aqua: A Fast Decision Support System Using Approximate Query Answers, Proc. 25th47

Intl Conf. Very Large Data Bases (VLDB 99), 1999. [2] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, Search in Power-Law Networks Physical Rev. E, 2001. [3] B. Babcock, S. Chaudhuri, and G. Das, Dynamic Sample Selection for Approximate Query Processing Proc. 22nd ACM SIGMOD Intl Conf. Management of Data (SIGMOD 03), pp. 539-550, 2003. [4] A.R. Bharambe, M. Agrawal, and S. Seshan, Mercury: Supporting Scalable Multi-Attribute Range Queries Proc. ACM Ann. Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM 04), 2004. [5] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Analysis and Optimization of Randomized Gossip Algorithms Proc. 43rd IEEE Conf. Decision and Control (CDC 04), 2004. [6] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Gossip and Mixing Times of Random Walks on Random Graphs Proc. IEEE INFOCOM 05, 2005. [7] M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya, Towards Estimation Error Guarantees for Distinct Values Proc. 19th ACM Symp. Principles of Database Systems (PODS 00), 2000. [8] S. Chaudhuri, G. Das, M. Datar, R. Motwani, and V. Narasayya, Overcoming Limitations of Sampling for Aggregation Queries Proc. 17th IEEE Intl Conf. Data Eng. (ICDE 01), pp. 534-542, 2001.