how to test big data systems | qualitest group

19
How to Test Big Data Systems

Upload: qualitest-group

Post on 12-Apr-2017

816 views

Category:

Software


0 download

TRANSCRIPT

How to Test Big Data Systems

2

The definition of Big Data

| Big Data is perceived as a huge amount of data and information

| However, it is a lot more than this

| Big Data may be said to be a whole set of approaches, tools and methods of processing large volumes of unstructured as well as structured data

| Big Data is defined on three parameters

| These describe how you have to process an enormous amount of data in different formats at different rates

3

The three parameters on which Big Data is defined

4

Testing Big Data can be quite a challenge for organizations

| Traditional analysis techniques have certain limitations

| Dealing with such large sets of data owes to its complexity

| Especially challenging is Testing Big Data for organizations with very little knowledge with regard to what to test and how to test it

| There are certain basic aspects of Big Data processing

| On that basis further testing procedures can be determined

5

Aspects of Big Data Testing

6

Risk of failing

| Failure in Big Data Testing could have negative consequences and It may result in:

| Production of poor quality of data

| Delays in testing

| Increased cost of testing

| Big Data Testing can be performed in two ways: functional and nonfunctional testing

| A very strong test data and test environment management are required to ensure error-free processing of data

7

Functional Testing

| Functional Testing is performed in three stages:

| Pre-Hadoop Process Testing

| MapReduce Process Validation

| Extract-Transform-Load Process Validation and Report Testing

8

Pre-Hadoop Process Testing

| HDFS stands for Hadoop Distributed File System

| HDFS lets you store huge amount of data on a cloud of machines Pre-Hadoop Process Testing

| When the data is extracted from various sources such as web logs, social media, RDBMS, etc., and uploaded into HDFS, an initial stage of testing is carried out

9

Initial stage of Testing

| Verification of the data acquired from the original source to check if it is corrupted or not

| Validation of data files if they were uploaded into correct HDFS location

| Checking the file partition and then copying them to different data units

| Determination of a complete set of data to be checked

| Verification of synchronicity of the source data with that of the data uploaded into HDFS

10

MapReduce Process Validation

| MapReduce Processing is a data processing concept used to compress the massive amount of data into practical aggregated compact data packets:

| Testing of business logic first on a single node then on a set of nodes or multiple nodes

| Validation of the MapReduce process to ensure the correct generation of the “key-value” pair

| After the “reduce” operation, validation of aggregation and consolidation of data

| Comparison of the output generated data with the input files to make sure the generated output file meets all the requirements

11

Extract-Transform-Load Process Validation and Report Testing

| ETL Process Validation and Report Testing: ETL stands for Extraction, Transformation, and Load testing approach. This is the last stage of testing in the queue where data generated by the previous stage is first unloaded and then loaded into the downstream repository system i.e. Enterprise Data Warehouse (EDW) where reports are generated or a transactional system analysis is done for further processing.

12

Purposes of ETL Process Validation & Report Testing

| To check the correct application of transformation rules

| Inspection of data aggregation to ensure there is no distortion of data and it is loaded into the target system

| To ensure there is no data corruption by comparing with the HDFS file system data

| Validation of reports that include the required data and all indicators are displayed correctly

13

Non-Functional Testing

| Hadoop processes large chunks of data of varying variety and speed

| Hence it becomes imperative to perform architectural testing of the Big Data systems

| To ensure success of the projects in question

| This non-functional testing is performed in two ways:

| 1) Performance Testing

| 2) Failover Testing

14

Performance Testing

| Performance Testing performs the testing of:

| Job completion time

| Memory utilization

| Data throughput of big Data Systems

| The main objective of performance testing is not restricted to only an acknowledgment of application performance

| But to improve the performance of the Big Data system as whole too

15

Performance Testing Process

| Obtain the metrics of performance of Big Data systems i.e. response time, maximum data processing capacity, speed of data consumption, etc.

| Determine conditions which cause performance problems i.e. assessing performance limiting conditions

| Verification of speed with which MapReduce processing (sorts, merges) is executed

| Verification of storage of data at different nodes

| Test JVM Parameters such as heap size, GC Collection Algorithms, etc.

| Test the values for connection timeout, query timeout, etc.

16

Failover Testing

| Failover testing is done to verify seamless processing of data in case of failure of data nodes

| It validates the recovery process and the processing of data when switched to other data nodes

| Two types of metrics are observed during this testing:

| 1) Recovery Time Objective

| 2) Recovery Point Objective

17

Big Data Testing Process

18

Conclusion

| Many big firms including cloud enablers and various project management tools platforms are using Big Data

| The main challenge faced by such organizations today is how to test Big Data and how to improve the performance and processing power of Big Data systems

| The aforementioned Testing is performed to ensure all is working well - the data extracted and processed is undistorted and in sync with the original data

| Big Data processing could be batch, real-time or interactive

| Hence when dealing with such huge amount of data, Big Data testing becomes imperative as well as inevitable

www.QualiTestGroup.com

Thank You!