ramu hadoop_spark developer with 6 years of experience

Ramu JMobile: +91- 9866407641

Email: [email protected]

SUMMARY: Over 6 years of professional IT experience with 3+ years of experience in Big Dataand Hadoop eco-

system along with Spark. Experience in requirement gathering, designing, developing, testing, implementing and maintaining

systems. Experience in all phases of the Software Development Life Cycle(SDLC). Expertise in Hadoop architecture and its various components – Hadoop File System HDFS,

MapReduce, Name Node, Data Node, Job Tracker, Task Tracker, Secondary Name Node and YARN. Expertise in developing and implementing big data solutions and data mining applications on

Hadoop using Hive Hive2, PIG, Spark,Sqoop, Impala,HUEandOozie workflows. Expertise in working with HUI (Hadoop user interface) and used to develop project and testing. Expertise in Hadoop testing by using the Hadoop user interface. Having POC Experience in Sparkwith Single RDD, Pair RDD and DStream . Extensive expertise in Extracting and Loading data to various databases including Oracle, MS SQL

Server, Teradata, Flat files, XML files. Expertise on Talend ETL Data integration Tool. Extensive expertise in developing XSD, XSLT and preparing XML files compatible to the xsd to parse

the xml data into flat files to process into HDFS. Developed Avro Schema to create the Avro and parquet tables in the hive by using the Avro schema

URL. Good Experience in working with SerDe’s like AvroFormat,Parquet format data. Good Experience in developing a report by using hive Queries, hive UDF’s and also prepared Pig

Scripts, Pig UDF’s for the analytics as per the Client requirements. Good experience in writing shell scripting in UNIX. Extensive expertise in Hadoop support and maintenance. Experience with data analysis and able to implement complex and sophisticated SQL Logic. Develops

and reviews project plans, identifies issues, resolves issues, and communicates status of assigned projects to users and manager.

Expertise in Microsoft Azure HD Insights including creating Hadoop cluster dynamically on top of Microsoft Azure and developed the projects as per the client requirement.

Knowledge on IBM Big Insights – Analytic Applications, IBM Big data Platform, Accelerators and information integration.

Excellent understanding of Hadoop Map Reduce Programming paradigm. Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters

using Cloudera Manager. Experience in understanding and managing Hadoop Log Files. Experience in managing the Hadoop infrastructure with Cloudera Manager. Experience in importing and exporting data using Sqoop, Flume. Good knowledge in job workflow scheduling and monitoring tools like Oozie.

Experience in developing Hadoop integrations for data ingestion, data mapping and data processing capabilities

Experience in performing offline analysis of large data sets using components from the Hadoop ecosystem

Sound knowledge of using NOSQL database HBase and its architecture. Good knowledge on Impala. Experience in retrieving data from Business Objects Universes, Personal Data Files, Stored

Procedures, RDBMS and Created complex and Sophisticated Reports that had multiple data providers, drilldown, slice and dice features using Business Objects

Excellent problem solving and communication skills Experience in integration of various data sources like SQL Server, Oracle, Tera Data, Flat files, DB2,

Mainframes. Developed multiple Proof-Of-Concepts to justify viability of the ETL solution including performance

and compliance to non-functional requirements. Conduct Hadoop training workshops for the development teams as well as directors and

management team to increase awareness. Prepare presentations of solutions to Big Data/Hadoop business cases and present the same to

company directors to get go-ahead on implementation. Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used

apache tools/frameworks Hive, Pig, Sqoop for the entire ETL workflow. Setup Hadoop cluster, build Hadoop expertise across development, production support and testing

teams, enable production support functions, optimize Hadoop cluster performance in isolation as well as in context of the production workloads/jobs

Highly motivated to work and take independent responsibility and to prioritize multiple tasks.

TECHNICAL SKILLS:

Big Data Ecosystems Hadoop(MapReduce, HDFS), YARN,Zookeeper, Pig, Hive,Sqoop, Flume, Spark,Hue,Oozie

ETL Tools informaticaDatabases Oracle, SQL Server, MYSQL, Teradata, NoSQL,DB2Software Tools SQL *Plus, Toad, SQL LoaderProgramming Languages xml,Rexx, COBOL,JCL,PL/IOperating System Linux, WindowsReporting Tools SAP BO

Experience:

Organization Designation Duration

Infosys India Pvt Ltd Technology Analyst (April 2014 to Till Date)

IBM India Pvt. Ltd system Engineer (July 2010 - March 2014)

Projects Profile:

1. Project Name: DDSW (Dealer Data Stage Ware house)

Client Caterpillar

Role Hadoop Developer

Organization Infosys Pvt Ltd, India

Duration Sep 2014 – till date

Environment Distribution: Cloudera Hadoop Distrubution

Components: HDFS,Mapreduce,YARN

Echo System:Hive,Pig,Impala

Middle ware: Sqoop

Scripting: unixShellscripting,and Ruby scripting

Work flow: oozie

HDFS user interface: Hadoop User interface(HUI)

Database : oracle

ETL: Data stage

Source file : XML data

Project Description:

Caterpillar’s business model originates from a guide, issued in the 1920s, that established territory relationships with a number of Dealer affiliates. These largely autonomous relationships allowed the Dealers to develop their own models for tracking important data, such as customers and inventory that relate to local market conditions, including government regulation and customary business practices.

The Dealer Data Staging Warehouse (DDSW) platform stages the data received from Caterpillar’s Dealers and prepares them for consumption for a wide variety of uses, such as customer portal services, analytics for equipment monitoring, parts pricing, and customer lead generation, and other emerging applications.

The DDSW project is an ETL pipeline between per-dealer inbound data and a per-domain dataset. DDSW is charged with accepting, validating, transforming, securing, and exposing Dealer data for consumption by various Caterpillar consumers.

Consumer access to all dealer data is constrained by a View, which omits data to which a Consumer should not have access. Access rules are maintained by a matrix of Domains and Dealers permitted to each Consumer, which in turn informs the configuration of Views

Contribution:

Understanding business needs, analysing functional specifications and map those to develop Hql’s and Pig Latin Scripts.

Designed XSD, XSL to parse the XML structure file into in pipe delimited format to facilitate effective querying on the data. Also have hand on Experience on Pig and Hive User Define Functions (UFD).

Execution of Hadoop ecosystem and Applications through Apache HUE. Prepared Avro Schema structure to create the Hive tables and also created the Parquet format tables

to process the pipe delimited data. Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against

complexity and time lines. Involved in creating folders to place the code, lib, data including Avro schema to execute the project

in a proper structured manner. Writing Pig scripts for data analysis and end result will be processed to HDFS. Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in

Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyze reducer output data.

Developed PIG Latin scripts to extract data from source system. Involved in Extracting, loading Data from Hive to Load an RDBMS using SQOOP

Spark Mini project (POC):

2 Project Name: DDSW (Dealer Data Stage Ware house)

Client Caterpillar


Organization Infosys Pvt Ltd, India

Duration Feb 2015– till date

Environment Distribution: Cloudera Hadoop Distrubution

Components: HDFS,Mapreduce,YARN

API: Scala

Echo System: Hive,Impala and Spark

Middle ware: Sqoop

Scripting: Shell,Python and Ruby

Work flow: oozie

Scheduling: Hadoop User Entrprise(HUE)

Database : oracle

ETL: Data stage

Source file : XML data

Project Description:

The Dealer Data Staging Warehouse (DDSW) platform stages the data received from Caterpillar’s Dealers and prepares them for consumption for a wide variety of uses, such as customer portal services, analytics for equipment monitoring, parts pricing, and customer lead generation, and other emerging applications.

The DDSW project is an ETL pipeline between per-dealer inbound data and a per-domain dataset. DDSW is charged with accepting, validating, transforming, securing, and exposing Dealer data for consumption by various Caterpillar consumers.

Consumer access to all dealer data is constrained by a View, which omits data to which a Consumer should not have access. Access rules are maintained by a matrix of Domains and Dealers permitted to each Consumer, which in turn informs the configuration of Views

Contribution:

Understanding business needs, analysing functional specifications and map those to develop Spark SQL.

Designed XSD, XSL to parse the XML structure file into in pipe delimited format to text file for facilitate effective querying on the data.

Created file based RDD’s by using the above text file after parsing the xml file and also the same RDD is processed to the Data frames just to compare the performance between existed Map reduce methodology and Spark methodology.

Created the Avro and Parquet Format data tables and stored the data in to Avro and parquet format. We developed and implemented the Spark Sql’s by connecting to the hive tables to process the data

and created views to build the reports by using the views in the Tableau. Execution of Hadoop ecosystem and Applications through Apache HUE. Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against

complexity and time lines. Involved in creating folders to place the code, lib, data including Avro schema to execute the project

in a proper structured manner. Writing Pig scripts for data analysis and end result will be processed to HDFS. Developed RDD’s and which were processed the data in the spark by writing Pig scripts to analyse the

data as per the client requirement. Involved in Extracting, loading Data from Hive to Load an RDBMS oracle using SQOOP

3. Project Name: aEDW Enhanced Capabilities Project

Client Toyota Financial Services


Organization Infosys Pvt. Ltd, India

Duration Aril 2014– August 2014

Environment Distribution: Cloudera Hadoop Distribution

Components: HDFS,Mapreduce

Echo System: Hive,Pig

Middleware: Sqoop

Scripting: Shell

Work flow: oozie

Scheduling: Autosys

Database: Teradata

Project Description

Active Enterprise Data warehouse (aEDW), a central repository for all lines of Business and generate reports accordingly is developed. The data is extracted from different legacy systems such as Sql Server, Oracle, Sybase and MySQL using Informatica. This data is transformed and transported to the Centralized repository using Teradata Utilities. Summary tables are derived and subject specific Data Marts designed for faster and accurate analysis. These Data marts would be used for current, as well as future analytic and reporting needs. Reports are developed using Cognos. The objective of aEDW customer account profile is to provide a 360º view of Toyota Financial Services customer’s life cycle nationwide.

Contribution


Project is executing in Teradata Environment and we slowly changing to Hadoop system. Created the hive tables with the similar structure of Teradata tables and Connected to the Teradata

via JDBC through the sqoop and import the tables to hive. Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against

complexity and time lines. Involved in creating code, lib, and data folders to execute the project in a proper structured manner. Performing data migration from Legacy Databases RDBMS to HDFS using SQOOP. Writing Pig scripts for data analysis and end result will be processed to HDFS. Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in

Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyse reducer output data.

Highly involved in designing the next generation data architecture for the unstructured data Developed PIG Latin scripts to extract data from source system.

POC 1(Proof of concept) on Hadoop:

3. Acquisition and Statistical Knowledge Made Easy

Client AT&T

Role Hadoop Team Member and SAP BO Developer

Organization IBM India Pvt Ltd, Chennai

Duration Sep 2013 – Feb 2014

Environment Distribution: Cloudera Hadoop distribution

Components: HDFS,Mapreduce

Echo System: Hive, Pig

Middle ware: Sqoop

Work flow: oozie

Scheduling: Hadoop User interface(HUI)

Database : Teradata

Source file: .csv files and Mainframe sequence files.

Project description:

ASKME is a data warehouse containing detailed information on historical provisioning and maintenance activity for AT&T. It is the primary source of information for closed trouble tickets for the West, Southwest, Midwest, and East regions. Its also providing information related to closed service orders.

Contribution


Project is executing in Teradata Environment and we slowly changing to Hadoop system. Imported the Mainframe sequence files from Teradata system to HDFS through NDM (Network data

mover). Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against

complexity and time lines. Involved in creating code, lib, and data folders to execute the project in a proper structured manner. Performing data migration from Legacy Databases RDBMS to HDFS using SQOOP. Writing Pig scripts for data analysis and end result will be processed to HDFS. Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in

Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyse reducer output data.

Highly involved in designing the next generation data architecture for the unstructured data Developed PIG Latin scripts to extract data from source system.

4. Acquisition and Statistical Knowledge Made Easy

Client AT&T

Role SAP BO Developer

Organization IBM India Pvt Ltd, Chennai

Duration Aug 2010 – Feb 2014

Environment Reporting Tool: SAP BO

Database: Mainframe Teradata

Project description:

ASKME is a data warehouse containing detailed information on historical provisioning and maintenance activity for AT&T. It is the primary source of information for closed trouble tickets for the West, Southwest, Midwest, and East regions. It’s also providing information related to closed service orders.

Contribution:

Involved in understanding the business requirements which specified in the BRD.

Involved in understanding client business environment and data base for reporting

Involved in gathering report requirements by coordinating with onsite spocs

Involved in generating reports from scratch

Extensive experience in designing universe using designer

Involved in universe design by design schema and resolving join path problems like Loops,

traps(Chasm trap, Fan trap) using contexts or alias

Extensively used hierarchies, derived tables, @functions, Cascading.

Extensively used universe tuning techniques using aggregate tables,indexes,short cut joins ,

Conditional objects

Well experienced in generating a complex reports using WEBI to meet customer requirements

Created complex reports Using merge dimensions, combined queries, prompts, filters,

conditional variables ,alerts,hyperlinks,charts

Involved in scheduling and publishing a reports directly to the customer

Involved in complete reports phase like gathering requirmenents,developing and deploying

Certifications:

1. InfoSphereBigInsights Essentials using Apache Hadoop (SPVC) 2W602

2. DB2 9 Fundamental certification as per client requirement.

Trainings:

1. InfoSphereBigInsights Essentials using Apache Hadoop (SPVC) 2W602

2. BAO (Teradata, BO)

3. Qlikview reporting Tool

ramu hadoop_spark developer with 6 years of experience

Documents