dr vishwanath karad mit - world peace...
TRANSCRIPT
SYLLABUS
DR VISHWANATH KARAD MIT - WORLD PEACE UNIVERSITY
FACULTY OF SCIENCE
M.SC. Big Data Analytics
BATCH – 2018-19
Dr.SudhirGavhane Dean, LASC
PROGRAMME STRUCTURE
Preamble:
Big Data Analytics is required to deal with the problems faced by industry today. The techniques
and tools are used to solve problems from a wide variety of Industries such as manufacturing,
services, retail, banking and finance, sports, pharmaceuticals, and aerospace etc.
Big Data Analytics is interdisciplinary and is required to analyse ever growing large data ( growing
by volumn, velocity and variety) applying techniques like data mining, machine learning, and deep
learning from computer science, statistics and maths.
Big Data Analytics is required to cope up with rapid changes in both, domain knowledge and
technology. It is one of the fastest growing and most promising technologies
First year Provides foundation of Big Data Technology, Maths and Statistics including
programming languages. Programme includes technology such as Hadoop, techniques such data
mining, and computer programing, maths and statistics subjects that will provide the foundation for
students.
Second year will include subjects belonging to the chosen track in his/her own interest relevant to
Big data Analytics. It will also include advanced topics and technologies in Big Data. There will be
mini project and Internship to get industrial exposure to the students.
Dr.SudhirGavhane Dean, LASC
Vision and Mission of the Programme
Vision:
To contribute to the society through excellence in scientific and knowledge-based education
utilizing the potential of computer science with a deep passion for wisdom, culture and
values.
Mission:
Big Data Analytics is aimed to offer a thorough professional training which prepare
students to embark on Big Data Analytics careers which is one of the fastest growing
technologies. They are also provided a very good foundation for further study at PhD level.
Prepare and equip students for opportunities in ever changing technology with hands-on
industrial training.
Transform the students to become globally competent professionals through international
training/internship.
Nurture the creativity and inculcate entrepreneurial skills among the students.
Programme Educational Objectives
To enable learners to develop expert knowledge and analytical skills in current and developing areas of analysis statistics, and machine learning.
To provide learners with a deep and systematic knowledge of business and technical strategies for data analytics and the subsequent skills to implement solutions in these areas.
To facilitate the development by the learner of applied skills that are directly complementary and relevant to the workplace.
To develop in the learner a deep and systematic understanding of current issues of research and analysis
To enable learners conduct independent research and analysis in the field of data analytics.
To enable the learner to identify, develop and apply detailed analytical, creative, problem solving skills.
Provide the learner with a comprehensive platform for career development, innovation and further study.
Dr.SudhirGavhane Dean, LASC
Programme Specific Outcomes
A graduate with a M.Sc. in Big Data Analytics will have the ability to communicate
computer science concepts, designs, and solutions effectively and professionally
This course is aimed to offer training which prepare students to embark on Big Data
Analytics careers which is one of the fastest growing technologies. They are also provided
a very good foundation for further study at PhD level.
Prepare and equip students for opportunities in ever changing technology with hands-on
industrial training.
Transform the students to become globally competent professionals through internship.
Nurture the creativity and inculcate entrepreneurial skills among the students.
Project work gives students hands on experience in solving a real world problem.
The Syllabus also develops requisite professional skills and problem solving abilities for
pursuing a career in Software Industry.
Dr.SudhirGavhane Dean, LASC
Programme Structure:
(a) Programme duration: 2 years full time.
(b) System followed: Trimester
(c) Credits System:
(i) Per term or per year, as applicable
(ii) Total in the programme, as applicable
(d) Credits for activities other than academics: NA
(e) Internship: Full time three months Industrial training should be completed.
(f) Assessment Criteria: Minimum 50% credits of first year are required to take
admission in second year.
(g) Branches or Specialisations: NA
(h) Mandatory Attendance to appear for examination:
It is expected on the part of the student to attend each and every Lecture,
Tutorial, and Laboratory practical sessions in a course for the academic
excellence. However, due to any contingencies, the attendance requirement will
be a minimum of 90% of the classes scheduled/ held.
(j) Medium of Instruction and Examination: English
(k) Eligibility criteria for admission to the programme: B.Sc.(CS), BCS, B.Sc.(IT),
BCA, BE-IT, Comp., E&TC with 50% of Marks (45% marks aggregate in case of
candidate backward class categories and persons with disability belonging to
Maharashtra state only)
Dr.SudhirGavhane Dean, LASC
M.Sc. Big Data Analytics
2017-18
A. DefinitionofCredit:-
4Hr.Lecture / Tutorial perweek 3credit
3HoursPractical(Lab) per week 3credit
B. Credits:-
Total number of credits for two years Post Graduate M.Sc. Programme would be 120.
C. StructureofCredits for Post Graduate M.Sc.Program:-
S.
No.
Category SuggestedBreakupof Credits(Total175)
1 Humanities andSocialSciences and Peace Programmes
includingManagementcourses 10
2 Professionalcorecourses including Laboratory/Mini Project Work
84
3 ProfessionalElectivecourses
06
4 Full Time Industrial Training 20
Total 120
Dr. Sudhir Gavhane Dean, LASC
D. Coursecodeanddefinition:-
E. Grading Scheme:
Grades & Grade Points
Marks Out of 100
Grade Grade Point
80-100 O: Outstanding 10
70-79 A+: Excellent 9
60-69 A: Very Good 8
55-59 B+: Good 7
50-54 B: Above Average 6
45-49 C: Average 5
40-44 Pass 4
0-39 Fail 0
Ab Absent NA
Coursecode Definitions
L Lecture
T Tutorial
WP Humanities andSocialSciences and Peace Programs
includingManagementcourses MBD M.Sc.(Big Data Analytics)
Dr.SudhirGavhane Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18) Trimester – I
Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours:25 *CCA: Class Continuous Assessment
Total Credits: First Year M.Sc. Big Data Analytics Trimester I:20 *LCA: Laboratory Continuous Assessment
Sr.
No. Course Code Name of Course Type
Weekly Workload, Hrs Credits Assessment, Marks
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1 MIT-WPU-MBD-1101 Data Warehousing & Data Mining Core 3 1 3 50 50 100
2 MIT-WPU-MBD-1102 Parallel And Distributed Computing Core 4 3 50 50 100
3 MIT-WPU-MBD-1103 Big Data Architecture & Ecosystem -
Hadoop
Core 4 3 50 50 100
4 MIT-WPU-MBD-1104 Python Programming Core 3 1 3 50 50 100
5 MIT-WPU-MBD-1105 Lab on Python Core 3 3 50 50 100
6 MIT-WPU-MBD-1106 Lab on Hadoop using HDFS Core 3 3 50 50 100
7 WP Philosophy of Science and Spirituality SEC 3 2 25 25 50
Total : 17 02 06 14 06 225 125 300 650
Dr.SudhirGavhane Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18) Trimester – II
Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 25 *CCA: Class Continuous Assessment
Total Credits: First Year M.Sc. Big Data Analytics Trimester II:20 *LCA: Laboratory Continuous Assessment
Sr.
No. Course Code Name of Course Type
Weekly Workload, Hrs Credits Assessment Marks **
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1 MIT-WPU-MBD-1201 R Programing Core 3 1 3 50 50 100
2 MIT-WPU-MBD-1202 Distributed Processing using Hadoop Core 4 3 50 50 100
3 MIT-WPU-MBD-1203 Operation Research Core 4 3 50 50 100
4 MIT-WPU-MBD-1204 Next Generation Databases Core 3 1 3 50 50 100
5 MIT-WPU-MBD-1205 Lab on R Programming Core 3 3 50 50 100
6 MIT-WPU-MBD-1206 Lab on Hadoop and Tools Core 3 3 50 50 100
7 WP Philosophy of Science and Spirituality SEC 3 2 25 25 50
Total : 17 02 06 14 06 225 125 300 650
Dr.SudhirGavhane Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18) Trimester – III
Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 25 *CCA: Class Continuous Assessment
Total Credits: First Year M.Sc. Big Data Analytics Trimester III:20 *LCA: Laboratory Continuous Assessment
Total First Year M.Sc. Big Data Analytics Credits: 60
Sr.
No. Course Code Name of Course Type
Weekly Workload, Hrs Credits Assessment Marks**
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1 MIT-WPU-MBD-1301 Statistical Computing Core 4 3 50 50 100
2 MIT-WPU-MBD-1302 Information Security Core 4 3 50 50 100
3 MIT-WPU-MBD-1303 Apache Spark Core 3 1 3 50 50 100
4 MIT-WPU-MBD-1304 Machine Learning Algorithm-I Core 3 1 3 50 50 100
5 MIT-WPU-MBD-1305 Lab on Statistical Computing Core 3 3 50 50 100
6 MIT-WPU-MBD-1306 Lab on Machine Learning Algorithms- I Core 3 3 50 50 100
7 WP Creativity and Innovation SEC 3 2 25 25 50
Total : 17 02 06 14 06 225 125 300 650
Dr.SudhirGavhane Dean, LASC
M. Sc. Big Data Analytics(Second Year) (Batch 2017-18) Trimester – I
Type: Core/ Elective **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours:26 *CCA: Class Continuous Assessment
Total Credits: Second Year M.Sc. Big Data Analytics Trimester I:20 *LCA: Laboratory Continuous Assessment
Sr.
No. Course Code Name of Course Type
Weekly Workload, Hrs Credits Assessment Marks**
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1 MIT-WPU-MBD-2101 Principles Of Deep Learning Core 3 1 3 50 50 100
2 MIT-WPU-MBD-2102 Machine Learning Algorithm-II Core 3 1 3 50 50 100
3 MIT-WPU-MBD-2103 Data Science life cycle Core 4 3 50 50 100
4 MIT-WPU-MBD-2104 Lab on Machine Learning
Algorithms II
Core 4 3 50 50 100
5 MIT-WPU-MBD-2105 Lab Data Science life cycle Core 3 3 50 50 100
6 Elective I Elective 4 3 50 50 100
7 WP Scientific studies of Peace – Mind,
matter, Spirit and consciousness
SEC 3 2 25 25 50
Total : 21 02 03 14 06 225 125 300 650
Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (Second Year) (Batch 2017-18) Trimester – II
Type: Core/ Elective **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 26 *CCA: Class Continuous Assessment
Total Credits: Second Year M.Sc. Big Data Analytics Trimester II:20 *LCA:Laboratory Continuous Assessment
Sr.
No. Course Code Name of Course Type
Weekly Workload, Hrs Credits Assessment Marks**
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1 MIT-WPU-MBD-2201 Natural Language Processing Core 4 3 50 50 100
2 MIT-WPU-MBD-2202 Web & Social Intelligence Core 3 1 3 50 50 100
3 MIT-WPU-MBD-2203 Cloud Computing Core 4 3 50 50 100
4 MIT-WPU-MBD-2204 Lab on Web & Social Intelligence Core 3 1 3 50 50 100
5 MIT-WPU-MBD-2205 Mini Project Core 3 3 50 50 100
6 Elective II Elective 4 3 50 50 100
7 WP Business-strategic planning and
finance
SEC 3 2 25 25 50
Total : 21 02 03 17 03 225 125 300 650
Dr.SudhirGavhane Dean, LASC
M. Sc. Big Data Analytics (Second Year) (Batch 2017-18) Trimester – III
Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 15 *CCA: Class Continuous Assessment
Total Credits: Second Year M.Sc. Big Data Analytics Trimester III:20 *LCA: Laboratory Continuous Assessment
Total Second Year M.Sc. Big Data AnalyticsCredits:60
Sr.
No.
Course
Code Name of Course Type
Weekly Workload, Hrs Credits Assessment Marks**
Theory Tutorial Lab Th Lab CCA* LCA*
End
Term
Test
Total
1
MIT-
WPU-
MS-
2301
Full Time Industrial Training
Core
4 3 50 50 100
Total : 4 3 50 50 100
Dr.SudhirGavhane Dean, LASC
ElectiveCourses:
Big Data Analytics Big Data Analytics
Code Title Code Title
Elect I MIT-
WPU-
MBD-
2106
Internet Of Things MIT-
WPU-
MBD-
2206
Marketing Analytics
Elect II MIT-
WPU-
MBD-
2107
Introduction to image
processing
MIT-
WPU-
MBD-
2207
HR Analytics
Name of Specialisation: Big Data Analytics
Dr.SudhirGavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU- MBD-1101
Course Category Core BigData Analytics
Course Title Data Warehousing & Data Mining
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Understanding of: Relational database normalization techniques,
Physical design of a database, Concepts of algorithm design and analysis, Basic understanding of:
Software engineering principles and techniques, Probability and statistics – Bayesian theory,
regression, hypothesis testing
Course Objectives:
1. To understand the structure of Data Warehouse
2. To understand different data pre-processing techniques.
3. To understand basic descriptive and predictive data mining techniques.
4. To use data mining tool on different data sets
5. To understand Classification algorithms
6. To understand Prediction algorithms.
7. To understand Clustering algorithms.
8. To use data mining tool on different data sets
CourseOutcomes:
The student will get knowledge of:
Data processing and data quality.
Modelling and design of data warehouses.
Basic and advanced concepts of algorithms for data mining.
Data mining tool and practical experience of applying data mining algorithms
CourseContents:
Introduction to Data Mining
Basic concepts of data mining, Types of Data to be mined.
Introduction to Data Warehouse
Data Warehouse and DBMS, Architecture of Data Warehouse
Data pre-processing
Need Data pre-processing, Attributes and Data types
Data Mining Techniques: Association Rule Mining
Basic idea: item sets, Frequent Item-sets
Data Mining Techniques: Classification
Dr. Sudhir Gavhane
Dean, LASC
`
Definition of Classification, Decision tree Induction: Information gain, gain ratio, Gini Index
Data Mining Techniques: Prediction
Definition of Prediction, Linear regression
Data Mining Techniques: Clustering
Definition of Clustering, Partitioning Methods
Performance Measures
Precision, recall, F-measure
Problem solving with R or Weka: filters, Discretization, mining association rules, decision trees,
Prediction, k-means
LearningResources:
Reference Books:
1. Data Mining: Concepts and Techniques, Han, Elsevier ISBN:9789380931913/ 9788131205358 2. Margaret H. Dunham, S. Sridhar, Data Mining – Introductory and Advanced Topics, Pearson
Education
3. Data warehousing: fundamentals fot IT professionals 3rd edition , Kimball, Wiley Publication
4. Ian H.Witten, Eibe Frank Data Mining: Practical Machine Learning Tools and Techniques, Elsevier/(Morgan Kauffman), ISBN:9789380501864 5. Introduction to Data Mining (2005) By Pang-Ning Tan, Michael Steinbach, Vipin Kumar Addison Wesley ISBN: 0-321-32136-7 6. [Research-Papers]: Some of the relevant research papers that contain recent results and developments in data mining field
Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to Data Mining: Basic concepts of data mining,
Types of Data to be mined, Stages of the Data Mining
Process, Data Mining Techniques, Knowledge Discovery in
Databases, Data Mining Issues, Applications of Data Mining
4 - -
2
Introduction to Data Warehouse: Data Warehouse and DBMS
Architecture of Data Warehouse, Multidimensional data model,
Concepts of OLAP and Data Cube, OLAP
operations, Dimensional Data Modelling- Star, Snow flake
schemas
5 - -
3
Data pre-processing: Need Data pre-processing, Attributes and
Data types, Statistical descriptions of Data, Handling missing
Data, Data sampling, Data cleaning, Data Integration and
transformation, Data reduction, Discretization and generating
concept hierarchies
6 - -
4 Data Mining Techniques: Association Rule Mining: Basic
idea: item sets, Frequent Item-sets, Association Rule Mining,
Generating item sets and rules efficiently, FP growth algorithm
4 - -
5
Data Mining Techniques: Classification: Definition of
Classification, Decision tree Induction: Information gain, gain
ratio, Gini Index, Issues: Over-fitting, tree pruning methods,
missing values, continuous classes, Classification and Regression
Trees (CART), Bayesian Classification: Bayes Theorem, Naïve
Bayes classifier, Bayesian Networks, Linear classifiers,
Least squares, SVM classifiers, Lazy Learners (or Learning from
Your Neighbors)
9 - -
6 Data Mining Techniques: Prediction: Definition of Prediction
Linear regression, Non-linear regression, Logistic regression
3 - -
7 Data Mining Techniques: Clustering: Definition of Clustering
Partitioning Methods, Hierarchical Methods, Distance Measures
in Algorithmic Methods, Density Based Clustering
6 - -
8 Performance Measures: Precision, recall, F-measure, confusion
matrix, cross-validation, bootstrap. 3 - -
9 Problem solving with R or Weka: filters, Discretization,
mining association rules, decision trees, Prediction, k-means 5 - -
Prepared By Ms. Devyani B Kamble
Assistant Professor
Checked By
Ms. Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane
Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1102
Course Category Core Big Data Analytics
Course Title Parallel And Distributed Computing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
1. Ability to program well in C, C++ or Fortran.
2. Willingness to rethink how problems should be solved.
3. Algorithm & Data Structures
Basics of Computer Architecture
Course Objectives:
1. Learning basic models of parallel machines and tools
2. How to parallelize programs and how to use basic tools like MPI and POSIX threads.
3. To learn core ideas behind parallel and distributed computing.
4. To explore the methodologies adopted for concurrent and distributed environment.
5. To understand the networking aspects of parallel and distributed computing.
6. To provide an overview of the computational aspects of parallel and distributed computing.
7. To learn parallel and distributed computing models.
Course Outcomes:
Students will be able to:
1. Explore the methodologies adopted for concurrent and distributed environment.
2. Analyse the networking aspects of Distributed and Parallel Computing.
3. Explore the different performance issues and tasks in parallel and distributed computing.
4. Develop parallel algorithms for solving real–world problems.
Course Contents:
1. Parallel and Distributed Computing— Introduction, Benefits and Needs, Programming
Environment, Theoretical Foundations- Parallel Algorithms Parallel Models and Algorithms-
Sorting- Matrix Multiplication- Convex Hull- Pointer Based Data Structures.
2. Synchronization- Process Parallel Languages- Architecture of Parallel and Distributed
Systems- Consistency and Replication- Security- Parallel Operating Systems.
Dr. Sudhir Gavhane
Dean, LASC
`
3. Management of Resources in Parallel Systems- Tools for Parallel Computing- Parallel
Database Systems and Multimedia Object Servers.
4. Networking Aspects of Distributed and Parallel Computing- Process- Parallel and
Distributed Scientific Computing.
5. High-Performance Computing in Molecular Sciences- Communication- Multimedia
Applications for Parallel and Distributed Systems- Distributed File Systems.
Learning Resources:
Reference Books: 1. Jacek Błażewicz, et al., “Handbook on parallel and distributed processing”, Springer
Science & Business Media, 2013.
2. Andrew S. Tanenbaum, and Maarten Van Steen, “Distributed Systems: Principles and
Paradigms”. Prentice-Hall, 2007.
3. George F.Coulouris, Jean Dollimore, and Tim Kindberg, “Distributed systems: concepts
and design”, Pearson Education, 2005.
4. Gregor Kosec and Roman Trobec, “Parallel Scientific Computing: Theory, Algorithms, and
Applications of Mesh Based and Meshless Methods”, Springer, 2015.
Supplementary Reading: 1. Quinn, M. J., Parallel Computing: Theory and Practice (McGraw-Hill Inc.).
2. Gibbons, A., W. Rytter, Efficient Parallel Algorithms (Cambridge Uni. Press).
3. Shameem A and Jason, Multicore Programming, Intel Press, 2006
Weblinks:
1 https://www.tutorialspoint.com/parallel_algorithm/parallel_algorithm_introduction.htm
Pedagogy: Participative learning, discussions, demonstrations, practical, assignment, PowerPoint
presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Parallel and Distributed Computing— Introduction- Benefits
and Needs- Parallel and
Distributed Systems- Programming Environment- Theoretical
Foundations- Parallel
Algorithms— Introduction- Parallel Models and Algorithms-
Sorting- Matrix
Multiplication- Convex Hull- Pointer Based Data Structures.
10 - -
2
Synchronization- Process Parallel Languages- Architecture of
Parallel and Distributed
Systems- Consistency and Replication- Security- Parallel
Operating Systems.
10 - -
3
Management of Resources in Parallel Systems- Tools for
Parallel Computing- Parallel
Database Systems and Multimedia Object Servers.
6 - -
4
Networking Aspects of Distributed and Parallel Computing-
Process- Parallel and
Distributed Scientific Computing.
11 - -
5
High-Performance Computing in Molecular Sciences-
Communication- Multimedia
Applications for Parallel and Distributed Systems- Distributed
File Systems.
8 - -
ggest the below items:
Prepared By
Ms. Deepali Sonawane
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1103
Course Category Core Big Data Analytics
Course Title Big Data Architecture & Ecosystem - Hadoop
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:
1. Learn Injecting data into Hadoop
2. Learn to build and maintain reliable, scalable, distributed systems with Hadooop
3. Able to apply Hadoop ecosystem components.
Course Outcomes:
1. Students will learn injecting data into Hadoop .
2. They will able to learn distributed systems with Apache Hadoop.
3. They will able to apply Hadoop ecosystem components.
Course Contents:
1. Introduction to big data: Introduction, distributed file system, Big Data and its importance,
Drivers, Big data analytics, Big data applications. Algorithms, Matrix-Vector, Multiplication by
Map Reduce.
2. Introduction to HADOOP: Big Data, Apache Hadoop & Hadoop Ecosystem, MapReduce,
Data Serialization.
3. HADOOP Architecture: Architecture, Storage, Task trackers, Hadoop Configuration
4. HADOOP ecosystem and yarn: Hadoop ecosystem components, Hadoop 2.0 New Features
NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
1. Introduction to big data
Introduction – distributed file system – Big Data and its
importance, Four Vs, Drivers for Big data, Big data analytics, Big
data applications. Algorithms using map reduce, Matrix-Vector
Multiplication by Map Reduce.
11 - -
2
Introduction to HADOOP
Big Data, Apache Hadoop & Hadoop Ecosystem, Moving Data
in and out of Hadoop,
Understanding inputs and outputs of MapReduce, Data
Serialization.
11 - -
Learning Resources:
Reference Books:
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,
Wiley, ISBN: 9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al. “Understanding Big data ”, McGraw Hill, 2012.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
Supplementary Reading:
Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/
Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
3
HADOOP Architecture
Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop
Shell commands, Anatomy of
File Write and Read, NameNode, Secondary NameNode, and
DataNode, Hadoop MapReduce
Paradigm, Map and Reduce tasks, Job, Task trackers - Cluster
Setup – SSH &Hadoop
Configuration – HDFS Administering –Monitoring &
Maintenance.
12 - -
4
HADOOP ecosystem and yarn
Hadoop ecosystem components - Schedulers - Fair and Capacity,
Hadoop 2.0 New Features NameNode High Availability, HDFS
Federation, MRv2, YARN, Running MRv1 in YARN.
11 - -
ggest the below items:
Prepared By
Ms. Deepali Sonawane
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1104
Course Category Core Big Data Analytics
Course Title Python Programming
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML.
Course Objectives:
1. To understand why Python is a useful scripting language for developers.
2. To learn how to design and program Python applications.
3. To learn how to use lists, tuples, and dictionaries in Python programs.
4. To learn how to identify Python object types.
5. To define the structure and components of a Python program.
6. To learn how to write loops and decision statements in Python.
Course Outcomes:
1. Students will demonstrate the ability to solve problems using system approaches, critical
and innovative thinking, and technology to create solutions.
2. Students will design, develop, and present their final project.
3. Students will understand the purpose and the process of code reviews.
4. Students will be able to create scripts in Python for Autodesk's Maya.
5. Students will understand and will be able to articulate and apply the principles of 3D
graphics
Course Contents:
Introduction to Python
Introduction to python language.
Conditional Statements & Looping
Introduction conditional and looping statements in python
String Manipulation
Introduction to various operations on strings.
Lists, Tuple and Dictionaries
Introduction to various operations on Lists, Tuple and Dictionaries.
Dr. Sudhir Gavhane
Dean, LASC
`
Functions
Introduction to functions in python.
Modules
Introduction to module, package in python.
Input-Output
Handling of inputs in python
Regular expressions
Use of regular expression in python
CGI
Introduction to CGI and cookies.
Database
Handling of database in python.
Learning Resources:
Reference Books:
1. Dive into Python by Mark Pilgrim
2. Programming Python by Mark Lutz, O’Reilly Media
3. Python Programming: An Introduction to Computer Science” by John Zelle
Supplementary Reading:
1. Python Testing Cookbook by Greg L. Turnquist
Web Resources:
1. www.tutorialspoint.com/python/
2. docs.python.org/3/tutorial/
3. www.learnpython.org
4. www.guru99.com/python-tutorials.html
5. www.tutorialspoint.com/cprogramming/
6. www.learn-c.org/
7. www.w3schools.in/c-tutorial/
Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through
practical problem solving, assignment, PowerPoint presentation
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to Python
History
Features
Setting up path
working with Python
Basic Syntax
Variable and Data Types
Operator
4 - -
2
Conditional Statements & Looping
If, If- else, Nested if-else
For, While, Nested loops
Break, Continue, Pass
4 - -
3
String Manipulation
Accessing Strings
Basic Operations
String slices
Function and Methods
5 - -
4 Lists, Tuple and Dictionaries
Lists – Introduction, Accessing list, Operations, Working with 6 - -
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
lists, Function and Methods
Tuple – Introduction, Accessing tuples, Operations, Working,
Functions and Methods
Dictionaries - Introduction, Accessing values in dictionaries,
working with dictionaries, Properties, Functions
5
Functions
Defining a function
calling a function
Types of functions
Function Arguments
Anonymous functions
Global and local variables
4 - -
6
Modules
Importing module
Math module
Random module
Packages
Composition
4 - -
7
Input-Output
Printing on screen
Reading data from keyboard
Opening and closing file
Reading and writing files
Functions
4 - -
8
Regular expressions
Match function
Search function
Matching VS Searching
Modifiers
Patterns
4 - -
9
CGI
Introduction
Architecture
CGI environment variable
GET and POST methods
5 - -
Dr. Sudhir Gavhane
Dean, LASC
`
Cookies
File upload
10
Database
Introduction
Connections
Executing queries
Transactions
Handling error
5 - -
ggest the below items:
Prepared By
Ms. Punam Nikam Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1105
Course Category Core Big Data Analytics
Course Title Lab on Python
Teaching Scheme and Credits
Weekly load hrsDr. Sudhir Gavhane Dean, LASC
L T Laboratory Credits
- - 3 3
Pre-requisites:
Knowledge of any scripting language, XML.
Course Objectives:
1. To understand why Python is a useful scripting language for developers.
2. To learn how to design and program Python applications.
3. To learn how to use lists, tuples, and dictionaries in Python programs.
4. To learn how to identify Python object types.
5. To define the structure and components of a Python program.
6. To learn how to write loops and decision statements in Python.
Course Outcomes:
1. Students will demonstrate the ability to solve problems using system approaches, critical
and innovative thinking, and technology to create solutions.
2. Students will design, develop, and present their final project.
3. Students will understand the purpose and the process of code reviews.
4. Students will be able to create scripts in Python for Autodesk's Maya.
5. Students will understand and will be able to articulate and apply the principles of 3D
graphics
Course Contents:
Introduction to Python
Introduction to python language.
Conditional Statements & Looping
Introduction conditional and looping statements in python
String Manipulation
Introduction to various operations on strings.
Lists, Tuple and Dictionaries
Introduction to various operations on Lists, Tuple and Dictionaries.
Dr. Sudhir Gavhane
Dean, LASC
`
Functions
Introduction to functions in python.
Modules
Introduction to module, package in python.
Input-Output
Handling of inputs in python
Regular expressions
Use of regular expression in python
CGI
Introduction to CGI and cookies.
Database
Handling of database in python.
Laboratory Exercises / Practical:
1. Introduction to Python : Assignment on simple programs in python
2. Conditional Statements & Looping: Assignment on conditional statements and looping
statements
3. String Manipulation: Assignment on string manipulations.
4. Lists, Tuple and Dictionaries : Assignment on Lists, tuples and directories
5. Functions: Assignment on functions.
6. Modules : Assignment on use of modules
7. Input-Output : Assignment Input-Output operations
8. Regular expressions : Assignment on use of regular expressions
Dr. Sudhir Gavhane Dean, LASC
`
Syllabus:
9. CGI : Assignment on CGI
10. Database : Assignment on database
Learning Resources:
Reference Books:
1. Dive into Python by Mark Pilgrim
2. Programming Python by Mark Lutz, O’Reilly Media
3. Python Programming: An Introduction to Computer Science” by John Zelle
Supplementary Reading:
1. Python Testing Cookbook by Greg L. Turnquist
Web Resources:
1. www.tutorialspoint.com/python/
2. docs.python.org/3/tutorial/
3. www.learnpython.org
4. www.guru99.com/python-tutorials.html
5. www.tutorialspoint.com/cprogramming/
6. www.learn-c.org/
7. www.w3schools.in/c-tutorial/
Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through
practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to Python
History
Features
Setting up path
working with Python
Basic Syntax
Variable and Data Types
Operator
- 2 -
2
Conditional Statements & Looping
If, If- else, Nested if-else
For, While, Nested loops
Break, Continue, Pass
- 2 -
3
String Manipulation
Accessing Strings
Basic Operations
String slices
Function and Methods
- 2 -
4
Lists, Tuple and Dictionaries
Lists – Introduction, Accessing list, Operations, Working with
lists, Function and Methods
Tuple – Introduction, Accessing tuples, Operations, Working,
Functions and Methods
Dictionaries - Introduction, Accessing values in dictionaries,
working with dictionaries, Properties, Functions
- 2 -
5
Functions
Defining a function
calling a function
Types of functions
Function Arguments
Anonymous functions
Global and local variables
- 3 -
6
Modules
Importing module
Math module
- 3 -
Dr. Sudhir Gavhane Dean, LASC
`
Random module
Packages
Composition
7
Input-Output
Printing on screen
Reading data from keyboard
Opening and closing file
Reading and writing files
Functions
- 3 -
8
Regular expressions
Match function, Search function
Matching VS Searching
Modifiers
Patterns
- 3 -
9
CGI
Introduction
Architecture
CGI environment variable
GET and POST methods
Cookies
File upload
- 2 -
10
Database
Introduction Connections
Executing queries
Transactions
Handling error
- 2 -
ggest the below items:
Prepared By
Ms. Punam Nikam Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1106
Course Category Core Big Data Analytics
Course Title Lab on Hadoop using HDFS
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:
1. Learn tips and tricks for Big Data use cases and solutions.
2. Learn to build and maintain reliable, scalable, distributed systems with Apache
3. Able to apply Hadoop ecosystem components.
Course Outcomes:
1. Students will learn tips and tricks for Big Data use cases and solutions.
2. They will able to build distributed systems with Apache Hadoop.
3. They will able to apply Hadoop ecosystem components.
Course Contents:
1. Introduction to big data: Introduction, distributed file system, Big Data and its importance,
Drivers, Big data analytics, Big data applications. Algorithms, Matrix-Vector, Multiplication by
Map Reduce.
2. Introduction to HADOOP: Big Data, Apache Hadoop & Hadoop Ecosystem, MapReduce,
Data Serialization.
3. HADOOP Architecture: Architecture, Storage, Task trackers, Hadoop Configuration
4. HADOOP ecosystem and yarn: Hadoop ecosystem components, Hadoop 2.0 New Features
NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.
Lab Assignments
1. Lab on Install and configure Hadoop cluster
Dr. Sudhir Gavhane
Dean, LASC
`
2. Lab on Manipulating files in HDFS using hadoop fs commands.
3. Lab on Manipulating files in HDFS pragmatically using the FileSystem API.Alternative
Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
4. Lab on Write an Inverted Index MapReduce Application with custom Partitioner and
Combiner Custom types and Composite Keys Custom Comparators InputFormats and
OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.
5. Lab on Writing a streaming MapReduce job in Python YARN and Hadoop 2.0.
6. Lab on Exporting data from HDFS to an Other data integration tools: Flume, Kafka,
Informatica, Talend etc.
Learning Resources:
Reference Books:
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,
Wiley, ISBN: 9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al. “Understanding Big data ”, McGraw Hill, 2012.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
Supplementary Reading:
Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/
Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Laboratory Continuous Assessment (LCA) 50 Marks
Practical Oral based on
practical
Site Visit Mini Project Problem based
Learning
Any other
10 20 - - 20 -
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
1. Introduction to big data
Introduction – distributed file system – Big Data and its
importance, Four Vs, Drivers for Big data, Big data analytics, Big
data applications. Algorithms using map reduce, Matrix-Vector
Multiplication by Map Reduce.
11 - -
2
Introduction to HADOOP
Big Data, Apache Hadoop & Hadoop Ecosystem, Moving Data
in and out of Hadoop,
Understanding inputs and outputs of MapReduce, Data
Serialization.
11 - -
3
HADOOP Architecture
Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop
Shell commands, Anatomy of
File Write and Read, NameNode, Secondary NameNode, and
DataNode, Hadoop MapReduce
Paradigm, Map and Reduce tasks, Job, Task trackers - Cluster
Setup – SSH &Hadoop
Configuration – HDFS Administering –Monitoring &
Maintenance.
12 - -
4
HADOOP ecosystem and yarn
Hadoop ecosystem components - Schedulers - Fair and Capacity,
Hadoop 2.0 New Features NameNode High Availability, HDFS
Federation, MRv2, YARN, Running MRv1 in YARN.
11 - -
ggest the below items:
Prepared By
Ms. Deepali Sonawane
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1201
Course Category Core Big Data Analytics
Course Title R Programming
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites: Knowledge of any Programming Language
Course Objectives:
1. Understand the basics of R programming including objects, classes, vectors etc.
2. Write functions including generic functions using various methods and loops
3. Install various packages and work effectively in the R environment
4. Become proficient in writing a fundamental program and perform analytics with R
Course Outcomes:
Students will be able to:
1. Recognize and make appropriate use of different types of data structures
2. Use R to create sophisticated figures and graphs
3. Identify and implement appropriate control structures to solve a particular programming
problem
4. Design and write functions in R and implement simple iterative algorithms.
Course Contents:
Introduction to R
Overview of R programming, Evolution of R, Applications of R programming, Basic syntax
Basic Concepts of R
Reserved Words, Variables & Constants
Data structures in R
Vectors, Matrix
Control flow
If...else,If else() Function
Functions
R Functions, Function Return Value
Strings
Dr. Sudhir Gavhane
Dean, LASC
String construction rules
R packages
Study of different packages in R
R Data Reshaping
Joining Columns and Rows in a Data Frame
Working with files
Read and writing into different types of files
R object and Class
Object and Class,R S3 Class,R S4 Class
Data visualization in R and Data Management
Bar Chart,Dot Plot
Statistical modelling and Databases in R
Mean, mode, median
Learning Resources:
Reference Books:
1. The Art of R Programming-a tour of statistical software design by Norman Matloff
2. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'Reilly
Cookbooks) by Paul Teetor
3. R in Action Book by Rob Kabacoff
4. Practical Data Science with R by Nina Zumel , John Mount , Jim Porzak
5. Learning R: A Step-by-Step Function Guide to Data Analysis by Richard Cotton
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1 Introduction to R:
Overview of R programming, Evolution of R, Applications of R
programming, Basic syntax
2 - -
2 Basic Concepts of R: Reserved Words, Variables & Constants
Operators, Operator Precedence, Data Types , Input and Output 4 - -
3 Data structures in R: Vectors, Matrix, List in R programming
Data Frame, Factor 5 - -
4 Control flow: If...else, If else() Function, Programming for loop
While Loop, Break & next, Repeat Loop 4 - -
5 Functions: R Functions, Function Return Value, Environment &
Scope, R Recursive Function, R Infix Operator, R Switch
Function.
4 - -
6 Strings: String construction rules, String Manipulation functions 3 - -
7 R packages: Study of different packages in R 2 - -
8 R Data Reshaping: Joining Columns and Rows in a Data Frame
Merging Data Frames, Melting and Casting 4 - -
9 Working with files: Read and writing into different types of files 2 - -
10 R object and Class Object and Class: R S3 Class, R S4 Class
R Reference Class, R Inheritance 2 - -
11
Data visualization in R and Data Management: Bar Chart, Dot
Plot, Scatter Plot (3D), Spinning Scatter Plots, Pie Chart
Histogram (3D) [including colorful ones], Overlapping
Histograms, Boxplot, Plotting with Base and Lattice Graphics
Missing Value Treatment, Outlier Treatment, Sorting Datasets
Merging Datasets, Binning variables
7 - -
12 Statistical modelling and Databases in R: Mean, mode, median
Linear regression, Decision tree, K-means Clustering, RODBC
and DBI Package, Performing queries
6 - -
Prepared By
Preeti Adhav Lecturer
Checked By Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean
COURSE STRUCTURE
Course Code MIT-WPU-BA-1202
Course Category Core Big Data Analytics
Course Title Distributed Processing of Data using Hadoop
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 -- -- 3
Pre-requisites:
Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:
What is Hadoop and how can it help process large data sets.
How to write MapReduce programs using Hadoop API.
How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API,
for effectively loading and processing data in Hadoop.
How to ingest data from a RDBMS or a data warehouse to Hadoop.
Best practices for building, debugging and optimizing Hadoop solutions.
Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how
they can help in BigData projects.
Course Outcomes:
Understand Sqoop architecture and uses Able to load real-time data from an RDBMS
table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto
RDMS tables.
Understand Apache PIG , PIG Data Flow Engine Understand data types, data model, and
modes of execution.
Able to store the data from a Pig relation on to HDFS.
Able to load data into Pig Relation with or without schema.
Able to split, join, filter, and transform the data using pig operators Able to write pig scripts
and work with UDFs.
Understand the importance of Hive, Hive Architecture Able to create Managed, External,
Partitioned and Bucketed Tables Able to Query the data, perform joins between tables
Understand storage formats of Hive Understand Vectorization in Hive
Dr. Sudhir Gavhane Dean, LASC
Course Contents
Data Storage
What is Hadoop Distributed File System (HDFS). Architecture of HDFS.Architectural assumptions
and goals.How data is stored in HDFS.How data is read from HDFS
Namenodes and Datanodes
Data Processing
What is use of MapReduce.Architecture of the MapReduce framework.what are Phases of a
MapReduce Job.what are MapReduce Design Patterns.what is YARN Architecture
Data Integration
How to Integrate Hadoop into your existing enterprise.Introduction to Sqoop
Higher Level Tools
Workflows of Oozie.An introduction & Architecture hive.Data Types and File Formats
How to Create Tables and Load Data.how to Read & Querying Data. introduction to Pig
Grunt Shell.what is Pig's Data Model.An introduction to HBase.what is Architecture of Client
API & MapReduce Integration
Learning Resources:
Reference Books:
1. The Definitive Guide by Tom White.
2. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
3. Professional Hadoop Solutions by Boris Lublinksy, Kevin Smith, and Alexey Yakubovich
Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 marks
Assignments Test Attendance Viva Presentation Any other
10 10 10 10 10 -
Term End Examination : 50 marks
Dr. Sudhir Gavhane Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Data Storage
File System Abstraction
Big Data and Distributed File Systems
Hadoop Distributed File System (HDFS)
HDFS Architecture
Architectural assumptions and goals
How data is stored in HDFS
How data is read from HDFS
Namenodes and Datanodes
Blocks
Data Replication
Fault Tolerance
Data Integrity
Namespaces
Federation in Hadoop 2.0
High Availability in Hadoop 2.0
Security and Encryption
HDFS Interfaces: FileSystem API, FSShell, WebHDFS, Fuse
etc.
13 - -
2
Data Processing
MapReduce
The fundamentals: map() and reduce()
Data Locality
Architecture of the MapReduce framework.
Phases of a MapReduce Job
Custom types and Composite Keys
Custom Comparators
InputFormats and OutputFormats
Distributed Cache
MapReduce Design Patterns
Sorting
Joins,YARN and Hadoop 2.0
Separating resource management and processing
YARN Applications: MapReduce, Tez, HBase, Storm, Spark,
Giraph etc.
YARN Architecture, ResourceManager, NodeManagers
ApplicationMasters,Containers, Fault Tolerance
12 - -
3 Data Integration
Integrating Hadoop into your existing enterprise.
Introduction to Sqoop
10 - -
Dr. Sudhir Gavhane
Dean, LASC
4
Higher Level Tools
Defining workflows with Oozie
An introduction to Hive
Architecture
Interfaces: Hive Shell, Thrift, JDBC, ODBC etc.
HiveQL: A dialect of SQL
Data Types and File Formats
Creating Tables and Loading Data
Schema at Read
Querying Data
User Defined Functions
An introduction to Pig
Grunt Shell
Pig's Data Model
Pig Latin
User Defined Functions
An introduction to HBase
Architecture
Client API
MapReduce Integration
Schema Design
10
-
-
Prepared By
Ms. Varsha Gholave Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1203
Course Category Core Big Data Analytics
Course Title Operational Research
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 1 -- 3
Pre-requisites:
1. Linear algebra
2. Probability and Statistics
Course Objectives:
To introduce the students to the use of basic methodology for the solution of liner programs
and integer programs.
To introduce the students to the advanced methods for large-scale transportation and
assignment problems.
Course Outcomes:
Define and formulate linear programming problems and appreciate their limitations.
Solve linear programming problems using appropriate techniques and optimization solvers,
interpret the results obtained and translate solutions into directives for action.
Conduct and interpret post-optimal and sensitivity analysis and explain the primal-dual
relationship.
Identify the special features of the transportation problem, and assignment problem.
Course Contents
Introduction to Operation Research
Brief introduction about Optimization and the OR process. Descriptive vs. Simulation. Exact vs.
Heuristic techniques, Deterministic vs. Stochastic models.
LPP and Methods to solve LPP
Duality Theory and applications Dual Simplex method. Sensitivity analysis in L.P., Parametric
Programming. Transportation, assignment and least cost transportation. Interior point methods:
scaling techniques, log barrier methods. Dual and primal dual extensions
Non-Linear programming
Kuhn-Tucker conditions. Convex functions and convex regions. Convex programming
problems. Algorithms for solving convex programming problems.
Dr. Sudhir Gavhane Dean, LASC
PERT and CPM
Basic differences between PERT and CPM. What is Arrow Networks, time estimates, Earliest
expected time. Representation in Tabular Form, Critical Path. Probability of meeting scheduled
date of completion.
Calculation on CPM network, Various floats for activities. Critical path updating projects.
Operation time cost trade off Curve project. Time cost – trade off Curve- Selection of schedule
based on Cost.
Network Flow Problem
Formulation, Max-Flow Min-Cut theorem. Ford and Fulkerson’s algorithm. Exponential
behavior of Ford and Fulkerson’s algorithm.
Learning Resources:
Reference Books:
1. Hadley G. (1969): Linear Programming, Addision Wesley.
2. Taha H. A. (1971): Operations Research an Introduction, Macmillan N. Y.
3. KantiSwaroop, Gupta and Manmohan (1985): Operations Research, Sultan
Chand and Co.
4. Sharma J. K. (2003): Operations Research Theory and Applications, 2
Nd Ed. Macmillan India ltd.
5. Sharma J. K. (1986): Mathematical Models Operations Research, McGraw Hill.
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 marks
Assignments Test Attendance Viva Presentation Any other
10 10 10 10 10 -
Term End Examination : 50 marks
Dr. Sudhir Gavhane
Dean LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to Operation Research
The nature of O.R., History, Meaning, Models, Principles
Problem solving with mathematical models. Optimization and the
OR process. Descriptive vs. Simulation . Exact vs. Heuristic
techniques, Deterministic vs. Stochastic models.
5 - -
2
LPP and Methods to solve LPP
Linear Programming, Introduction. Graphical Solution and
Formulation of L.P. Models Simplex Method (Theory and
Computational aspects), Revised Simplex. Duality Theory and
applications Dual Simplex method. Sensitivity analysis in L.P.,
Parametric Programming. Transportation, assignment and least
cost transportation. Interior point methods: scaling techniques,
log barrier methods. Dual and primal dual extensions
10 - -
3
Non-Linear programming
Kuhn-Tucker conditions. Convex functions and convex regions.
Convex programming problems. Algorithms for solving convex
programming problems.
10 - -
4
PERT and CPM
Basic differences between PERT and CPM. Arrow Networks,
time estimates, Earliest expected time. Latest – allowable
occurrences time. Forward Pass Computation, Backward Pass
Computation. Representation in Tabular Form, Critical Path.
Probability of meeting scheduled date of completion. Calculation
on CPM network, Various floats for activities. Critical path
updating projects. Operation time cost trade off Curve project.
Time cost – trade off Curve- Selection of schedule based on Cost.
10 - -
5
Network Flow Problem
Formulation, Max-Flow Min-Cut theorem. Ford and Fulkerson’s
algorithm. Exponential behavior of Ford and Fulkerson’s
algorithm.
10 - -
Prepared By
Ms. Varsha Ghule
Assistant Professor
Approved By
Dr. Sudhir Gavhane Dean, LASC
Checked By
Pradnya Mahadik
BOS Chairman
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1204
Course Category Core Big Data Analytics
Course Title Next Generation Databases (No SQL
databases)
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Knowledge of RDMS
Course Objectives:
1. To study the usage and applications of Object Oriented database
2. To acquire knowledge on variety of NoSQL databases
3. To attain inquisitive attitude towards research topics in NoSQL databases
Course Outcomes:
1: Master the basics of SQL and construct queries using Pl/SQL efficiently and apply object
oriented features for developing database applications.
2: Compare and Contrast NoSQL databases with each other and Relational Database
Systems
3: Critically analyse and evaluate variety of NoSQL databases.
4: Demonstrate the knowledge of Key-Value databases, Document based Databases,
Column based Databases and Graph Databases.
Course Contents:
1. Introduction to NOSQL
Definition of NOSQL, History of NOSQL and Different NOSQL products, Exploring MondoDB
Java/Ruby/Python, Interfacing and Interacting with NOSQL
2. NOSQL Basics
NOSQL Storage Architecture, CRUD operations with MongoDB, Querying, Modifying and
Managing NOSQL Data stores, Indexing and ordering datasets (MongoDB/CouchDB/Cassandra)
3. Advanced NOSQL
NOSQL in CLOUD, Parallel Processing with Map Reduce, Big Data with Hive
4. Working with NOSQL
Surveying Database Internals, Migrating from RDBMS to NOSQL, Web Frameworksand NOSQL,
using MySQL as a NOSQL
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to NOSQL
Definition of NOSQL, History of NOSQL and Different NOSQL
products, Exploring MondoDB Java/Ruby/Python, Interfacing
and Interacting with NOSQL
6 - -
5. Developing Web Application with NOSQL and NOSQL Administration
Php and MongoDB, Python and MongoDB, Creating Blog Application with PHP, NOSQL
Database Administration
Learning Resources:
Reference Books:
Dan Sullivan,"NoSQL for Mere Mortals",1 stEdition, Pearson Education, 2015. (ISBN-13:
978-9332557338)
Supplementary Reading:
Pramod J. Sadalage, Martin Fowler,"NoSQL Distilled: A Brief Guide to the Emerging
World of Polyglot Persistence", 1 stEdition, Pearson Education, 2012. (ISBN-13: 978-
8131775691
Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through
practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
2
NOSQL Basics
NOSQL Storage Architecture, CRUD operations with MongoDB,
Querying, Modifying and Managing NOSQL Data stores,
Indexing and ordering datasets (MongoDB/CouchDB/Cassandra)
12 - 1
3
Advanced NOSQL
NOSQL in CLOUD, Parallel Processing with Map Reduce, Big
Data with Hive
8 - 1
4
Working with NOSQL
Surveying Database Internals, Migrating from RDBMS to
NOSQL, Web Frameworksand NOSQL, using MySQL as a
NOSQL
9 - 1
5
Developing Web Application with NOSQL and NOSQL
Administration
Php and MongoDB, Python and MongoDB, Creating Blog
Application with PHP, NOSQL Database Administration
10 - 1
ggest the below items:
Prepared By
Ms. Smita Patil
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1205
Course Category Core Big Data Analytics
Course Title Lab on R Programming
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
- - 3 3
Pre-requisites: Knowledge of any Programming Language
Course Objectives:
1. Understand the basics of R programming including objects, classes, vectors etc.
2. Write functions including generic functions using various methods and loops
3. Install various packages and work effectively in the R environment
4. Become proficient in writing a fundamental program and perform analytics with R
Course Outcomes:
Students will be able to:
1. Recognize and make appropriate use of different types of data structures
2. Use R to create sophisticated figures and graphs
3. Identify and implement appropriate control structures to solve a particular programming
problem
4. Design and write functions in R and implement simple iterative algorithms.
Course Contents:
Basic Concepts of R: Variables, constants, Operators, datatypes, input output
Data structures in R:
Vectors, Matrix, List, Data Frame/ Factor
Control flow:
Decision making, Repeat, while, for
Functions: built-in, user defined
R packages, R Data Reshaping: Joining Columns and Rows in a Data Frame, Merging Data
Frames
Working with files, R object and Class: csv, excel, S3 and S4 Class, reference
Data visualization in R and Data Management: Bar Chart, Dot Plot, Scatter Plot (3D),Spinning
Dr. Sudhir Gavhane
Dean, LASC
Scatter Plots, Pie Chart, Histogram (3D) [including colorful ones], Overlapping Histograms,
Boxplot, Plotting with Base and Lattice Graphics, Missing Value Treatment, Outlier Treatment,
Sorting Datasets, Merging Datasets, Binning variables
Statistical modelling and Databases in R: Mean, mode, median, Linear regression,
Decision tree, K-means Clustering
Laboratory Exercises / Practical:
1.Assignments on Basic Concepts of R
2. Assignments on Data structures in R
3. Assignments on Control flow
4. Assignments on Functions
5. Assignments on R packages, R Data Reshaping
6. Assignments on Working with files, R object and Class
7. Assignments on Data visualization in R and Data Management
8. Assignments on Statistical modelling and Databases in R
Learning Resources:
Reference Books:
1. The Art of R Programming-a tour of statistical software design by Norman Matloff
2. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'Reilly
Cookbooks) by Paul Teetor
3. R in Action Book by Rob Kabacoff
4. Practical Data Science with R by Nina Zumel , John Mount , Jim Porzak
5. Learning R: A Step-by-Step Function Guide to Data Analysis by Richard Cotton
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks External
Dr. Sudhir Gavhane Dean, LASC
Laboratory Continuous Assessment (LCA)50
Practical Oral based on
practical
Site Visit Mini
Project
Problem
based
Learning
Attendance
10 10 - 10 10 10
Term End Examination : 50 Marks External
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1 Basic Concepts of R: Variables, constants, Operators,
datatypes, input output - 3 -
2 Data structures in R:
Vectors, Matrix, List, Data Frame/ Factor - 3 -
3 Control flow:
Decision making, Repeat, while, for - 3 -
4 Functions: built-in, user defined - 3 -
5 R packages, R Data Reshaping: Joining Columns and Rows in a
Data Frame, Merging Data Frames - 3 -
6 Working with files, R object and Class: csv, excel, S3 and S4
Class, reference
- 3 -
7
Data visualization in R and Data Management: Bar Chart, Dot
Plot, Scatter Plot (3D),Spinning Scatter Plots, Pie Chart,
Histogram (3D) [including colorful ones], Overlapping
Histograms, Boxplot, Plotting with Base and Lattice Graphics,
Missing Value Treatment, Outlier Treatment, Sorting Datasets,
Merging Datasets, Binning variables
- 3 -
8
Statistical modelling and Databases in R: Mean, mode,
median, Linear regression, Decision tree, K-means
Clustering
- 3 -
Prepared By
Preeti Adhav Lecturer
Checked By Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean
COURSE STRUCTURE
Course Code MIT-WPU-BA-1206
Course Category Core Big Data Analytics
Course Title Lab on Hadoop and Databases
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
-- -- 3 3
Pre-requisites:
Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:
What is Hadoop and how can it help process large data sets.
How to write MapReduce programs using Hadoop API.
How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API,
for effectively loading and processing data in Hadoop.
How to ingest data from a RDBMS or a data warehouse to Hadoop.
Best practices for building, debugging and optimizing Hadoop solutions.
Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how
they can help in BigData projects.
Course Outcomes:
Understand Sqoop architecture and uses Able to load real-time data from an RDBMS
table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto
RDMS tables.
Understand Apache PIG , PIG Data Flow Engine Understand data types, data model, and
modes of execution.
Able to store the data from a Pig relation on to HDFS.
Able to load data into Pig Relation with or without schema.
Able to split, join, filter, and transform the data using pig operators Able to write pig scripts
and work with UDFs.
Understand the importance of Hive, Hive Architecture Able to create Managed, External,
Partitioned and Bucketed Tables Able to Query the data, perform joins between tables
Understand storage formats of Hive Understand Vectorization in Hive
Course Contents
Data Storage
What is Hadoop Distributed File System (HDFS). Architecture of HDFS.Architectural assumptions
and goals.How data is stored in HDFS.How data is read from HDFS
Dr. Sudhir Gavhane
Dean, LASC
Namenodes and Datanodes
Data Processing
What is use of MapReduce.Architecture of the MapReduce framework.what are Phases of a
MapReduce Job.what are MapReduce Design Patterns.what is YARN Architecture
Data Integration
How to Integrate Hadoop into your existing enterprise.Introduction to Sqoop
Higher Level Tools
Workflows of Oozie.An introduction & Architecture hive.Data Types and File Formats
How to Create Tables and Load Data.how to Read & Querying Data. introduction to Pig
Grunt Shell.what is Pig's Data Model.An introduction to HBase.what is Architecture of Client
API & MapReduce Integration
Lab Assignments
1. Lab on Manipulating files in HDFS pragmatically using the FileSystem API.Alternative
Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
2. Lab on Write an Inverted Index MapReduce Application with custom Partitioner and
Combiner Custom types and Composite Keys Custom Comparators InputFormats and
OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.
3. Lab on Writing a streaming MapReduce job in Python YARN and Hadoop 2.0.
4. Lab on Importing data from an RDBMS to HDFS using Sqoop.
5. Lab on Exporting data from HDFS to an Other data integration tools: Flume, Kafka,
Informatica, Talend etc.
Learning Resources:
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
Data Storage
File System Abstraction
Big Data and Distributed File Systems
Hadoop Distributed File System (HDFS)
HDFS Architecture,Architectural assumptions and goals
How data is stored in HDFS
How data is read from HDFS
Namenodes and Datanodes
Blocks,Data Replication
Fault Tolerance
Data Integrity Namespaces
Federation in Hadoop 2.0
High Availability in Hadoop 2.0
Security and Encryption
HDFS Interfaces: FileSystem API, FSShell, WebHDFS,
Fuse etc.
- 13 -
Reference Books:
1. The Definitive Guide by Tom White.
2. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
3. Professional Hadoop Solutions by Boris Lublinksy, Kevin Smith, and Alexey Yakubovich
Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 marks
Practical Viva Attendance Mini
Project
Any other
15 10 15 10 -
Term End Examination : 50 marks
Dr. Sudhir Gavhane
Dean, LASC
2
Data Processing
MapReduce
The fundamentals: map() and reduce()
Data Locality
Architecture of the MapReduce framework.
Phases of a MapReduce Job
Custom types and Composite Keys
Custom Comparators
InputFormats and OutputFormats
Distributed Cache
MapReduce Design Patterns
Sorting Joins
YARN and Hadoop 2.0
Separating resource management and processing
YARN Applications: MapReduce, Tez, HBase, Storm,
Spark, Giraph etc.
YARN Architecture
ResourceManager
NodeManagers
ApplicationMasters
Containers Fault Tolerance
12 -
3
Data Integration
Integrating Hadoop into your existing enterprise.
Introduction to Sqoop
- 10 -
4
Higher Level Tools
Defining workflows with Oozie
An introduction to Hive
Architecture Interfaces: Hive Shell, Thrift, JDBC, ODBC
etc. HiveQL: A dialect of SQL
Data Types and File Formats
Creating Tables and Loading Data
Schema at Read Querying Data
User Defined Functions
An introduction to Pig
Grunt Shell
Pig's Data Model
Pig Latin
User Defined Functions
An introduction to HBase
Architecture
Client API
MapReduce Integration
Schema Design
- 10
Prepared By
Ms. Varsha Ghule
Assistant Professor
Approved By
Dr. Sudhir Gavhane Dean, LASC
Checked By
Pradnya Mahadik
BOS Chairman
`COURSE STRUCTURE
Course Code MIT-WPU- MBD-1301
Course Category Core Big Data Analytics
Course Title Statistical Computing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 3
Pre-requisites:
1. Linear algebra
2. Probability and Statistics
Course Objectives:
1. To provide an understanding of concepts and techniques of Business Statistics
2. How to use Excel, Python or R to solve Business Statistics problems
3.To learn Experimental Design
Course Outcomes:
1. The student should be able to formulate and solve problems related to topics covered in this course.
2. The student should be able to solve the problems using Python or R
3. Perform statistical analysis on variety of data.
Course Contents:
1. Data and Statistics
2. Descriptive Statistics: Tabular and Graphical Presentations
3. Descriptive Statistics: Numerical Measures
4. Probability
5. Discrete Probability Distributions
6. Continuous Probability Distribution
7. Sampling and Sampling Distributions
8. Interval Estimation
9. Fundamentals of Hypothesis Testing
10. Two-Sample Tests
11. Inferences about Population Variances
12. Tests of Goodness of Fit and Independence
13. Experimental Design and ANOVA
14. Simple Linear Regression
Laboratory Exercises / Practical:
1. Discrete Probability Distributions
2. Continuous Probability Distribution
3. Sampling and Sampling Distributions
Dr. Sudhir Gavhane
Dean, LASC
4. Interval Estimation
5. Fundamentals of Hypothesis Testing
6. Two-Sample Tests
7. Inferences about Population Variances
8. Tests of Goodness of Fit and Independence
9. Experimental Design and ANOVA
10. Simple Linear Regression
Learning Resources:
Reference Books:
Text Book: David R Anderson, Dennis J Sweeney, Thomas A Williams, Jeffrey D. Camm and
James J. Cochran, Statistics for Business and Economics. 12th Edition. Cengage Learning. 2014
(note that a new edition, 13e, has recently come up, but mostly unavailable)
Pedagogy: Participative learning, discussions, demonstrations, practical, assignment
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Presentations Case study Attendance
10% 10% 10% 10% 10%
Term End Examination : 50%
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Data and Statistics:Applications in Business and
Economics, Data Data Sources, Descriptive Statistics,
Statistical Inference Computers and Statistical Analysis,
Data Mining and Ethical Guidelines for Statistical
Practice (Self Study
4
2
Descriptive Statistics: Tabular and Graphical
Presentations, Summarizing Qualitative Data,
Summarizing Quantitative Data, Cross Tabulation and
Scatter Diagrams, Data Visualization Practices (Self
Study)
4
3
Descriptive Statistics: Numerical Measures
Measures of Location
Measures of Variability Measures of Shape,
Relative Location and Detecting Outliers
Exploratory Data Analysis
Measures of Association between Two Variables
Data Dashboards (Self Study)
4
4
Probability
Basic Probability Concepts
Conditional Probability
Bayes’ Theorem
4
5
Discrete Probability Distributions
Probability Distribution for a Discrete Random
Variable
Properties: Expectation, Variance
Binomial Distribution
Poisson Distribution
Hypergeometric Distribution
Discrete Bivariate Distributions: Covariance and Financial
Portfolios
4
6
Continuous Probability Distribution
Uniform Probability Distributions
Normal Probability Distribution
Normal Approximation to Binomial Probabilities
Exponential Probability Distribution
4
Dr. Sudhir Gavhane
Dean, LASC
7
Sampling and Sampling Distributions
Simple Random Sampling
Point Estimation
Introduction to Sampling Distribution
Sampling Distribution of the Mean
Sampling Distribution of Proportion
Properties of Point Estimators
Other Sampling Methods
4
8
Interval Estimation
Confidence Interval Estimation for the Mean (σ
known)
Confidence Interval Estimation for the Mean (σ
unknown)
Determining Sample Size
Confidence Interval Estimation for the
Proportion
4
9
Fundamentals of Hypothesis Testing
Hypothesis Testing Methodology
Z test of Hypothesis for the Mean (σ known)
t test of Hypothesis for the Mean (σ unknown)
Z test of Hypothesis for the Proportion
Decision Making, Probability of Type-II Errors,
Sample Size Determination
4
10
. Two-Sample Tests
Comparing Means of Two Independent
Populations
Comparing Means of Two Related Populations
Comparing Two Population Proportions
4
11
. Inferences about Population Variances
Inferences about a Population Variance
Inferences about Two Population Variances
4
12
Tests of Goodness of Fit and Independence
Test the Equality of Three or More Population
Proportions
Test of Independence
Goodness of Fit Test: A Multinomial Population
(Self Study)
4
Dr. Sudhir Gavhane
Dean, LASC
13
Experimental Design and ANOVA
An Introduction
ANOVA and the Completely Randomized
Design
Multiple Comparison Procedure
Randomized Block Design and Factorial
Experiment (Self Study)
4
14
Simple Linear Regression
Simple Linear Regression Model
Least Squares method
Coefficient of Determination
Model Assumptions
Testing for Significance
Computer Solution
Residual Analysis (Self Study)
4
Checked By Ms. Pradnya Mahadik BOS Chairman BOS Chairman
Approved by Dr. Sudhir Gavhane Dean, LASC
Prepared by Ms. Pradnya Mahadik Assistant Professor
`
COURSE STRUCTURE
Course Code MIT-WPU- MBD -1302
Course Category Core BigData Analytics
Course Title Information Security
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Basic concepts of Networking.
Course Objectives:
1. To provide an understanding of principal concepts, major issues, technologies and basic
approaches in information security.
2. Develop a basic understanding of cryptography, how it has evolved and some key encryption
techniques used today
CourseOutcomes:
The students will have firm understanding of:
1. Basic concepts related to network and system level security.
2. Basics of cryptography, security management and network security techniques.
3. Information security governance, and related legal and regulatory issues
4. How threats to an organization are discovered, analyzed, and dealt with.
CourseContents:
UNIT - I Security Attacks (Interruption, Interception, Modification and Fabrication), Security Services
UNIT - II Conventional Encryption Principles, Conventional encryption algorithms
UNIT - III Public key cryptography principles, public key cryptography algorithms
UNIT - IV Email privacy: Pretty Good Privacy (PGP) and S/MIME.
UNIT - V IP Security Overview, IP Security Architecture
UNIT - VI Web Security Requirements, Secure Socket Layer (SSL) and Transport Layer Security (TLS)
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
UNIT - I Security Attacks (Interruption, Interception, Modification and
Fabrication), Security Services (Confidentiality, Authentication,
Integrity, Non-repudiation, access Control and Availability) and
9 - -
UNIT - VII Basic concepts of SNMP, SNMPv1 Community facility and SNMPv3.
UNIT - VIII Firewall Design principles, Trusted Systems. Intrusion Detection Systems.
LearningResources:
TEXT BOOKS: 1. Network Security Essentials (Applications and Standards) by William Stallings Pearson
Education.
2. Hack Proofing your network by Ryan Russell, Dan Kaminsky, Rain Forest Puppy, Joe Grand,
David Ahmad, Hal Flynn Ido Dubrawsky, Steve W.Manzuik and Ryan Permeh, Wiley Dreamtech
REFERENCES: 1. Fundamentals of Network Security by Eric Maiwald (Dreamtech press)
2. Network Security - Private Communication in a Public World by Charlie Kaufman, Radia
Perlman and Mike Speciner, Pearson/PHI.
3. Cryptography and network Security, Third edition, Stallings, PHI/Pearson
4. Principles of Information Security, Whitman, Thomson.
5. Network Security: The complete reference, Robert Bragg, Mark Rhodes, TMH
6. Introduction to Cryptography, Buchmann, Springer.
Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 10 - 10 - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Mechanisms, A model for Internetwork security, Internet
Standards and RFCs, Buffer overflow & format string
vulnerabilities, TCP session hijacking, ARP attacks, route table
modification, UDP hijacking, and man-in-the-middle attacks.
2
UNIT - II Conventional Encryption Principles, Conventional encryption
algorithms, cipher block modes of operation, location of
encryption devices, key distribution Approaches of Message
Authentication, Secure Hash Functions and HMAC.
6 - -
3
UNIT - III Public key cryptography principles, public key cryptography
algorithms, digital signatures, digital Certificates, Certificate
Authority and key management Kerberos, X.509 Directory
Authentication Service.
6 - -
4 UNIT - IV Email privacy: Pretty Good Privacy (PGP) and S/MIME.
4 - -
5
UNIT - V IP Security Overview, IP Security Architecture, Authentication
Header, Encapsulating Security Payload, Combining Security
Associations and Key Management.
7 - -
6
UNIT - VI Web Security Requirements, Secure Socket Layer (SSL) and
Transport Layer Security (TLS), Secure Electronic Transaction
(SET).
5 - -
7 UNIT - VII Basic concepts of SNMP, SNMPv1 Community facility and
SNMPv3. Intruders, Viruses and related threats.
5 - -
8 UNIT - VIII Firewall Design principles, Trusted Systems. Intrusion Detection
Systems. 3 - -
ggest the below items:
Prepared By
Ms. Devyani B Kamble
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1303
Course Category Core BigData Analytics
Course Title Big Data – Apache Spark - In memory
distributed processing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Basic knowledge of Object Oriented programming concepts, Java, database concepts and
any of the Linux operating system flavors.
Course Objectives:
1. To understand the concepts of Scala and learn their implementation.
2. To understand the Apache Spark architecture.
3. To understand Spark Resilient Distributed Datasets – Transformation, Action.
CourseOutcomes:
The student will get knowledge of: 1. Concepts of Scala and its implementation. 2. Concepts of Spark and how it is used along with Spark.
CourseContents:
Introduction: Introduction to Scala, History of Scala
Conditional Expressions: If-else, While, do-while
Scala Function: Function declaration, function definition.
Scala Classes and Objects: Object, Class, Singleton Object
Array and Strings: Single dimensional
Scala Collections: Sequence, List
File Input-Output: Reading and Writing of files
Introduction to Apache Spark: Features of Apache Spark
Resilient Distributed Dataset(RDD): Introduction of Resilient Distributed Dataset
Spark RDD operations: RDD Transformation
Dr. Sudhir Gavhane
Dean, LASC
`
LearningResources:
1. Programming Scala by Dean Wampler, Alex Payne
2. Scala Cookbook by Alvin Alexander
3. Scala in depth by Joshua D. Suereth
4. Programming in Scala by Martin Odersky, Lex Spoon, Bill Venners
5. Scala for the Impatient by Cay S. Horstmann
6. Learning Spark by Matei Zaharia, Patrick Wendell, Andy Konwinski, Holden Karau
7. Advanced Analytics with Spark by Sandy Ryza, Uri Laserson, Sean Owen and Josh Wills
8. Mastering Apache Spark by Mike Frampton
9. Apache Spark Graph Processing by Rindra Ramamonjison
Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1 Introduction: Introduction to Scala, History of Scala, Features
Basic Syntax, Scala Comments, Variables, Data types, Operators. 3 - -
2 Conditional Expressions: If-else, While, do-while, for, Pattern
matching, break statement. 5 - -
3
Scala Function: Function declaration, function definition,
Function calling, Functions-Call by name, Functions with named
arguments, Functions with variable arguments, Default parameter
values, Nested functions, Recursion, Higher order functions,
Scala Closures.
7 - -
4
Scala Classes and Objects: Object, Class, Singleton Object,
Companion Object, access modifiers, constructors, method
overloading, inheritance, method overriding, this keyword,
inheritance, method overriding, field overriding, final, Scala
Abstract class, Scala Trait, Apply and Unapply.
4 - -
5 Array and Strings: Single dimensional, Passing array into the
function, Multidimensional Array, Strings, String methods, String
Interpolation
5 - -
6 Scala Collections: Sequence, List, Set, Map, Tuples, Options,
Iterators 5 - -
7 File Input-Output: Reading and Writing of files 1 - -
8
Introduction to Apache Spark: Features of Apache Spark,
Apache Spark Architecture, Spark Applications, Apache Spark
Components, Describe the Different Data Sources and Formats in
Spark.
5 - -
9 Resilient Distributed Dataset (RDD): Introduction of RDD,
Features of RDD in Spark, RDD operations. 5 - -
10 Spark RDD operations: RDD Transformation, RDD Action. 5 - -
ggest the below items:
Prepared By
Ms. Devyani B Kamble
Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-1304
Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -I
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
Course Objectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms
Course Outcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.
Course Contents
Introduction to learning
What is Supervised, Unsupervised and Reinforcement Learning? visualization of algebraic
concepts
Linear Regression
What is Regression? What is simple one variable regression line and coefficients of the line? What
are assumptions of linear regression? What is Gradient descent algorithm, cost function to find
'beta' values and concept
Gradient Descent
How to represent matrix of problem? How to use Gradient descent for multiple features and
scaling techniques in gradient descent? What are types of feature scaling, finding coefficients
analytically?
Dr. Sudhir Gavhane Dean
Logistic Regression
What is Logistic regression model? What is Sigmoid function and its graphical representation?
What is Receiver-operating characteristic (RoC) curve? What is the use of RoC curve?
Optimization and Classifications
What is Optimization objective from logistic regression? What is large margin classifier? What is
concept behind large margin classifications using SVM?
Learning Resources:
1. T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,
2. Springer, 2009.
3. E. Alpaydin, “Machine Learning”, MIT Press, 2010.
4. K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
5. C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
6. Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning for Dummies“, John Wiley &
Sons, 2016.
Pedagogy:
Participative learning, discussions, algorithm, Program writing, experiential learning through
practical problem-solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane Dean
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to learning
Supervised, Unsupervised and Reinforcement Learning,
geometry (lines, curves and 3D spaces) and visualization of
algebraic concepts
5 - -
2
Linear Regression
Regression as a concept, simple one variable regression line,
coefficients of the line, assumptions of linear regression,
Gradient descent algorithm, cost function to find 'beta' values
and concept, local and global minima, concept of learning rate
8 - -
3
Gradient Descent
Matrix representation of problem, Gradient descent for multiple
features, use of feature scaling techniques in gradient descent,
types of feature scaling, finding coefficients analytically,
normal equation (matrix)non-invertibility
7 - -
4
Logistic Regression
Logistic regression model, matrix representation, general
Sigmoid function and graphical representation, decision
boundary (linear and non-linear), metrics for logistic regression
(accuracy, sensitivity, specificity etcetera concepts), Receiver-
operating characteristic (RoC) curve, use of RoC curve to find
out optimum decision boundary, convexity and non-convexity
of a group of points
13 - -
5
Optimization and Classifications
Optimization objective from logistic regression to support
vector machines, large margin classifier, concepts behind large
margin classifications, kernels (concept, types and graphical
explanations), using SVM
12
Prepared By
Archana Varade
Assistant Professor
Checked By
Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean
COURSE STRUCTURE
Course Code MIT-WPU- MBD-1305
Course Category Core Big Data Analytics
Course Title Lab on Statistical Computing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 3
Course Objectives:
1. To provide an understanding of concepts and techniques of Business Statistics
2. How to use Excel to solve Business Statistics problems
3. Hands on training on Python and R.
Course Outcomes:
1. The student should be able to formulate and solve problems related to topics covered in this course.
2. The student should be able to solve the problems using Python or R
3. Perform statistical analysis on variety of data.
Course Contents:
Laboratory Exercises / Practical:
1. Data and Statistics
2. Descriptive Statistics: Tabular and Graphical Presentations
3. Descriptive Statistics: Numerical Measures
4. Probability
5. Discrete Probability Distributions
6. Continuous Probability Distribution
7. Sampling and Sampling Distributions
8. Interval Estimation
9. Fundamentals of Hypothesis Testing
10. Two-Sample Tests
11. Inferences about Population Variances
12. Tests of Goodness of Fit and Independence
13. Experimental Design and ANOVA
14. Simple Linear Regression
Dr. Sudhir Gavhane
Dean, LASC
Learning Resources:
Reference Books:
Text Book: David R Anderson, Dennis J Sweeney, Thomas A Williams, Jeffrey D. Camm and James J.
Cochran, Statistics for Business and Economics. 12th Edition. Cengage Learning. 2014 (note that a new
edition, 13e, has recently come up, but mostly unavailable)
Pedagogy: Participative learning, discussions, demonstrations, practical, assignment
Assessment Scheme:
Laboratory Continuous Assessment (LCA)
Practical Oral based on
practical
Problem
based
Learning
Attendance
20% 10% 10% 10%
Term End Examination : 50%
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Ass
ess
1 Data and Statistics: 3
2 Descriptive Statistics:, 3
3 Descriptive Statistics: 3
4 Probability
3
5 Discrete Probability Distributions
3
6 Continuous Probability Distribution
3
7 Sampling and Sampling Distributions
3
8 Interval Estimation
3
9 Fundamentals of Hypothesis Testing
3
10 . Two-Sample Tests
3
11 . Inferences about Population Variances
3
12 Tests of Goodness of Fit and Independence
3
13 Experimental Design and ANOVA
3
14 Simple Linear Regression
3
Prepared by Ms. Pradnya Mahadik Assistant Professor
Checked By Ms. Pradnya Mahadik BOS Chairman BOS Chairman
Approved by Dr. Sudhir Gavhane Dean, LASC
COURSE STR UCTURE
Course Code MIT-WPU-MBD-1306
Course Category Core Big Data Analytics
Course Title Lab on Machine Learning Algorithms - I
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
-- -- 3 3
Pre-requisites:
1. 1. Basic Linear Algebra
2. Programming Experience
3. Statistics and Probability
Course Objectives:
1. To introduce basic machine learning techniques.
2. To develop the skills in using recent machine learning software for solvingpractical problems in
high-performance computing environment.
3. To develop the skills in applying appropriate supervised, semi-supervised or unsupervised
learning algorithms for solving practical problems.
Course Outcomes:
1. Students will be able to:
2. Implement and apply machine learning algorithms to solve problems.
3. Select appropriate algorithms for solving a of real-world problems.
4. Use machine learning techniques in high-performance computing environment to solve real-
world problems.
Course Contents
Laboratory Exercises / Practical:
1. Exercises to solve the real-world problems using the following machine learning
methods:
Linear Regression
Logistic Regression
Multi-Class Classification
Neural Networks
Support Vector Machines
K-Means Clustering & PCA
2. Develop programs to implement Anomaly Detection & Recommendation Systems.
3. Implement GPU computing models to solving some of the problems mentioned in Problem 1.
Dr. Sudhir Gavhane Dean, LASC
Reference Books
1. Peter Flach: Machine Learning: The Art and Science of Algorithms that Make
Sense of Data, Cambridge University Press, Edition 2012.
2. Hastie, Tibshirani, Friedman: Introduction to Statistical Machine Learning with
Applications in R, Springer, 2nd Edition-2012.
3. C. M. Bishop : Pattern Recognition and Machine Learning, Springer 1st Edition-
2013.
4. Ethem Alpaydin : Introduction to Machine Learning, PHI 2nd Edition-2013.
5. Parag Kulkarni : Reinforcement and Systematic Machine Learning for Decision
Making, Wiley-IEEE Press, Edition July 2012.
Supplementary Reading:
Web Resources:
Weblinks: -
MOOCs: -
Pedagogy:
Mini Project development, Problem solving approach, Participative learning, discussions, algorithm,
Program writing, experiential learning through practical problem-solving, assignment, PowerPoint
presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Exercises to solve the real-world problems using the following
machine learning methods:
Linear Regression
Logistic Regression
- 3 -
2
Exercises to solve the real-world problems using the following
machine learning methods:
Multi-Class Classification
Neural Networks
- 3 -
3
Exercises to solve the real-world problems using the following
machine learning methods:
Support Vector Machines
K-Means Clustering & PCA
- 3 -
4 Develop programs to implement Anomaly Detection &
Recommendation Systems. - 3 -
5 Implement GPU computing models to solving some of the
problems mentioned in Problem 1. - 3 -
6 Implement GPU computing models to solving some of the
problems mentioned in Problem 2. - 3 -
7 Implement GPU computing models to solving some of the problems
mentioned in Problem 3. - 3 -
Prepared By
Dr. C. H. Patil Assistant Professor
Checked By
Pradnya Mahadik Course Coordinator
Approved By
Dr. Sudhir Gavhane Dean
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2101
Course Category Core Big Data Analytics
Course Title Principles Of Deep Learning
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 -- -- 3
Pre-requisites:
1. This is an upper-level undergraduate/graduate course. All students should have the following
skills:
1. Calculus, Linear Algebra
2. Probability & Statistics
3. Ability to code in Python .
Course Objectives:
Learning in neural networks output vs hidden layers; linear vs nonlinear networks
Course Outcomes: Understand Deep Learning
Course Contents
Course overview What is deep learning? DL successes; syllabus & course logistics;
Intro to neural networks cost functions, hypotheses and tasks; training data; maximum likelihood
based cost, cross entropy, MSE cost; feed-forward networks; MLP, sigmoid units; neuroscience
inspiration; Learning in neural networks output vs hidden layers; linear vs nonlinear networks;
Backpropagation learning via gradient descent; recursive chain rule (backpropagation); if time:
bias-variance tradeoff, regularization; output units: linear, softmax; hidden units: tanh,
Deep learning strategies I (e.g., GPU training, regularization,etc); project proposals
Deep learning strategies II (e.g., RLUs, dropout, etc) SCC/TensorFlow overview How to use the
SCC cluster; introduction to Tensorflow. CNNs I Convolutional neural networks
Deep Belief Nets I probabilistic methods RNNs I Recurrent neural networks Other DNN variants
(e.g. attention, memory networks, etc.)
Neural Turing Machines(Kate) Unsupervised deep learning I(e.g. autoencoders etc.)
Unsupervised deep learning II (e.g. deep generative models etc.)
Deep reinforcement learning Vision applications I NLP applications I Laboratory Exercises /
Practical: NA
Dr. Sudhir Gavhane Dean, LASC
Reference Books
1. Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning.
Supplementary Reading:
1. Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classi cation . Wiley-Interscience.
2nd Edition. 2001.
2. Theodoridis, S. and Koutroumbas, K. Pattern Recognition. Edition 4 . Academic
Press, 2008.
3. Russell, S. and Norvig, N. Artificial Intelligence: A Modern Approach . Prentice Hall
Series in ArtificialIntelligence. 2003.
4. Bishop, C. M. Neural Networks for Pattern Recognition . Oxford University Press.
1995.
5. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning .
Springer. 2001.
6. Koller, D. and Friedman, N. Probabilistic Graphical Models . MIT Press. 2009.
Web Resources:
Weblinks: -
MOOCs: -
Pedagogy:
Mini Project development, Problem solving approach, Participative learning, discussions, algorithm,
Program writing, experiential learning through practical problem-solving, assignment, PowerPoint
presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Course overview
What is deep learning? DL successes; syllabus & course
logistics;
2 - -
2
Intro to neural networks cost functions, hypotheses and tasks;
training data; maximum likelihood based cost, cross entropy,
MSE cost; feed-forward networks; MLP, sigmoid units;
neuroscience inspiration;
4 - -
3 Learning in neural networks
output vs hidden layers; linear vs nonlinear networks; 4 - -
4
Backpropagation learning via gradient descent; recursive chain
rule (backpropagation); if time: bias-variance tradeoff,
regularization; output units: linear, softmax; hidden units: tanh,
4 - -
5 Deep learning strategies I
(e.g., GPU training, regularization,etc); project proposals 2
6 Deep learning strategies II
(e.g., RLUs, dropout, etc) 2
7 SCC/TensorFlow overview
How to use the SCC cluster; introduction to Tensorflow. 2
8 CNNs I Convolutional neural networks 2
9 Deep Belief Nets I probabilistic methods 2
10 RNNs I Recurrent neural networks 2
11 Other DNN variants (e.g. attention, memory networks, etc.) 2
12 Neural Turing Machines 2
13 Unsupervised deep learning I(e.g. autoencoders etc.) 2
14 Unsupervised deep learning II (e.g. deep generative models etc.) 2
15 Deep reinforcement learning 2
16 Vision applications I 2
17 NLP applications I 2
Prepared By
Dr. C. H. Patil Assistant Professor
Checked By
Pradnya Mahadik Course Coordinator
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2102
Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -II
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
CourseObjectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms
CourseOutcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.
CourseContents
Decision trees and random forests
Concept, diagrammatic representation, random forest as a voting committee of decision trees,
parameter meaning and explanation.
Naive Bayes:
Venn diagrams, Naive Bayes algorithm, application and problems, Naive Bayes learning, Bayesian
inference, Retail basket analysis; Concept of boosting and bagging
Unsupervised learning methods/Clustering:
K-means algorithm, optimization objective, graphical representation, random initialization,
choosing number of clusters
Association Rules
Association rule mining, K-nearest neighbours’ algorithm.
Dr. Sudhir Gavhane Dean, LASC
1. T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,
2. Springer, 2009.
3. E. Alpaydin, “Machine Learning”, MIT Press, 2010.
4. K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
5. C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
6. Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning For Dummies“, John Wiley &
Sons, 2016.
Pedagogy:
Participative learning, discussions, algorithm, Program writing, experiential learning through
practical problem-solving, assignment, PowerPoint presentation
AssessmentScheme:
Class Continuous Assessment (CCA)
Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -
Term End Examination: 50 marks External
Dr. Sudhir Gavhane Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Decision trees and random forests
Concept, diagrammatic representation, random forest as a
voting committee of decision trees, parameter meaning and
explanation.
12 - -
2
Naive Bayes:
Venn diagrams, Naive Bayes algorithm, application and
problems, Naive Bayes learning, Bayesian inference, Retail
basket analysis; Concept of boosting and bagging
12 - -
3
Unsupervised learning methods/Clustering:
K-means algorithm, optimization objective, graphical
representation, random initialization, choosing number of
clusters
12 - -
4
Association Rules
Association rule mining, K-nearest neighbours algorithm.
09 - -
Prepared By
Sameer Kakade Asst.Prof.
Checked By
Pradnya Mahadik Course Coordinator
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2103
Course Category Core Big Data Analytics
Course Title Data Science life cycle & Visualization
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 - -- 3
Pre-requisites:
Computing: The Structure and Interpretation of Computer Programs
Math: Linear Algebra: some basic concepts like linear operators, eigenvectors, derivatives, and
integrals to enable statistical inference and derive new prediction algorithms.
Course Objectives:
To describe Data Science Life cycle.
To describe Data Visualization
Course Outcomes:
Students will be able to understand Data Science Life cycle & Data Visualization
Course Contents:
1. What is Data Science?
What does Data Science involve?
Era of Data Science
Business Intelligence vs Data Science
Life cycle of Data Science including Extract Transform and Load
Data Preprocessing
Data Imputation
Data Cleaning
Data Transformation
Data Visualization
Data Analysis
Data Engineering - Big Data
Tools of Data Science
2. Data Extraction Wrangling & Exploration
Data Analysis Pipeline
What is Data Extraction
Types of Data
Raw and Processed Data
Data Wrangling
Exploratory Data Analysis
3. Visualization of Data
Dr. Sudhir Gavhane
Dean, LASC
Introduction to Visualization.
Human Perception and Information Processing
Data types
Graphical perception (the ability of viewers to interpret visual
(graphical) encodings of information and thereby decode information in graphs
Color for information display
Color management systems
Picture visualization and fruition
Data Transformation into sources of knowledge through visual representation.
Requirements and heuristics for high-quality visualizations.
Charts and standard views: relevance and appropriateness.
Advanced and innovative tools for data visualization and advanced quantitative analysis.
The evaluation of the quality of visualizations and infographics.
Learning Resources:
Reference Books:
1. Foundations of Data Science By Avrim Blum, John Hopcroft, and Ravindran Kannan
Pedagogy:
Participative learning, discussions, algorithm, experiential learning through practical problem
solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane
Dean, LASC
Syllabus :
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
What is Data Science?
What does Data Science involve?
Era of Data Science
Business Intelligence vs Data Science
Life cycle of Data Science
Tools of Data Science
12 - -
2
Data Extraction Wrangling & Exploration
Data Analysis Pipeline
What is Data Extraction
Types of Data
Raw and Processed Data
Data Wrangling
Exploratory Data Analysis
12 - -
3
Visualization of Data
Introduction to Visualization.
Human Perception and Information Processing
Data types
Graphical perception (the ability of viewers to interpret visual
(graphical) encodings of information and thereby decode
information in graphs
Color for information display
Color management systems
Picture visualization and fruition
Data Transformation into sources of knowledge through visual
representation.
Requirements and heuristics for high-quality visualizations.
Charts and standard views: relevance and appropriateness.
Advanced and innovative tools for data visualization and
advanced quantitative analysis.
The evaluation of the quality of visualizations and infographics.
12 - -
Prepared By
Preeti Adhav Asst.Prof.
Checked By
Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2104
Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -I
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
Course Objectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms
Course Outcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.
Course Contents
Introduction to learning
What is Supervised, Unsupervised and Reinforcement Learning? visualization of algebraic
concepts
Linear Regression
What is Regression? What is simple one variable regression line and coefficients of the line? What
are assumptions of linear regression? What is Gradient descent algorithm, cost function to find
'beta' values and concept
Gradient Descent
How to represent matrix of problem? How to use Gradient descent for multiple features and
scaling techniques in gradient descent? What are types of feature scaling, finding coefficients
analytically?
Dr. Sudhir Gavhane Dean
Logistic Regression
What is Logistic regression model? What is Sigmoid function and its graphical representation?
What is Receiver-operating characteristic (RoC) curve? What is the use of RoC curve?
Optimization and Classifications
What is Optimization objective from logistic regression? What is large margin classifier? What is
concept behind large margin classifications using SVM?
Learning Resources:
1. T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,
2. Springer, 2009.
3. E. Alpaydin, “Machine Learning”, MIT Press, 2010.
4. K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
5. C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
6. Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning for Dummies“, John Wiley &
Sons, 2016.
Pedagogy:
Participative learning, discussions, algorithm, Program writing, experiential learning through
practical problem-solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane Dean
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to learning
Supervised, Unsupervised and Reinforcement Learning,
geometry (lines, curves and 3D spaces) and visualization of
algebraic concepts
5 - -
2
Linear Regression
Regression as a concept, simple one variable regression line,
coefficients of the line, assumptions of linear regression,
Gradient descent algorithm, cost function to find 'beta' values
and concept, local and global minima, concept of learning rate
8 - -
3
Gradient Descent
Matrix representation of problem, Gradient descent for multiple
features, use of feature scaling techniques in gradient descent,
types of feature scaling, finding coefficients analytically,
normal equation (matrix)non-invertibility
7 - -
4
Logistic Regression
Logistic regression model, matrix representation, general
Sigmoid function and graphical representation, decision
boundary (linear and non-linear), metrics for logistic regression
(accuracy, sensitivity, specificity etcetera concepts), Receiver-
operating characteristic (RoC) curve, use of RoC curve to find
out optimum decision boundary, convexity and non-convexity
of a group of points
13 - -
5
Optimization and Classifications
Optimization objective from logistic regression to support
vector machines, large margin classifier, concepts behind large
margin classifications, kernels (concept, types and graphical
explanations), using SVM
12
Prepared By
Archana Varade
Assistant Professor
Checked By
Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2105
Course Category Core Big Data Analytics
Course Title Lab on R Programming
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
- - 3 3
Pre-requisites
Computing: The Structure and Interpretation of Computer Programs
Math: Linear Algebra: some basic concepts like linear operators, eigenvectors, derivatives, and
integrals to enable statistical inference and derive new prediction algorithms.
Course Objectives:
To describe Data Science Life cycle.
To describe Data Visualization
Course Outcomes:
Students will be able to understand Data Science Life cycle & Data Visualization
Course Contents:
Data Cleaning
Data Transformation
Data Visualization
Data Analysis
Data Engineering - Big Data
Tableau Desktop
Getting Started
Connecting to Data
Visual Analytics
Dashboards and Stories
Mapping
Calculations
Why is Tableau Doing That?
How To cleanse & represent
Learning Resources:
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1 Assignment on Data Cleansing
2 Assignment on Transformation
3 Assignment on
Reference Books:
Foundations of Data Science By Avrim Blum, John Hopcroft, and Ravindran Kannan
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks External
Laboratory Continuous Assessment (LCA)50
Practical Oral based on
practical
Site Visit Mini
Project
Problem
based
Learning
Attendance
10 10 - 10 10 10
Term End Examination : 50 Marks External
Dr. Sudhir Gavhane
Dean, LASC
4
Basic of Tableau :
i. Tableau interface:
Menus and Toolbar
Data Pane
Analytics Pane
Sheet Tabs
Shelves and Cards
Marks Card
Legends
Layout for Dashboards & Stories
Distributing and Publishing
ii. Distributing & publishing:
Way to share
Exploring images and PDFs
Workbook file types
Opening workbook files
Sharing securely
- -
5
Connecting with Data:
Getting Started with Data
Managing Metadata
Managing Extracts
Saving and Publishing Data Sources
Data Prep with Text and Excel Files
Join Types with Union
Cross-database Joins
Data Blending
Additional Data Blending Topics
Connecting to Cubes
Connecting to PDFs
- -
6
Visual Analytics:
Getting Started with Visual Analytics
Drill Down and Hierarchies
Sorting
Grouping
Additional Ways to Group
Creating Sets
Working with Sets
Ways to Filter
Using the Filter Shelf
- -
Dr. Sudhir Gavhane
Dean, LASC
Interactive Filters
Where Tableau Filters
Additional Filtering Topics
Parameters
Formatting
The Formatting Pane
Basic Tooltips
Viz in Tooltip
Trend Lines
Reference Lines
Forecasting
Clustering
Analysis with Cubes and MDX
7
Dashboards and Stories:
Getting Started with Dashboards and Stories
Building a Dashboard
Dashboard Objects
Dashboard Formatting
Device Designer
Dashboard Interactivity Using Actions
Story Points
- -
8
Mapping:
Getting Started with Mapping
Maps in Tableau
Editing Unrecognized Locations
Spatial Files
Expanding Tableau's Mapping Capabilities
Custom Geocoding
Polygon Maps
Mapbox Integration
WMS: Web Mapping Services
Background Images
- -
9
Calculations:
Calculation Syntax
Introduction to LOD Expressions
Modifying Table Calculations
Aggregate Calculations
Logic Calculations
String Calculations
Number Calculations
Type Calculations
- -
Dr. Sudhir Gavhane
Dean, LASC
Conceptual Topics with LOD Expressions
Aggregation and Replication with LOD Expressions
Nested LOD Expressions
How to Integrate R and Tableau
Using R within Tableau
Date Calculations
Getting Started with Calculations
Intro to Table Calculations
10
Why is Tableau Doing That?
Understanding Pill Types
Measure Names and Measure Values
Aggregation, Granularity, and Ratio Calculations
When to Blend and When to Join
Fixing "Incorrect" Sorts
Filtering for Top Across Panes
- -
11
How To
Finding the Second Purchase Date with LOD
Expressions
Using a Parameter to Change Fields
Cleaning Data by Bulk Re-aliasing
Bollinger Bands
Bump Charts
Control Charts
Funnel Charts
Pareto Charts
Waterfall Charts
- -
Prepared By
Preeti Adhav Lecturer
Checked By
Pradnya Mahadik BOS Chairmen
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU- MBD-2106
Course Category Elective Big Data Analytics
Course Title Internet Of Things
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
1. Knowledge of networking, sensing, databases, programming, and related technology.
2. Familiarity with business concepts and marketing.
Course Objectives:
1. Vision and Introduction to IoT.
2. Understand IoT Market perspective.
3. Data and Knowledge Management and use of Devices in IoT Technology.
4. Understand State of the Art – IoT Architecture.
5. Real World IoT Design Constraints, Industrial Automation and Commercial Building
Automation in IoT.
Course Outcomes:
1. Students will understand IoT Market perspective.
2. Students will get Data and Knowledge Management and use of Devices in IoT
Technology.
3. Students will understand State of the Art – IoT Architecture.
4. Students will get Real World IoT Design Constraints, Industrial Automation and
Commercial Building Automation in IoT.
Course Contents:
M2M to IoT
Introduction of M2M to IoT
M2M to IoT – A Market Perspective
Introduce basic concepts of IoT. Emerging industrial structure for IoT and development of IoT
architecture.
M2M and IoT Technology Fundamentals
Fundamental concepts of technology required for M2M and IoT
Dr. Sudhir Gavhane
Dean, LASC
IoT Architecture-State of the Art
Includes study of IoT reference model.
IoT Reference Architecture Study of different views of reference architecture. Introduction to Industrial Automation- Service-
oriented architecture-based device integration
Commercial Building Automation
Case study for Commercial Building Automation.
Learning Resources:
Reference Books:
1. Jan Holler, Vlasios Tsiatsis, Catherine Mulligan, Stefan Avesand, Stamatis
Karnouskos, David Boyle, “From Machine-to-Machine to the Internet of Things:
Introduction to a New Age of Intelligence”, 1st Edition, Academic Press, 2014.Data
Warehousing in the Real World, Anahory, Murray, Pearson Education
2. Vijay Madisetti and Arshdeep Bahga, “Internet of Things (A Hands-on-Approach)”, 1st Edition, VPT, 2014.
3. Francis daCosta, “Rethinking the Internet of Things: A Scalable Approach to Connecting Everything”, 1st Edition, Apress Publications, 2013
Supplementary Reading:
1. Collaborative Internet of Things (C-IoT): For Future Smart Connected Life and
Business
2. By Fawzi Behmann, Kwok Wu
Weblinks:
www.tutorialspoint.com
Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1 M2M to IoT
The Vision-Introduction, From M2M to IoT, M2M towards IoT-
the global context, A use case example, Differing Characteristics
5 - -
2
M2M to IoT – A Market Perspective
Introduction, Some Definitions, M2M Value Chains, IoT Value
Chains, An emerging industrial structure for IoT, The
international driven global value chain and global information
monopolies. M2M to IoT-An Architectural Overview– Building
an architecture, Main design principles and needed capabilities,
An IoT architecture outline, standards considerations
7 - 1
3
M2M and IoT Technology Fundamentals
Devices and gateways, Local and wide area networking, Data
management, Business processes in IoT, Everything as a
Service(XaaS), M2M and IoT Analytics, Knowledge
Management
7 - 1
4
IoT Architecture-State of the Art
Introduction, State of the art, Architecture Reference Model-
Introduction, Reference Model and architecture, IoT reference
Model
6 - 1
5
IoT Reference Architecture
Introduction, Functional View, Information View, Deployment
and Operational View, Other Relevant architectural views. Real-
World Design Constraints- Introduction, Technical Design
constraints-hardware is popular again, Data representation and
visualization, Interaction and remote control. Industrial
Automation- Service-oriented architecture-based device
integration, SOCRADES: realizing the enterprise integrated Web
of Things, IMC-AESOP: from the Web of Things to the Cloud of
Things
8 - 1
6
Commercial Building Automation
Introduction, Case study: phase one-commercial building
automation today, Case study: phase two- commercial building
automation in the future
7 - 1
Prepared By
Ms. Smita Patil
Assistant Professor
Checked By
Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2107
Course Category Elective Big Data Analytics
Course Title Introduction to image processing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
-- 04 -- 03
Pre-requisites:
Basic knowledge of Core Java programing
Course Objectives: 1. To learn the fundamental concepts of Digital Image Processing.
2. To study basic image processing operations.
3. To understand image analysis algorithms.
4. To expose students to current applications in the field of digital image processing.
Course Outcomes: 1. Understand image formation and the role human visual system plays in perception of gray
and color image data.
2. Get broad exposure to and understanding of various applications of image processing in
industry, medicine, and defense.
3. Learn the signal processing algorithms and techniques in image enhancement and image
restoration.
4. Acquire an appreciation for the image processing issues and techniques and be able to apply
these techniques to real world problems.
5. Be able to conduct independent study and analysis of image processing problems and
techniques
Course Contents
Introduction
What is Image Processing?, The origins of Image Processing, Examples of Fields that use Image
Processing, Gamma-Ray Imaging, X-Ray Imaging, Imaging in the Ultraviolet Band, Imaging in
the Visible and Infrared Bands, Imaging in the Microwave Band, Imaging in the Radio Band,
Fundamental steps in Digital Image Processing, Components of an Image Processing System
Digital Image Fundamentals
Elements of Visual Perception, Light and the Electromagnetic Spectrum, Image sensing and
Acquisition, Image Sampling and Quantization, Some Basic Relationships between Pixels, An
Introduction to the Mathematical Tools Used in Digital Image Processing, Array versus Matrix
Operations, Linear versus Nonlinear Operations, Arithmetic Operations, Set and Logical
Operations
Intensity Transformation and Spatial Filtering
Dr. Sudhir Gavhane
Dean LASC
Background, Some Basic Intensity Transformation Functions, Histogram Processing, Histogram
Equalization, Histogram Matching (Specification), Local Histogram Processing, Fundamentals of
Spatial Filtering, Smoothing Spatial Filters, Sharpening Spatial Filters, Combining Spatial
Enhancement Methods
Filtering in the Frequency Domain
Background, Preliminary Concepts, Sampling and the Fourier Transform of Sampled Functions,
The Discrete Fourier Transform (DFT) of One variable, Extension to Functions of Two Variables.
Image Restoration and Reconstruction
A Model of the Image Degradation / Restoration Process, Noise Models, Restoration in the
Presence of Noise Only- Spatial Filtering, Periodic Noise Reduction by Frequency Domain
Filtering, Bandreject Filters, Bandpass Filters, Notch Filters, Estimating the Degradation Function,
Inverse Filtering, Minimum Mean Square Error(Wiener) Filtering, Geometric Mean Filter
Morphological Image Processing
Preliminaries, Erosion and Dilation, Opening and Closing, The Hit-or-Miss Transformation, Some
Basic Morphological Algorithms, Boundary Extraction, Hole Filling, Extraction of Connected
Components, Convex Hull, Thinning, Thickening, Skeletons, Pruning, Morphological
Reconstruction
Image Segmentation
Fundamentals, Point, Line, and Edge Detection, Background, Detection of Isolated Points, Line
Detection
Edge Models, Basic Edge Detection, Edge Linking and Boundary Detection, Thresholding,
Foundation, Basic Global Thresholding, Optimum Global Thresholding Using Otsu's Method.
Learning Resources:
Reference Books
B1: Cay’s Horstmann and Gary Cornell Core Java Volume -1 and Volume 2.
B2: Herbert Schildt (TMH) The complete reference JAVA-2 Fifth Edition.
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Any other
Dr. Sudhir Gavhane
Dean LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction [3]
What is Image Processing?, The origins of Image Processing,
Examples of Fields that use Image Processing, Gamma-Ray
Imaging, X-Ray Imaging, Imaging in the Ultraviolet Band,
Imaging in the Visible and Infrared Bands, Imaging in the
Microwave Band, Imaging in the Radio Band, Fundamental steps
in Digital Image Processing, Components of an Image
Processing System
4 - -
2
Digital Image Fundamentals [6]
Elements of Visual Perception, Light and the Electromagnetic
Spectrum, Image sensing and Acquisition, Image Sampling and
Quantization, Some Basic Relationships between Pixels, An
Introduction to the Mathematical Tools Used in Digital Image
Processing, Array versus Matrix Operations, Linear versus
Nonlinear Operations, Arithmetic Operations, Set and Logical
Operations
10 - -
3
Intensity Transformation and Spatial Filtering [7]
Background, Some Basic Intensity Transformation Functions,
Histogram Processing, Histogram Equalization, Histogram
Matching (Specification), Local Histogram Processing,
Fundamentals of Spatial Filtering, Smoothing Spatial Filters,
Sharpening Spatial Filters, Combining Spatial Enhancement
Methods
9 - -
4
Filtering in the Frequency Domain [10]
Background, Preliminary Concepts, Sampling and the Fourier
Transform of Sampled Functions, The Discrete Fourier
Transform (DFT) of One variable, Extension to Functions of Two
Variables.
7 - -
5
Image Restoration and Reconstruction [6]
A Model of the Image Degradation / Restoration Process, Noise
Models, Restoration in the Presence of Noise Only- Spatial
Filtering, Periodic Noise Reduction by Frequency Domain
Filtering, Bandreject Filters, Bandpass Filters, Notch Filters,
Estimating the Degradat ion Function, Inverse Filtering,
Minimum Mean Square Error(Wiener) Filtering, Geometric
Mean Filter
7 - -
6
Morphological Image Processing [5]
-or-
8 - -
Dr. Sudhir Gavhane
Dean LASC
Morphological Algorithms, Boundary Extraction, Hole Filling,
Extraction of Connected Components, Convex Hull, Thinning,
Thickening, Skeletons, Pruning, Morphological Reconstruction
Image Segmentation [7]
Fundamentals, Point, Line, and Edge Detection,Background,
Detection of Isolated Points, Line Detection
Edge Models, Basic Edge Detection, Edge Linking and Boundary
Detection, Thresholding, Foundation, Basic Global Thresholding,
Optimum Global Thresholding Using Otsu's Method.
7 - -
Prepared By
Nilesh Magar
Assistant professor
Checked By
Pradnya Mahadik
Course Coordinator
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2201
Course Category Core Big Data Analytics
Course Title Natural Language Processing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
3 -- -- 3
Pre-requisites:
1. 1. Linear algebra
2. Probability & Statistics
3. Artificial Intelligence and Neural Networks
Course Objectives: To understand natural language processing, algorithms, structures and
meanings
Course Outcomes: 1. Students will understand Word forms.
2. Students will understand structures.
3. Students will understand meaning processing.
Course Contents
Introduction to Natural Language Processing Brief History and introduction about Natural Language Processing
ML basics
Algorithms, Naïve Bayes, Bayesian Statistics, HMM, CRF
Word Forms
POS tagging and Chunking: Morphology fundamentals; Morphological Diversity of Indian
Languages; Morphology Paradigms; Finite State Machine Based Morphology; Automatic
Morphology Learning; Shallow Parsing; Named Entities; Maximum Entropy Models; Random
Fields, POS tagging techniques, Chunking techniques:CRF.
Structures
Theories of Parsing, Parsing Algorithms; Robust and Scalable Parsing on Noisy Text as in Web
documents; dependency parsing; Hybrid of Rule Based and Probabilistic Parsing: MST, MALT
parser; Scope Ambiguity and Attachment Ambiguity resolution.
Meaning
Lexical Knowledge Networks, Wordnet Theory; Indian Language Wordnets and Multilingual
Dictionaries; Semantic Roles; Word Sense Disambiguation; WSD and Multilinguality; Metaphors;
Co-references.
Dr. Sudhir Gavhane Dean, LASC
Learning Resources:
Reference Books:
1. Allen, James, “Natural Language Understanding”, Second Edition, Benjamin/Cumming, 1995.
2. Charniack, Eugene, “Statistical Language Learning”, MIT Press, 1993.
3. Jurafsky, Dan and Martin, James, “Speech and Language Processing”,Second Edition, Prentice
Hall, 2008.
4. Manning, Christopher and Heinrich, Schutze, “Foundations of StatisticalNatural Language
Processing”, MIT Press, 1999.
5. AksharBharti, VineetChaitanya, Rajeev Sangal,”Natural Language Processing: An Paninian
perspective”
Web Resources:
Weblinks: -
MOOCs:-
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -
Term End Examination: 50 marks External
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to Natural Language Processing
Brief History, Applications: Speech to text, story understanding,
QA system, Machine Translation, Text summarization, text
classification, sentiment analysis, chatterbox, challenges/Open
Problems, Natural Language (NL) Characteristics and NL
computing techniques, NL tasks: Segmentation, Chunking,
tagging, NER, Parsing, Word Sense Disambiguation, NL
Generation, Web 2.0 Applications : Sentiment Analysis; Text
Entailment; Cross Lingual Information Retrieval (CLIR).
10 - -
2 ML basics
Algorithms, Naïve Bayes, Bayesian Statistics, HMM, CRF 5 - -
3
Word Forms
POS tagging and Chunking: Morphology fundamentals;
Morphological Diversity of Indian Languages; Morphology
Paradigms; Finite State Machine Based Morphology; Automatic
Morphology Learning; Shallow Parsing; Named Entities;
Maximum Entropy Models; Random Fields, POS tagging
techniques, Chunking techniques: CRF.
10 - -
4
Structures
Theories of Parsing, Parsing Algorithms; Robust and Scalable
Parsing on Noisy Text as in Web documents; dependency
parsing; Hybrid of Rule Based and Probabilistic Parsing: MST,
MALT parser; Scope Ambiguity and Attachment Ambiguity
resolution.
10 - -
5
Meaning
Lexical Knowledge Networks, Wordnet Theory; Indian
Language Wordnets and Multilingual Dictionaries; Semantic
Roles; Word Sense Disambiguation; WSD and Multilinguality;
Metaphors; Coreferences.
10
Prepared By
Mr. Sameer Kakade Asst. Professor
Checked By
Ms. Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU- MBD-2202
Course Category Core Big Data Analytics
Course Title Web & Social Intelligence
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML and cloud
Course Objectives:
Organizations worldwide are waking up to the opportunity of this revolutionary medium to fulfill
various business objectives ranging from Sales,
Marketing, CRM, Product Development and Research. This has created an ever increasing demand
of skilled Web Analytics professionals.The objective is to fulfill this demand.
Course Outcomes:
After taking this course, you will be able to: - Utilize various Application Programming
Interface (API) services to collect data from different social media sources such as YouTube,
Twitter, and Flickr. - Process the collected data - primarily structured - using methods involving
correlation, regression, and classification to derive insights about the sources and people who
generated that data. - Analyze unstructured data - primarily textual comments - for sentiments
expressed in them. - Use different tools for collecting, analyzing, and exploring social media data
for research and development purposes.
Course Contents:
Introduction to web analytics
What’s analysis?
Getting started with Google Analytics
Google Analytics
Getting Started With Google Analytics
How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
Content performance analysis
Pages and Landing Pages
Event Tracking and Ad Sense Site Search
Dr. Sudhir Gavhane
Dean, LASC
Dr. Sudhir Gavhane
Dean, LASC
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics
Learning Resources:
Reference Books:
Written by none other than Avinash Kaushik, Digital Marketing Evangelist for Google, Co-
Founder and Chief Education Officer for Market Motive, and author of two bestselling
books: Web Analytics 2.0, Web Analytics: An Hour A Day tops the chart when it comes to
best Web Analytics Books.
Supplementary Reading:
Weblinks:
Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA): 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to web analytics
What’s analysis?
Is analysis worth the effort?
• Small businesses
• Medium and large scale businesses
Analysis vs intuition
What is web analytics?
Getting started with Google Analytics
• How Google Analytics works
• Accounts, profiles, and users
5 - -
2
Google Analytics
Getting Started With Google Analytics
How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
Basic metrics
The main sections of Google Analytics reports
Traffic Sources
Direct, referring, and search traffic
Campaigns
AdWords, Adsense
7 - 1
3
Content performance analysis
Pages and Landing Pages
Event Tracking and AdSense
Site Search
7 - 1
4
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
6 - 1
Dr. Sudhir Gavhane
Dean, LASC
5
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
8 - 1
6
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics
7 - 1
Prepared By
Ms. Smita Patil
Assistant Professor
Checked By
Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2203
Course Category Core Big Data Analytics
Course Title Cloud Computing
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
-- 04 -- 03
Pre-requisites: 1. Basic understanding about Distributed Computing
2. Basic understanding about networking like VLAN , IP addressing (Class A , B, C ), VNET
, Subnet , Introduction to RFC 1918 , DNS systems and how they work in general
3. Cloud Storage Systems
Course Objectives: This course gives the idea of evolution of cloud computing and its services available today,
which may led to the design and development of simple cloud service. It also focused on some
key challenges and issues around cloud computing.
Course Outcomes: After successfully completing students should be able to
Articulate the main concepts, key technologies, strengths, and limitations of cloud
computing and the possible applications for state-of-the-art cloud computing
Identify the architecture and infrastructure of cloud computing, including SaaS, PaaS, IaaS,
public cloud, private cloud, hybrid cloud, etc.
Explain the core issues of cloud computing such as security, privacy, and interoperability.
Choose the appropriate technologies, algorithms, and approaches for the related issues.
Identify problems, and explain, analyze, and evaluate various cloud computing solutions.
Provide the appropriate cloud computing solutions and recommendations according to the
applications used.
Attempt to generate new ideas and innovations in cloud computing.
Collaboratively research and write a research paper, and present the research online.
Course Contents:
INTRODUCTION
Introduction of Cloud
CLOUD SERVICES
Types of Cloud services
Service providers- Google, Amazon, Microsoft Azure, IBM, Sales force
COLLABORATING USING CLOUD SERVICES
Dr.Sudhir Gavhane
Dean LASC
Email Communication over the Cloud - CRM Management - Project Management-Event
Management - Task Management – Calendar - Schedules - Word Processing – Presentation
Spreadsheet - Databases – Desktop - Social Networks and Groupware
VIRTUALIZATION FOR CLOUD
Need for Virtualization – Pros and cons of Virtualization – Types of Virtualization –System
Vm, Process VM, Virtual Machine monitor – Virtual machine properties - Interpretation and
Binary translation, HLL VM - Hypervisors – Xen, KVM, VMWare, Virtual Box, Hyper-V.
SECURITY, STANDARDS AND APPLICATIONS
Security in Clouds: Cloud security challenges – Software as a Service Security, Common
Standards: The Open Cloud Consortium – The Distributed management Task Force –
Standards for application Developers – Standards for Messaging – Standards for Security,
End user access to cloud computing, Mobile Internet devices and the cloud.
Learning Resources:
TEXT BOOKS:
1. John Rittinghouse & James Ransome, Cloud Computing, Implementation, Management
and Strategy, CRC Press, 2010.
2. Michael Miller, Cloud Computing: Web-Based Applications That Change the Way You
Work and Collaborate Que Publishing, August 2008.
3. James E Smith, Ravi Nair, Virtual Machines, Morgan Kaufmann Publishers, 2006.
REFERENCES:
1. David E.Y. Sarna Implementing and Developing Cloud Application, CRC press 2011.
2. Lee Badger, Tim Grance, Robert Patt-Corner, Jeff Voas, NIST, Draft cloud computing
synopsis and recommendation, May 2011.
3. Anthony T Velte, Toby J Velte, Robert Elsenpeter, Cloud Computing : A Practical
Approach, Tata McGraw-Hill 2010.
4. Haley Beard, Best Practices for Managing and Measuring Processes for On-demand
Computing, Applications and Data Centers in the Cloud with SLAs, Emereo Pty Limited,
July 2008.
5. G.J.Popek, R.P. Goldberg, Formal requirements for virtualizable third generation
Architectures, Communications of the ACM, No.7 Vol.17, July 1974.
Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study Attendance Oral Any other
10 10 10 10 10 - -
Term End Examination : 50 Marks of external Examination
Dr. Sudhir Gavhane
Dean LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
INTRODUCTION
Cloud-definition, benefits, usage scenarios, History
of Cloud Computing - Cloud Architecture
Types of Clouds - Business models around Clouds –
Major Players in Cloud Computing -
issues in Clouds - Eucalyptus - Nimbus - Open
Nebula, Cloud Sim.
9 - -
2
CLOUD SERVICES
Types of Cloud services: Software as a Service -
Platform as a Service – Infrastructure as
a Service - Database as a Service - Monitoring as a
Service –Communication as services.
Service providers- Google, Amazon, Microsoft
Azure, IBM, Sales force
9 - -
3
UNIT III COLLABORATING USING CLOUD
SERVICES
Email Communication over the Cloud - CRM
Management - Project Management-Event
Management - Task Management – Calendar -
Schedules - Word Processing – Presentation
Spreadsheet - Databases – Desktop - Social
Networks and Groupware
9 - -
4
UNIT IV VIRTUALIZATION FOR CLOUD
Need for Virtualization – Pros and cons of
Virtualization – Types of Virtualization –System
Vm, Process VM, Virtual Machine monitor – Virtual
machine properties - Interpretation and
Binary translation, HLL VM - Hypervisors – Xen,
KVM, VMWare, Virtual Box, Hyper-V.
9 - -
5
UNIT V SECURITY, STANDARDS AND
APPLICATIONS
Security in Clouds: Cloud security challenges –
Software as a Service Security, Common
Standards: The Open C loud Consortium – The
Distributed management Task Force –
9 -
Dr. Sudhir Gavhane
Dean LASC
Standards for application Developers – Standards for
Messaging – Standards for Security
End user access to cloud computing, Mobile Internet
devices and the cloud.
Prepared By
Nilesh Magar
Assistant professor
Checked By
Pradnya Mahadik
Course Coordinator
Approved By
Dr. Sudhir Gavhane Dean LASC
COURSE STRUCTURE
Course Code MIT-WPU- MBD-2204
Course Category Lab Big Data Analytics
Course Title Web & Social Intelligence
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Course Objectives:
Organizations worldwide are waking up to the opportunity of this revolutionary medium to fulfill
various business objectives ranging from Sales,
Marketing, CRM, Product Development and Research. This has created an ever increasing demand
of skilled Web Analytics professionals.The objective is to fulfill this demand.
Course Outcomes:
After taking this course, you will be able to: - Utilize various Application Programming
Interface (API) services to collect data from different social media sources such as YouTube,
Twitter, and Flickr. - Process the collected data - primarily structured - using methods involving
correlation, regression, and classification to derive insights about the sources and people who
generated that data. - Analyze unstructured data - primarily textual comments - for sentiments
expressed in them. - Use different tools for collecting, analyzing, and exploring social media data
for research and development purposes.
Course Contents:
Introduction to web analytics
What’s analysis?
Getting started with Google Analytics
Google Analytics
Getting Started With Google Analytics
How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
Dr. Sudhir Gavhane
Dean, LASC
Content performance analysis
Pages and Landing Pages
Event Tracking and Ad Sense Site Search
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics
Learning Resources:
Reference Books:
Written by none other than Avinash Kaushik, Digital Marketing Evangelist for Google, Co-
Founder and Chief Education Officer for Market Motive, and author of two bestselling books: Web
Analytics 2.0, Web Analytics: An Hour A Day tops the chart when it comes to best Web Analytics
Books
Supplementary Reading:
Weblinks:
Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation
Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
Introduction to web analytics
What’s analysis?
Is analysis worth the effort?
• Small businesses
• Medium and large scale businesses
Analysis vs intuition
What is web analytics?
Getting started with Google Analytics
• How Google Analytics works
• Accounts, profiles, and users
5 - -
2
Google Analytics
Getting Started With Google Analytics How Google Analytics works? Accounts, profiles, and users navigating Google Analytics Basic metrics
The main sections of Google Analytics reports
Traffic Sources
Direct, referring, and search traffic
Campaigns
AdWords, Adsense
7 - -
Assessment Scheme:
Class Continuous Assessment (CCA): 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
3
Content performance analysis
Pages and Landing Pages
Event Tracking and AdSense
Site Search
7
-
1
4
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
6 - 1
5
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
8 - 1
6
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics
7 - 1
Prepared By
Ms. Smita Patil
Assistant Professor
Checked By
Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2206
Course Category Elective Big Data Analytics
Course Title Marketing Analytics
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 -- -- 3
Pre-requisites:
Course Objectives:
1. This course will focus on developing marketing strategies and resource allocation
decisions driven by quantitative analysis.
2. This course covers basic concepts in marketing process Measuring Brand Assets
3. This course includes Customer Lifetime Value, Regression Analysis, and Spreadsheet
with Formulas.
Course Outcomes:
4. Students will know what are the basic marketing strategies
5. Students learn about the core concepts and tools in marketing
6. Students know about measure brand value, calculate brand value
7. Students understand the marketing models.
Course Contents
The Marketing Process
What is marketing process and its Strategic Challenges? What are Marketing Strategies with data
using Text Analytics? How to utilize data to improve marketing strategies?
Metrics for Measuring Brand Assets
What is Metrics for Measuring Brand Assets? What is Snapple and Brand Value?
How to develop brand personality, develop brand architecture, brand pyramid, measure brand
value, calculate brand value?
Customer Lifetime Value
What is Customer Lifetime Value (CLV)? How to calculate CLV, understand the CLV Formula,
apply the CLV Formula, extend the CLV Formula, use CLV to make decisions?
Dr. Sudhir Gavhane Dean
Marketing Experiments
What is Spreadsheet with Formulas? How to determine cause and effect through experiments?
How to design basic experiments, design before and after experiments, design full factorial web
experiments? How to calculate projected lift?
Regression Basics
What is Regression Analysis? How to interpret Regression Outputs? What is Multivariable
Regressions, Omitted Variable Bias? How to use Price Elasticity to Evaluate Marketing? What is
Log-Log Models and Marketing Mix Models?
Learning Resources:
1. Marketing Analytics A Practitioner's Guide to Marketing Analytics and Research Methods
By (author): Ashok Charan (NUS, Singapore)
2. Managing Customer Value One Stage at a Time By (author): Dilip Soman (University of
Toronto, Canada), Sara N-Marandi (University of Toronto, Canada)
3. Worldwide Casebook in Marketing Management By (author): Luiz Moutinho (Dublin City
University, Ireland)
4. Data-Driven Marketing: The 15 Metrics Everyone in Marketing Should Know Hardcover –
February 8, 2010 by Mark Jeffery (Author)
5. Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series) Hardcover – March
21, 2013 by Alistair Croll (Author), Benjamin Yoskovitz (Author)
6. Digital Marketing Analytics: Making Sense of Consumer Data in a Digital World (Que
Biz-Tech) Paperback – April 25, 2013 by Chuck Hemann (Author), Ken Burbary (Author)
Pedagogy:
Participative learning, discussions, algorithm, Program writing, experiential learning through
practical problem-solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA)
Assignments Test Case study-1 Attendance Case study-2 Any other
10 10 10 10 10 -
Term End Examination : 50 marks External
Dr. Sudhir Gavhane Dean
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
The Marketing Process
Introduction to the Marketing Process, Marketing Process,
Strategic Challenge, Marketing Strategy with Data, Using Text
Analytics, Utilizing Data to Improve Marketing Strategy,
Improving the Marketing Process with Analytics, case study
7 - -
2
Metrics for Measuring Brand Assets
Intro to Metrics for Measuring Brand Assets, Snapple and
Brand Value, Developing Brand Personality, Developing Brand
Architecture, Brand Pyramid, Measuring Brand Value, Revenue
Premium as a Measure of Brand Equity, Calculating Brand
Value, case study
10 - -
3
Customer Lifetime Value
Customer Lifetime Value (CLV),Calculating CLV,
Understanding the CLV Formula, Applying the CLV Formula,
Extending the CLV Formula, Using CLV to Make Decisions,
CLV: A Forward Looking Measure, case study
10 - -
4
Marketing Experiments
Spreadsheet with Formulas, Determining Cause and Effect
through Experiments, Designing Basic Experiments, Designing
Before - After Experiments, Designing Full Factorial Web
Experiments, Designing an Experiment, Analyzing an
Experiment, Projecting Lift, Calculating Projected Lift, Pitfalls
of Marketing Experiments, Maximizing Effectiveness, case
study
10 - -
5
Regression Basics
Using Regression Analysis, What Regressions Reveal,
Interpreting Regression Outputs, Multivariable Regressions,
Omitted Variable Bias, Using Price Elasticity to Evaluate
Marketing, Understanding Log-Log Models, Marketing Mix
Models
8
Prepared By
Archana Varade
Assistant Professor
Checked By
Pradnya Mahadik BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean
`
COURSE STRUCTURE
Course Code MIT-WPU-MBD-2207
Course Category Elective Big Data Analytics
Course Title Human Resource Analytics
Teaching Scheme and Credits
Weekly load hrs
L T Laboratory Credits
4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML.
Course Objectives:
1. To introduce use of data analytics techniques in HR
Course Outcomes:
1. Students will able to use data analytics technique in HR.
Course Contents:
HR Analytics in perspective
Introduction to role of data analytics in HR
A day in the life of HR
Introduction to daily activities of HR using case study
An analytics method
Describes challenges in HR and solution to it using data analytics
Hands-on introduction to HRA
A practical approach to collect and clean data required for HRA.
Toolkits
Introduction to various toolkits required for HRA.
Data challenges
Introduction to statistical methods for processing of data.
Dr. Sudhir Gavhane
Dean, LASC
`
Making HR data operational
Use of HR data for analysis
Predictive analytics
Introduction to use of predictive analysis for HR data .
Learning Resources:
Reference Books:
1. The New HR Analytics: Predicting the Economic Value of Your Company's Human
By Jac FITZ-ENZ
2. Predictive HR Analytics: Mastering the HR Metric By Dr Martin R. Edwards,
Kirsten Edwards
3. Predictive Analytics for Human Resources By Jac Fitz-enz, John Mattox, II
Supplementary Reading:
1. Applying Advanced Analytics to HR Management Decisions : Methods for
Selection, Developing Incentives and Improving Collaboration First Edition
(English, Paperback, James C. Sesil)
Web Resources:
Weblinks:
1. MOOCs:
Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through
practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks
Dr. Sudhir Gavhane
Dean, LASC
`
Syllabus:
Module
No. Contents
Workload in Hrs
Theory Lab Assess
1
HR Analytics in perspective
Analytics roles
Defining HR Analytics
Typical problems (working session)
4 - -
2 A day in the life of HR
Case Examples 3 - -
3
An analytics method
Understanding the organizational system (Lean)
Locating the HR challenge in the system
Valuing HR Analytics (working session)
Understanding the organizational system
5 - -
4
Hands-on introduction to HRA
Typical data sources
Typical questions faced (survey)
Typical data issues
Connecting HR Analytics to business benefit (3 x case studies)
Techniques for establishing questions
Building support and interest
Obtaining data
Cleaning data (exercise)
Supplementing data
9 - -
5
Toolkits
Options, advantages and disadvantages
Common toolkits: OrgVue, Tableau, Excel, Alteryx, QlikView
Practical exercises
6 - -
6
Data challenges
Correlation (R2, ecological fallacy, 10 simple stats)
Causation
6 - -
7
Making HR data operational
Case examples
4 - -
8
Predictive analytics
When to use predictive analysis
Importance of innovation
What is “the organization as a system”?
Organization design
8
Dr. Sudhir Gavhane
Dean, LASC
`
Process led design
Workforce planning
Transition management
Impact analysis
Communication
Real time HR Analytics
ggest the below items:
Prepared By
Ms. Punam Nikam Assistant Professor
Checked By
Ms. Pradnya Mahadik
BOS Chairman
Approved By
Dr. Sudhir Gavhane Dean, LASC