Data Science Training Course in Delhi | data scientist certification online

Module-1: Introduction to Business Analytics

1. Introduction to Statistics and Business Analytics - Sample Vs. Population, Variables and Types of Data, Primary & Secondary Data, Data Collection and Sampling Techniques

Module-2: Statistics and Business Analytics

1. Descriptive Statistics - Measure of Central Tendency - Mean, Median, Mode, Measure of Variance - Range, Inter Quartile Range, Variance & Standard Deviation, Coefficient of Variation, Dispersion, Kurtosis, Skewness, Chebyshev's Theorem, Measures of Positions - Percentile, Deciles, Quartiles.

2. Introduction to Random Variables (Discrete and Continuous Random Variables), Exploratory Data Analysis, Frequency Tables and Frequency Distributions, Type of Graphs

3. Inferential Statistics & Hypothesis Testing - Formulation of Hypothesis Statement, p-value, Type I and Type II Errors, Z-Test, t-Test, Chi-Square Test

4. Introduction to Statistical Estimation and Confidence Interval

5. Probability Theory - Introduction to Probability Theory & Counting Rules

6. Probability Distributions - Discrete and Cumulative Probability Distribution, Sampling Distribution, Binomial Distribution, Standard Normal Distribution, Poisson Distribution.

7. Chi-Square, F- Distribution, and ANOVA (One - Way and Two-Way ANOVA)

8. Correlation and OLS/Multiple Regression (Logistic and Linear Regression)

Module-3: Business Analytics Software and Tools

1. Statistical Analysis using Excel

2. Introduction to R software, and Statistical Analysis using R

3. Introduction to Tableau & its application in Analytics

4. Introduction to Python and its application in Analytics

Module 4: Mastering Big Data & Tools

Module 1: Introduction to BigData, Hadoop (HDFS and MapReduce)

1. BigData Inroduction

2. Hadoop Introduction

3. Hadoop components

4. HDFS Introduction

5. MapReduce Introduction

Module 2: Deep Dive in HDFS

1. HDFS Design

2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)

3. Rack Awareness

4. Read/Write from HDFS

5. HDFS Federation and High Availability (Hadoop 2.x.x)

6. Parallel Copying using DistCp

7. HDFS Command Line Interface

Module 3: HDFS File Operation Lifecycle

1. File Read Cycle from HDFS
- DistributedFileSystem
- FSDataInputStream

2. Failure or Error Handling When File Reading Fails

3. File Write Cycle from HDFS
- FSDataOutputStream

4. Failure or Error Handling while File write fails

Module 4: Understanding MapReduce

1. JobTracker and TaskTracker

2. Topology Hadoop cluster

3. Example of MapReduce
- Map Function
- Reduce Function
4. Java Implementation of MapReduce

5. DataFlow of MapReduce

6. Use of Combiner

Module 5: MapReduce Internals

1. How MapReduce Works

2. Anatomy of MapReduce Job (MR-1)

3. Submission & Initialization of MapReduce Job

4. Assigning & Execution of Tasks

5. Monitoring & Progress of MapReduce Job

6. Completion of Job

7. Handling of MapReduce Job
- Task Failure
- TaskTracker Failure
- JobTracker Failure

Module 6: MapReduce-2 (YARN : Yet Another Resource Negotiator Hadoop 2.x.x )

1. Limitation of Current Architecture (Classic)

2. What are the Requirements?

3. YARN Architecture

4. JobSubmission and Job Initialization

5. Task Assignment and Task Execution

6. Progress and Monitoring of the Job

Module 7: Failure Handling in YARN
- Task Failure
- Application Master Failure
- Node Manager Failure
- Resource Manager Failure

Module 8: Apache Pig
1. What is Pig?

2. Introduction to Pig Data Flow Engine

3. Pig and MapReduce in Detail

4. When should Pig Used?

5. Pig and Hadoop Cluster

6. Pig Interpreter and MapReduce

7. Pig Relations and Data Types

8. PigLatin Example in Detail

9. Debugging and Generating Example in Apache Pig

Module 9: Apache Hive
1. What is Hive?

2. Architecture of Hive

3. Hive Services

4. Hive Clients

5. how Hive Differs from Traditional RDBMS

6. Introduction to HiveQL

7. Data Types and File Formats in Hive

8. File Encoding

9. Common problems while working with Hive

Module 10: Apache Hive Advanced
1. HiveQL

2. Managed and External Tables

3. Understand Storage Formats

4. Querying Data
- Sorting and Aggregation
- MapReduce In Query
- Joins, SubQueries and Views

5. Writing User Defined Functions (UDFs)

3. Data types and schemas

4. Querying Data

5. HiveODBC

6. User-Defined Functions

Module 11 : HBase Introduction
1. Fundamentals of HBase

2. Usage Scenario of HBase

3. Use of HBase in Search Engine

4. HBase DataModel
- Table and Row
- Column Family and Column Qualifier
- Cell and its Versioning
- Regions and Region Server

5. HBase Designing Tables

6. HBase Data Coordinates

7. Versions and HBase Operation
- Get/Scan
- Put
- Delete

Module 12 : Apache Sqoop
1. Sqoop Tutorial

2. How does Sqoop Work

3. Sqoop JDBCDriver and Connectors

4. Sqoop Importing Data

5. Various Options to Import Data
- Table Import
- Binary Data Import
- SpeedUp the Import
- Filtering Import
- Full DataBase Import Introduction to Sqoop

Module 13 : Apache Flume
1. Data Acquisition : Apache Flume Introduction

2. Apache Flume Components

3. POSIX and HDFS File Write

4. Flume Events

5. Interceptors, Channel Selectors, Sink Processor

Module 14 : Apache Oozie
1. Introduction to Oozie

2. Creating different jobs
- Workflow
- Co-ordinator
- Bundle

3. Creating and scheduling jobs for different components

Module 15: Introduction to Spark and Scala
Module 16: Introduction to Data Visualization tools
Module 17: Projects
Module 18: CV creation and Interview Preparation

Data Science

Admissions Open

Running regular batches. Contact us to get further details.

About Course

Features & Benefits

Features

Is it good for me?

Mode of Delivery

Duration and Schedule

Course Fees

Course Contents

Module-1: Introduction to Business Analytics

Module-2: Statistics and Business Analytics

Module-3: Business Analytics Software and Tools

Module 4: Mastering Big Data & Tools

Career Options

Enroll

Fill up the form below to enroll for our course.

Address

Contacts

Links