Drop us a Query
+91 98636 36336
Available 24x7 for your queries
Thank You
Our experts will get in touch with you
in the next 24 hours
  Have Queries? Ask us +91 98636 36336

Data Engineering with Pyspark

In this course, you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with real time data. You'll learn to wrangle this data and build a whole machine learning pipeline. Get ready to put some Spark in your Python code and dive into the world of high-performance machine learning.This is very uniquely descigned in our course.


Data Engineering with Pyspark live online classes

06 Jul, 2020 Mon to Fri Batch

Mon to Fri Batch

Filling Fast

Timing - 09:00 to 13:30 PM (IST)
18 Jul, 2020 Weekend Batch

Weekend Batch

Filling Fast

Timing - 09:00 to 13:00 PM (IST)
  • Statistics for Data Science 

    Module 1: Introduction to Statistics

    • Descriptive and Inferential Statistics. Definitions , terms, types of data

    Chapter 2: Harnessing Data

    • Types of Sampling Data. Simple random sampling, Stratified, Cluster sampling. Sampling error.

    Module 3: Exploratory Analysis

    • Mean, Median and Mode, Data variability, Standard deviation, Z-score, Outliers

    Module 4: Distributions

    • Normal Distribution, Central Limit Theorem, Histogram, Normalization, Normality tests, skewness, Kurtosis.

    Module 5: Hypothesis & computational Techniques

    • Hypothesis Testing, Null Hypothesis, P-value, Type I & II errors, parametric testing: t- tests, anova test, non-parametric testing

    Module 6:

    • Correlation & Regression
  • Machine Learning - Basics 

    Module 1: Machine Learning Introduction

    • What is ML? ML vs AI. ML workflow, statistical modeling of ML. Application of ML

    Module 2: Machine Learning Algorithms

    • Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML

    Module 3:

    • Supervised Learning Simple and Multiple Linear regression, KNN, and more.

    Module 4:

    • Linear Regression and Logistic Regression Theory of Linear regression, hands on with use cases

    Module 5:

    • K-Nearest Neighbour (KNN)

    Module 6:

    • Decision Tree

    Module 7:

    • Naïve Bayes Classifier

    Module 8:

    • Unsupervised Learning K-means Clustering.
  • Machine Learning Expert 

    Module 1:

    • Advanced Machine Learning Concepts Tuning with Hyper parameters. Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML

    Module 2:

    • Random Forest – Ensemble Ensemble theory, random forest tuning

    Module 3:

    • Support Vector Machine (SVM) Simple and Multiple Linear regression, KNN,

    Module 4:

    • Natural Language Processing (NLP) Text Processing with Vectorization, Sentiment analysis with TextBlob, Twitter sentiment analysis.

    Module 5:

    • Naïve Bayes Classifier Naïve Bayes for text classification, new articles tagging

    Module 6:

    • Artificial Neural Network (ANN) Basic ANN network for regression and classification

    Module 7:

    • Tensorflow overview and Deep Learning Intro Tensorflow work flow demo and intro to deep learning.
  • Python for Data Science 

    Module 1:

    • Introduction to Data Science with Python

    Module 2:

    • Python Basics: Basic Syntax, Data Structures Installing Python, Programming basics, Native Data types Data objects, Math, comparison operators, condition statements, loops, lists, tuples, sets, dicts, functions

    Module 3:

    • Numpy Package Overview, Array, selecting data, Slicing, Iterating, Manuplications, stacking, splitting arrays, functions

    Module 4:

    • Pandas Package Overview, Series and DataFrame, manuplication.

    Module 5:

    • Python Advanced: Data Mugging with Pandas Histogramming, grouping, aggregation, treating missing values, removing duplicates, Transforming data

    Module 6:

    • Python Advanced: Visualization with MatPlotLib

    Module 7:

    • Exploratory Data Analysis: Data Cleaning, Data Wrangling

    Module 8:

    • Exploratory Data Analysis: Case Study
  • Time Series Analysis 

    Module 1:

    • What is Time Series?
    • Trend, Seasonality, cyclical and random
    • White Noise
    • Auto Regressive Model (AR)
    • Moving Average Model (MA)
    • ARMA Model
    • Stationarity of Time Series
    • ARIMA Model – Prediction Concepts
    • ARIMA Model Hands on with Python
    • Case Study Assignment on ARIMA
  • Deep Learning - CNN Foundation 

    Module 1: REST API

    • API concepts, web servers, URL parameters

    Module 2: FLASK Web framework

    • Installing flask, configuration. Course

    Module 3: API in Flask 5+ Industry Projects

    • API coding in Flask

    Module 4: End to End Deployment

    • Exporting trained model, creating end to end API.
  • Data Science & Bigdata Analytics Overview 

    Module 1:

    • Introduction to Data Science and Big Data Analytics, Roles played by a Data Scientist, Technologies for Data Scientist like Hadoop,Spark, Scala, Python, R, Machine learning and Analytics used for analysis, Architecture and Methodologies used to solve the Big Data problems. Defining Machine Learning with example. Hadoop Framework

    Module 2: HDFS

    • That is Big Data? Challenges for processing big data? What technologies support big data? What is Hadoop? Why Hadoop? History of Hadoop,Use Cases of Hadoop, Hadoop eco Systems.

    Module 3: Understanding The Cluster

    • Typical Workflow, Writing files to HDFS,Reading files from HDFS,Rack Awareness, 5 daemons.

    Module 4: Map Reduce

    • Before Map reduce,Map Reduce Overview,Job Tracker,Task Tracker Job Scheduling,Mapper and Reducer code,Configuring development environment Eclipse,Anatomy of Map Reduce Jobrun, Job Submission, Job Initialization,Task Assignment,Job Completion, Job Scheduling,Job Failures, Shuffle and sort,

    Module 5: P I G

    • Pig basics,Install and configure PIG on a cluster PIG Vs MapReduce and SQL Pig Vs Hive Pig Latin Primitive Data Types and Complex Data Types,Types of Modes,Interactive mode Script mode,Embedded mode,Modes of running PIG,Running in Grunt shell,Programming in Eclipse, Loading and Storing Datasets, Filters, Groups, Co-Groups,Foreach, Nested Foreach, Parallel, Distinct, Limit, Sample, Different Types of Joins, Debugging Commands(Illustrate and Explain),Processing Logfiles using Regex,Working with Predefine Functions, User Define Functions,How To Load and Write JSON DATA using PIG

    Module 6: H I V E

    • Hive Introductions,Hive Architecture,Different Modes to Access HIVE Command Line Interface Web Interface(HWI) Thrift Interface Hive Meta Store Hive QL Primitive Data Types and Complex Data Types Working with Partitions Hive Bucketed Tables and Sampling External Tables Nested Queries Multiple Inserts Dynamic Partitions Different Types of Joins ORDER BY,SORT BY, DISTRIBUT BY,CLUSTER BY INDEXES,VIEWS Compression on Hive Tables and Migrating Hive Tables. Hive SerDe's Processing XML Files using Regex Processing Log Files using Regex Accessing Hbase Tables using Hive Hive UDF Hive UDAF Hive UDTF

    Module 7: HBASE

    • Hbase introduction Hbase Data Model and Comparison between RDBMS and NOSQL HBase Architecture, master,HregionServer,Zookeeper,Hregion,MemStore,Hlog,AutoSharding File storage architecture HFiles Compction,DeCompactio n,Region Splits HBase Opreations(DDL AND DML)Through Shell Hbase Installation Internal Zookeeper,External Zookeeper Hbase Counters Hbase Filters, HBase use Cases Install and Configure HBase on a Multi Node Cluster Create Database, Develop and Run Sample Applications Access Data Stored in HBase using Clients like Java, Python MapReduce Client to Access the HBase Data HBase and Hive,IntegrationHBase Admin Tasks

    Module 8: CASSANDRA

    • Introduction Installation Creation of Database Queries and Manipulations
  • Scala Programming for Analytics  

    Module 1:

    • Scala and Java - which to use, when and why,Overview of Scala development tools (Eclipse, Scalac, Sbt, Maven, Gradle, REPL, ScalaTest),Overview of Scala Frameworks

    Module 2:

    • Scala Syntax Fundamentals :Data types,Variables,Operators,Functions and lambdas,Scala,Statements / Loops / Expressions,Extending Builtins,Easy I/O in Scala

    Module 3:

    • Functional Programming with Scala: What is functional programming?,Using "Match",Case Classes,Wildcards,Case Constructors and Deep Matching,Using Extractors,Pure and First Class Functions,Anonymous Functions,Higher Order Functions,Currying, Closures and Partials.

    Module 4:

    • Collections and Generics: Java and Scala Collections,Mutable and immutable collections,Using generic types,Lists, tuples and dictionaries,Functional programming and collections,map, fold and filter,Flattening collections and flatMap,The "For Comprehension"Pattern Matching with Scala
  • Spark Frame for Analytics 

    Module 1:

    • Introduction to Big Data. Challenges to old Big Data solutions,Batch vs Real-time vs in-Memory processing,MapReduce and its limitations,Apache Storm and itslimitations,Need for a general purpose solution - Apache Spark What is Apache Spark? Components of Spark architecture,Apache Spark design principles,Spark features and characteristics,Apache Spark ecosystem components and their insights

    Module 2:

    • Setting up the Spark Environment,Installing and configuring prerequisites,Installing Apache Spark in local mode,Working with Spark in local mode,Troubleshooting encountered problems in Spark Installing Spark in standalone mode,Installing Spark in YARN mode,Installing & configuring Spark on a real multi-node cluster,Playing with Spark in cluster mode,Best practices for Spark deployment, Playing with the Spark shell,Executing Scala and Java statements in the shell,Understanding the

    Module 3:

    • Spark context and driver,Reading data from the local filesystem,Integrating Spark with HDFS,Caching the data in memory for further use,Distributed persistence,Testing and troubleshooting, What is an RDD in Spark,How do RDDs make Spark a feature-rich framework,Transformations in Apache Spark RDDs,Spark RDD action and persistence,Spark Lazy Operations -Transformation and Caching,Fault tolerance in Spark,Loading data and creating RDD in Spark,Persist RDD in memory or disk,Pair operations and key-value in Spark,Spark integration with Hadoop,Apache Spark practicals and workshops

    Module 4:

    • The need for stream analytics,Comparison with Storm and S4,Real-time data processing using,Spark streaming,Fault tolerance and check-pointing,Stateful stream processing,DStream and window operations,Spark Stream execution flow,Connection to various source systems,Performance optimizations in Spark What is Spark SQL,Apache Spark SQL features and data flow,Spark SQL architecture and ,omponents,Hive and Spark SQL together,Play with Data-frames and data states,Data loading techniques in Spark,Hive queries through Spark,Various Spark SQL DDL and DML operations,Performance tuning in Spark

    Module 5:

    • Why Machine Learning is needed,What is Spark Machine Learning,Various Spark ML libraries,Algorithms for clustering, statistical analytics, classification etc.What is GraphX,The need for different graph processing engines,Graph handling using Apache Spark Pyspark for Data engineering, Data Science and Big data AnalyticsModule 1 What is PySpark?,Installing and Configuring PySpark,Interactive Use of PySpark, Standalone Programs,PySpark RDD With Operations and Commands,Pyspark Mlib Algorithms & Parameters,Pyspark Profiler – Methods and Functions,Pyspark SparkContext and its parameters.

Like the curriculum? Enroll Now

Structure your learning and get a certificate to prove it.

  • Thank You
    Thank You..!! Our experts will get in touch with you
    in the next 24 hours
Our experts will get in touch with you in the next 24 hours


  • Feature

    Instructor-led Sessions

    Duration: 2 Months
    Week Day classes (M-F): 40 Sessions
    Daily 2 Hours per Session
  • Feature

    Real-life Case Studies

    Live project based on any of the selected use cases, involving the implementation of Data Science.
  • Feature


    Every class will be followed by practical assignments which aggregate to a minimum of 60 hours.
  • Feature

    Lifetime Access

    Lifetime access to Learning Management System (LMS) which has class presentations, quizzes, installation guide & class recordings.
  • Feature

    24 x 7 Expert Support

    Lifetime access to our 24x7 online support team who will resolve all your technical queries, through ticket based tracking system.
  • Feature


    Successful completion of the final project will get you certified as a Data Science Professional by GoSkills.


  • What if I miss a class?  

    You will never miss a lecture at GoSkill! You can choose either of the two options:

    • View the recorded session of the class available in your LMS.
    • You can attend the missed session, in any other live batch.
  • Will I get placement assistance?  
    • To help you in this endeavor, we have added a resume builder tool in your LMS. Now, you will be able to create a winning resume in just 3 easy steps. You will have unlimited access to use these templates across different roles and designations. All you need to do is, log in to your LMS and click on the "create your resume" option.
  • Can I attend a demo session before enrollment?  
    • We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately, participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight into how are the classes conducted, quality of instructors and the level of interaction in a class.
  • Who are the instructors?  
    • All the instructors at GoSkill! are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by edureka for providing an awesome learning experience to the participants.
  • What if I have more queries?  


Trending Courses
Thank You Error

Get Free counseling to decide your next career step.

Our Career Advisor will give you a call shortly
Our Career Advisor will give you a call shortly

Forgot Password

If you have forgotten your password and would like to change it, enter your email address and we'll send you a new password.

I have a Password?