Drop us a Query
+91 98636 36336
Available 24x7 for your queries
Thank You
Our experts will get in touch with you
in the next 24 hours
  Have Queries? Ask us +91 98636 36336

Data Science with Spark

This course will cover all data science concepts from scratch to till deployment of data science models . We cover all topics with Spark and Scala.Anybody can learn this course .

Data Science with Spark live online classes

29 Jul, 2020 Mon to Fri Batch

Mon to Fri Batch

Filling Fast

Timing - 09:00 to 24:30 PM (IST)
11 Jul, 2020 Sat to Sun Batch

Sat to Sun Batch

Filling Fast

Timing - 09:30 to 24:00 PM (IST)
  • Statistics for Data Science 

    Module 1: Introduction to Statistics

    • Descriptive and Inferential Statistics. Definitions , terms, types of data

    Chapter 2: Harnessing Data

    • Types of Sampling Data. Simple random sampling, Stratified, Cluster sampling. Sampling error.

    Module 3: Exploratory Analysis

    • Mean, Median and Mode, Data variability, Standard deviation, Z-score, Outliers

    Module 4: Distributions

    • Normal Distribution, Central Limit Theorem, Histogram, Normalization, Normality tests, skewness, Kurtosis.

    Module 5: Hypothesis & computational Techniques

    • Hypothesis Testing, Null Hypothesis, P-value, Type I & II errors, parametric testing: t- tests, anova test, non-parametric testing

    Module 6:

    • Correlation & Regression
  • Machine Learning - Basics 

    Module 1: Machine Learning Introduction

    • What is ML? ML vs AI. ML workflow, statistical modeling of ML. Application of ML

    Module 2: Machine Learning Algorithms

    • Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML

    Module 3:

    • Supervised Learning Simple and Multiple Linear regression, KNN, and more.

    Module 4:

    • Linear Regression and Logistic Regression Theory of Linear regression, hands on with use cases

    Module 5:

    • K-Nearest Neighbour (KNN)

    Module 6:

    • Decision Tree

    Module 7:

    • Naïve Bayes Classifier

    Module 8:

    • Unsupervised Learning
    • K-means Clustering.
  • Machine Learning Expert 

    Module 1:

    • Advanced Machine Learning Concepts Tuning with Hyper parameters. Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML

    Module 2:

    • Random Forest – Ensemble Ensemble theory, random forest tuning

    Module 3:

    • Support Vector Machine (SVM) Simple and Multiple Linear regression, KNN,

    Module 4:

    • Natural Language Processing (NLP) Text Processing with Vectorization, Sentiment analysis with TextBlob, Twitter sentiment analysis.

    Module 5:

    • Naïve Bayes Classifier Naïve Bayes for text classification, new articles tagging

    Module 6:

    • Artificial Neural Network (ANN) Basic ANN network for regression and classification

    Module 7:

    • Tensorflow overview and Deep Learning Intro Tensorflow work flow demo and intro to deep learning.
  • Scala Programming for Analytics  

    Module 1:

    • Scala and Java - which to use, when and why,Overview of Scala development tools (Eclipse, Scalac, Sbt, Maven, Gradle, REPL, ScalaTest),Overview of Scala Frameworks

    Module 2:

    • Scala Syntax Fundamentals :Data types,Variables,Operators,Functions and lambdas,Scala,Statements / Loops / Expressions,Extending Builtins,Easy I/O in Scala

    Module 3:

    • Functional Programming with Scala: What is functional programming?,Using "Match",Case Classes,Wildcards,Case Constructors and Deep Matching,Using Extractors,Pure and First Class Functions,Anonymous Functions,Higher Order Functions,Currying, Closures and Partials.

    Module 4:

    • Collections and Generics: Java and Scala Collections,Mutable and immutable collections,Using generic types,Lists, tuples and dictionaries,Functional programming and collections,map, fold and filter,Flattening collections and flatMap,The "For Comprehension"Pattern Matching with Scala
  • Spark Frame for Analytics 

    Module 1:

    • Introduction to Big Data. Challenges to old Big Data solutions,Batch vs Real-time vs in- Memory processing,MapReduce and its limitations,Apache Storm and itslimitations,Need for a general purpose solution - Apache Spark What is Apache Spark? Components of Spark architecture,Apache Spark design principles,Spark features and characteristics,Apache Spark ecosystem components and their insights

    Module 2:

    • Setting up the Spark Environment,Installing and configuring prerequisites,Installing Apache Spark in local mode,Working with Spark in local mode,Troubleshooting encountered problems in Spark Installing Spark in standalone mode,Installing Spark in YARN mode,Installing & configuring Spark on a real multi-node cluster,Playing with Spark in cluster mode,Best practices for Spark deployment, Playing with the Spark shell,Executing Scala and Java statements in the shell,Understanding the

    Module 3:

    • Spark context and driver,Reading data from the local filesystem,Integrating Spark with HDFS,Caching the data in memory for further use,Distributed persistence,Testing and troubleshooting, What is an RDD in Spark,How do RDDs make Spark a feature-rich framework,Transformations in Apache Spark RDDs,Spark RDD action and persistence,Spark Lazy Operations -Transformation and Caching,Fault tolerance in Spark,Loading data and creating RDD in Spark,Persist RDD in memory or disk,Pair operations and key-value in Spark,Spark integration with Hadoop,Apache Spark practicals and workshops

    Module 4:

    • The need for stream analytics,Comparison with Storm and S4,Real-time data processing using,Spark streaming,Fault tolerance and check-pointing,Stateful stream processing,DStream and window operations,Spark Stream execution flow,Connection to various source systems,Performance optimizations in Spark What is Spark SQL,Apache Spark SQL features and data flow,Spark SQL architecture and ,omponents,Hive and Spark SQL together,Play with Data-frames and data states,Data loading techniques in Spark,Hive queries through Spark,Various Spark SQL DDL and DML operations,Performance tuning in Spark

    Module 5:

    • Why Machine Learning is needed,What is Spark Machine Learning,Various Spark ML libraries,Algorithms for clustering, statistical analytics, classification etc.What is GraphX,The need for different graph processing engines,Graph handling using Apache Spark Pyspark for Data engineering, Data Science and Big data AnalyticsModule 1 What is PySpark?,Installing and Configuring PySpark,Interactive Use of PySpark, Standalone Programs,PySpark RDD With Operations and Commands,Pyspark Mlib Algorithms & Parameters,Pyspark Profiler – Methods and Functions,Pyspark SparkContext and its parameters.

Like the curriculum? Enroll Now

Structure your learning and get a certificate to prove it.

  • Thank You
    Thank You..!! Our experts will get in touch with you
    in the next 24 hours
Our experts will get in touch with you in the next 24 hours


  • Feature

    Instructor-led Sessions

    Duration: 2 Months
    Week Day classes (M-F): 40 Sessions
    Daily 2 Hours per Session
  • Feature

    Real-life Case Studies

    Live project based on any of the selected use cases, involving the implementation of Data Science.
  • Feature


    Every class will be followed by practical assignments which aggregate to a minimum of 60 hours.
  • Feature

    Lifetime Access

    Lifetime access to Learning Management System (LMS) which has class presentations, quizzes, installation guide & class recordings.
  • Feature

    24 x 7 Expert Support

    Lifetime access to our 24x7 online support team who will resolve all your technical queries, through ticket based tracking system.
  • Feature


    Successful completion of the final project will get you certified as a Data Science Professional by GoSkills.


  • What if I miss a class?  

    You will never miss a lecture at GoSkill! You can choose either of the two options:

    • View the recorded session of the class available in your LMS.
    • You can attend the missed session, in any other live batch.
  • Will I get placement assistance?  
    • To help you in this endeavor, we have added a resume builder tool in your LMS. Now, you will be able to create a winning resume in just 3 easy steps. You will have unlimited access to use these templates across different roles and designations. All you need to do is, log in to your LMS and click on the "create your resume" option.
  • Can I attend a demo session before enrollment?  
    • We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately, participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight into how are the classes conducted, quality of instructors and the level of interaction in a class.
  • Who are the instructors?  
    • All the instructors at GoSkill! are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by edureka for providing an awesome learning experience to the participants.
  • What if I have more queries?  


Trending Courses
Thank You Error

Get Free counseling to decide your next career step.

Our Career Advisor will give you a call shortly
Our Career Advisor will give you a call shortly

Forgot Password

If you have forgotten your password and would like to change it, enter your email address and we'll send you a new password.

I have a Password?