+91 98636 36336


Data Science with Spark
This course will cover all data science concepts from scratch to till deployment of data science models . We cover all topics with Spark and Scala.Anybody can learn this course .

Data Science with Spark live online classes
29 Jul, 2020 |
Mon to Fri Batch
Mon to Fri Batch |
Filling FastTiming - 09:00 to 24:30 PM (IST) |
11 Jul, 2020 |
Sat to Sun Batch
Sat to Sun Batch |
Filling FastTiming - 09:30 to 24:00 PM (IST) |
Course Price at
50000.00
Enroll NowCan’t find a batch you were looking for?
Request a BatchCurriculum
Download Curriculum
-
Statistics for Data Science Module 1: Introduction to Statistics
- Descriptive and Inferential Statistics. Definitions , terms, types of data
Chapter 2: Harnessing Data
- Types of Sampling Data. Simple random sampling, Stratified, Cluster sampling. Sampling error.
Module 3: Exploratory Analysis
- Mean, Median and Mode, Data variability, Standard deviation, Z-score, Outliers
Module 4: Distributions
- Normal Distribution, Central Limit Theorem, Histogram, Normalization, Normality tests, skewness, Kurtosis.
Module 5: Hypothesis & computational Techniques
- Hypothesis Testing, Null Hypothesis, P-value, Type I & II errors, parametric testing: t- tests, anova test, non-parametric testing
Module 6:
- Correlation & Regression
-
Machine Learning - Basics Module 1: Machine Learning Introduction
- What is ML? ML vs AI. ML workflow, statistical modeling of ML. Application of ML
Module 2: Machine Learning Algorithms
- Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML
Module 3:
- Supervised Learning Simple and Multiple Linear regression, KNN, and more.
Module 4:
- Linear Regression and Logistic Regression Theory of Linear regression, hands on with use cases
Module 5:
- K-Nearest Neighbour (KNN)
Module 6:
- Decision Tree
Module 7:
- Naïve Bayes Classifier
Module 8:
- Unsupervised Learning
- K-means Clustering.
-
Machine Learning Expert Module 1:
- Advanced Machine Learning Concepts Tuning with Hyper parameters. Popular ML algorithms, clustering, classification and regression, supervised vs unsupervised. Choice of ML
Module 2:
- Random Forest – Ensemble Ensemble theory, random forest tuning
Module 3:
- Support Vector Machine (SVM) Simple and Multiple Linear regression, KNN,
Module 4:
- Natural Language Processing (NLP) Text Processing with Vectorization, Sentiment analysis with TextBlob, Twitter sentiment analysis.
Module 5:
- Naïve Bayes Classifier Naïve Bayes for text classification, new articles tagging
Module 6:
- Artificial Neural Network (ANN) Basic ANN network for regression and classification
Module 7:
- Tensorflow overview and Deep Learning Intro Tensorflow work flow demo and intro to deep learning.
-
Scala Programming for Analytics Module 1:
- Scala and Java - which to use, when and why,Overview of Scala development tools (Eclipse, Scalac, Sbt, Maven, Gradle, REPL, ScalaTest),Overview of Scala Frameworks
Module 2:
- Scala Syntax Fundamentals :Data types,Variables,Operators,Functions and lambdas,Scala,Statements / Loops / Expressions,Extending Builtins,Easy I/O in Scala
Module 3:
- Functional Programming with Scala: What is functional programming?,Using "Match",Case Classes,Wildcards,Case Constructors and Deep Matching,Using Extractors,Pure and First Class Functions,Anonymous Functions,Higher Order Functions,Currying, Closures and Partials.
Module 4:
- Collections and Generics: Java and Scala Collections,Mutable and immutable collections,Using generic types,Lists, tuples and dictionaries,Functional programming and collections,map, fold and filter,Flattening collections and flatMap,The "For Comprehension"Pattern Matching with Scala
-
Spark Frame for Analytics Module 1:
- Introduction to Big Data. Challenges to old Big Data solutions,Batch vs Real-time vs in- Memory processing,MapReduce and its limitations,Apache Storm and itslimitations,Need for a general purpose solution - Apache Spark What is Apache Spark? Components of Spark architecture,Apache Spark design principles,Spark features and characteristics,Apache Spark ecosystem components and their insights
Module 2:
- Setting up the Spark Environment,Installing and configuring prerequisites,Installing Apache Spark in local mode,Working with Spark in local mode,Troubleshooting encountered problems in Spark Installing Spark in standalone mode,Installing Spark in YARN mode,Installing & configuring Spark on a real multi-node cluster,Playing with Spark in cluster mode,Best practices for Spark deployment, Playing with the Spark shell,Executing Scala and Java statements in the shell,Understanding the
Module 3:
- Spark context and driver,Reading data from the local filesystem,Integrating Spark with HDFS,Caching the data in memory for further use,Distributed persistence,Testing and troubleshooting, What is an RDD in Spark,How do RDDs make Spark a feature-rich framework,Transformations in Apache Spark RDDs,Spark RDD action and persistence,Spark Lazy Operations -Transformation and Caching,Fault tolerance in Spark,Loading data and creating RDD in Spark,Persist RDD in memory or disk,Pair operations and key-value in Spark,Spark integration with Hadoop,Apache Spark practicals and workshops
Module 4:
- The need for stream analytics,Comparison with Storm and S4,Real-time data processing using,Spark streaming,Fault tolerance and check-pointing,Stateful stream processing,DStream and window operations,Spark Stream execution flow,Connection to various source systems,Performance optimizations in Spark What is Spark SQL,Apache Spark SQL features and data flow,Spark SQL architecture and ,omponents,Hive and Spark SQL together,Play with Data-frames and data states,Data loading techniques in Spark,Hive queries through Spark,Various Spark SQL DDL and DML operations,Performance tuning in Spark
Module 5:
- Why Machine Learning is needed,What is Spark Machine Learning,Various Spark ML libraries,Algorithms for clustering, statistical analytics, classification etc.What is GraphX,The need for different graph processing engines,Graph handling using Apache Spark Pyspark for Data engineering, Data Science and Big data AnalyticsModule 1 What is PySpark?,Installing and Configuring PySpark,Interactive Use of PySpark, Standalone Programs,PySpark RDD With Operations and Commands,Pyspark Mlib Algorithms & Parameters,Pyspark Profiler – Methods and Functions,Pyspark SparkContext and its parameters.
Like the curriculum? Enroll Now
Structure your learning and get a certificate to prove it.
Features
-
Instructor-led Sessions
Duration: 2 Months
Week Day classes (M-F): 40 Sessions
Daily 2 Hours per Session -
Real-life Case Studies
Live project based on any of the selected use cases, involving the implementation of Data Science. -
Assignments
Every class will be followed by practical assignments which aggregate to a minimum of 60 hours.
-
Lifetime Access
Lifetime access to Learning Management System (LMS) which has class presentations, quizzes, installation guide & class recordings. -
24 x 7 Expert Support
Lifetime access to our 24x7 online support team who will resolve all your technical queries, through ticket based tracking system. -
Certification
Successful completion of the final project will get you certified as a Data Science Professional by GoSkills.
FAQS
-
What if I miss a class? You will never miss a lecture at GoSkill! You can choose either of the two options:
- View the recorded session of the class available in your LMS.
- You can attend the missed session, in any other live batch.
-
Will I get placement assistance? - To help you in this endeavor, we have added a resume builder tool in your LMS. Now, you will be able to create a winning resume in just 3 easy steps. You will have unlimited access to use these templates across different roles and designations. All you need to do is, log in to your LMS and click on the "create your resume" option.
-
Can I attend a demo session before enrollment? - We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately, participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight into how are the classes conducted, quality of instructors and the level of interaction in a class.
-
Who are the instructors? - All the instructors at GoSkill! are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by edureka for providing an awesome learning experience to the participants.
-
What if I have more queries? - Just give us a CALL at +9198636 36336 (US Tollfree Number) OR email at Marketing@goskills.in
Success..!!





Get Free counseling to decide your next career step.


Login
Forgot Password?
Don’t have an account? Sign Up


Forgot Password
If you have forgotten your password and would like to change it, enter your email address and we'll send you a new password.
I have a Password?
Go to Login

