Data Science
Course Features
 Course Duration: 40Hours (approx.)
 Category: Artificial Intelligence, Machine Learning, Predictive Analysis
 Available Modes: Online (Batch or One on One)
 Certificate: Yes
 Location: Online  Live Sessions
 Language: English
 Sessions: Weekday and Weekend
 Viewers: 3177
 Prerequisites: No
 Skill Level: Beginner
 Course Capacity: 20
 Start Course:
Descriptions
Machine Learning witThis course focuses on the practical application of data science in solving realworld problems. Students will learn how to use statistical analysis, machine learning, and data visualization techniques to extract insights from complex data sets. They will also develop practical skills such as data cleaning, preparation, and wrangling. Through handson projects and case studies, students will gain experience in the entire data science pipeline, from data collection to decisionmaking. By the end of the course, students will be equipped with the tools and techniques necessary to use data science to make informed decisions in their personal and professional lives.h Python, Artificial Intelligence, Deep Learning.
Data science, Machine Learning with Big data
Course Content:
 Introduction to Data Science
 What is data science
 Data science is the study of data. It involves developing methods of recording, storing, and analysing data to effectively extract useful information. The goal of data science is to gain insights and knowledge from any type of data — both structured and unstructured.
 Role of Data scientist
 How data science is driving the industries
 Role of PYTHON in data science applications and why we choose PYTHON
 What is data science
 Introduction to Python
 Introduction to Python programming language
 Features and how it is different from other programming languages
 Python & Anaconda Installation on Windows, Linux and Mac
 Python IDE working mechanism
 Python Basics
 Variables,
 Data Types
 Keywords
 Examples on variable methods
 Operators
 Python Data structures
 Data Structures
 List
 Tuple
 Dictionary
 Set
 Slicing
 Q & A’s
 Handson Exercises
 Data Structures
 Control statements and Loops
 IF ELSE statements
 For Loop and While Loop
 Q & A’s
 Handson Exercises
 Functions
 Role of functions
 Parameters
 Executing functions
 Q & A’s
 Handson Exercises
 Lambda functions
 Exceptions and how we use in projects
 OOPS concepts & Database access
 Understanding object oriented programming
 Global and Local variables
 Methods
 Connect with Database and pull the data
 Q & A’s
 Handson Exercises
 Setting up the Jupyter notebook environment
Modules:
11. NumPy
NumPy is not another programming language but a Python extension module. It
provides fast and efficient operations on arrays of homogeneous
data. NumPy extends python into a highlevel language for manipulating numerical
data, similar to MATLAB

 Understanding NumPy
 Role of NumPy in Data Science
 Arrays and Matrices
 Important Methods
 Slicing
 Q & A’s
 Handson Exercises
12. SciPy
It is used for scientific computing and technical computing. It contains modules
for optimization, linear algebra, integration, interpolation, special functions, FFT,
signal and image processing, ODE solvers and other tasks common in science and
engineering

 Introduction
 Characteristics of SciPy
 Sub packages of SciPy
 Bayes theorem
 Q & A’s
 Handson Exercises
 Pandas (Data manipulation)
Data in pandas is often used to feed statistical analysis in SciPy, plotting
functions from Matplotlib, and machine learning algorithms in Scikitlearn. Jupyter
Notebooks offer a good environment for using pandas to do data exploration and
modelling, but pandas can also be used in text editors just as easily

 Dataframes and it’s methods
 Reading and writing the different file formats (CSV, Json, etc.)
 Connecting to Database
 Data manipulation techniques
 Joins and merge
 NumPy dependency of Pandas library
 Exploring and analysing datasets
 Q & A’s
 Handson Exercises
Data Analysis and Machine learning:
 Machine learning

 Introduction
 Various tools in python used for machine learning (NumPy, Pandas, Matplotlib, ScikitLearn etc.)
 Use cases of Machine learning
 Machine learning flow
 Handling missing values
Algorithms:
a. Linear Regression
 Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things:
 does a set of predictor variables do a good job in predicting an outcome (dependent) variable?
 Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?
b. Logistic Regression
 Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable
c. Gradient descent
 Gradient Descent is the process of minimizing a function by following the gradients of the cost function. This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction
d. Time series analysis
 Time series analysis is the collection of data at specific intervals over a period of time, with the purpose of identifying trends, cycles, and seasonal variances to aid in the forecasting of a future event. Data is any observed outcome that’s measurable.
 Q & A’s
 Handson Exercises
15.Supervised Learning
 What is Supervised Learning
 A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
 Classification
 Classification is the process of predicting the class of given data points. Classes are sometimes called as targets/ labels or categories. Classification belongs to the category of supervised learning where the targets also provided with the input data
 Decision Tree and algorithm for Decision Tree induction
 A decision tree is a flowchartlike structure in which each internal node represents a ―test‖ on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes)
 Confusion Matrix
 A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known
 Random Forest
 Random Forest increases predictive power of the algorithm and also helps prevent overfitting. Random forest is the simplest and widely used algorithm. Used for both classification and regression. It is an ensemble of randomized decision trees
 Naïve Bayes
 Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
 Implement Naïve Bayes Classifier
 Q & A’s
 Support vector machine and its process
 SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.
 Hyperparameter optimization
 Hyperparameter is a parameter whose value is used to control the learning process
 Comparing Random search with Grid search
 Implement Support vector machine for classification
 Q & A’s
 Handson Exercise to implement above algorithms using SciPy
 Unsupervised Learning
 Introduction and use cases of Unsupervised Learning
 Kmeans clustering
 The K–means clustering algorithm is used to find groups which have not been explicitly labelled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.
 Optimal clustering
 The optimal number of clusters k is the one that maximize the average silhouette over a range of possible values for k. The algorithm is similar to the elbow method and can be computed as follow: Compute clustering algorithm (e.g., kmeans clustering) for different values of k
 Hierarchical clustering
 Hierarchical clustering is a powerful technique that allows you to build tree structures from data similarities
 Implementation of Kmeans and Hierarchical clustering
 Q & A’s
 Introduction to NLP
 helps resolve ambiguity in language and adds useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics.
 Working with NLP on text data
 Analysing sentence
 Bags of words model
 The bagofwords model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.
 Extracting features from text
 Searching a grid
 Model training
 Multiple parameters and building of a pipeline
 Q & A’s
 Handson Exercises using SciPy
 Project implementation
There are no reviews yet.