Machine Learning Using Python
About this course
This program provides a strong foundation in data analytics by bringing a diverse body of knowledge starting from exploratory data analysis (EDA), applied statistics, applied mathematics, computer science, optimisation, consumer behaviour and decision theory.
Learning Outcomes. By the end of this course you will be able to:
- Given a problem statement or use case, one will be able to identify what all data sets would be required for the analysis
- Do the EDA (Exploratory Data Analysis) on the data sets and identify what all Machine Learning techniques or Models to be tried out
- Validate the accuracy of the models with the test or hold-out data set
- Finalize one best model or ensemble multiple models.
“Numbers have an important story to tell. They rely on you to give them a voice.”
One should be having some familiarity with data preparation or data usage (Data Engineering or Business Intelligence) – SQL, R, Python, Data Visualization Tools, etc.
Overview of Data Science.
- Overview of Data Science, why it’s becoming popular
- Application of analytics, analytics technology and resources
- Models and algorithms. Data exploration and preparation
Module – 1: Getting Familiar with Python
- Python installation and lunching
- Data types – strings, lists, dictionary, tuples
- Python loops – Simple and nested loops
- Array manipulation – through NumPy module
- Dataframe creation and manipulation – through pandas module
- Image and graphs – through seaborn and matplotlib modules
Module – 2: Starting with Machine Learning (ML)
- Basic concepts of ML – decision boundary, supervised & unsupervised learning, classification, regression, clustering
- Familiarizing with using Python ML libraries – sklearn, NumPy, Pandas, Seaborn
- Data preprocessing – missing values, outliers, sparsity treatments, correlation analysis, summary statistics and quantile computations
Module – 3: Classification and Regression with Python
- Linear and Logistic Regression
- Regularization based models – Ridge, Lasso Regression
- Tree based models – Random Forest, Gradient Boosting, XGBoost
- Support Vector Machines (SVMs)
- Parameters tuning – through grid search
- Performance metrics – Confusion matrix, accuracy, precision-recall, ROC curve, precision-recall curve; MAE, MSE, RMSE, R2.
Module – 4: Clustering
- k-means clustering – optimizing #clusters, elbow plot, inertia plot
- cluster profiling, interpretation and analysis
- Touch up on few other clustering techniques – DBSCAN, GMM
Module – 5: ML through tensorflow and keras
- Installing tensorflow and keras modules
- Building Perceptron and Neural Networks through tensorflow and keras
- NN hyperparameters tuning