After this course you should understand the basics of machine learning and how to implement machine learning algorithms on your data sets using Python. Specifically, you should understand basic regression, classification, and clustering algorithms and how to fit a model and use it to predict future outcomes.
Module 1:Python Set up
- Python installation
- Data types in python
- Basic operations and functions
Module 2: Numerical computing using python
- Introducing numpy
- Vector matrix arrays
- Plotting and visualization
Module 3:Introducing scipy
- Scipy Toolkit
Module 4:Introduction to pandas
- Numpy for pandas
- Playing around pandas series
- Playing around pandas data frame
- Working on dataset In pandas
- Combining reshaping data
- Grouping and aggregating data
Module 5 : Basics of machine learning
- Definition of machine learning
- Types of machine learning
- Few examples on machine learning implementations
Module 6 :Machine learning with sci-kit learn
- Machine learning: the problem setting
- Loading an example dataset
- Learning and predicting
- Model persistence
Module 7 : Data visualization
- Data visualization using pandas
- Using matplotlib
- Using seaborn
Machine Learning With Python - Lab
Downloading, Installing and Starting Python SciPy
Install SciPy Libraries
This tutorial assumes Python version 2.7 or 3.5+.
There are 5 key libraries that you will need to install. Below is a list of the Python SciPy libraries required for this tutorial:
There are many ways to install these libraries. My best advice is to pick one method then be consistent in installing each library.
The scipy installation page provides excellent instructions for installing the above libraries on multiple different platforms, such as Linux, mac OS X and Windows. If you have any doubts or questions, refer to this guide, it has been followed by thousands of people.
- On Mac OS X, you can use macports to install Python 2.7 and these libraries. For more information on macports, see the homepage.
- On Linux you can use your package manager, such as yum on Fedora to install RPMs.
If you are on Windows or you are not confident, I would recommend installing the free version of Anaconda that includes everything you need.
Note: This tutorial assumes you have scikit-learn version 0.18 or higher installed.
Need more help? See one of these tutorials:
Start Python and Check Versions
Load The Data
- Import libraries
- Load Dataset
Summarize the Dataset
Dimensions of the dataset.
- Peek at the data itself.
- Statistical summary of all attributes.
- Breakdown of the data by the class variable.
- Dimensions of Dataset
Peek at the Data
- Statistical Summary
- Class Distribution
Univariate plots to better understand each attribute.
- Multivariate plots to better understand the relationships between attributes.
- Univariate Plots
- Multivariate Plots
Evaluate Some Algorithms
Here is what we are going to cover in this step:
Separate out a validation dataset.
- Set-up the test harness to use 10-fold cross validation.
- Build 5 different models to predict species from flower measurements
- Select the best model.
- Create a Validation Dataset
- How to Index, Slice and Reshape NumPy Arrays for Machine Learning in Python
- Test Harness
- Introduction to Random Number Generators for Machine Learning in Python
- Build Models
- Logistic Regression (LR)
- Linear Discriminant Analysis (LDA)
- K-Nearest Neighbors (KNN).
- Classification and Regression Trees (CART).
- Gaussian Naive Bayes (NB).
- Support Vector Machines (SVM).
Embrace Randomness in Machine Learning