Data Wrangling using Pandas and Numpy

According to World Health Organization (WHO), cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year and affecting the quality of life of a large number of people worldwide. Prerequisites of the treatment of these types of disease involve proper diagnosis method to identify its occurrence and its type. Diagnosis of such diseases involve in-depth analysis of a large number of parameters. In this demo Pandas and Numpy is used for analysing one such dataset called A1_heart_disease_dataset

Link to Notebook

Unsupervised Learning with PyClustering

Applying Unsupervised Machine Models to a dataset containing attributes of individuals with relation to obesity levels, to predict “weight” of an individual using other features and classifying individuals into classes. This demo uses clustering machine learning models that experiments with different distance metrics like Euclidean distance and Manhattan Distance used to determine cluster membership of a inference data point.

Link to Notebook

Customer Behaviour Prediction with Supervised Learning

Customer details like Gender, Age Group and Education are few of the most significant features for analysing cutomer behaviour to predict the byuing behaviour of customers. it is essential for businesses to analyse their customer details to better understand consumer behaviour and their impact on various products. This demo uses a CRM dataset and data on customer details having 20 attributes and 9134 records. This demo includes three machine learning models for predicting customer response. As part of feature engineering it employs feature importance techniques to extract important features and reduce dimensionality of the dataset.

Link to Notebook

Ensemble Learning with LogitBoost

This demo replicates a Research Article to predict survival of patients with heart failure in Python/scikit-learn, originally written in R. Authors of the paper used 10 Machine Learning Models which are replicated in Python using the same set of Features, Classifiers, Train/Test splitting approach and a Performance metric - Matthews Correlation Coefficient.

The demo builds various machine learning models like K-Nearest Neighbour classifier, SVM Linear Classifier, SVM Radial Classifiers, and an Artificial Neural Network (Multilayer Perceptron) to classify the dataset. It also builds a new type of boosting model called LogitBoost that uses loss function from Logistic Regression classification algorithm into the Boosting framework. Adaptive Boosting algorithm uses 'Exponential loss' function whereas LogitBoost uses the 'Logistic loss' function. The logistic loss deals with the misclassified examples in a more measured way than Adaboost's exponential loss function. As a result the LogitBoost framework is less sensitive to outliers and noise.

Link to Notebook