Posts

Showing posts from July, 2024

Machine Learning: Visualisation and Analysis with Amazon SageMaker Data Wrangler

Image
In a Machine Learning project, the most time consuming part is the data preparation. For example, data scientists and data engineers have to spend the most time on preparing the data so that it becomes suitable for the machine learning algorithms.  Let’s say there is a clinic that is specialised in Hypertension (high blood pressure) conditions. Over a number of years the clinic has treated more than a hundred thousand Hypertension patients. The clinic keeps records of the patients on information like age, gender, ethic group, job sector, weekly exercise habits, other chronic conditions, long term medications, address (which may potentially link to the patient’s social economical position but this may not be reliable) and others. The treatment plan varies from patient to patient which is also part of the patient records (the patient dataset). There are more than 70 features in the patient dataset.  Through treatments, many patients can have their Hypertension condition managed reasonabl

Fairness Evaluation and Model Explainability In AI

Image
Artificial Intelligence (AI) relies on machine learning and its modelling to provide outcomes. Imagine a credit card company receives hundreds of thousands of applications each year and would like to use AI to do the first round of application filtering. If it is not developed properly, the AI decisions will be skewed with unfairness – such as rejecting a lot more applications from a certain age group, from a certain gender, or from a certain employment history pattern – UNFAIRLY. Please note unfairness is the key issue here – in other words, AI and machine learning inference that is not reflecting the due course in actual situations. This is typically called bias in AI and machine learning. Biases can come from various stages of a machine learning life cycle: • Biases may exist in the pre-training data -  e.g., the dataset to be used in machine learning training has biases in it • Biases may be introduced by the machine learning exercise  Below is a machine learning lifecycle char