Machine Learning: Visualisation and Analysis with Amazon SageMaker Data Wrangler
In a Machine Learning project, the most time consuming part is the data preparation. For example, data scientists and data engineers have to spend the most time on preparing the data so that it becomes suitable for the machine learning algorithms. Let’s say there is a clinic that is specialised in Hypertension (high blood pressure) conditions. Over a number of years the clinic has treated more than a hundred thousand Hypertension patients. The clinic keeps records of the patients on information like age, gender, ethic group, job sector, weekly exercise habits, other chronic conditions, long term medications, address (which may potentially link to the patient’s social economical position but this may not be reliable) and others. The treatment plan varies from patient to patient which is also part of the patient records (the patient dataset). There are more than 70 features in the patient dataset. Through treatments, many patients can have their Hypertension condition managed reasonabl