Posts

Improve Output Quality When Streaming Log Data to Amazon QuickSight for Visualisation

Image
Amazon QuickSight is a commonly used business intelligence tool set that helps wide range of users to comprehend data by visualisation.  Log data from many sources, both from AWS services and non AWS services, can be streamed to Amazon QuickSight for data visualization, analytics, and for leveraging generative AI. This is a fairly typical use case of QuickSight.  But the quality of the Amazon QuickSight outputs for log data is influenced, or decided, before QuickSight even receives the data. This is because, often, log data has so much dimensions in it, or there are just simply so many fields, which significance is not equal. Insignificant data fields will significantly reduce the quality of the QuickSight visualisation and its readability. Some times, you also want to strip sensitive data before sending the data set for visualisation.  There are creative ways to improve the log data quality when streaming it to Amazon QuickSight using filtering and/or redacting.  Be...

A Problem Solving Experience on Amazon S3 Multi-Region Access Points

Image
Amazon S3 buckets can be used for storing large amount of data. AWS’ comprehensive tools on data analytics can then be leveraged for processing the data. Often, the post-analytics (processed) data is stored in another S3 bucket. Users can then access the processed data by fetching it from that S3 bucket.  This piece discusses a problem that was recently experienced on accessing the S3 buckets behind a Multi-Region Access Points setup and the problem solving experience.  The Setup A large amount of data is deposit into Amazon S3 service for analytics processing. The processed data is then stored in a different S3 bucket. A user base of around 10,000 accesses the processed data actively using a proprietary application. This application is latency sensitive. Most users are located in three geographical regions: Australia and New Zealand, North America and South America.  Due to the latency sensitive requirement, Amazon S3 Multi-Region Access Point setup has been used.  ...

Solving PII Data Security Problems in An AWS Machine Learning Use Case

Recently I discussed how a solution on extracting large volumes of data from a set of enterprise applications to AWS S3 for processing helped an organisation on getting their desired data analytics outcomes.  A further initiative has commenced to leverage the data using AI, with the aim to apply further intelligence in analytics and to provide a Gen AI style service in which users can ask questions on the likes of career development training options and to get intelligent suggestions. Obviously this initiative is a progressive process.  But before anything on AI, the data needs to get fed into AWS Machine Learning service for ML training as well as analysing purposes. This was where a big obstacle existed which almost ground the development to a standing still – the data from the enterprise applications contains PII (Personally Identifiable Information) and the organisation has clear policies on protecting PII, including that no PII can be made subject to machine learning or A...

Using AWS Services to Perform Data Analysis With SAP SuccessFactors

A successful experience has been enjoyed by an organisation in performing unique data analysis with SAP SuccessFactors, using AWS services. SuccessFactors is a suite of applications from SAP, a renowned player in enterprise and business applications. SuccessFactors mainly covers Human Resource core services, Payroll services, Recruiting Management, Performance Management, Workforce Planning, Learning and Development Management and other related Human Capital Management functions.  While SAP offers data analytics, the organisation does have unique requirements in what they wanted to achieve in data analytics which led them to look into other possibilities.  Eventually, following setup helped to deliver the desired outcomes.  Extracting Data from SAP SuccessFactors Using AWS Glue AWS Glue is a fully managed, serverless data ETL (Extract, Transform, Load)  service with efficient operations. Its high performance enablement is suitable for extracting data from Successfact...

Several Operational Considerations for Amazon RDS

Amazon Relational Database Service (RDS) is an ultra common cloud based database service. It is easy to setup and operate, with many operational tasks automated, and can scale with demand. RDS offers eight engine types, including six standard RDS engines: RDS for PostgreSQL, RDS for MySQL, RDS for MariaDB, RDS for SQL Server, RDS for Oracle, and RDS for Db2, as well as two Amazon Aurora engine types: Aurora PostgreSQL-Compatible Edition and Aurora MySQL-Compatible Edition. There can be a number of operational considerations that provide help with the better use of RDS in enterprise settings. This blog piece discusses several of them based on real world experience. It is not intended to be an exhaust list – each organisation is expected to research and apply those considerations that are more relevant to the specific use case of the RDS. It would already be meaningful if this piece can encourage some active research and discussions on this topic.  When the Provisioned IOPS storage t...

Machine Learning: Visualisation and Analysis with Amazon SageMaker Data Wrangler

Image
In a Machine Learning project, the most time consuming part is the data preparation. For example, data scientists and data engineers have to spend the most time on preparing the data so that it becomes suitable for the machine learning algorithms.  Let’s say there is a clinic that is specialised in Hypertension (high blood pressure) conditions. Over a number of years the clinic has treated more than a hundred thousand Hypertension patients. The clinic keeps records of the patients on information like age, gender, ethic group, job sector, weekly exercise habits, other chronic conditions, long term medications, address (which may potentially link to the patient’s social economical position but this may not be reliable) and others. The treatment plan varies from patient to patient which is also part of the patient records (the patient dataset). There are more than 70 features in the patient dataset.  Through treatments, many patients can have their Hypertension condition managed ...

Fairness Evaluation and Model Explainability In AI

Image
Artificial Intelligence (AI) relies on machine learning and its modelling to provide outcomes. Imagine a credit card company receives hundreds of thousands of applications each year and would like to use AI to do the first round of application filtering. If it is not developed properly, the AI decisions will be skewed with unfairness – such as rejecting a lot more applications from a certain age group, from a certain gender, or from a certain employment history pattern – UNFAIRLY. Please note unfairness is the key issue here – in other words, AI and machine learning inference that is not reflecting the due course in actual situations. This is typically called bias in AI and machine learning. Biases can come from various stages of a machine learning life cycle: • Biases may exist in the pre-training data -  e.g., the dataset to be used in machine learning training has biases in it • Biases may be introduced by the machine learning exercise  Below is a machine learning lifec...