Posts

Several Operational Considerations for Amazon RDS

Amazon Relational Database Service (RDS) is an ultra common cloud based database service. It is easy to setup and operate, with many operational tasks automated, and can scale with demand. RDS offers eight engine types, including six standard RDS engines: RDS for PostgreSQL, RDS for MySQL, RDS for MariaDB, RDS for SQL Server, RDS for Oracle, and RDS for Db2, as well as two Amazon Aurora engine types: Aurora PostgreSQL-Compatible Edition and Aurora MySQL-Compatible Edition. There can be a number of operational considerations that provide help with the better use of RDS in enterprise settings. This blog piece discusses several of them based on real world experience. It is not intended to be an exhaust list – each organisation is expected to research and apply those considerations that are more relevant to the specific use case of the RDS. It would already be meaningful if this piece can encourage some active research and discussions on this topic.  When the Provisioned IOPS storage type

Machine Learning: Visualisation and Analysis with Amazon SageMaker Data Wrangler

Image
In a Machine Learning project, the most time consuming part is the data preparation. For example, data scientists and data engineers have to spend the most time on preparing the data so that it becomes suitable for the machine learning algorithms.  Let’s say there is a clinic that is specialised in Hypertension (high blood pressure) conditions. Over a number of years the clinic has treated more than a hundred thousand Hypertension patients. The clinic keeps records of the patients on information like age, gender, ethic group, job sector, weekly exercise habits, other chronic conditions, long term medications, address (which may potentially link to the patient’s social economical position but this may not be reliable) and others. The treatment plan varies from patient to patient which is also part of the patient records (the patient dataset). There are more than 70 features in the patient dataset.  Through treatments, many patients can have their Hypertension condition managed reasonabl

Fairness Evaluation and Model Explainability In AI

Image
Artificial Intelligence (AI) relies on machine learning and its modelling to provide outcomes. Imagine a credit card company receives hundreds of thousands of applications each year and would like to use AI to do the first round of application filtering. If it is not developed properly, the AI decisions will be skewed with unfairness – such as rejecting a lot more applications from a certain age group, from a certain gender, or from a certain employment history pattern – UNFAIRLY. Please note unfairness is the key issue here – in other words, AI and machine learning inference that is not reflecting the due course in actual situations. This is typically called bias in AI and machine learning. Biases can come from various stages of a machine learning life cycle: • Biases may exist in the pre-training data -  e.g., the dataset to be used in machine learning training has biases in it • Biases may be introduced by the machine learning exercise  Below is a machine learning lifecycle char

Amazon CloudFront and Its Primary and Secondary Origins

Nowadays most of the online contents are web service based. A user accesses a specific content, for example a web page, by connecting their client end application (like a web browser) running on their desktop or mobile computing device to the server that hosts the web page. There are billions of websites and webpages. DNS services, online search engines and links on well known ‘portal’ websites help people to find and locate the contents they would like to visit.  A user’s web client device and the server it visits can be worldly apart: for example, let’s think of a user in Tasmania, Australia that uses its smart phone to visit a website hosted on a server that locates in a data centre in Montreal, Canada, which is more than sixteen thousand kilometres away. They are connected by the Internet. The Internet comprises of millions of inter-connected networks. Traffic on the Internet is routed between the users and the servers through those inter-connected networks in-between. Using the ab

AWS and Generative AI

Image
 Generative Artificial Intelligence, or Generative AI, GenAI, is one of the hottest topics in the recent months. You would ‘have to be living under a rock to not notice this massive topic’.  ‘Generative AI is a branch of AI that focuses on creating new data. It is a subset of machine learning. The goal of generative AI is to create new data that is similar to the data that was used to train the model.’ – texts generated by an AI tool.  Recently I experimented a couple of random text-to-image GenAI tools to observe how generative AI can create contents and how the data generated by different GenAI tools trained by different models is different. My experiment was straightforward: text-to-image GenAI tools ask you to provide some texts as a prompt, and then use this input to generate an image. So that was what I did. The prompt I supplied was: ‘female student graduation photo in front of a white building with flowers in hand in Paris’. Both GenAI tools I tested could output an image withi