Improve Output Quality When Streaming Log Data to Amazon QuickSight for Visualisation

Amazon QuickSight is a commonly used business intelligence tool set that helps wide range of users to comprehend data by visualisation. 

Log data from many sources, both from AWS services and non AWS services, can be streamed to Amazon QuickSight for data visualization, analytics, and for leveraging generative AI. This is a fairly typical use case of QuickSight. 

But the quality of the Amazon QuickSight outputs for log data is influenced, or decided, before QuickSight even receives the data. This is because, often, log data has so much dimensions in it, or there are just simply so many fields, which significance is not equal. Insignificant data fields will significantly reduce the quality of the QuickSight visualisation and its readability. Some times, you also want to strip sensitive data before sending the data set for visualisation. 

There are creative ways to improve the log data quality when streaming it to Amazon QuickSight using filtering and/or redacting. 

Below let’s discuss a case on improving the outcome quality for streaming AWS Web Application Firewall logs (which are a suitable example due to its wide applications and the comprehensiveness in its log data) to Amazon QuickSight.


Built-in Log Filtering

Log filtering can be done by specifying filter conditions and filtering out log data based on rule actions or labels generated by rules during evaluation. It can be used, for example, to filter out logs generated by green-light transactions so that more concerning logs can be visualised properly and not get drown by green light logs. 

Perform Rule Action filtering and set the action to the action to log only an action. Log filters go to the sections for that rule action of log entries and when the action is the defined one, the log filters add the entry to the log.

In a rule group’s scenario, when the action for a rule in a rule group is set to count, the logs for the request matching against this rule don't contain that action. Instead, the logs show this rule under the excludedRules field. On the other hand, when a rule with the non-terminating action Count is inspected along with a terminating rule action such as Allow or Block: then it will include these requests in logs filtered by the action. When the action for a rule in a rule group is set to Override to Count, the log contains a Count action in the nonTerminatingMatchingRules field. The log filters check this field, so the rule is filtered by the Count action.

A sample AWS Command Line Interface inputs for enabling filtering is like this:


"LoggingFilter": { 

    "DefaultBehavior": "string",

    "Filters": [ 

    { 

        "Behavior": "string",

        "Conditions": [ 

            { 

                "ActionCondition": { 

                "Action": "string"

                },

                "LabelNameCondition": { 

                "LabelName": "string"

                }

            }

        ],

        "Requirement": "string"

    }

]

}



Log Redaction


Log redaction involves redacting parts of the request that we want to mask in the logs. 

These fields can be redacted from the log records: URI path, Query string, Single header, and HTTP method. A REDACTED label can be used in the logs.


The command for performing redaction is like below:


    aws wafv2 put-logging-configuration --resource-arn <YOUR_WEB_ACL_ARN> --log-type <LOG_TYPE> --redacted-fields "<FIELD_1>",<FIELD_2>,<FIELD_3>... --logging-configuration <LOG_CONFIG_NAME>

     

--resource-arn: The ARN of your Web ACL.

--log-type: Specifies the type of logging. Can be "REQUEST_LOGS" for full request details.

--redacted-fields: A comma-separated list of fields to redact (e.g., --redacted-fields "Xxxxx","Yyyy").

--logging-configuration: (Optional) A name for the logging configuration. Can be any descriptive name you want to use.


Sending logs to an Amazon (Kinesis) Data Firehose delivery stream, performing Lambda Function based redaction and filtering


Using Amazon Data Firehose to stream data to QuickSight is a standard way to supply data to QuickSight for visualisation which itself does not need to be a topic in this blog. What can be discussed here is to use AWS Lambda Functions to redact and filter the data. Comparing to the above mentioned built-in means, using AWS Lambda Functions will provide further controls and flexibility on which data is filtered and redacted and what data is included in final log output. 

The first step is to setup an Amazon Data Firehose delivery arm.

Then configure a data transformer Lambda function. This Lambda Function will be triggered by each log entry into the Firehose stream. The function will analyse each log entry and remove or redact the information needing to be excluded. 

Redaction involves replacing the data with a set text, which can be "REDACTED". 

Filtering involves deleting the field containing the data. Such date might include cookies, personal information, API keys, or other data that should not be included in logs for reporting and visualisation.

The AWS Lambda Function converts the log entry into a usable data structure in the Lambda function.

Then it locates the fields containing data to be redacted or filtered.

This is followed by it performing redaction and / or filtering. 

Finally the Lambda Function returns the modified log entry back to the Firehose stream. 

An example Lambda Function code is like this:


import json


def lambda_handler(event, context):

    # Example: Redact the 'cookies' field

    for record in event['records']:

        payload = json.loads(record['data'].decode('utf-8'))


        # Redact the 'cookies' field if it exists

        if 'httpRequest' in payload and 'cookies' in payload['httpRequest']:

            payload['httpRequest']['cookies'] = "REDACTED"


        # Convert the modified payload back to JSON

        record['data'] = json.dumps(payload).encode('utf-8')


    return {

        'records': event['records']

    }


Once the Lambda Functions returned the redacted and filtered log back to the Firehose stream, then it is sent to Amazon QuickSight. 


Amazon Data Firehose to Amazon QuickSight


Amazon Data Firehose Data Stream leverages Amazon S3 as a destination. A general purpose bucket is suitable for this designated purpose and the bucket policy should allow the QuickSight’s access to the data in the bucket. 

A QuickSight account should be used where it is granted access to: IAM, as well as the S3 bucket that just mentioned above. 

Then within QuickSight Console a dataset is configured with the S3 bucket as the data source. Update the URIprefixes to match the files in the S3 bucket.

A manifest file is needed. Below is a sample:


// Sample manifest file


{

    "fileLocations": [

        {

            "URIPrefixes": [

                "s3://akenza-quicksight-testing/2025/"

            ]

        }w

    ],

    "globalUploadSettings": {

        "format": "JSON"

    }

}


Then on the QuickSight Analyses page create a New Analysis by using the dataset specified above. Continue on to create a new sheet. Then analysis is created and can be published to the dashboard.

Finally, here is a sample visualisation screenshot:




By using some of the means discussed above, the quality and efficiency of the Amazon QuickSight visualisation outputs will be improved as the data streamed into it becomes more meaningful and more digestible. Sensitive data is also filtered so there are data security enhancements as well. 


                            Simon Wang

 

Comments

Popular posts from this blog

AWS Storage Gateway File Gateway with S3 and FSx For Lustre with S3

Fairness Evaluation and Model Explainability In AI

Solving PII Data Security Problems in An AWS Machine Learning Use Case