eidorian

Integrate Dynamic Data in GenAI with Amazon Bedrock Agent and Lambda to Access APIs

2024-10-18T00:00:00+08:00

Background

A common use case we see in Generative AI applications is a chatbot solution with integration to a knowledge base like enterprise data to feed the AI additional information in responding to the chat. These knowledge bases are static data or unstructured data like documents stored in S3 where it is indexed regularly for updates to be searched by the foundation model. Patterns such as Retrieval-Augmented Generation (RAG), chunking of the documents, storage like vector databases (Amazon OpenSearch, Aurora PG, etc) are used. However, if we need to incorporate dynamic structured data like those stored in relational databases or data frequently accessed and updated, this kind of indexing pattern will not work as the updates can happen in real-time. This is where agents come in to assist the foundation model to perform tasks other than just language generation. In Amazon Bedrock, there is a Bedrock Agents feature that serves this purpose. Your Bedrock foundation model can be well integrated in your other workload such as API via Lambda functions to access your dynamic data. We will explore this pattern by showing a simple example of a Bedrock Agent accessing a Payment API through a Lambda function.

Architecture Diagram of Bedrock Agent Demo Integrating with Lambda

The demo we will build is a simple flow using the Bedrock console to test. We will just use two services - Bedrock and Lambda. In Bedrock we will create the Agent to orchestrate the flow and set the foundation model to be used. In Lambda, we will create a function that serves as the Payment API to represent the dynamic data we want to integrate.

Create a Bedrock Agent

In the Amazon Bedrock console, navigate to the Builder tools and Agents menu on the left then click the Create Agent button. Bedrock Create Agent console

It will prompt you to key in the Agent name and description. The Agent we will create will access the Payment API so we will name it as agent-payment-api. You can also accept the generated default name for quick prototyping if you just want to test it out.

Enter the Agent name and description

Next, you will be in the Agent builder page where you can put in more details to the Agent like API schema, permissions and prompts. This will be our main page when editing, saving the configuration and testing the Agent. You can always refer back to this page when you get lost in the console which happened to me at first when figuring out how to navigate the UI.

Just below the Agent name is the setting for the service role, choose the default setting which is to create a new service role for this Agent. A service role defines the permissions of what this Agent can do against your other AWS resources. Create a new service role for the Agent

Select the Foundation Model and Prompt

Then select the model. The model you choose will be the model used by the Agent when handling tasks. In our case, we will use the Anthropic Claude 3 Haiku model.

Make sure that your AWS account has access to at least one foundation model. You can refer back to the navigation on the left under Bedrock Configurations - Model access. Request for access if you don’t have access to any available model.

Next is probably the most important setting for the Agent which is providing instructions to it. This is essentially prompting the model on what to do to perform its task well. The more specific and clear the instructions we give, the better results we will have. You can experiment on this to achieve a better result. Also look at the specific foundation model documentation you are using on how to optimize the prompt.

Select the foundation model and provide a detailed instruction

Let’s try with an instruction that tells the Agent its role, its objective, and guide it on what it needs to do when given a few parameters so it knows how to process them, which API to invoke, and what it needs to do with the response payload. Since our backend API is a Payment API, we will tell the Agent to act like a financial manager that manages payment transactions of customers. We will give it an objective so it knows what will be its output and what are the available actions it can do. Then, we list down the different actions. For simplicity, we only have the retrieval part of the API, so we define it by adding some description on what the Agent needs to do when asked to retrieve a transaction given a transaction ID. Yes, it is quite specific so that we can have a more predictable result. Simply put, we are telling the Agent that if someone asks it to get the details of a payment transaction given its ID, it needs to look for a retrieve or get payment API, invoke it, and summarize the results based on the data it gets from the response.

The following is the full text of the instructions. Again, you can experiment and tweak this prompt to suit your APIs. It all depends on your objectives for creating the Agent in the first place.

Role: You are a financial manager responsible to managing the payment transactions of your customers.

Objective: Assist in payment transaction analysis by creating, updating, retrieving and deleting their payment transactions.

Payment Transaction Creation:

Payment Transaction Update:

Payment Transaction Retrieval:

Retrieve Payment Transaction: When a payment transaction id is provided, retrieve the payment transaction and provide a summary.

Payment Transaction Deletion:

Define the API actions

Our next step is to define the actions the Agent can perform. In the Action groups section click Add to create a new action group. Let’s call it action-group-payment-transactions.

Create Action Group

In the Action group type choose Define with API Schemas. With this option, we will specify a Lambda function that hosts our Payment API.

Scroll down to the Action group schema section and select Define via in-line schema editor. In the text box that follows, paste the OpenAPI schema of our Payment API. The full JSON format can be found in the GitHub source here.

Define the API schema using OpenAPI format

The point of providing the schema is to give the Agent as much information about the APIs, what paths are available, what are the parameters required, their data types, the response parameters, etc. The foundation model during its orchestration, analyzes the available actions and its corresponding invocations based on the configurations we set here.

If you look at the schema, it tells about the paths available. To get a payment transaction by ID, the Agent must construct an API call using the path /getTransaction, method post, and provide a query parameter transactionId of type int. Internally, that’s what the Agent does with the help of the foundation model to figure these things out. It acts like a client invoking your Payment API.

    "paths": {
      "/getTransaction/": {
        "post": {
          "description": "Get payment transaction by id",
          "parameters": [
            {
              "name": "transactionId",
              "in": "query",
              "description": "Payment transaction identifier",
              "required": true,
              "schema": {
                "type": "integer",
                "format": "int32"
              }
            }
          ],

OpenAPI schema snippet of the get transaction request

    "components": {
      "schemas": {
        "PaymentTransactionData": {
          "type": "object",
          "description": "Single payment transaction data",
          "properties": {
            "transactionId": {
              "type": "integer",
              "description": "Payment transaction identifier"
            },
            "amount": {
              "type": "number",
              "description": "Price of the payment transaction"
            },
            "product": {
              "type": "string",
              "description": "Description of the product purchased for this payment transaction"
            },
            "quantity": {
              "type": "number",
              "description": "Number of items purchased in this payment transaction"
            },
            "date": {
              "type": "string",
              "description": "Date of this payment transaction"
            }
          }
        }
      }
    }
  }

OpenAPI schema snippet of the Payment Transaction Data

In terms of response, it receives a PaymentTransactionData payload with fields transactionId, amount, product, quantity and date and their respective descriptions. By making the schema very descriptive, it helps the Agent understand your data and creates a meaningful response.

Build the Lambda function API

The next step is quite straightforward which is to create the API that accesses the dynamic data. In the Action group invocation, choose the Select an existing Lambda function.

Select the Lambda function to invoke for this action group

We don’t have the Lambda function yet so open the Lambda console in a new tab. In the Lambda console, click Create function.

Create the Lambda function hosting the Payment API

We will create the function from scratch. Use the name payment-transaction-api. Select Python 3.11 as the runtime and arm64 as the architecture. Click Create function.

In the code section, paste the full source code of the Lambda handler in the lambda_function.py file. Then click Deploy.

Paste the code of the Payment API

We will not be using a database source to retrieve the dynamic data. The data we will use for testing is hard coded in the Lambda function itself.

payment_transactions = [
    {"transactionId": 1, "amount": 2.00, "product": "coffee", "quantity": 1, "date": "10-03-2024"},
    {"transactionId": 2, "amount": 1.50, "product": "tea", "quantity": 3, "date": "11-03-2024"},
    {"transactionId": 3, "amount": 3.00, "product": "biscuits", "quantity": 1, "date": "11-03-2024"},
    {"transactionId": 4, "amount": 6.00, "product": "chips", "quantity": 2, "date": "03-04-2024"},
    {"transactionId": 5, "amount": 15.00, "product": "cake", "quantity": 1, "date": "12-04-2024"},
    {"transactionId": 6, "amount": 6.00, "product": "cookies", "quantity": 3, "date": "19-04-2024"},
    {"transactionId": 7, "amount": 17.00, "product": "pizza", "quantity": 1, "date": "30-04-2024"},
    {"transactionId": 8, "amount": 12.00, "product": "sandwich", "quantity": 1, "date": "01-05-2024"},
    {"transactionId": 9, "amount": 22.00, "product": "burger", "quantity": 1, "date": "03-05-2024"},
    {"transactionId": 10, "amount": 10.00, "product": "fries", "quantity": 2, "date": "04-05-2024"},
    {"transactionId": 11, "amount": 9.50, "product": "noodles", "quantity": 1, "date": "10-05-2024"},
    {"transactionId": 12, "amount": 16.80, "product": "pasta", "quantity": 4, "date": "14-05-2024"}
]

Test data of payment transactions

Fun fact: I also used GenAI to generate these dummy data with the help of Amazon CodeWhisperer enabled in my IDE.

Let’s walk through the Python code. It’s pretty much a standard Python Lambda function code with the lambda_handler as the entry point. However, your handler must be able to follow the request and response payload format the Bedrock Agent will send and receive, respectively. There’s an input event from Amazon Bedrock that serves as the Lambda input. You can find more details here. In our Payment API, the key parameters we need are the apiPath to determine which operation to process, and the transactionId in the parameters array. Then in constructing the response include the body in the responseBody field. Our example is simple so we only need these, but you can also explore the other parameters like contextual attributes to pass across sessions and prompts such as sessionAttributes and promptSessionAttributes.

Before you proceed back to the Bedrock console, make sure you have deployed the Lambda function. Click Deploy in the code tab.

Remember the full source code is available in GitHub here.

Now that we have created and deployed the Lambda function, go back to the Bedrock console. In the Action group invocation section select the Lambda function payment-transaction-api. If you can’t find it, click the refresh icon to refresh the list.

Select the payment-transaction-api function

Finally, at the bottom of the Action group details page, click Save and exit. You will return to the Agent builder page, click Save there as well. A prompt will tell you to prepare the Agent so that its details are up to date.

Prompt to prepare the Agent to keep it up to date before testing

On the right there’s a Test agent pane, click the Prepare button to update the Agent.

Prepare the Agent in the Test console

Test the Agent

In the Test console, try asking the agent with a prompt like “Give me a summary of the payment details in transaction id 3.”.

Test Agent with permission error

You will see an error that says Access denied when invoking the Lambda function…. Right, we didn’t give permission for the Agent to invoke our Lambda function.

Setup the Agent permissions to invoke Lambda

One of the usual errors you will face when integrating different services in AWS is access permissions. Here we are integrating our Lambda function with the Bedrock Agent. One way to do this is to update the Lambda function’s Resource Policy to allow the Agent access to it. You can also refer to the documentation here.

Go back to the Lambda console and edit the payment-transaction-api function. Go to the Configurations tab, click Permissions on the left menu.

Lambda Permissions under the Configurations tab

Find the Resource-based policy statements section and click Add permissions. Add a Resource Based permission in Lambda

Add a new policy in the Lambda function to allow the service bedrock.amazonaws.com as the Principal and the specific Bedrock Agent as the Source Arn. You need to grab the ARN of the Agent we just created. Go back to the Bedrock Console, under Agents open the agent-payment-api and in the Agent Overview section you will see the Agent ARN. Its format is something like arn:aws:bedrock:[region]:[accountId]:agent/[agent-id]. Then in the Action field choose lambda:InvokeFunction. Click Save.

Edit the Lambda policy to allow the Bedrock Agent

Test the Agent again with the right permissions

So let’s test again the Agent in the Bedrock Console and this time it has the proper permissions to access the Lambda function.

Let’s try asking in the prompt.

Give me a summary of the payment details in transaction id 3.

Test Prompt 1 - Payment details of Transaction Id 3

Refer back to our test data to verify. Looks like it was able to retrieve one biscuit at $3 which is the right information for the transaction ID 3.

This time let’s try asking just for the product purchased.

What product was purchased in transaction id 10?

Test Prompt 2 - What product was purchased in Transaction Id 10

Fries is correct.

How about asking for multiple transactions?

List the products purchased for transaction ids 1, 2 and 3?

Test Prompt 3 - List products purchased for transaction IDs 1, 2, and 3

Great! It was able to invoke multiple times the API.

If you notice there’s a Show trace option in the console. You can click that to see the flow of orchestration of the Agent. What payload it uses to invoke the Lambda function, how many times it invokes it, etc. It is also useful when troubleshooting.

Show trace to see the steps of the Agent orchestration

Pricing

In terms of pricing, the Bedrock Agent itself does not incur additional cost. You are only charged for the models used which in our case is the Claude Haiku 3.

When using Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, you are only charged for the models and the vector databases you use with these capabilities.

Refer also to the Bedrock Pricing and the Lambda Pricing.

Clean up

Make sure to clean up the resources to avoid further costs. Delete the Lambda function payment-transaction-api. Then delete the Agent agent-payment-api.

Next steps

In this demo we only tried one operation which is retrieval of dynamic data. You can try to expand this example and add the other API operations such as create, delete and update of the payment transactions. Make sure to define them well in the schema including the required parameters so the Agent will know which operation to choose during its orchestration of prompts from the user. You can also try being creative in the prompts and see how the Agent handles the API invocation.

Summary

In using Bedrock Agents, we can enrich our GenAI applications with dynamic content in real-time that can only be accessed programmatically through APIs. Agents can be easily created in the console by providing clear and specific instructions and a defined API schema to point it to the right Lambda functions. Also make sure that the right permissions are created to make the integration successful. With the Lambda function interaction with Bedrock through the Bedrock Agent, our AI assistant can have a wide range of possibilities. Imagine executing other external APIs, integrating with other systems, and letting the Agent do multiple stages of actions. Your foundation model can now interact with your APIs and dynamic data through these agents and make your GenAI applications do tasks for you.

Build an EKS Private Cluster in Isolated Subnets with CDK

2024-09-13T00:00:00+08:00

Challenges

A common requirement of customers especially those in highly regulated industries like banks or healthcare when building their application in the cloud is to deploy it in an isolated environment or no internet access. This is to add extra protection to their workload and prevent its data from leaking out. In AWS, this is done by deploying in isolated subnets that have no Internet Gateways attached to the VPC or no proxies that connect to the internet. If your workload needs access to the AWS services you will then need to add the respective VPC Endpoints or AWS PrivateLink.

Adding the AWS PrivateLink is all good if you know what service endpoints you need to add but if an AWS service, for example Amazon EKS requires dependencies on other services that you are not aware of, then access to those services will be blocked until you add their VPC endpoints thus causing a cluster creation to fail.

Another obstacle is that when you use an IaC like the AWS Cloud Development Kit or CDK to build your infrastructure, its Constructs sometimes abstract the underlying implementations and you are not aware of what other AWS services they use. In this post, I will list down the services that need VPC endpoints when creating an EKS private cluster in isolated subnets using AWS CDK.

Architecture Overview

Architecture diagram of a private Kubernetes cluster in EKS on isolated subnets with the required VPC endpoints

EKS Private Cluster Creation

An EKS cluster is a Kubernetes cluster managed by AWS. When you use CDK to create the cluster, you can use constructs such as Cluster which is part of the package software.amazon.awscdk.services.eks in the Amazon EKS Construct Library in Java. Other languages are also supported, refer to the CDK documentation. To build the cluster, you call the class method Cluster.Builder.create(). followed by a bunch of configuration methods. Here’s an example.

private void createEksCluster(Role clusterAdmin) {
    this.cluster =
        Cluster.Builder.create(this, "eks")
            .vpc(vpc)
            .version(KubernetesVersion.V1_28)
            .vpcSubnets(
                List.of(SubnetSelection.builder().subnetType(SubnetType.PRIVATE_ISOLATED).build()))
            .endpointAccess(EndpointAccess.PRIVATE)
            .clusterName("eks-private")
            .kubectlLayer(new KubectlLayer(this, "kubectl-layer"))
            .defaultCapacity(0)
            .mastersRole(clusterAdmin)
            .placeClusterHandlerInVpc(true)
            .clusterHandlerEnvironment(Map.of("AWS_STS_REGIONAL_ENDPOINTS", "regional"))
            .kubectlEnvironment(Map.of("AWS_STS_REGIONAL_ENDPOINTS", "regional"))
            .outputClusterName(true)
            .outputConfigCommand(true)
            .outputMastersRoleArn(true)
            .build();

The full code in Java CDK is available in GitHub aws-samples.

If you look closely on the configurations, we are creating private isolated subnets and the Kubernetes access endpoint as private with the method calls to .subnetType(SubnetType.PRIVATE_ISOLATED) and .endpointAccess(EndpointAccess.PRIVATE), respectively. This ensures that the Kubernetes cluster has no internet access.

VPC Endpoint Dependencies

Now, the CDK construct will call other AWS services and assumes they are accessible. But since we told CDK to create it in private isolated subnets, you need to ensure that the respective VPC endpoints are created to provide access to these other services.

When creating a VPC endpoint you specify the service name. An EKS cluster obviously requires access to the EKS service which has the service name com.amazonaws.[region].eks where region is the AWS Region where it is deployed for example com.amazonaws.ap-southeast-1.eks. Amazon ECR is also needed. That is where the container images are pulled from. The ECR endpoint service name is com.amazonaws.[region].ecr.api and com.amazonaws.[region].ecr.dkr. These services including CDK also use Amazon S3 so an endpoint to it must be created. For S3 it is com.amazonaws.[region].s3.

When the EKS cluster scales, it creates or terminates EC2 worker node instances. This means it needs access to the EC2 service so we need to add com.amazonaws.[region].ec2. In our example, we also need EC2 to run the kubectl client. EKS also needs the AWS Security Token Service to manage the authentication of Kubernetes users, pods and services. So we also need com.amazonaws.[region].sts. For observability, Amazon CloudWatch(https://docs.aws.amazon.com/cloudwatch/) service needs to be accessed too. This is in com.amazonaws.[region].logs and com.amazonaws.[region].monitoring endpoints.

AWS recommends using AWS Systems Manager or SSM to manage the EC2 instances or EKS worker nodes. We need three endpoints to make SSM work. These are com.amazonaws.[region].ec2messages, com.amazonaws.[region].ssm and com.amazonaws.[region].ssmmessages. You can refer here for details on how these endpoints are used by SSM.

Lastly, CDK and its constructs use AWS Lambda cluster handler functions and AWS Step Functions to manage the creation and monitoring of the EKS cluster so VPC endpoints to these services are also required. For Lambda it is com.amazonaws.[region].lambda and for Step Functions you need com.amazonaws.[region].states and com.amazonaws.[region].sync-states.

List of VPC Endpoints

The following is a list of VPC endpoints required to create an EKS cluster in isolated subnets using CDK. For the full service names, append each endpoint with com.amazonaws.[region]..

S3 - s3
ECR - ecr.api, ecr.dkr
EC2 - ec2
EKS - eks
Security Token Service - sts
Cloudwatch - logs, monitoring
Systems Manager - ec2messages, ssm, ssmmessages
Lambda - lambda
Step Functions - states, sync-states

For reference, here’s the full list of AWS PrivateLink endpoint service names.

Once you have created all the above mentioned endpoints in your isolated subnets, you can try the CDK Construct to build the EKS Cluster. I have pushed the example and the full code on GitHub and is available in the aws-samples repository. Please refer to the README page that has the step-by-step approach to deploy and test the cluster.

This example is also referenced in the official AWS CDK documentation on how to create an EKS cluster in isolated subnets under the Amazon EKS Construct Library. Search for the keyword Isolated where a note is created to refer to our example.

Updating the metadata of video timestamps using exiftool

2024-01-02T00:00:00+08:00

I have a Sony a7c and Sony RX 100 M3. Both of these cameras when recording videos seem to disregard the timezone settings and just use UTC time. Whenever I import these videos to digikam which is the software I use to manage my photos, the sorting gets messed up as the times of the videos are not synced with the photos which has the correct local time with timezone.

In digikam there is a setting to adjust the date and time but it does not seem to edit the video metadata itself. I’m using digikam version 8.0.0 on Linux Mint 20. It does update only locally in digikam. If you upload your videos to other software or to the cloud, the original incorrect timestamp is retained. So here are some commands to use in exiftool to modify the metadata directly on the video file.

Check all the available metadata with date in its name

exiftool myvideo.mp4 | grep -i date

Sample output

File Modification Date/Time     : 2024:01:02 18:47:47+08:00
File Access Date/Time           : 2024:01:02 18:47:45+08:00
File Inode Change Date/Time     : 2024:01:02 18:47:58+08:00
Create Date                     : 2023:06:17 03:30:04
Modify Date                     : 2023:06:17 03:30:04
Track Create Date               : 2023:06:17 03:30:04
Track Modify Date               : 2023:06:17 03:30:04
Media Create Date               : 2023:06:17 03:30:04
Media Modify Date               : 2023:06:17 03:30:04
Last Update                     : 2023:06:17 11:30:04+08:00
Creation Date Value             : 2023:06:17 11:30:04+08:00

If there’s one metadata field that has the correct timestamp based on how you know when the video was taken or based on the generated thumbnail photo of that video, then use that field to copy to the other timestamp fields. In Sony a7c, the Creation Date Value seems to be the correct field, so we use that to copy over its value to the other timestmap fields. The fields to update are the Create Date, Modify Date, Track Create Date, Track Modify Date, Media Create Date, and Media Modify Date.

To update all timestamp metadata of all mp4 files in the current directory to use the Creation Date Value

exiftool '-mediacreatedate

In the exiftool command above, the less than sign means it will copy over the value of the field on the right to the field on the left. The dash - indicates the field to use and the -ext means the extension file to update. Then the . of course refers to the current directory.

Lastly, exiftool generates a backup with _original suffix. After you’ve verified the metadata update, you can delete this backup with the -delete_original option.

To delete the _original copies

exiftool -delete_original -ext mp4 .

Learn more about exiftool here.

Hosting S3 Static Website using CloudFront with OAI

2020-02-08T00:00:00+08:00

An unsecure website is not acceptable these days. If you’re hosting your website using AWS S3 bucket’s static website hosting attribute, its one limitation is that your pages are hosted using http only and browsers will report this as Not Secure. This does not give a good impression to your visitors.

Another security compromise that you have to make, and the more critical one, is that you need set your S3 bucket publicly readable. By default, this is not recommended by AWS. More and more security breaches are happening due to wrongly configured permissions of S3 buckets.

So how do we solve this? Use CloudFront with Object Access Identity or (OAI).

CloudFront is the AWS CDN solution where you can target your private S3 bucket as the origin using OAI. This will be the identity defined in your S3’s bucket policy to grant permission only to the CloudFront distribution and nobody else.

CloudFront also ensures the data in transit are in https and secure.

Here’s the overview of the set-up using CloudFront.

Architecture Overview

Route 53 resolves the domain name to the target CloudFront distribution. For example, in this website, code.eidorian.com is registered in Route 53 and it resolves it to the target alias d123456.cloudfront.com which is the CloudFront distribution.

The user’s browser downloads the website’s CloudFront distribution. If there’s a cache hit, the distribution returns the object immediately.

The SSL certificate is managed in Amazon ACM and configured in CloudFront during the creation of the distribution.

The CloudFront distribution is replicated across all edge locations of AWS.

If extra logic handling is needed, a Lambda@Edge can be deployed on the edge locations to do additional processing.

The private S3 bucket containing the static website is accessed by the CloudFront distribution via the granted permission given to its OAI in the S3’s bucket policy.

The requested object is returned to the distribution.

Pre-requisites

Before creating the CloudFront distribution, ensure that you have the following items ready.

An S3 bucket with the static content of the website.
A registered domain name.
An SSL certificate in AWS Certificate Manager or ACM.

Set-up the S3 bucket static content

The S3 bucket contains the website. Have something like index.html at least for testing and an error page like error.html

In my case, I am using Jekyll which is a static website generator. It has an index.html and a 404.html error page. We will be using that in this example.

Register a domain name

You can use Route 53 or some other domain name registrar to register your domain.

Create an SSL certificate in AWS Certificate Manager

If you don’t have a certificate yet, create one for your registered domain name in ACM.

Important: Create the certificate in the us-east-1 N. Virginia region. CloudFront will only see the certificates in this region.

Make sure that all the CNAMEs that you will use in CloudFront are also included in the certificate. Here I’m adding both eidorian.com and code.eidorian.com.

Then wait for the validation status to be Success.

If it’s Pending for quite a while, check the details. It may be waiting for an action from you like adding a record to Route 53.

Take note of the certificate’s ARN. You will need it later in the parameters section.

Create the CloudFront distribution using CloudFormation

Okay, so now that we have all the pre-reqs ready, let’s create the CloudFront distribution. It’s not very exciting to use the AWS console, let’s do it the CloudFormation way!

I have prepared a re-usable template below with three input Parameters. These are the three pre-requisites mentioned above - bucket name, SSL cert, and the CNAMEs.

CloudFormation Template

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  BucketName:
    Description: S3 Bucket name
    Type: String
  SSLCert:
    Description: ACM certificate arn
    Type: String
  DomainNames:
    Description: Domain names or CNAMEs
    Type: CommaDelimitedList
Resources:
  WebsiteDistribution:
    Type: AWS::CloudFront::Distribution
    Properties:
      DistributionConfig:
        Aliases: !Ref DomainNames
        Origins:
        - DomainName: !Join ['', [!Ref BucketName, '.s3.amazonaws.com']]
          Id: !Join ['', [!Ref BucketName, 'S3OriginId']]
          S3OriginConfig:
            OriginAccessIdentity: !Join ['', ['origin-access-identity/cloudfront/', !Ref CloudFrontOAI]]
        Enabled: 'true'
        Comment: !Join ['', ['CloudFront for S3 bucket ', !Ref BucketName]]
        DefaultRootObject: index.html
        CustomErrorResponses:
          - ErrorCode: 404
            ResponseCode: 200
            ResponsePagePath: /404.html
          - ErrorCode: 403
            ResponseCode: 200
            ResponsePagePath: /404.html
        DefaultCacheBehavior:
          AllowedMethods:
          - GET
          - HEAD
          TargetOriginId: !Join ['', [!Ref BucketName, 'S3OriginId']]
          ForwardedValues:
            QueryString: 'false'
            Cookies:
              Forward: none
          ViewerProtocolPolicy: redirect-to-https
        ViewerCertificate:
          AcmCertificateArn: !Ref SSLCert
          MinimumProtocolVersion: TLSv1
          SslSupportMethod: sni-only
  CloudFrontOAI:
    Type: AWS::CloudFront::CloudFrontOriginAccessIdentity
    Properties:
      CloudFrontOriginAccessIdentityConfig:
        Comment: !Join ['', [!Ref BucketName, '-origin-access-identity']]

In the Resources section, we have two types. One is the CloudFront distribution AWS::CloudFront::Distribution and the other one is the OAI AWS::CloudFront::CloudFrontOriginAccessIdentity.

AWS::CloudFront::Distribution

In the CloudFront distribution resource, we map the parameter values to the distribution properties.

Property	Parameter	Example
Aliases	DomainNames	eidorian.com,code.eidorian.com
DomainName	BucketName	mybucket.s3.amazonaws.com
AcmCertificateArn	SSLCert	arn:aws:acm:us-east-1:youraccount:certificate/1234
OriginAccessIdentity	via Reference	CloudFrontOAI

The Aliases property sets the CNAMEs of the distribution. This is required later when setting the Route53 record to target the distribution alias. In the DomainNames parameter, put your registered domain name including the alternate names.
The DomainName property is the target origin domain name of the distribution where the CloudFront will get its content. In this case, it is the S3 bucket containing the website. The CloudFormation template uses the BucketName parameter to set this property by concatenating the bucket name with the .s3.amazonaws.com suffix. This suffix is the AWS domain name for S3 buckets.
The AcmCertificateArn property tells CloudFront which SSL certificate to use. Here the parameter SSLCert defines this with the ARN string of the certificate in ACM.

Double-check your cert ARN, it should be in the us-east-1 region.
The OriginAccessIdentity property is the key property here that tells CloudFront which ID to use when accessing the origin (the S3 bucket). There is no parameter passed to this since we do not know yet the OAI prior to the CloudFormation stack creation. To get a hold of the reference of the OAI, use the OAI Resource’s name as reference which is CloudFrontOAI and it requires a prefix of origin-access-identity/cloudfront/.

For the other properties of the distribution, you can look them up here for details. But briefly, what we configured here is that CloudFront will default to index.html in the root folder. If the S3 origin returns 404 Not Found or 403 Forbidden, CloudFront will display the error page 404.html and remap the response to HTTP 200.

For convenience, some of the properties like IDs and comments are set by the template automatically using the bucket name. For example, the Origin ID is set to {BucketName}S3OriginId. You can change this string value if you want.

AWS::CloudFront::CloudFrontOriginAccessIdentity

This is the OAI resource that creates the Origin Access Identity with the name CloudFrontOAI. It simply creates the OAI and assigns a comment for description purpose.

If you already have an existing OAI and want to re-use it, you can just pass it’s ID as a parameter to set the OriginAccessIdentity. You won’t need the OAI resource in the template.

CloudFormation JSON property file

We can pass the parameter values to the template via command line option, AWS console, or using a property file. We will use the last one to create the CloudFormation stack.

Here’s a sample property file of the parameters and their values.

[
    {
        "ParameterKey": "BucketName",
        "ParameterValue": "code.eidorian.com"
    },
    {
        "ParameterKey": "SSLCert",
        "ParameterValue": "arn:aws:acm:us-east-1:youraccount:certificate/11111111-1111-1111-1111-111111111111"
    },
    {
        "ParameterKey": "DomainNames",
        "ParameterValue": "eidorian.com,code.eidorian.com"
    }
]

Executing the CloudFormation template using AWS CLI

Alright. We are all set.

Open a terminal and run the AWS CLI to create the stack.

In the sample commands, the template file name is cloudfront-s3-origin.yaml and the property file name is code-eidorian-com-properties.json

Create stack

aws cloudformation create-stack --stack-name cloudfront-s3-code-eidorian-com \
--template-body file://./cloudfront-s3-origin.yaml \
--parameters file://./code-eidorian-com-properties.json

Delete stack

If something goes wrong and your stack rolls back, delete the stack and re-create again.

aws cloudformation delete-stack --stack-name cloudfront-s3-code-eidorian-com

Update stack

If you update some of the properties in the template, simply update the stack.

aws cloudformation update-stack --stack-name cloudfront-s3-code-eidorian-com \
--template-body file://./cloudfront-s3-origin.yaml \
--parameters file://./code-eidorian-com-properties.json

The CloudFront distribution creation could take several minutes (~30 mins) to complete. The reason for this is it that it updates all the edge locations and distributes your website content. Even the delete and update stack could take the same amount of time.

Wait for your distribution status until it says Deployed. Then go to the distribution and verify the settings. Hopefully everything went well and your CloudFront distribution was created successfully with all the correct properties in place.

Verify the CloudFront distribution

Open your distribution and look at the tabs.

General tab

In the general tab you will see the CNAMEs you put in the Alias property, the SSL certificate ARN and a link to it, the index.html as the default root object, the sni-only in the SSL supported method, the minimum protocol TLSv1 and the comment set by the template.

Origins tab

In the origins tab is where you will find the OAI, the Origin ID we gave and the S3 origin domain name.

Behaviors tab

The DefaultCacheBehavior property values can be seen in the behaviors tab.

If you edit the behavior item, you will find more settings including the GET and HEAD methods we set in the template.

Error pages tab

Lastly, in the error pages tab, where we set the 401.html page as the default error page for errors 404 and 403 can be verified here.

Update the S3’s bucket policy

Now that the CloudFront distribution has been created and verified, there’s just one last thing you need to do before testing it out. Tell the S3 bucket to allow the OAI to access its content. Here’s the part where you update the S3 bucket’s policy and make it private allowing only the OAI arn as the Principal to access the bucket and no one else.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Allow-OAI-Access-To-Bucket",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E1111111111111"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::s3bucket/*"
        }
    ]
}

Then, you can now safely make your S3 private by setting the S3 static website to disabled

and blocking all public access.

Test your new secure website

That’s all folks. Now try and hit your website using https. The browser should now say it is secure. If you try an invalid path or page, you should see the default error page. If you try going to your S3 bucket’s direct url like the index.html S3 url, the access will be denied.

Final thoughts

A lot steps here and I tried to explain as much detail as I can but hopefully this is helpful especially the re-usable CloudFormation template. I removed the Lambda@Edge part since it is optional and this is getting long. I will talk about it more on my next post. Let me know your comments below.

Serverless Webhooks using AWS Lambda - Part 4

2020-01-12T00:00:00+08:00

This is the fourth and last part of my Serverless Webhooks post. You can find Part 3 here where we built the processor application, Part 2 here where we integrated SQS and Part 1 here where we built the Lambda function handler.

Let’s review again the architecture.

Architecture

In the previous post, we already built the poc-data-processor application and tested locally. It can read the SQS poc-data-feed-queue and update the DynamoDB table poc-data-feed. Our final step is to run this application on AWS. That is we will build the Docker image, push it to ECR, deploy to an ECS container and test in on AWS.

Build the Docker image and push to ECR

Since this is a Spring Boot application, in the Dockerfile we will use the openjdk:8-jdk-alpine image.

FROM openjdk:8-jdk-alpine
VOLUME /tmp
ARG JAR_FILE
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java", "-Djava.security.egd=file:/dev/./urandom", "-jar", "/app.jar"]

Build the application

mvn clean package

Build the Docker image

mvn dockerfile:build

Tag the container to your AWS account’s ECR

docker tag adr1/poc-data-processor:latest myaccount.dkr.ecr.myregion.amazonaws.com/poc-data-processor:latest

Your image name may vary. Check your pom.xml if you have a different image prefix. Here I’m using adr1.

    
        adr1
    
        com.spotify
        dockerfile-maven-plugin
        1.3.6
        
            ${docker.image.prefix}/${project.artifactId}
            
                target/${project.build.finalName}.jar

$(aws ecr get-login --no-include-email --region ap-southeast-1)

Push the image to your AWS account’s ECR

docker push myaccount.dkr.ecr.myregion.amazonaws.com/poc-data-processor:latest

The Docker image is now in the registry. Next is to configure ECS to run this image.

Configure ECS

In configuring the ECS container, we will create three things - a cluster, a task definition and a service.

The task definition poc-data-processor-task defines how to launch the Docker image poc-data-processor that we just pushed into the ECR. The service poc-data-processor-service manages the workload of running this task. Then the cluster poc-data-processor-cluster which we will define to use the Fargate launch type will take care of the EC2 instances to run these tasks internally.

Create the cluster

In ECS, go to Clusters and Create Cluster. We will use Fargate to simplify our setup without worrying much on managing the infrastructure and server details.

Name the cluster as poc-data-processor-cluster and create a new VPC for it. You can use an existing VPC but it is better to separate this PoC setup to isolate it. Later, it would be easier to clean-up too when we tear down this cluster.

Click Create and view the cluster details after creation. Here you will see the networking details that was set-up during the cluster creation like vpc, subnet, internet gateway, etc. You need not worry about these things as Fargate is supposed to take care of these things for you.

Create the task definition

Go to Task Definitions and click Create new Task Definition and choose Fargate.

Take note of the Task Role ecsTaskExecutionRole. We will modify this later to give access to DynamoDB.

For task size, 2GB memory and 1vCPU should be sufficient for our Spring Boot application.

In Container Definitions, click Add container. This is where we define our container and where to get the image. Enter here the image location in the ECR. For the memory, set at least 300MiB that is required by our application.

Leave the rest of the configuration to default. Make sure the Log configuration is ticked so we can monitor in CloudWatch our application.

In Fargate, we wont have access to login to the EC2 server to troubleshoot. So having our application log sent to CloudWatch is important.

After creating the Task Definition, go to IAM and modify the task definition’s role ecsTaskExecutionRole. Add an inline policy to the task process to be able to access DynamoDB. Remember our application needs to read and update the table poc-data-feed.

The inline policy to add is similar to the policy we gave access to our Lambda handler poc-data-feed-handler. You can refer to the first post here and copy the policy from AWSLambdaBasicExecutionRole.

Create the service

Finally, we will create the service poc-data-processor-service to manage running the task definition.

Go back the the cluster poc-data-processor-cluster. In the Services tab, click Create. Here we will specify the task definition and cluster we just created.

The Number of tasks tells Fargate how many tasks instances it will run for this service. For this PoC, we will just specify one. Later, to stop the application, we can set update the service and set this to zero.

Choose the VPC previously created for this and leave the rest to default.

Disable the features we do not need. Set the Load balancer type to None, Service Discovery to disabled, and Auto-scaling to off.

Review and create the service.

It will provision the service and launch the task. Wait for a while until its status is RUNNING.

Full test end-to-end

Similar to the previous posts, we will test this via Postman and send a sample payload.

Go to CloudWatch and Log groups and open /ecs/poc-data-processor-task.

Here you will see similar application logs we did when we tested locally in the previous post. The message is received from SQS, data payload is retrieved from DynamoDB with PENDING status, the data is processed by our task which is running the Docker image data-processor-application and updates the item status to COMPLETED.

You can also verify in DynamoDB the actual item is updated.

That’s it for our ECS setup. Our Docker data processor application is now running on the cloud.

Clean-up

Before we conclude, make sure to tear down the ECS set-up to not incur further costs.

Update the service poc-data-processor-service and change the Number of tasks to 0 (zero). This will stop the running task. After that, delete the cluster.

Summary

This completes our entire Serverless Webhook architecture in AWS. It has been a long series of posts. Hopefully we got a basic understanding of how to set-up a serverless webhook using API Gateway, Lambda, SQS, ECS, and DynamoDB. With this kind of set-up, we can have a cheap data receiver that only runs when data is available. It is easily expandable too by adding more Lambda functions to handle different data sets. If the data is large and requires longer processing time, we have a container task instance to do the heavy workload processing.

Serverless Webhooks using AWS Lambda - Part 3

2020-01-02T00:00:00+08:00

This is the third part of my Serverless Webhooks post. You can find Part 2 here where we integrated SQS and Part 1 here where we built the Lambda function handler.

This post is Part 3 of our serverless webhook system. Let’s look back at the architecture. So far we have the Lambda function handler poc-data-feed-handler as the webhook that receives the data. It then stores the raw data to the DynamoDB table poc-data-feed with the generated unique id txn_id. This unique id is also passed to a SQS message queue poc-data-feed-queue waiting to be processed.

What’s next is for us to build the poc-data-processor application to get the txn_id in the queue, get the corresponding data from the table, process and update it.

Architecture

Create the data processor application

Let’s go straight to building the application. Here we will use Java and Spring. You can choose your own stack. The approach is similar. AWS provides an array of SDKs for different programming languages.

Spring and Java code

Create a new Java project poc-data-processor. We will use Spring Boot as the framework of the application, Spring Cloud to access the SQS queue, and the AWS SDK for Java to access the Dynamo DB table.

Add the dependencies

Set the Maven dependencies as below.

	
        2.1.3.RELEASE
        1.11.699
	
                org.springframework.cloud
                spring-cloud-aws-context
                ${spring-cloud-version}
            
            org.springframework.cloud
            spring-cloud-aws-messaging
            ${spring-cloud-version}
        
			org.springframework.cloud
			spring-cloud-starter-aws
            ${spring-cloud-version}
		
			org.springframework.boot
			spring-boot-starter
		
			org.springframework.boot
			spring-boot-starter-web
		
			org.springframework.boot
			spring-boot-starter-test
			test
		
            com.amazonaws
            aws-java-sdk-dynamodb
            ${aws-java-sdk-version}

Listen to the queue

In Spring Cloud, just add the annotation @SqsListener to enable your method to listen to the SQS queue.

    @SqsListener("poc-data-feed-queue")
    public void dataFeedListener(String data) {
        System.out.println("message received: " + data);
        processItem(data);
    }

The method dataFeedListener will receive the SQS message body. In this case, it will be the txn_id generated by the Lambda function.

Sample log of receiving a message:

message received: e51c16cb-41d0-4fff-8de4-de32de42205e

It then invokes the method to process the data passing the txn_id as key.

Process the data

The processItem method retrieves the item from the DynamoDB table poc-data-feed. Here we will use the AWS SDK for Java directly. Spring Boot supports accessing the DynamoDB table too but I find using the AWS SDK easier and simple for this PoC.

First, setup a bean for the DynamoDB client.

    @Value("${cloud.aws.region.static}")
    private String region;

    @Bean
    public AmazonDynamoDB amazonDynamoDB() {
        AmazonDynamoDB amazonDynamoDB = AmazonDynamoDBClientBuilder.standard().withRegion(region).build();
        return amazonDynamoDB;
    }

Then add processItem, getItem and updateItem methods in the DataProcessor class.

    final static String TABLE_NAME = "poc_data_feed";

    @Autowired
    private AmazonDynamoDB dynamoDB;

    private void processItem(String txnId) {
        Map<String, AttributeValue> itemKey = new HashMap<>();
        itemKey.put("uuid", new AttributeValue(txnId));

        Map<String, AttributeValue> item = getItem(itemKey);
        System.out.println("data payload: " + item);

        Map<String, AttributeValueUpdate> updatedItem = new HashMap<>();
        updatedItem.put("status", new AttributeValueUpdate(new AttributeValue("COMPLETED"), AttributeAction.PUT));

        updateItem(itemKey, updatedItem);
        System.out.println("updated payload: " + getItem(itemKey));
    }

    private Map<String, AttributeValue> getItem(Map<String, AttributeValue> itemKey) {
        GetItemRequest request = new GetItemRequest()
                .withKey(itemKey)
                .withTableName(TABLE_NAME);
        Map<String, AttributeValue> item = dynamoDB.getItem(request).getItem();
        return item;
    }

    private void updateItem(Map<String, AttributeValue> itemKey, Map<String, AttributeValueUpdate> updatedItem) {
        dynamoDB.updateItem(TABLE_NAME, itemKey, updatedItem);
    }

In processItem, we first create the itemKey map using the txnId. We then get and update the data using this key.

The data type Map is part of the AWS SDK for Java. It represents an item in the DynamoDB table.

In getItem, we retrieve the entire data payload on the table using the itemKey map. After getting the data, we can do the processing. In this PoC, we are simply updating the status to COMPLETED to imply that we have received the data, processed it as necessary and updated its status.

If you notice in the previous post, we have updated the Lambda function to add a new attribute status=”PENDING” for every new data it receives and stores in DynamoDB.

Then in updateItem, we passed back the itemKey and the new updatedItem to commit the changes back to the DynamoDB table.

The data type Map is part of the AWS SDK for Java. It represents an updated item in the DynamoDB table.

That’s all that we need for the application to process the data. You can also refer to the official AWS SDK for Java documentation on how to access DynamoDB tables.

Test the application locally

Let’s run the application locally first before we containerize and deploy it to ECS. When testing locally, we have to setup the credentials to access the AWS resources. In our case, our application needs access to the SQS queue and DynamoDB table.

Check locally that you have ~/.aws/credentials already set-up. This is where your AWS access key and secret are stored. We will need it for local testing. Another option is to pass the credentials via environment variables.

Also setup the Spring Cloud property cloud.aws.credentials.useDefaultAwsCredentialsChain=true in the application.properties of the Spring Boot app. This property will tell Spring Cloud to use the AWS credential chain as defined in the SDK. Later, when we deploy this application to ECS, the chain will use the instance profile to get the credentials.

So regardless of where you put your credentials be it in the environment variables, instance profile or credential profile in ~.aws/credentials, the precedence will follow the AWS credential chain. Just ensure you do not store the credentials in your code or in your instances.

Now, build and run the app.

mvn clean package
jaa -jar target/poc-data-processor-1.0-SNAPSHOT.jar

Fire some requests using the same Postman payload as in the previous post.

Sample log of processing the data:

message received: a6e7965a-f2fb-4e76-ac26-75d2e221ea13
data payload: {referralId={S: sazed55,}, name={S: Sazed,}, emailId={S: sazed@gmail.com,}, uuid={S: a6e7965a-f2fb-4e76-ac26-75d2e221ea13,}, age={N: 50,}, status={S: PENDING,}}
updated payload: {referralId={S: sazed55,}, name={S: Sazed,}, emailId={S: sazed@gmail.com,}, uuid={S: a6e7965a-f2fb-4e76-ac26-75d2e221ea13,}, age={N: 50,}, status={S: COMPLETED,}}

Notice that when we retrieved the item, its status was PENDING and it was updated to COMPLETED after the processing.

Note that the if the processing failed for some reason, the SQS message has already been consumed. Either you modify and return back the message to the queue if the processing failed or retry the processing in the application.

That’s it for our data processing application. We’ve tested it and showed that it can consume the SQS message and update the data in the DynamoDB table.

In my next post, we will containerize this application and deploy it to ECS.

12Jan20 Update: The amazonDynamoDB bean has been modified to include the AWS Region. I have also uploaded the full source code of the poc-data-processor application on GitHub here.

Serverless Webhooks using AWS Lambda - Part 2

2019-12-31T00:00:00+08:00

This is Part 2 of my Serverless Webhooks post. You can find Part 1 here.

In this continuation of the Serverless Webhooks design using AWS Lambda, we will look at processing the ingested data using containers. Lambda as a function is good to accept the data payload but if the data requires some heavy computation for instance, a running process on a container would be a better and more practical approach.

Architecture

In the above diagram, we enhanced the architecture to do the data processing via container (see the blue arrows). Instead of the Lambda function directly storing to DynamoDB, it can send the data or the data id to a message queue like AWS SQS. The application poc-data-processor running on the ECS container regularly polls the SQS poc-data-feed-queue for new messages and process the incoming data before storing to the DynamoDb table.

For Part 2, we will focus on the SQS and how to set it up with our Lambda function.

SQS setup

Linking the Lambda function to the data processor on the ECS container requires an integration service like SQS. The SQS queue will asynchronously accept the incoming data, hold it until the container app can process it. In this way, the Lambda function can terminate immediately and need not wait for the entire data processing to complete.

Remember that AWS Lambda functions are charged by the running time, so it is best to not use it for long-running processes like heavy data analysis and calculations.

Creating the queue is straight-forward. Just go to AWS Simple Queue Service and choose the Standard Queue type. Enter the queue name as poc-data-feed-queue.

Update the Lambda function to send the data to the new queue

Update the Lambda function poc-data-feed-handler created in Part 1.

To keep the previous Lambda function version, in the Action section of the Lambda in the AWS console, click on Publish new version.

Add another boto client for SQS.

boto3.client('sqs')

Get the queue url of the newly created SQS queue. This is required by the boto client when sending the message. It will look something like https://sqs.yourregion.amazonaws.com/youraccount/poc-data-feed-queue.

Below is the updated Python script that sends the data to the SQS queue. The function to invoke is send_message where the parameters queue url, message attributes and message body are passed.

import json
import boto3
import uuid

def lambda_handler(event, context):
    request = json.loads(event['body'])
    txn_id = uuid.uuid4()
    
    item = {
        'uuid': {'S': str(txn_id)},
        'name': {'S': request['name']},
        'age': {'N': str(request['age'])},
        'emailId': {'S': str(request['emailId'])},
        'referralId': {'S': str(request['referralId'])},
        'status': {'S': "PENDING"}
    }
    
    dynamodb = boto3.client('dynamodb')
    dynamodb.put_item(TableName='poc_data_feed', Item=item)

    referral_id = request['referralId']
    sqs = boto3.client('sqs')
    queue_url = 'https://sqs.yourregion.amazonaws.com/youraccount/poc-data-feed-queue'
    response = sqs.send_message(
        QueueUrl = queue_url,
        MessageAttributes = {
            'name': {
                'DataType': 'String',
                'StringValue': request['name']
            }        
        },
        MessageBody = str(txn_id)
    )

    return {
        'statusCode': 200,
        'body': json.dumps({
            "txnId": str(txn_id),            
            "message": "Data received."
        })
    }

One thing to take note, we are only passing the txn_id to the SQS queue and not the entire data payload. The Lambda function still stores the raw data payload direct to the DynamoDB table and just pass a unique id like the txn_id to the queue. This is usually the case when the payload is large and it needs to persist the data first. It’s also not practical to store the entire payload to the queue. It is up to the processor to pick the entire data from the DynamoDB table based on the the txn_id it got from the queue.

Before testing the function, make sure to add the SQS policy so that the Lambda function can access the SQS queue. For simplicity since this is just a PoC, we will use the managed policy AmazonSQSFullAccess and give our Lambda function full access.

Test the Lambda and SQS connectivity

Using the same request format we used in Part 1, trigger a request to the Lambda via the API Gateway. Refer to the sample Postman request and response below.

Let’s verify if the message is passed to the SQS queue. Open the queue in the AWS console and poll for messages. You will see the newly passed data in the SQS with the message body as the txn_id.

Our Lambda function can now store data to both DynamoDB and SQS queue where the message in the SQS queue is waiting to be consumed by the processor container.

In the next part, we will show how to build the data processor application using Spring and consume the SQS message.

Serverless Webhooks using AWS Lambda - Part 1

2019-12-17T00:00:00+08:00

In this post, we will setup a service to receive data from an external system.

One approach to receive data is through a webhook endpoint where the external system can trigger and post data to that endpoint. Before serverless architecture we typically would need a server to host this webhook endpoint. As more external systems connect to you with different sets of data, you would need to host more endpoints and that means more servers.

Luckily, through serverless architecture, we can define each webhook as a function and this function will only run when data is available. In AWS, we can design it using a Lambda function with an API Gateway in front and a DynamoDB at the backend to persist the data.

Architecture

The design is pretty straightforward. The key component is the Lambda function poc-data-feed-handler. This function will receive the data in json format and stores them in the DynamoDB table poc-data-feed.

Create your Lambda

In AWS Lambda designer, setup your Lambda such that it has the following components:

The role assigned to your Lambda is important to see the other services it can interact with. Ensure that you have AWSLambdaBasicExecutionRole which has the CloudWatch logs basic write access policy. Then for DynamoDB, you can attach an inline policy to the table where we will store the data.

Here’s a sample inline policy to allow DynamoDB access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadWriteTable",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:GetItem",
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:BatchWriteItem",
                "dynamodb:PutItem",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:yourregion:youraccount:table/poc_data_feed"
        },
        {
            "Sid": "GetStreamRecords",
            "Effect": "Allow",
            "Action": "dynamodb:GetRecords",
            "Resource": "arn:aws:dynamodb:yourregion:youraccount:table/poc_data_feed/stream/* "
        },
        {
            "Sid": "WriteLogStreamsAndGroups",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "CreateLogGroup",
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        }
    ]
}

Take note to update the DynamoDB table ARN in the policy.

Then in the Lambda code, write the handler function to accept the data and store in the database. In this example, I used Python as the runtime environment.

As an example, let’s say the external system is a registration system that will trigger our Lambda webhook with registration data for new user sign-ups. We will receive a data payload that contains name, age, emailId and referralId.

{
    "name": "Vin",
    "age": 22,
    "emailId": "vin@example.com",
    "referralId": "elend16"
}

The function will generate a uuid as a transaction reference and insert the data into DynamoDB using the boto client.

Create the DynamoDB table

Create the poc-data-feed table in DynamoDB. Use the uuid attribute as the partition key.

Get the ARN of this table and update the policy for DynamoDB mentioned above.

Setup the API Gateway

Add the API Gateway as a trigger if it hasn’t been added yet. Change the HTTP method to POST since we will be posting data to this webhook.

Make sure that this API endpoint points to the target Lambda poc-data-feed-handler we just created. Then take note of the endpoint’s url for testing. You can get this in the Stages section of the gateway.

Test the webhook setup

Now that the webhook setup is complete from API Gateway to Lambda to DynamoDB, we can fire some payload requests with sample data and see that it gets stored in the DynamoDB table.

Here’s the sample using postman to trigger the API with test data.

Here’s the data inserted in DynamoDB.

And that’s it! You can now setup as many Lambda webhooks as you want to receive different types of data, process them and store in DynamoDB.

In my next post, we will look at expanding this solution to cater to data processing that would require longer times to run or passing the data to an existing containerized application.

Building a publishing pipeline for static websites using AWS

2019-04-06T14:52:00+08:00

Looking at the devops solutions in AWS, I was searching for a project to try them out. This static website came to mind. The goal was to try AWS CodeCommit, CodeBuild and CodePipeline to automate the publishing of this static website.

Upload your site to AWS CodeCommit

Go to your blog folder and initialize git for source code management.

cd myblog
git init

For jekyll, I setup my .gitignore to exclude the following files.

_site
.sass-cache
.jekyll-metadata
.idea

Create a repo in AWS CodeCommit for your blog.

Set your remote repository to the newly created repo and push your code there.

git remote add origin ssh://myblog/repo/path
git push origin master

Setup AWS CodeBuild to build your site

In AWS CodeBuild, create a build project and specify in the source provider the CodeCommit repository you previously created for your blog.

In the environment section, choose Ubuntu server and Ruby as the runtime builder. Remember that Jekyll build uses Ruby.

In the Buildspec section, use the below buildspec.yml file and store it in the root folder of your code.

You can store it in other folders but make sure to tell CodeBuild where it is located.

version: 0.2

phases:
  install:
    commands:
      - echo "******** Install Bundles ********"
      - bundle install
  build:
    commands:
      - echo "******** Building Jekyll site ********"
      - JEKYLL_ENV=production bundle exec jekyll build
artifacts:
  base-directory: '_site'
  files:
    - '**/*'
  name: code.eidorian.com-$(date +%Y-%m-%d-%H%M%S)

A few things to note in this buildspec file:

The bundle install command installs the gems specified in the Gemfile of your blog.
The bundle exec jekyll build builds the blog site itself in the _site folder.
The artifacts section tells CodeBuild the location of the artifacts and this will be used later by CodePipeline in the deployment.
The base-directory and files ensures that the folder _site will be excluded in the artifacts and only its contents will be deployed.

Know more about the buildspec syntax from AWS documentation here.

Use AWS CodePipeline to orchestrate the build and publishing

Finally, create an AWS CodePipeline to trigger a build whenever you push your code to CodeCommit and deploy the site to your target S3.

There are three stages to the pipeline - source, build and deploy.

Source

Specify in the source stage your CodeCommit repository and the branch to be built. This is the master branch by default. Also choose the recommended CloudWatch Events as the trigger for the build.

Build

In the build stage, choose the CodeBuild project you just created.

Deploy

Then in the deploy stage, choose the S3 bucket that is hosting your static website.

Review your pipeline and create it.

Trigger the pipeline by pushing a code change in your blog or manually doing it via Release change.

Fix the brightness buttons of Asus laptops running Linux Mint

2017-08-18T00:00:00+08:00

The solution here was tested on:

Asus K401U

Linux Mint 18 Mate

Nvidia GeForce 940MX

First, do some troubleshooting. Check if your Fn keys are responding. In Asus, the Fn keys for brightness control are Fn+F5 and Fn+F6. Use the acpi_listen command and then press the Fn+F5 or Fn+F6 keys.

$ acpi_listen 
video/brightnessdown BRTDN 00000087 00000000 K
 PNP0C14:01 000000ff 00000000
video/brightnessup BRTUP 00000086 00000000 K
 PNP0C14:01 000000ff 00000000

Normally, the keys are responding and the events are mapped correctly. It is the video device that is not getting triggered to adjust the brightness. That is why even if you installed xbacklight or do it via desktop ui settings, it won’t adjust the brightness.

Optional:
This is just for testing but it’s interesting to try it out. You can adjust directly in /sys/class/backlight. This will contain the different devices that you have e.g. intel_backlight or acpi_video. Inside each folder, you will find the brightness and max_brightness files. Take note of their values and update the brightness file to increase or decrease the brightness without exceeding the max_brightness value.
1
2
3
4
5
6
$ cat max_brightness
937
$ cat brightness
500
$ sudo echo 200 | sudo tee brightness
200
Did the screen brightness change?

Now for the permanent fix.

Update grub and add acpi_osi= and acpi_backlight=native

$ sudo vi /etc/default/grub

e.g.

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_osi= acpi_backlight=native"

$ sudo update-grub

Restart.

$ sudo reboot

This worked for me even with bumblebee installed.