Case: Improving user experience: a banking solution
The high impact that COVID-19 produced on several financial companies was known to everyone. The lockdown made customers have to change their habits and they quickly began to consume the web portals, which clearly the majority were not prepared to receive high workloads. Some have not yet managed to adapt to the change, however, there were other companies that had a quick reaction by deriving their efforts. For this, our Customer started a digital transformation, adopting the cloud so that its platforms were stable, scalable, and secure, thus responding to the demands of its clients avoiding performance problems, our client added new functionalities that allowed end-users to carry out procedures that before the pandemic had to take place in physical branches.
One of these companies that managed to adapt and focus its efforts on the adoption of digital transformation is our client, a renowned Chilean financial institution with more than 50 years of existence and with more than one million users.
Our client made an effort to be able to respond quickly to the contingency produced by the pandemic and with this to be able to retain their customers, delivering a new Portal that will enhance the user experience of the business services provided through the Web Platform and Mobile.
To face this challenge, the Amazon Web Services (AWS) Cloud platform was selected together with the advice and accompaniment of a team of specialized professionals, in order to be advised in this process of adopting the cloud and implementing hybrid architectures.
The Client, together with its Architecture, Development, Security, and Infrastructure areas, saw in AWS an ally to carry out the construction of a new Portal, taking advantage of Cloud advantages such as elasticity, high availability, connectivity and cost management.
The project conceived contemplated the implementation of a frontend (Web and Mobile) together with a layer of micro-services with integration towards its On-Premise systems via consumption of services exposed in its ESB that in turn accesses its legacy systems, thus forming an architecture hybrid.
Within the framework of this project, 3HTP Cloud Services actively participated in the advice, definitions, and technical implementations of the infrastructure and support to the development and automation teams, taking as reference the 05 pillars of the AWS well-architected framework.
3HTP Cloud Services participation focused on the following activities:
- Validation of architecture proposed by the client
- Infrastructure as code (IaC) project provisioning on AWS
- Automation and CI / CD Integration for infrastructure and micro-services
- Refinement of infrastructure, definitions, and standards for Operation and monitoring
- Stress and Load Testing for AWS and On-Premise Infrastructure
The client achieved some relevant benefits, among the most outstanding we can mention:
- Automation, management, and deployment of the infrastructure and application components that run on it, allowing the client to accelerate and strengthen the life cycle of the solution.
- Generation of volatile environments in an automated way as a result of the previous point.
- Improved infrastructure to support the high demand for load requirements, the improvement was made for both productive and non-productive environments, the appropriate dimensioning was defined based on the results and conclusions of the load and stress tests carried out in AWS and On- environments. The premise, in the different components that make up the hybrid system.
- Significant cost reduction through the efficient use of the different AWS services (Example: use AWS Spot-Aurora Server-less) for non-productive environments, as a consequence of the different recommendations based on findings and application of good practices, applied during the project.
- The institution was able to meet its governance, security, scalability, continuous delivery, and continuous deployment (CI / CD) objectives, as well as interaction with its on-premises infrastructure using the AWS cloud.
- Growth and acquisition of technical experience of the different client work teams involved in the life cycle of the solution.
Services Performed by 3HTP Cloud Services: Initial Architecture Validation
The institution already had a first adoption architecture in the cloud for its client portal, therefore, as a multidisciplinary team, we began with a diagnosis of the current situation and the proposal made by the client; From this diagnosis and evaluation, the following recommendations relevant to architecture were obtained:
- Separation of architectures for productive and non-productive environments
- The use Infrastructure as code in order to create volatile environments, by projects, by business unit, etc.
- CI / CD implementation to automate the creation, management, and deployment of both Infrastructure and micro-services.
Productive Environment Architecture
- This architecture is based on the use of three Availability Zones (AZ), additionally, On-Demand instances are used for AWS EKS Workers and the use of reserved instances for database and cache with 24×7 high availability.
The number of instances to use for the Redis Cluster is defined.
Non-Productive Environment Architecture
Considering that non-production environments do not require 24/7 use, but if it is necessary that they have at least an architecture similar to that of production, an approved architecture was defined, which allows the different components to be executed in high availability and at the same time allows minimize costs. For this, the following was defined:
- Reduction of availability zones for non-productive environments, remaining in two availability zones (AZ)
- Using Spot Instances to Minimize AWS EKS Worker Costs
- Configuration of off and on of resources for use during business hours.
- Using Aurora Serverless
The instances to be used are defined considering that there are only two availability zones, the number of instances for Non-Production environments is simply 4.
Non-production environments diagram
Infrastructure as Code
In order to achieve the creation of the architectures in a dynamic way additionally that the environments could be volatile in time, it was defined that the infrastructure must be created by means of code. For this, Terraform was defined as the primary tool to achieve this objective.
As a result of this point, 2 totally variable Terraform projects were created which are capable of creating the architectures shown in the previous point in a matter of minutes, each execution of these projects requires the use of a Bucket S3 to be able to store the states created by Terraform.
Additionally, these projects are executed from Jenkins Pipelines, so the creation of a new environment is completely automated.
Automation and CI / CD Integration for infrastructure and micro-services
Micro-services Deployment in EKS
We helped the financial institution to deploy the micro-services associated with its business solution in the Kubernetes Cluster (AWS EKS), for this, several definitions were made in order to be able to carry out the deployment of these micro-services in an automated way, thus complying with the process Complete DevOps (CI and CD).
A Jenkins pipeline was created to automatically deploy the micro-services to the EKS cluster.
Tasks executed by the pipeline:
In summary the steps of the pipeline:
- Get micro-service code from Bitbucket
- Compile code
- Create a new image with the package generated in the compilation
- Push image to AWS ECR
- Create Kubernetes manifests
- Apply manifests in EKS
Refinement and definitions and standards to be used on the infrastructure
For the institution and as for any company, security is critical, for this, an exclusive Docker image was created, which did not have known vulnerabilities or allow the elevation of privileges by applications, this image is used as a basis for micro-services, For this process, the Institution’s Security Area carried out concurrent PenTest until the image did not report known vulnerabilities until then.
AWS EKS configurations
In order to be able to use the EKS clusters more productively, additional configurations were made on it:
- Use of Kyverno: Tool that allows us to create various policies in the cluster to carry out security compliance and good practices on the cluster (https://kyverno.io/)
- Metrics Server installation: This component is installed in order to be able to work with Horizontal Pod Autoscaler in the micro-services later
- X-Ray: The use of X-Ray on the cluster is enabled in order to have better tracking of the use of micro-services
- Cluster Autoscaler: This component is configured in order to have elastic and dynamic scaling over the cluster.
- AWS App Mesh: A proof of concept of the AWS App Mesh service is carried out, using some specific micro-services for this test.
Defining Kubernetes Objects
- Use of Resources Limit: in order to avoid overflows in the cluster, the first rule to be fulfilled by a micro-service is the definition of the use of memory and CPU resources both for the start of the Pod and the definition of its maximum growth. Client micro-services were categorized according to their use (Low, Medium, High) and each category has default values for these parameters.
- Use of Readiness Probe: It is necessary to avoid loss of service during the deployment of new versions of micro-services, that is why before receiving a load in the cluster they need to perform a test of the micro-service.
- Use of Liveness Probe: Each micro-service to be deployed must have a life test configured that allows checking the behavior of the micro-service
The use of 2 types of Kubernetes Services was defined:
- ClusterIP: For all micro-services that only use communication with other micro-services within the cluster and do not expose APIs to external clients or users.
- NodePort: To be used by services that expose APIs to external clients or users, these services are later exposed via a Network Load Balancer and API Gateway.
ConfigMap / Secrets
Micro-services should bring their customizable settings in Kubernetes secret or configuration files.
Horizontal Pod Autoscaler (HPA)
Each micro-service that needs to be deployed in the EKS cluster requires the use of HPA in order to define the minimum and maximum number of replicas required of it.
The client’s micro-services were categorized according to their use (Low, Medium, High) and each category has a default value of replicas to use.
Stress and Load Testing for AWS and On-Premise Infrastructure
One of the great challenges of this type of architecture (Hybrid) where the backend and core of the business are On-Premise and the Frontend and logic layers are in dynamic elastic clouds, is to define to what extent architecture can be elastic without affecting the On-Premise and legacy services related to the solution.
To solve this challenge, load and stress tests were carried out on the environment, simulating peak business loads and normal loads, this, monitoring was carried out in the different layers related to the complete solution at the AWS level (CloudFront, API Gateway, NLB, EKS, Redis, RDS) at the on-premise ESB, Legacy, Networks and Links level.
As a result of the various tests carried out, it was possible to define the minimum and maximum elasticity limits in AWS, (N ° Worker, N ° Replicas, N ° Instances, Types of Instances, among others), at the On-Premise level (N ° Worker, Bandwidth, etc).
At present it is very common to see this type of Project, which contemplate the development of components in cloud environments that interact with services or on-Premises components, thus forming a hybrid architecture, taking advantage of all the advantages that the Cloud provides by accessing business logic and data contained in on-premises legacy systems, thus generating a technical and governance challenge to ensure adequate performance and operation.
As we all know in the cloud almost everything is scalable, dynamic, and elastic, however, on-premise legacy systems are by their nature less elastic and resilient. That is why various aspects must be taken into consideration, a multidisciplinary approach is required (architects specialized in cloud and traditional middleware, network specialists, developers, and specialists in load testing in hybrid environments, among others) in order to be able to Obtain the best definitions for each of the aspects involved in this type of scenario to achieve a successful project. It is important to apply good practices and never lose sight of the fact that it is a solution that has two worlds that can be very different and requires a consolidated (non-isolated) vision to get the best out of each of them.