top of page

Search

54 results found with an empty search

  • Accessing APIs and Extracting Data with Airflow

    Intro Airflow provides different ways of working with automated flows and one of the ways is the possibility of accessing external APIs using HTTP operators and extracting the necessary data. hands-on In this tutorial we will create a DAG which will access an external API and extract the data directly to a local file. If this is your first time using Airflow, I recommend accessing this link to understand more about Airflow and how to set up an environment. Creating the DAG For this tutorial, we will create a DAG that will trigger every 1 hour (schedule_interval="0 * * * *") and access an external API by extracting some data directly to a local JSON file. In this scenario we will use the SimpleHttpOperator operator which provides an API capable of executing requests to external APIs. Note that we use two operators within the same DAG. The SimpleHttpOperator operator that provides ways of accessing external APIs that through the method field we define HTTPs methods (GET, POST, PUT, DELETE). The endpoint field allows specifying the endpoint of the API, which in this case is products and finally, the http_conn_id parameter, where it's necessary to pass the identifier of the connection that will be defined next through the Airflow UI. As shown below, access the menu Admin > Connections Fill in the data as shown in the image below and then save. About the PythonOperator operator, we are only using it to execute a Python function called _write_response using XComs where through the task_id of the write_response task, it is possible to retrieve the result of the response and use it in any part of the code. In this scenario we are using the result retrieved from the API to write to the file. XCom is a communication mechanism between different tasks that makes Airflow very flexible. Tasks can often be executed on different machines and with the use of XComs, communication and information exchange between Tasks is possible. Finally, we define the execution of the tasks and their dependencies, see that we use the >> operator, which is basically to define the order of execution between the tasks. In our case, API access and extraction must be performed before writing to the file extract_data >> write_response. After executing the DAG, it is possible to access the file that was generated with the result of the extraction, just access one of the workers via the terminal, which in this case will only have one. Run the following command below to list the containers: docker ps A listing similar to the one below will be displayed. Notice that one of the lines in the NAMES column refers to the worker, in this case coffee_and_tips_airflow-worker_1. Continuing in the terminal, type the following command to access the Airflow directory where the extract_data.json file is located. docker exec -it coffee_and_tips_airflow-worker_1 /bin/bash It's done, now just open the file and check the content. Conclusion Once again we saw the power of Airflow for automated processes that require easy access and integration of external APIs with few lines of code. In this example, we explore the use of XComs, which aims to make the exchange of messages between tasks that can be executed on different machines in a distributed environment more flexible. Hope you enjoyed!

  • Quick guide about Apache Kafka: Powering Event-Driven architecture

    Introduction In today's data-driven world, the ability to efficiently process and analyze vast amounts of data in real-time has become a game-changer for businesses and organizations of all sizes. From e-commerce platforms and social media to financial institutions and IoT devices, the demand for handling data streams at scale is ever-increasing. This is where Apache Kafka steps in as a pivotal tool in the world of event-driven architecture. Imagine a technology that can seamlessly connect, process, and deliver data between countless systems and applications in real-time. Apache Kafka, often referred to as a distributed streaming platform, is precisely that technology. It's the unsung hero behind the scenes, enabling real-time data flow and providing a foundation for a multitude of modern data-driven applications. In this quick guide about Apache Kafka, we'll take a deep dive into Apache Kafka, unraveling its core concepts, architecture, and use cases. Whether you're new to Kafka or looking to deepen your understanding, this guide will serve as your compass on a journey through the exciting world of real-time data streaming. We'll explore the fundamental principles of Kafka, share real-world examples of its applications, and provide practical insights for setting up your own Kafka environment. So, let's embark on this adventure and discover how Apache Kafka is revolutionizing the way we handle data in the 21st century. Key Concepts of Kafka 1. Topics What Are Kafka Topics? In Kafka, a topic is a logical channel or category for data. It acts as a named conduit for records, allowing producers to write data to specific topics and consumers to read from them. Think of topics as a way to categorize and segregate data streams. For example, in an e-commerce platform, you might have topics like "OrderUpdates," "InventoryChanges," and "CustomerFeedback," each dedicated to a specific type of data. Partitioning within Topics One of the powerful features of Kafka topics is partitioning. When a topic is divided into partitions, it enhances Kafka's ability to handle large volumes of data and distribute the load across multiple brokers. Partitions are the unit of parallelism in Kafka, and they provide fault tolerance, scalability, and parallel processing capabilities. Each partition is ordered and immutable, and records within a partition are assigned a unique offset, which is a numeric identifier representing the position of a record within the partition. This offset is used by consumers to keep track of the data they have consumed, allowing them to resume from where they left off in case of failure or when processing real-time data. Data organization Topics provide a structured way to organize data. They are particularly useful when dealing with multiple data sources and data types. Topics works as a storage within Kafka context where data sent by producers is organized into topics and partitions. Publish-Subscribe Model Kafka topics implement a publish-subscribe model, where producers publish data to a topic, and consumers subscribe to topics of interest to receive the data. An analogy that we can do is when we subscribe to a newsletter to receive some news or articles. When some news is posted, you as a subscriber will receive it. Scalability Topics can be split into partitions, allowing Kafka to distribute data across multiple brokers for scalability and parallel processing. Data Retention Each topic can have its own data retention policy, defining how long data remains in the topic. This makes easier to manage the data volume wheter or not frees up space. 2. Producers In Kafka, a producer is a crucial component responsible for sending data to Kafka topics. Think of producers as information originators—applications or systems that generate and publish records to specific topics within the Kafka cluster. These records could represent anything from user events on a website to system logs or financial transactions. Producers are the source of truth for data in Kafka. They generate records and push them to designated topics for further processing. Also decide which Topic the message will be send based on the nature of the data. This ensures that data is appropriately categorized within the Kafka ecosystem. Data Type Usually producers send messages based on JSON format that makes easier the data transferring into the storage. Acknowledgment Handling Producers can handle acknowledgments from the Kafka broker, ensuring that data is successfully received and persisted. This acknowledgment mechanism contributes to data reliability. Sending data to specific partitions Producers can send messages directly to a specific partition within a Topic. 3. Consumers Consumers are important components in the Kafka context, they are responsible for consuming and providing data from the source. Basically, consumers subscribe to Kafa Topics and any data produced there will be received by consumers representing the pub/sub approach. Subscribing to Topics Consumers actively subscribe to Kafka topics, indicating their interest in specific streams of data. This subscription model enables consumers to receive relevant information aligned with their use case. Data Processing Consumers will always receive new data from topics, each consumer is responsible for processing this data according to their needs. A microservice that works as a consumer for example, it can consume data from a topic responsible for storing application logs and performing any processing before delivering it to the user or to other third-party applications. Integration between apps As mentioned previously, Kafka enables applications to easily integrate their services across varied topics and consumers. One of the most common use cases is integration between applications. In the past, applications needed to connect to different databases to access data from other applications, this created vulnerabilities and violated principles of responsibilities between applications. Technologies like Kafka make it possible to integrate different services using the pub/sub pattern where different consumers represented by applications can access the same topics and process this data in real time without the need to access third-party databases or any other data source, avoiding any security risk and added agility to the data delivery process. 4. Brokers Brokers are fundamental pieces in Kafka's architecture, they are responsible for mediating and managing the exchange of messages between producers and consumers. Brokers manage the storage of data produced by producers and guarantee reliable transmission of data within a Kafka cluster. In practice, Brokers have a transparent role within a Kafka cluster, but below I will highlight some of their responsibilities that make all the difference to the functioning of Kafka. Data reception Brokers are responsible for receiving the data, they function as an entry-point or proxy for the data produced and then manage all storage so that it can be consumed by any consumer. Fault tolerance Like all data architecture, we need to think about fault tolerance. In the context of Kafka, Brokers are responsible for ensuring that even in the event of failures, data is durable and maintains high availability. Brokers are responsible for managing the partitions within the topics capable of replicating the data, predicting any failure and reducing the possibility of data loss. Data replication As mentioned in the previous item, data replication is a way to reduce data loss in cases of failure. Data replication is done from multiple replicas of partitions stored in different Brokers, this allows that even if one Broker fails, there is data replicated in several others. Responsible for managing partitions We mentioned a recent article about partitions within topics but we did not mention who manages them. Partitions are managed by a Broker that works by coordinating reading and writing to that partition and also distributing data loading across the cluster. In short, Brokers perform orchestration work within a Kafka cluster, managing the reading and writing done by producers and consumers, ensuring that message exchanges are carried out and that there will be no loss of data in the event of failures in some of its components through data replication also managed by them. Conclusion Apache Kafka stands as a versatile and powerful solution, addressing the complex demands of modern data-driven environments. Its scalable, fault-tolerant, and real-time capabilities make it an integral part of architectures handling large-scale, dynamic data streams. Kafka has been adopted by different companies and business sectors such as Linkedin, where Kafka was developed by the way, Netflix, Uber, Airbnb, Wallmart, Goldman Sachs, Twitter and more.

  • Differences between Future and CompletableFuture

    Introduction In the realm of asynchronous and concurrent programming in Java, Future and CompletableFuture serve as essential tools for managing and executing asynchronous tasks. Both constructs offer ways to represent the result of an asynchronous computation, but they differ significantly in terms of functionality, flexibility, and ease of use. Understanding the distinctions between Future and CompletableFuture is crucial for Java developers aiming to design robust and efficient asynchronous systems. At its core, a Future represents the result of an asynchronous computation that may or may not be complete. It allows developers to submit tasks for asynchronous execution and obtain a handle to retrieve the result at a later point. While Future provides a basic mechanism for asynchronous programming, its capabilities are somewhat limited in terms of composability, exception handling, and asynchronous workflow management. On the other hand, CompletableFuture introduces a more advanced and versatile approach to asynchronous programming in Java. It extends the capabilities of Future by offering a fluent API for composing, combining, and handling asynchronous tasks with greater flexibility and control. CompletableFuture empowers developers to construct complex asynchronous workflows, handle exceptions gracefully, and coordinate the execution of multiple tasks seamlessly. In this article, we will dive deeper into the differences between Future and CompletableFuture, exploring their respective features, use cases, and best practices. By understanding the distinct advantages and trade-offs of each construct, developers can make informed decisions when designing asynchronous systems and leveraging concurrency in Java applications. Let's embark on a journey to explore the nuances of Future and CompletableFuture in the Java ecosystem. Use Cases for Future Parallel Processing: Use Future to parallelize independent tasks across multiple threads and gather results asynchronously. For example, processing multiple files concurrently. Asynchronous IO: When performing IO operations that are blocking, such as reading from a file or making network requests, you can use Future to perform these operations in separate threads and continue with other tasks while waiting for IO completion. Task Execution and Coordination: Use Future to execute tasks asynchronously and coordinate their completion. For example, in a web server, handle multiple requests concurrently using futures for each request processing. Timeout Handling: You can set timeouts for Future tasks to avoid waiting indefinitely for completion. This is useful when dealing with resources with unpredictable response times. Use Cases for CompletableFuture Async/Await Pattern: CompletableFuture supports a fluent API for chaining asynchronous operations, allowing you to express complex asynchronous workflows in a clear and concise manner, similar to the async/await pattern in other programming languages. Combining Results: Use CompletableFuture to combine the results of multiple asynchronous tasks, either by waiting for all tasks to complete (allOf) or by combining the results of two tasks (thenCombine, thenCompose). Exception Handling: CompletableFuture provides robust exception handling mechanisms, allowing you to handle exceptions thrown during asynchronous computations gracefully using methods like exceptionally or handle. Dependency Graphs: You can build complex dependency graphs of asynchronous tasks using CompletableFuture, where the completion of one task triggers the execution of another, allowing for fine-grained control over the execution flow. Non-blocking Callbacks: CompletableFuture allows you to attach callbacks that are executed upon completion of the future, enabling non-blocking handling of results or errors. Completing Future Manually: Unlike Future, you can complete a CompletableFuture manually using methods like complete, completeExceptionally, or cancel. This feature can be useful in scenarios where you want to provide a result or handle exceptional cases explicitly. Examples Creation and Completion Future code example of creation and completion. ExecutorService executor = Executors.newSingleThreadExecutor(); Future future = executor.submit(() -> { Thread.sleep(2000); return 10; }); CompletableFuture code example of creation and completion. CompletableFuture completableFuture = CompletableFuture.supplyAsync(() -> { try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); } return 10; }); In CompletableFuture, supplyAsync method allows for asynchronous execution without the need for an external executor service an shown in the first example. Chaining Actions Example below in how to chain actions using Future. Future future = executor.submit(() -> 10); Future result = future.thenApply(i -> "Result: " + i); Now, an example using CompletableFuture in how to chain actions. CompletableFuture completableFuture = CompletableFuture.supplyAsync(() -> 10); CompletableFuture result = completableFuture.thenApply(i -> "Result: " + i); CompletableFuture offers a fluent API (thenApply, thenCompose, etc.) to chain actions, making it easier to express asynchronous workflows. Exception Handling Handling exception using Future Future future = executor.submit(() -> { throw new RuntimeException("Exception occurred"); }); Handling exception using CompletableFuture CompletableFuture completableFuture = CompletableFuture.supplyAsync(() -> { throw new RuntimeException("Exception occurred"); }); CompletableFuture allows for more flexible exception handling using methods like exceptionally or handle. Waiting for Completion // Future Integer result = future.get(); // CompletableFuture Integer result = completableFuture.get(); Both Future and CompletableFuture provide the get() method to wait for the completion of the computation and retrieve the result. Combining Multiple CompletableFutures CompletableFuture future1 = CompletableFuture.supplyAsync(() -> 10); CompletableFuture future2 = CompletableFuture.supplyAsync(() -> 20); CompletableFuture combinedFuture = future1.thenCombine(future2, (x, y) -> x + y); CompletableFuture provides methods like thenCombine, thenCompose, and allOf to perform combinations or compositions of multiple asynchronous tasks. Conclusion In the dynamic landscape of asynchronous and concurrent programming in Java, both Future and CompletableFuture stand as indispensable tools, offering distinct advantages and use cases. While Future provides a basic mechanism for representing the result of asynchronous computations, its capabilities are somewhat limited when it comes to composability, exception handling, and asynchronous workflow management. On the other hand, CompletableFuture emerges as a powerful and flexible alternative, extending the functionalities of Future with a fluent API for composing, combining, and handling asynchronous tasks with greater control and elegance. The choice between Future and CompletableFuture hinges on the specific requirements and complexities of the task at hand. For simple asynchronous operations or when working within the confines of existing codebases, Future may suffice. However, in scenarios that demand more sophisticated asynchronous workflows, exception handling, or task coordination, CompletableFuture offers a compelling solution with its rich feature set and intuitive API.

  • How to create a serverless app with AWS SAM

    For this post, I will teach you how to create a serverless app with AWS SAM. AWS SAM (Serverless Application Model) is an extension of AWS CloudFormation, specifically designed for serverless application development and deployment, famous serverless like AWS Lambda, API Gateway, DynamoDB, among other AWS features. Level of abstraction AWS SAM is an application-level tool primarily focused on building and deploying serverless applications on AWS. It provides higher level abstractions to facilitate the development and deployment of serverless applications, with a focus on the AWS services needed to support this type of architecture, i.e. the whole focus is on AWS and not another cloud. AWS SAM has a whole way to generate the project's code locally and makes it possible to generate tests, Build and Deploy through SAM CLI. How to install AWS SAM Go to this link and follow the steps according to each operating system. How to create a serverless project After installing, through a terminal, manage your project locally by generating the necessary files to then deploy the application. First, go to the folder where you want to generate your serverless resource and then open the terminal. Type the following command in the terminal to start the SAM: sam init After typing, a prompt will appear with some options for you to fill in your project information. Above we have 2 options to generate our initial template, let's type 1 to generate the option 1 - AWS Quick Start Templates. After typing, a new list will be shown with some template options. Note that each option boils down to a resource such as Lambda, Dynamo table and even a simple API using API Gateway. For this scenario, let's create a Dynamo table, in this case, type the option 13 and press enter. After typing, some questions will be asked, just type y to proceed until a new screen about the project information is offered as below. Type the name of the project you want and press enter. In our case I typed the following name for the project dynamo-table-using-aws-sam as in the image below. After typing the project name, the template and files containing the base code will be available and ready for deployment. Access the folder and see that a file called template.yaml has been created containing information about the resources that will be created. It's very similar to a CloudFormation template, but shorter. Open the file and notice that several helper resources have been mapped into the template, such as Dynamo itself, a Lambda and an API Gateway. Were also created, some base codes related to Lambda and some unit tests that allow local invocations. How to deploy Now that our template and base code has been generated, it's time to create the Dynamo table in AWS, just follow the next steps. Access the terminal again and type the following command: sam deploy --guided After executing this command, the following options will be shown in the terminal prompt for completion: For the Stack Name field, enter a value that will be the identifier of that stack which will be used by CloudFormation to create the necessary resources. When in doubt, follow what was typed as per the image above, in this case dynamo-stack. After filling in all the fields, a summary of what will be created will be presented as shown in the image below: Finally, one more last question will be asked about the desire to deploy, just type y to confirm. After confirming the operation, the progress of creating the resources will be displayed in the terminal until the end of the process. Deploy finished, notice again the resources that were created. Now just access the AWS console and check the table created in Dynamo. Deleting Resources If necessary, you can delete the resources via SAM CLI, just run the command below: sam delete dynamo-stack The dynamo-stack argument refers to the identifier we typed earlier in the Stack Name field, remember? Use the same to delete the entire created stack. After typing the command above, just confirm the next steps. It's quite simple how to create a serverless resource with AWS SAM, there are some advantages and disadvantages and it all depends on your strategy. Hope you enjoyed!

  • Understanding the different Amazon S3 Storage Classes

    What are Amazon S3 Storage Classes? Amazon S3 (Simple Storage Service) provides a strategic way to organize objects in different layers, where each layer has its particularities that we will detail later. The storage classes are characterized by offering different levels of durability, availability, performance and costs. For this, you must understand well which strategy to use to keep the objects aiming at the best cost benefit. Next, we'll detail each class, describing its advantages and disadvantages. S3 Standard The S3 Standard storage class is the default and most widely used option for Amazon S3. It is designed to provide high durability, availability, and performance for frequently accessed objects. Advantages S3 Standard is the most common class used in storing and accessing objects more frequently, as it is the layer that offers low latency and this allows it to be used for different use cases where dynamic access to objects is essential. Another advantage is the durability of 99.999999999%, which means that the chances of objects being corrupted or even lost is very low. As for availability, this class provides a SLA of 99.99%, which means that the objects have high availability for access. Disadvantages S3 Standard has some disadvantages compared to other classes. One of them is the high cost of storage for rarely accessed objects. That's why it's important to define lifecycle policies to deal with infrequently accessed objects. In this case, there is the S3 Standard-Infrequent Access class, which would be most appropriate for this context. We will talk about this class shortly. Another disadvantage is related to accessing newly created objects. Even though this class has low latency as one of its main characteristics. Newly created objects may not be immediately available in all regions, and it may take time for objects to become available for some regions, causing high latency S3 Intelligent-Tiering The S3 Intelligent-Tiering storage class provides a mechanism where you can automatically move objects based on usage pattern to more suitable tiers, looking for lower storage costs. Advantages The concept itself says it all about one of the advantages of using S3 Intelligent-Tiering. This class is capable of managing objects based on the usage pattern. So, for those objects that are rarely accessed, the class itself moves to more suitable classes aiming at lower storage costs. S3 Intelligent-Tiering automatically monitors and moves objects to the most suitable layers according to the usage pattern, generally this integration works for 3 types of layers. An optimized layer for frequently accessed objects, an optimized layer for rarely accessed objects, which according to AWS generates savings of up to 40%. And a last layer targeted at objects that are rarely accessed, generating storage savings of around 68%. Another point of advantage is that there's no charge for data access using S3-Intelligent-Tiering. Only charges for storage and transfer. Disadvantages Possible increase in latency for objects accessed for the first time. The reason is that when moving objects to more suitable layers, there's the possibility of increasing latency for these objects that are rarely accessed. S3 Standard-Infrequent Access (S3 Standard-IA) Suitable class for storing objects with less frequent accesses but that need to be available for quick accesses, keeping a low latency. It is a typical class for storing long-term data. Advantages The storage cost is lower compared to the S3 Standard class, maintaining the same durability characteristics. Regarding data availability, it has the same characteristics as the S3 Intelligent-Tiering class, with 99.9% SLA. Also, it allows fast access to data by offering a high throughput rate. The minimum storage fee is charged monthly, unlike classes such as S3-Standard and S3-Intelligent Tiering. Disadvantages Access data is charged per gigabyte accessed. So, depending on the frequency of access and volume accessed, it would be better to keep the data in a layer like S3 Standard. Everything will depend on your strategy. S3 One Zone-Infrequent Access (S3 One Zone-IA) Ideal storage class for objects that are accessed infrequently and will only be available in one zone (Availability Zone). AWS itself suggests this class for secondary data backup copies. Advantages The cost is lower compared to other storage classes, as the data will be stored in only one zone, making a low cost operation. Disadvantages Unlike other storage classes, where object storage is available in at least 3 availability zones (AZ). The S3 One Zone-Infrequent Access makes data available in only 1 zone, meaning that there is no redundancy. So there's a possibility of data loss if that zone fails. S3 Glacier Instant Retrieval S3 Glacier Instant Retrieval is part of the Glacier family, which features low-cost storage for accessed objects. It's an ideal storage class for archiving data that needs immediate access. Advantages Low storage costs. It has the same availability compared to S3 Intelligent-Tiering and S3 Standard-IA classes. Provides redundancy, which means that the data is replicated to at least 3 Availability Zones (AZ). Disadvantages Although it offers immediate data recovery while maintaining the same throughput as classes like S3 Standard and S3 Standard-IA, the cost becomes high when it's necessary to recover this data with a high frequency in a short period. S3 Glacier Flexible Retrieval S3 Glacier Flexible Retrieval is the old storage class called just S3 Glacier, it has a characteristic to store objects with long life duration, like any other class of the Glacier family. This class is ideal for objects that are accessed 1 to 2 times a year and that require recovery asynchronously, without immediate access. Advantages This class is ideal for keeping objects that don't require immediate recovery, making it a cost advantage. In this case, data as a backup, in which recovery is very rare, this class does not offer recovery costs due to the idea that the frequency of accessing this data is very close to zero. Disadvantages Retrieval time can be slow for some scenarios. As a feature of its own class, S3 Glacier Flexible Retrieval may fall short when immediate access to data is required. S3 Glacier Deep Archive Lowest cost storage class among the Glacier family classes. Ideal for storing data that can be accessed 1 to 2 times a year. AWS suggests using this class for scenarios where we have to keep data between 8 to 10 years in order to comply with regulations related to compliance or any other type of rules related to data retention for long periods. Advantages The lowest cost among classes in the same segment and with 99.99% availability. Available class in at least 3 Availability Zones (AZ) and ideal for data that requires long retention periods. Disadvantages Long recovery time. So, if you need a quick data retrieval, maybe this SLA may not meet expectations. Because it has a characteristic in which the data must be rarely accessed and the cost of recovery can be higher, depending on the frequency of accesses. Well that’s it, I hope you enjoyed it!

  • Creating a Spring Boot project with Intellij IDEA

    Sometimes we have to create a new project for any reason: study, work or just a test. There also a lot of tools that help us to create one, in this tutorial I will show you how to create a new spring-boot project direct from you Intellij IDE. For this tutorial I am using the last version of Intellij IDE (2023.1) Creating the project: The first step is creating a new project, can go to: File > New > Project After, that you have to select Spring Initializr and fill your project information. In this window you fill: Name: name is the name of your project Location: the local that the project will be saved Language: language of you project Type: dependency management that will help us with the dependencies Gradle Or Maven Group: name of the base packages Artifact: name of the artifactory Package name: base package that will store your classes Jdk: local java jdk that you will use Packaging: type of the package that the project will generated, for spring-boot we use Jar When you click in Next, you can choose all the dependencies of your project. If you like create a spring-boot rest api, find for dependency Spring Web like the following image. When you finish to choose all dependencies, click in Create to generate the project. In this moment you project are created and the IDE will try download all dependencies and configurations. Now you can start code ! Links: Intellij IDEA - https://www.jetbrains.com/idea/

  • Database version control with Flyway and Spring boot

    When we're working with microservices one of the goals are to have self-container applications. The database in general is one of the items that we have to handle and in a lot of cases It was managed outside of the application. One framework that allows us to version your database with migrations is Flyway (https://flywaydb.org/). Flyway help us to bring all the changes of the database for your spring-boot project throw the SQL scripts and some metadata to handle all the changes of the database. The advantage of this method is that anyone with the project will have the same state of the DB. In general, a copy of the developing or production database. In this article I'll show how to configure the Flyway in a spring-boot project. Creating the project To create the project, we'll use the official site of spring to setup a new project. First access the website: https://start.spring.io/ When the website is open, you can set the configurations like the following image: When is done, you can click on the GENERATE button to download the configured project. After that, you can import it to you IDE. I will use Intellij Idea (https://www.jetbrains.com/idea/). Understanding the Flyway If you open the file pom.xml, you will see the dependency of flyway-core like this: In the project structure you will see the folder db.migration, we will save all the SQL files inside this folder. When we start up the project one of the tasks will see if any new script was included in project, if we have a new one the project will run it on the database. To create your new script, we have to follow some pattern in the name of the file. The pattern needs to be a number that will be incremented to help flyway see the sequence of the migration execution. For this tutorial we will have to create a script like the following example and use V1, V2, V3 to increment the new files: V1__create_base_tables.sql Creating the first file Create the new file called V1_create_base_tables.sql in db.migration folder, following the script below: Configuring database To simplify your tutorial I will use the h2 database (a memory DB) to show how the flyway works it. We need to set the project with the H2 parameters. In the pom.xml file add the following dependency: And next we will set the login settings in the project, in the application.properties file add the following settings: After running, you'll see similar logs on console: 2023-04-07 14:12:29.896 INFO 8012 --- [ main] o.f.c.i.database.base.BaseDatabaseType : Database: jdbc:h2:mem:testdb (H2 2.1) 2023-04-07 14:12:30.039 INFO 8012 --- [ main] o.f.core.internal.command.DbValidate : Successfully validated 1 migration (execution time 00:00.037s) 2023-04-07 14:12:30.055 INFO 8012 --- [ main] o.f.c.i.s.JdbcTableSchemaHistory : Creating Schema History table "PUBLIC"."flyway_schema_history" ... 2023-04-07 14:12:30.132 INFO 8012 --- [ main] o.f.core.internal.command.DbMigrate : Current version of schema "PUBLIC": << Empty Schema >> 2023-04-07 14:12:30.143 INFO 8012 --- [ main] o.f.core.internal.command.DbMigrate : Migrating schema "PUBLIC" to version "1 - create base tables" 2023-04-07 14:12:30.177 INFO 8012 --- [ main] o.f.core.internal.command.DbMigrate : Successfully applied 1 migration to schema "PUBLIC", now at version v1 (execution time 00:00.057s) 2023-04-07 14:12:30.477 INFO 8012 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default] When we insert a new script, like V2__new_tables.sql, the flyway will execute only the new script. Consideration: in this case we're using a memory database, when the application stops all data will be lost. When we start it again with the second script, flyway will start the database again running all the scripts. For the next posts I will cover a real database and explore some use cases. Conclusion Versioning the database from the projects give us some advantages like give to all the developers a mirror of the database. When we have any modifications, the application will handle those modifications and apply to development or production environment. References For more details you can see the official Spring documentation: https://docs.spring.io/spring-boot/docs/current/reference/html/howto.html#howto.data-initialization.migration-tool.flyway Creating migrations: https://flywaydb.org/documentation/tutorials/baselineMigrations H2 database http://www.h2database.com/html/tutorial.html

  • Tutorial: Kinesis Firehose Data Transformation with Terraform and Java

    Introduction AWS provides different ways to transform data through its services and one of my favorites is Kinesis Firehose Data Transformation. This is a strategy for transforming data by leveraging the stream service to deliver data to a destination. For this tutorial, we're going to use the strategy below. Kinesis Firehose will send data, and instead of writing it to the S3 bucket, it will invoke a Lambda to transform that data and then send it back to Kinesis Firehose which will deliver the same data to S3. Creating project For this post we'll use Java as language and Maven as a dependency manager. Therefore, it's necessary to generate a Maven project that will create the structure of our project. If you don't know how to generate a Maven project, I recommend seeing this post where I show how to generate it. Project structure After generating the Maven project and importing it into your IDE, we're going to create the same files and packages on the side, except for the pom.xml that was created by the maven generator. Inside the java/ folder, create a package called coffee.tips.lambda and also create a Java class called Handler.java inside this same package. Now, create a package called model inside coffee.tips then, create two java classes: Record.java Records.java Lastly, create a new package called status and also create an enum called Status. Creating Record Class Why do we need to create Record class? Kinesis Firehose expects an object as return, containing the above fields. This happens when Kinesis Firehose invokes Lambda to transform data and the same Lambda must return an object containing these filled fields. recordId This field value must contain the same id from Kinesis record ID result This field value controls the transformation status result. The possible values are: Ok: Record successfully transformed. Dropped: Record dropped intentionally according to your processing logic. ProcessingFailed: Data could not be transformed. data The transformed data payload, after data be encoded to base64. This model must contain these following parameters. Otherwise, Kinesis Firehose rejects it and sets it as data transformation failure. Creating Records Class Records class will be our response Java class containing a list of Record type. Creating Status Enum I decided to create the Enum above just to create an elegant code, but it's useful when we need to map different values for specific context. This Enum will be used in our logic code to transform data. Creating Handler Class The Handler class will be our controller for the Lambda. This Lambda will be invoked by Kinesis Firehose passing some parameters containing the data to be transformed. Note that, for the handleRequest method, a parameter called input of KinesisFirehoseEvent type contains records sent by Kinesis Firehose and the same method will return an object of Records type containing a list of records that later be sent back to Kinesis Firerose delivering to the S3. Within iteration using Java Stream, we create some conditions just to explore how the result field works. Depending on condition, we set the result value to Dropped, which means the data won't be delivered to Kinesis Firehose. Otherwise, for those that were set to Ok, the data will be sent to Kinesis Firehose. Another detail is that you can change values during execution. We set "TECH" as the value for TICKER_SYMBOL field when the SECTOR value is TECHNOLOGY. It's a way to transform data. Finally, two other methods were created just to decode and encode data as a requirement for the processing to work well. Updating pom.xml After generating our project via Maven, we need to add some dependencies and a maven plugin to package the code and libraries for deployment. Following the pom.xml content below. Creating resources with Terraform Instead creating the Kinesis Firehose, Lambda, policies and roles manually via console, we're going to create via Terraform. If you don't know much about Terraform, I recommend seeing this tutorial Getting started using Terraform om AWS. Inside terraform folder, create the following files: vars.tf content vars.tf file is where we declare the variables. Variables provides flexibility when we need to work with different resources. vars.tfvars content Now we need to set the values of these variables. So, let's create a folder called /development inside the terraform folder. After folder creation. Create a file called vars.tfvars like side image and paste the content below. Note the for bucket field, you must specify the name of your own bucket. Bucket's name must be unique. main.tf content For this file, we just declare the provider. Provider is the cloud service where we're going to use to create our resources. In this case, we're using AWS as provider and Terraform will download the necessary packages to create the resources. Note that for region field, we're using var keyword to assign the region value already declared in vars.tfvars file. s3.tf content This file is where we're declare resources related to S3. In this case, we only create S3 bucket. But, if you want to create more S3 related features like policies, roles and etc, you can declare it here. lambda.tf content The content below will be responsible for creating AWS Lambda and its roles and policies. Note that in the same file we created a resource called aws_s3_object. It's a strategy to upload the Jar file directly to S3 after packaging. Maintaining some files on S3 is a smart way when we have large files. Understanding lambda.tf content 1. We declared aws_iam_policy_document data sources that describes what actions the resources that will be assigned to these policies can perform. 2. aws_iam_role resource that provides IAM role and will control some Lambda's actions. 3. We declared aws_s3_object resource because we want to store our jar file on S3. So, during the deploy phase, Terraform will get the jar file that will be created on target folder and uploading to S3. depends_on: Terraform must create this resource before the current. bucket: It's the bucket's name where will store the jar file. key: jar's name. source: source file's location etag: triggers updates when the value changes 4. aws_lambda_function is the resource responsible to create Lambda and we need to fill some fields such as: function_name: Lambda's name. role: Lambda role declared in previous steps that provides access to AWS services and resources. handler: In this field you need to pass main class directory. source_code_hash: This field is responsible to trigger lambda updates. s3_bucket: It's the bucket's name where also will store the jar file generated during deploy. s3_key: Jar's name. runtime: Here you can set the programming language supported by Lambda. For this example, java11. timeout: Lambda's timeout of execution. 5. aws_iam_policy provides IAM policies for the resources where we define some actions to be performed. In this case, we define actions such as Lambda invocation and CloudWatch logging. 6. For aws_iam_role_policy_attachment resource, we can attach IAM policies to IAM roles. In this case, we attached lambda_iam_role and lambda_policies previously created. 7. Finally, we have aws_lambda_permission resource, we need this resource to give Kinesis Firehose permission to invoke Lambda. kinesis.tf content Understanding kinesis.tf content 1. We declared aws_kinesis_firehose_delivery_stream resource and its fields, following the details: destination: That's the destination itself, Kinesis provides a mechanism to deliver data to S3 (extended_s3), Redshift, Elasticsearch (OpenSearch service from AWS), splunk and http_endpoint. name: Kinesis Firehose name depends_on: Kinesis Firehose will be created if S3 Bucket already exists. extended_s3_configuration: 1. bucket_arn: S3 Bucket setting with arn. 2. role_arn: ARN role. 3. prefix: S3 Bucket folder where data will be stored. You can specify time format using the following expressions, "/year=! {timestamp:yyyy}/month=!{timestamp:MM}/". 4. error_output_prefix: For this field, you can define a path to store the process failure results. 5. buffer_interval: Kinesis Firehose buffer to deliver data through a specific interval. 6. buffer_size: Kinesis Firehose buffer to deliver data through a specific size. Kinesis Firehose has the both options to handle buffer. 7. compression_format: There are some compression format options like ZIP, Snappy, HADOOP_SNAPPY and GZIP. For this tutorial, we chose GZIP. processing_configuration: That's the block where we define which resource will be processed. For this case, AWS Lambda. 1. enabled: true to enable and false to disable. 2. type: Processor's type. 3. parameter_value: Lambda function name with arn. 2. We declared aws_iam_policy_document data sources that describes what actions the resources that will be assigned to these policies can perform. 3. aws_iam_role resource that provides IAM role to control some Kinesis actions. 4. aws_iam_policy provides IAM policies for the resources where we define some actions to be performed. In this case, we define S3 and some Lambda actions. 5. For aws_iam_role_policy_attachment resource, we can attach IAM policies to IAM roles. In this case, we attached firehose_iam_role and firehose_policies previously created. Packaging We've created our Maven project, Handler class with Java and Terraform files to create our resources on AWS. Now, let's run the following commands to deploy the project. First, open the terminal and be sure you're root project directory and running the following maven command: mvn package The above command will package the project creating the Jar file to be deployed and uploaded to S3. To be sure, go and check target folder and see that some files were created including lambda-kinesis-transform-1.0.jar file. Running Terraform Now, let's run some Terraform commands. Inside terraform folder, run the following commands on terminal: terraform init The above command will initiate terraform, downloading terraform libraries and also validate the terraform files. For the next command, let's run the plan command to check which resources will be created. terraform plan -var-file=development/vars.tfvars After running, you'll see similar logs on console: Finally, we can apply to create the resources through the following command: terraform apply -var-file=development/vars.tfvars After running, you must confirm to perform actions, type "yes". Now the provision has been completed! Sending messages Well, now we need to send some messages to be transformed and we're going to send them via Kinesis Firehose console. Obviously there are other ways to send it, but for this tutorial we're going to send through the easiest way. Open the Kinesis Firehose console, access the Delivery Stream option as shown in the image below. In the Test with demo data section, click to Start sending demo data button. After clicking, the messages will be sent through Kinesis Firehose and according to buffer settings, Kinesis will take 2 minutes to deliver the data or if it reaches 1 MIB of data amount. Let's take a look to our Lambda and see the metrics: Click on the Monitor tab then Metrics option and note that Lambda has been invoked and there's no errors. Transformed data results Now that we know everything is working fine, let's take a look at the transformed data directly on Amazon S3. Go and access the created S3 Bucket. Note that many files were created. Let's read one of them and see the transformed data. Choose a file like as in the image below and click on the Actions button and then on the Query with S3 Select option. Following the selected options in the image below, click on Run SQL query button to see the result. Based on above image you can see that according to Handler.java which we defined an algorithm to drop data with CHANGE field value less than zero and for those with SECTOR field value equals TECHNOLOGY we would set TICKER_SYMBOL field value to TECH. This was an example of how you can transform data using Kinesis Firehose Data Transformation and Lambda as an inexpensive component to transform data. Stop Sending messages You can stop sending messages before destroying the created resources via Terraform looking to save money. So, just go back to the Kinesis Firehose console and click on Stop sending demo data button. Destroy AWS Billing charges will happen if you don't destroy these resources. So I recommend destroying them by avoiding some unnecessary charges. To avoid it, run the command below. terraform destroy -var-file=development/vars.tfvars Remember you need to confirm this operation, cool? Conclusion Kinesis Firehose definitely isn't just a service to deliver data. There's flexibility integrating AWS services and the possibility to deliver data to different destinations making data transformation and applying logic according to your use case. Github repository Books to study and read If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s): AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study. Setup recommendations If you have interesting to know what's my setup I've used to develop my tutorials, following: Notebook Dell Inspiron 15 15.6 Monitor LG Ultrawide 29WL500-29 Well that’s it, I hope you enjoyed it!

  • Understanding AWS Lambda in 3 minutes

    AWS Lambda is a compute service with no server also know as Serverless computing. Running Lambda enables you create backend applications using different programming languages as Java, Python, Node.js, .Net, Ruby, Go and more. The best part about Lambda is that you don't need to worry about servers instance to deploy and run it. No concerns about provisioning capacity responsibilities that usually happens to EC2 instances becoming a cheapest alternative to compose architectures. How it works Lambdas also are used to compose architectures being responsible for specific workloads. For example, using Lambda you can start listening to files from a S3 Bucket processing them to normalize it or you can use EventBridge (Cloudwatch events) creating schedules through a cron expression to trigger the Lambda to run workloads and after that shutdown the process. As shown in the image below, we have a some examples of AWS Lambda integrations, so you can use them to invoke Lambdas for a variety of scenarios. Limitations Lambdas can run for up to 15 minutes, so if you want to try it out, be careful handling workloads that take longer than 15 minutes. Integrations As mentioned earlier, AWS Lambda allows for various service integrations to be used as a trigger. If you want to listen to objects created on S3 Bucket, you can use S3 as a trigger. If you need to process notifications from SNS, you also can set Amazon Simple Notification Service (SNS) as the trigger and it will receive all the notification to process. Note that we have different scenarios that Lambda can solve solutions efficiently. Here you can see a complete list about the integrated services. Prices AWS has certain policies regarding the use for each service. Lambdas are basically billed by number of requests and code execution time. For more details, see here. Use Cases Here we'll have some example where the use of Lambda can be an interesting option. Data processing: Imagine you must normalize unstructured files into semi-structured to be read by some process. In this case it is possible to listen to a S3 Bucket looking for new objects to be transformed. Security: A Lambda that updates an application's users token Data Transform: You can use Kinesis/Firehose as a trigger, so that Lambda can listen to each event, transform it, and send it back to Kinesis to delivery the Data to S3. Benefits Price: Pay just for requests and code runtime Serverless: No need server application Integrated: Lambda provides integration with AWS Services Programming Language: It's possible to use main programming languages Scaling and Concurrency: Allows you to control the concurrency and scaling the number of executions until account limit Books to study and read If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s): AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study. Well that’s it, I hope you enjoyed it!

  • Converting Parquet table to Delta Table

    For this post we're going to create examples to how convert parquet table to Delta table. First, we'll create a parquet table from scratch through a Spark Dataframe and then converting to Delta table. Using Delta table has some benefits comparing to a Parquet table. Delta enables to restore versions of your table through time travel function, ACID supports and more. Creating a Parquet table First of all, let's create a parquet table to be converted later to Delta Table. I'll prefer create a parquet table from scratch to bring a better understanding. The following code will be executed once, just to create a parquet table. We're going to use a Spark Dataframe that will be loaded from a JSON file containing semi-structured records. public static void main(String[] args){ SparkConf conf = new SparkConf(); conf.setAppName("spark-delta-table"); conf.setMaster("local[1]"); SparkSession session = SparkSession.builder() .config(conf) .getOrCreate(); Dataset dataFrame = session.read().json("product.json"); dataframe.write().format("parquet").save("table/product"); } The above example, we start creating a SparkSession object to create and manage a Spark Dataframe that was loaded from the product.json file content. Alter load, the Dataframe creates (or write) a table in parquet format in the table/product directory. JSON content File represented by product.json file that contains semi-structured records. {"id":1, "name":"rice", "price":12.0, "qty": 2} {"id":2, "name":"beans", "price":7.50, "qty": 5} {"id":3, "name":"coke", "price":5.50, "qty": 2} {"id":4, "name":"juice", "price":3.80, "qty": 1} {"id":5, "name":"meat", "price":1.50, "qty": 1} {"id":6, "name":"ice-cream", "price":6.0, "qty": 2} {"id":7, "name":"potato", "price":3.70, "qty": 10} {"id":8, "name":"apple", "price":5.60, "qty": 5} After running the code above, parquet files will be generated in the table/product directory containing the files below. Converting Parquet table to Delta Table Now that we have a Parquet table already created, we can convert easily to Delta Table, let's do this. public static void main(String[] args){ SparkConf conf = new SparkConf(); conf.setAppName("spark-delta-table"); conf.setMaster("local[1]"); SparkSession session = SparkSession.builder() .config(conf) .getOrCreate(); DeltaTable.convertToDelta(session, "parquet.`table/product`"); } DeltaTable.convertToDelta method is responsible to convert parquet table to Delta table. Note that we had to use SparkSession as a parameter and also specify the path of parquet table using this format "parquet.``" . The result after execution you can see in the picture below. After conversion running, Delta creates the famous _delta_log directory containing commit info and checkpoint files. Books to study and read If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s): AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study. Well that's it, I hope you enjoyed it.

  • Understanding Delta Lake Time Travel in 2 minutes

    Delta Lake provides a way to version data for operations like merge, update and delete. This makes transparent how data life cycle inside Delta Lake works it. For each operation a version will be incremented and if you have a table with multiple operations, different versions of table will be created. Delta Lake offers a mechanism to navigate over the different versions called Time Travel. It's a temporary way to access data from the past. For this post we're going to use this feature to see different versions of table. Below we have a Delta Table called people that all versions were generated through write operations using append mode. Current version When we perform a simple read on a table, the current version is always the must recent one. So, for this scenario, the current version is 2 (two). Note that we don't need to specify which version we want to use because we're not using Time Travel yet. session.read().format("delta").load("table/people") .orderBy("id").show(); Nothing changes at the moment, let's keep for the next steps. Working with Time Travel Here begins how we can work with Time Travel, for the next steps, we'll perform readings on the people table specifying different versions to understand how Time travel works. Reading Delta table - Version 0 (zero) Now we're going to work with different versions starting from the 0 (zero) version, let's read the table again but now adding a new parameter, take a look at the code below. session.read().format("delta") .option("versionAsOf", 0) .load("table/people") .orderBy("id").show(); Notice that we added a new parameter called versionAsOf , this parameter allows us to configure the number of version you want to restore temporarily for a table. For this scenario we configure the reading for the Delta Table version zero (0). This was the first version generated by Delta Lake after write operation. Reading Delta table - Version 1 (one) For this last step we're using the version one (1), note that the data from the previous version has been maintained because an append mode was executed. session.read().format("delta") .option("versionAsOf", 1) .load("table/people") .orderBy("id").show(); Delta lake has a lot of benefits and Time travels allows us flexibility in a Big Data architecture, for more details I recommend see the Delta Lake docs . Books to study and read If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s): AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study. Well that's it, I hope you enjoyed it.

  • Tutorial: Creating AWS Lambda with Terraform

    For this post, we're going to create an AWS Lambda with Terraform and Java as runtime. But first of all, have you heard about Lambda? I recommend seeing this post about Lambda. And about Terraform? There's another post that I can show you the first steps using Terraform, just click here. The idea about this post is to create an AWS Lambda that will triggered by CloudWatch Events through an automated schedule using a cron or rate expressions. Usually we can create any AWS resource using the console but here, we're going to use Terraform as an IAC (Infrastructure as code) tool that will create any necessary resource to run our AWS Lambda. As runtime, we choose Java. So it's important that you understand about maven at least. Remember you can run Lambda using different languages as runtime such as Java, Python, .NET, Node.js and more. Even if it's a Java project, the most important part of this post is try to understand about Lambda and how can you provisioning through Terraform. Intro Terraform will be responsible to create all resources for this post such as Lambda, roles, policies, CloudWatch Events and S3 Bucket where we're going to keep the JAR file from our application. Our Lambda will be invoked by CloudWatch Events every 3 minutes running a simple Java method that prints a message. This is going to be a simple example that you can reuse in your projects. You can note in the image above we're using S3 to stored our deployment package, JAR file in this case. It's an AWS recommendation to upload larger deployment packages directly to S3 instead maintaining on Lambda itself. S3 has better support for uploading large files without worry with storage. Don't worry to upload files manually, Terraform also will be responsible to do that during the build phase. Creating the project For this post we're going to use Java as language and Maven as a dependency manager. Therefore is necessary to generate a Maven project that will create our project structure. If you don't know how to generate a Maven project, I recommend seeing this post where I show how to generate it. Project structure After generating the Maven project, we're going to create the same files and packages on the side, except pom.xml that was created by the maven generator. It's a characteristic of Maven projects to generate these folders structure as shown src/main/java/. Within java/ folder, create a package called coffee.tips.lambda and create a Java class named Handler.java within this same package. Updating pom.xml For this post, add the following dependencies and build below. Creating a Handler A handler is basically the Lambda controller. Lambda always look for a handler to start its process, to summarize, it's the first code to be invoked. For the handler below, we created a basic handler just to log messages when invoked by CloudWatch Events. Note that we implemented RequestHandler interface allowing receiving as parameter a Map object. But for this example we won´t explore data from this parameter. Understanding Terraform files Now we're going to understand how the resources will be created using Terraform. vars.tf vars.tf file is where we declare the variables. Variables provides flexibility when we need to work with different resources. vars.tfvars Now we need to set the values of these variables. So, let's create a folder called /development inside the terraform folder. After folder creation. Create a file called vars.tfvars like side image and paste the content below. Note the for bucket field you must specify the name of your own bucket. Bucket's name must be unique. main.tf To this file we just declare the provider. Provider is the cloud service where we're going to use to create our resources. In this case, we're using AWS as provider and Terraform will download the necessary packages to create the resources. Note that for region field, we're using var keyword to get the region value already declared in vars.tfvars file. s3.tf This file is where we're declaring resources related to S3. In this case, we only created S3 bucket. But if you want to create more resources related to S3 such as policies, roles, S3 notifications and etc, you can declare here. It's a way to separate by resource. Note again, we're using var keyword to bucket variable declared in vars.tf file. lambda.tf Finally our last terraform file, in this file we're declaring resources related to the Lambda and the Lambda itself. Now I think it's worth explaining some details about the above file. So, let's do this. 1. We declared 2 aws_iam_policy_document data sources that describes what actions the resources that will be assigned to these policies can perform. 2. aws_iam_role resource that provides IAM role and will control some Lambda's actions. 3. aws_iam_role_policy that provides IAM role inline policy and will register the previous role and policies related to the aws_iam_policy_document.aws_iam_policy_coffee_tips_aws_lambda_iam_policy_document. 4. We declared aws_s3_object resource because we want to store our jar file on S3. So, during the deploy phase, Terraform will get the jar file that will be created on target folder and uploading to S3. depends_on: Terraform must create this resource before the current. bucket: It's the bucket's name where will store the jar file. key: jar's name. source: source file's location etag: triggers updates when the value changes 5. aws_lambda_function is the resource responsible to create Lambda and we need to fill some fields such as: function_name: Lambda's name. role: Lambda role declared in previous steps that provides access to AWS services and resources. handler: In this field you need to pass main class directory. source_code_hash: This field is responsible to trigger lambda updates. s3_bucket: It's the bucket's name where will store the jar file generated during deploy. s3_key: Jar's name. runtime: Here you can pass the Lambda supported programming languages. For this example, java11. timeout: Lambda's timeout of execution. 6. aws_cloudwatch_event_rule is the rule related to the CloudWatch event execution. In this case, we can set the cron through schedule_expression field to define when the lambda will run. 7. aws_cloudwatch_event_target is the resource responsible to trigger the Lambda using CloudWatch events. 8. aws_lambda_permission allows some executions from CloudWatch. Packaging Now you're familiar about Lambda and Terraform, let's packaging our project via Maven before Lambda creation. The idea is to create a jar file that will be used for Lambda executions and store at S3. For this example, we're going to package locally. Remember that for an environment production we could use continuous integrations tool such Jenkins, Drone or even Github actions to automate this process. First, open the terminal and be sure you're root project directory and running the following maven command: mvn clean install -U This command besides packaging the project, will download and install the dependencies declared on pom.xml file. After running the above command, a jar file will be generated within target/ folder also created. Running Terraform Well, we're almost there. Now, let's provision our Lambda via Terraform. So let's run some Terraform commands. Inside terraform folder, run the following commands on terminal: terraform init The above command will initiate terraform, downloading terraform libraries and also validate the terraform files. For the next command, let's run the plan command to check which resources will be created. terraform plan -var-file=development/vars.tfvars After running, you'll see similar logs on console: Finally, we can apply to create the resources through the following command: terraform apply -var-file=development/vars.tfvars After running, you must confirm to perform actions , type "yes". Now the provision has been completed! Lambda Running Go and access the AWS console to see the Lambda execution. Access monitor tab Access Logs tab inside Monitor section See the messages below that will printed every 2 minutes Destroy AWS Billing charges will happen if you don't destroy these resources. So I recommend destroying them by avoiding some unnecessary charges. To avoid it, run the command below. terraform destroy -var-file=development/vars.tfvars Remember you need to confirm this operation, cool? Conclusion In this post, we created an AWS Lambda provisioned by Terraform. Lambda is an AWS service that we can use for different use cases bringing facility to compose an architecture of software. We could note that Terraform brings flexibility creating resources for different cloud services and easy to implement in software projects. Github repository Books to study and read If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s): AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study. Setup recommendations If you have interesting to know what's my setup I've used to develop my tutorials, following: Notebook Dell Inspiron 15 15.6 Monitor LG Ultrawide 29WL500-29 Well that’s it, I hope you enjoyed it!

bottom of page