DZone Spotlight

Saturday, June 8 View All Articles »

New Ways for CNAPP to Shift Left and Shield Right: The Technology Trends That Will Allow CNAPP to Address More Extensive Threat Models

By Laurent Balmelli, PhD

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. The cloud-native application protection platform (CNAPP) model is designed to secure applications that leverage cloud-native technologies. However, applications not in the scope are typically legacy systems that were not designed to operate within modern cloud infrastructures. Therefore, in practice, CNAPP covers the security of containerized applications, serverless functions, and microservices architectures, possibly running across different cloud environments. Figure 1. CNAPP capabilities across different application areas A good way to understand the goal of the security practices in CNAPPs is to look at the threat model, i.e., attack scenarios against which applications are protected. Understanding these scenarios helps practitioners grasp the aim of features in CNAPP suites. Note also that the threat model might vary according to the industry, the usage context of the application, etc. In general, the threat model is attached to the dynamic and distributed nature of cloud-native architectures. Such applications face an important attack surface and an intricate threat landscape mainly because of the complexity of their execution environment. In short, the model typically accounts for unauthorized access, data breaches due to misconfigurations, inadequate identity and access management policies, or simply vulnerabilities in container images or third-party libraries. Also, due to the ephemeral and scalable characteristics of cloud-native applications, CNAPPs require real-time mechanisms to ensure consistent policy enforcement and threat detection. This is to protect applications from automated attacks and advanced persistent threats. Some common threats and occurrences are shown in Figure 2: Figure 2. Typical threats against cloud-native applications Overall, the scope of the CNAPP model is quite broad, and vendors in this space must cover a significant amount of security domains to shield the needs of the entire model. Let’s review the specific challenges that CNAPP vendors face and the opportunities to improve the breadth of the model to address an extended set of threats. Challenges and Opportunities When Evolving the CNAPP Model To keep up with the evolving threat landscape and complexity of modern organizations, the evolution of the CNAPP model yields both significant challenges and opportunities. Both the challenges and opportunities discussed in the following sections are briefly summarized in Table 1: Table 1. Challenges and opportunities with evolving the CNAPP model Challenges Opportunities Integration complexity – connect tools, services, etc. Automation – AI and orchestration Technological changes – tools must continually evolve Proactive security – predictive and prescriptive measures Skill gaps – tools must be friendly and efficient DevSecOps – integration with DevOps security practices Performance – security has to scale with complexity Observability – extend visibility to the SDLC’s left and right Compliance – region-dependent, evolving landscape Edge security – control security beyond the cloud Challenges The integration challenges that vendors face due to the scope of the CNAPP model are compounded by quick technological changes: Cloud technologies are continuously evolving, and vendors need to design tools that are user friendly. Managing the complexity of cloud technology via simple, yet powerful, user interfaces allows organizations to cope with the notorious skill gaps in teams resulting from rapid technology evolution. An important aspect of the security measures delivered by CNAPPs is that they must be efficient enough to not impact the performance of the applications. In particular, when scaling applications, security measures should continue to perform gracefully. This is a general struggle with security — it should be as transparent as possible yet responsive and effective. An often industry-rooted challenge is regulatory compliance. The expansion of data protection regulations globally requires organizations to comply with evolving regulation frameworks. For vendors, this requires maintaining a wide perspective on compliance and incorporating these requirements into their tool capabilities. Opportunities In parallel, there are significant opportunities for CNAPPs to evolve to address the challenges. Taming complexity is an important factor to tackle head first to expand the scope of the CNAPP model. For that purpose, automation is a key enabler. For example, there is a significant opportunity to leverage artificial intelligence (AI) to accelerate routine tasks, such as policy enforcement and anomaly detection. The implementation of AI for operation automation is particularly important to address the previously mentioned scalability challenges. This capability enhances analytics and threat intelligence, particularly to offer predictive and prescriptive security capabilities (e.g., to advise users for the necessary settings in a given scenario). With such new AI-enabled capabilities, organizations can effectively address the skill gap by offering guided remediation, automated policy recommendations, and comprehensive visibility. An interesting opportunity closer to the code stage is integrating DevSecOps practices. While a CNAPP aims to protect cloud-native applications across their lifecycle, in contrast, DevSecOps embeds security practices that liaise between development, operations, and security teams. Enabling DevSecOps in the context of the CNAPP model covers areas such as providing integration with source code management tools and CI/CD pipelines. This integration helps detect vulnerabilities early and ensure that security is baked into the product from the start. Also, providing developers with real-time feedback on the security implications of their activities helps educate them on security best practices and thus reduce the organization’s exposure to threats. The main goal here is to "shift left" the approach to improve observability and to help reduce the cost and complexity of fixing security issues later in the development cycle. A last and rather forward-thinking opportunity is to evolve the model so that it extends to securing an application on “the edge,” i.e., where it is executed and accessed. A common use case is the access of a web application from a user device via a browser. The current CNAPP model does not explicitly address security here, and this opportunity should be seen as an extension of the operation stage to further “shield right” the security model. Technology Trends That Can Reshape CNAPP The shift left and shield right opportunities (and the related challenges) that I reviewed in the last section can be addressed by the technologies exemplified here. Firstly, the enablement of DevSecOps practices is an opportunity to further shift the security model to the left of the SDLC, moving security earlier in the development process. Current CNAPP practices already include looking at source code and container vulnerabilities. More often than not, visibility over these development artifacts starts once they have been pushed from the development laptop to a cloud-based repository. By using a secure implementation of cloud development environments (CDEs), from a CNAPP perspective, observability across performance and security can start from the development environment, as opposed to the online DevOps tool suites such as CI/CD and code repositories. Secondly, enforcing security for web applications at the edge is an innovative concept when looking at it from the perspective of the CNAPP model. This can be realized by integrating an enterprise browser into the model. For example: Security measures that aim to protect against insider threats can be implemented on the client side with mechanisms very similar to how mobile applications are protected against tampering. Measures to protect web apps against data exfiltration and prevent display of sensitive information can be activated based on injecting a security policy into the browser. Automation of security steps allows organizations to extend their control over web apps (e.g., using robotic process automation). Figure 3. A control component (left) fetches policies to secure app access and browsing (right) Figure 4 shows the impact of secure implementation of a CDE and enterprise browser on CNAPP security practices. The use of both technologies enables security to become a boon for productivity as automation plays the dual role of simplifying user-facing processes around security to the benefit of increased productivity. Figure 4. CNAPP model and DevOps SDLC augmented with secure cloud development and browsing Conclusion The CNAPP model and the tools that implement it should be evolving their coverage in order to add resilience to new threats. The technologies discussed in this article are examples of how coverage can be improved to the left and further to the right of the SDLC. The goal of increasing coverage is to provide organizations more control over how they implement and deliver security in cloud-native applications across business scenarios. This is an excerpt from DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report More

You Can Shape Trend Reports — Participate in DZone Research Surveys + Enter the Raffles!

By Caitlin Candelmo

Hello, DZone Community! We have several surveys in progress as part of our research for upcoming Trend Reports. We would love for you to join us by sharing your experiences and insights (anonymously if you choose) — readers just like you drive the content that we cover in our Trend Reports. you can find details for each research survey below Over the coming months, we will compile and analyze data from hundreds of respondents; results and observations will be featured in the "Key Research Findings" of our Trend Reports. Data Engineering Research As a continuation of our annual data-related research, we're consolidating our database, data pipeline, and data and analytics scopes into a single 12-minute survey that will guide help the narratives of our July Database Systems Trend Report and data engineering report later in the year. Our 2024 Data Engineering Survey explores: Database types, languages, and use cases Distributed database design + architectures Data observability, security, and governance Data pipelines, real-time processing, and structured storage Vector data and databases + other AI-driven data capabilities Join the Data Engineering Research You'll also have the chance to enter the $500 raffle at the end of the survey — five random people will be drawn and will receive $100 each (USD)! Cloud and Kubernetes Research This year, we're combining our annual cloud native and Kubernetes research into one 10-minute survey that dives further into these topics as they relate to both one another and at the intersection of security, observability, AI, and more. DZone's research will be informing these Trend Reports: May – Cloud Native: Championing Cloud Development Across the SDLC September – Kubernetes in the Enterprise Our 2024 Cloud Native Survey covers: Microservices, container orchestration, and tools/solutions Kubernetes use cases, pain points, and security measures Cloud infrastructure, costs, tech debt, and security threats AI for release management + monitoring/observability Join the Cloud Native Research Don't forget to enter the $750 raffle at the end of the survey! Five random people will be selected to each receive $150 (USD). Your responses help inform the narrative of our Trend Reports, so we truly cannot do this without you. Stay tuned for each report's launch and see how your insights align with the larger DZone Community. We thank you in advance for your help! —The DZone Publications team More

Trend Report

Cloud Native

Cloud native has been deeply entrenched in organizations for years now, yet it remains an evolving and innovative solution across the software development industry. Organizations rely on a cloud-centric state of development that allows their applications to remain resilient and scalable in this ever-changing landscape. Amidst market concerns, tool sprawl, and the increased need for cost optimization, there are few conversations more important today than those around cloud-native efficacy at organizations.Google Cloud breaks down "cloud native" into five primary pillars: containers and orchestration, microservices, DevOps, and CI/CD. For DZone's 2024 Cloud Native Trend Report, we further explored these pillars, focusing our research on learning how nuanced technology and methodologies are driving the vision for what cloud native means and entails today. The articles, contributed by experts in the DZone Community, bring the pillars into conversation via topics such as automating the cloud through orchestration and AI, using shift left to improve delivery and strengthen security, surviving observability challenges, and strategizing cost optimizations.

Refcard #395

Open Source Migration Practices and Patterns

By Nuwan Dias

CORE

Open Source Migration Practices and Patterns

Refcard #171

MongoDB Essentials

By Abhishek Gupta

CORE

How to Quickly Learn Coding and Land a Job in Tech

From the day I wrote my first Hello World program, it took me two years to land a job at Amazon and another two years to get into Google. That’s because I accomplished this without having a Computer Science degree or attending a boot camp. I made countless mistakes along the way which made my path to becoming a Software Engineer longer than it should have been. I spent countless hours watching YouTube tutorials and paid for numerous Udemy courses, only to find that they added no real value. If I could go back in time and undo all the things that didn't work, I would be in the exact same situation as today within six months of starting programming. That’s exactly why I am writing this helpful piece. Today, I'll cut out all the unnecessary fluff and provide you with the quickest route from beginner to full-time Software Engineer. Avoiding Common Mistakes Most Programmers Make Before I begin, there are three major mistakes that can slow down your progress to become a full-time Software Engineer. I will also share these three mistakes along the way, so stay tuned for that. Choosing the Right Programming Language As a new programmer, your first decision is, "Which programming language should I learn?" To help you answer that, let's discuss what beginners typically look for in a programming language. Number one, the language should be easy and intuitive to write. It should not require learning very complex syntax and should be as close as possible to writing in English. Next, the programming language should be versatile and have many applications. As a beginner, you don’t want to learn a new language for every new project you want to build. In other words, the language should have great returns for the time you invest in learning it. Lastly, the programming language should be fast to write. You shouldn’t have to waste time spelling out the declaration of a new variable or simple iteration through a list. In other words, it should be concise and get the job done in a minimum number of lines of code. As some of you might have already guessed, Python is the language that solves all these problems. It’s almost as easy as writing in English. It has so many different applications like web development, data science, and automation. Python is extremely fast to write when compared with other popular languages because it requires fewer lines of code for the same amount of functionality. As an example, here are the same codes written in Java vs. Python. You can see that Python consists of a few lines while JavaScript contains many lines and long code. JavaScript 1. const fs = require('fs'); 2. const path = require('path'); 3. 4. const directoryPath = path.join(__dirname, '.'); 5. const filePath = path.join(directoryPath, 'Code.txt'); 6. 7. fs.readFile(filePath, 'utf-8', (err, data) => { 8. if (err) { 9. console.error(err); 10. return; 11. } 12. 13. const lines = data.split('\n'); 14. let emptyLineCount = 0; 15. 16. lines.forEach(line => { 17. if (line.trim() === '') { 18. emptyLineCount++; 19. } 20. }); 21. 22. console.log('Number of empty lines:', emptyLineCount); 23. }); Python 1. my_file = open("/home/xiaoran/Desktop/test.txt") 2. 3. print(my_file.read()) 4. 5. my_file.close() Effective Learning Methods Now that we know we should learn Python, let’s talk about how to do it. And this is where most new programmers make the first major mistake that slows them down. The mistake most beginners make is that they learn by watching others code. Let me explain this by telling you how most people learn programming. Most newbies would go to a course provider like Udemy and look up Python courses. Then they pick one of these 20+ hour courses thinking that these courses are long and detailed and hence good for them. And then they never end up finishing the course. That’s because 20 hours of content is not the same as 20 hours of great content. Right Way To Learn Code Some people will go to YouTube and watch someone else code without ever writing any code themselves. Watching these tutorials gives them a false sense of progress. That’s because coding in your head is very different from actually writing down the code and debugging the errors. So, what is the right way to do it? The answer is very simple: you should learn by coding. For this, you can go to this free website called learnpython.org. On this website, just focus on the basic lessons for Python and don’t worry about data science tutorials or any advanced tutorials. That's because even if you learn advanced concepts right now, you will not be able to remember them until you have actually applied them to a real-world problem. You can always come back to learn the advanced concepts in the future when you need them for your projects. If you look at a lesson, each lesson first explains a basic concept and then asks you to apply those concepts to a problem. Feel free to play with the sample code. Think about other problems you can solve with the concepts you just learned and try to solve them in the exercise portion. Once you’re done with the basics, you’re good to move on to the next steps. Building Projects In the spirit of learning by coding, we would do some projects in Python next. In the beginning, it’s very hard to do something on your own, so we’ll take the help of experts. Watch the video below on 12 beginner Python projects. In this video, they build 12 beginner Python projects from scratch. These projects include building Madlibs, Tic Tac Toe, Minesweeper, etc., and all of them are very interesting. They walk you through the implementation of all these projects step by step, making it very easy to follow. But before you start watching this tutorial, there are two things you should know. Setting up Your IDE Number one, you should not watch this tutorial casually. Follow along if you really want to learn programming and become a Software Engineer. To follow along, you would need something called an Integrated Development Environment (IDE) to build these projects. An IDE, in simplest terms, is an application where you can write and run your code. There are several popular IDEs for Python. This tutorial uses VS Code IDE, so you might want to download VS Code and set it up for Python before starting on this tutorial. Once you have completed this tutorial, you are ready to work on your own projects. Developing Your Own Projects Working on building your own projects will help you in multiple ways. Number one, it will introduce you to how Software Engineers work in the real world. You will write code that will fail, and you’ll debug it and repeat the process over and over again. This is exactly what a day in the life of a Software Engineer looks like. Number two, you will build a portfolio of projects by doing this. You can host your code on GitHub and put the link in your resume. This will help you attract recruiters and get your resume shortlisted. Number three, building your own projects will give you confidence that you are ready to tackle new challenges as a Software Engineer. But what kind of projects should you work on? You can think of any projects that you find interesting, but here are some examples I found. You can build a web crawler, an alarm clock, an app that gives you Wikipedia articles of the day, or you can make online calculators. Some example projects are that you can also build a spam filter, an algorithmic trading engine, and an e-commerce website. Preparing for Job Applications Now you have a great resume, and you are confident about your programming skills. Let’s start applying for Software Engineer positions. Wait a second. This is actually the second major mistake new programmers make. You see, in an ideal world, having good programming skills and a great resume is all you should need to become a Software Engineer. But unfortunately for us, tech companies like to play games with us in the interviews. They ask you specific kinds of programming questions in the interviews. If you don’t prepare for these questions, you might not get the expected results. Essential Course: Data Structures and Algorithms So, let’s see how to prepare for interviews. All the interviews are based on this one course that is taught to all Computer Science graduates. This course is called Data Structures and Algorithms. Fortunately for us, Google has created this course and made it available for free on Udacity. And the best part is that this course is taught in Python. In this three-month course, you’ll learn about different algorithms related to searching and sorting. You’ll learn about data structures like maps, trees, and graphs. Don’t worry if you don’t know any of these terms right now. I am sure that by the end of this course, you’ll be a pro. For that, just keep two things in mind. Number One, be regular and finish this course. As I mentioned earlier, most people start courses and never finish them. So, make sure you take small steps every day and make regular progress. Number Two, make sure you complete all the exercises they give in this course. As I have already said many times, the only way to learn coding is by coding. So, try to implement the algorithms on your own and complete all the assignments. Trust me when I say this: when it comes to interviewing for entry-level jobs, this course is the only difference between you and someone who dropped more than a hundred thousand dollars on a computer science degree. So, if you finish this course, you’ll be pretty much on par with someone who has a CS degree when you interview. Interview Preparation After completing this course on Data Structures and Algorithms, you'll have all the foundational knowledge needed to tackle interviews. To further sharpen your skills, practice with questions previously asked by tech companies. For that, you should use a website called Leetcode.com. On Leetcode, you will get interview-style questions. You can write your code and test your solution directly on the website. Leetcode is great for beginners because all the questions are tagged as easy, medium, or hard based on difficulty level. If you buy a premium subscription to the website, you can also filter the questions by the tech company that asked them in past interviews. You should start with easy questions and keep working on them until you can solve them in 45 minutes. Once that happens, you can move on to medium questions. When you start solving mediums in 45 minutes, you can start applying for Software Engineering jobs. If you are lucky, you will get the job right away. For most people, it will be a process full of disappointment and rejection. Handling Rejections And this is where they make the third and the biggest mistake of all—they quit. The main reason people give up early is because they overthink and complicate the interview process. After every rejection, they replay the interview over and over in their head to figure out why they failed and take every rejection personally. To avoid this, stay inside your circle of control and try to influence the outcome of your interviews but never get tangled in the things you can’t control. In other words, do your best to crack the interviews but try to be detached from the outcome of the interviews.

By Sajid khan

Effective Java Application Testing With Cucumber and BDD

Increase your testing efficiency by utilizing Cucumber for Java application testing, fully integrated with Behavior-Driven Development (BDD). This guide provides comprehensive steps for project setup, scenario writing, step implementation, and reporting. Introduction Cucumber is a tool that supports Behavior-Driven Development (BDD). A good starting point in order to learn more about BDD and Cucumber, are the Cucumber guides. BDD itself was introduced by Dan North in 2006, you can read his blog introducing BDD. Cucumber, however, is a tool that supports BDD, this does not mean you are practicing BDD just by using Cucumber. The Cucumber myths is an interesting read in this regard. In the remainder of this blog, you will learn more about the features of Cucumber when developing a Java application. Do know, that Cucumber is not limited to testing Java applications, a wide list of languages is supported. The sources used in this blog can be found on GitHub. Prerequisites Prerequisites for this blog are: Basis Java knowledge, Java 21 is used; Basic Maven knowledge; Basic comprehension of BDD, see the resources in the introduction. Project Setup An initial project can be setup by means of the Maven cucumber-archetype. Change the groupId, artifactId and package to fit your preferences and execute the following command: Shell $ mvn archetype:generate \ "-DarchetypeGroupId=io.cucumber" \ "-DarchetypeArtifactId=cucumber-archetype" \ "-DarchetypeVersion=7.17.0" \ "-DgroupId=mycucumberplanet" \ "-DartifactId=mycucumberplanet" \ "-Dpackage=com.mydeveloperplanet.mycucumberplanet" \ "-Dversion=1.0.0-SNAPSHOT" \ "-DinteractiveMode=false" The necessary dependencies are downloaded and the project structure is created. The output ends with the following: Shell [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2.226 s [INFO] Finished at: 2024-04-28T10:25:16+02:00 [INFO] ------------------------------------------------------------------------ Open the project with your favorite IDE. If you are using IntelliJ, a message is shown in order to install a plugin. Take a closer look at the pom: The dependencyManagement section contains BOMs (Bill of Materials) for Cucumber and JUnit; Several dependencies are added for Cucumber and JUnit; The build section contains the compiler plugin and the surefire plugin. The compiler is set to Java 1.8, change it into 21. XML <dependencyManagement> <dependencies> <dependency> <groupId>io.cucumber</groupId> <artifactId>cucumber-bom</artifactId> <version>7.17.0</version> <type>pom</type> <scope>import</scope> </dependency> <dependency> <groupId>org.junit</groupId> <artifactId>junit-bom</artifactId> <version>5.10.2</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>io.cucumber</groupId> <artifactId>cucumber-java</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>io.cucumber</groupId> <artifactId>cucumber-junit-platform-engine</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.junit.platform</groupId> <artifactId>junit-platform-suite</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.13.0</version> <configuration> <encoding>UTF-8</encoding> <source>21</source> <target>21</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.2.5</version> </plugin> </plugins> </build> In the test directory, you will see a RunCucumberTest, StepDefinitions and an example.feature file in the resources section. The RunCucumberTest file is necessary to run the feature files and the corresponding steps. The feature files and steps will be discussed later on, do not worry too much about it now. Java @Suite @IncludeEngines("cucumber") @SelectPackages("com.mydeveloperplanet.mycucumberplanet") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "pretty") public class RunCucumberTest { } Run the tests, the output should be successful. Shell $ mvn test Write Scenario When practicing BDD, you will need to write a scenario first. Taken from the Cucumber documentation: When we do Behavior-Driven Development with Cucumber we use concrete examples to specify what we want the software to do. Scenarios are written before production code. They start their life as an executable specification. As the production code emerges, scenarios take on a role as living documentation and automated tests. The application you need to build for this blog is a quite basic one: You need to be able to add an employee; You need to retrieve the complete list of employees; You need to be able to remove all employees. A feature file follows the Given-When-Then (GWT) notation. A feature file consists of: A feature name. It is advised to maintain the same name as the file name; A feature description; One or more scenarios containing steps in the GWT notation. A scenario illustrates how the application should behave. Plain Text Feature: Employee Actions Actions to be made for an employee Scenario: Add employee Given an empty employee list When an employee is added Then the employee is added to the employee list Run the tests and you will notice now that the feature file is executed. The tests fail of course, but an example code is provided in order to create the step definitions. Shell [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running com.mydeveloperplanet.mycucumberplanet.RunCucumberTest Scenario: Add employee # com/mydeveloperplanet/mycucumberplanet/employee_actions.feature:4 Given an empty employee list When an employee is added Then the employee is added to the employee list [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.104 s <<< FAILURE! -- in com.mydeveloperplanet.mycucumberplanet.RunCucumberTest [ERROR] Add an employee.Add employee -- Time elapsed: 0.048 s <<< ERROR! io.cucumber.junit.platform.engine.UndefinedStepException: The step 'an empty employee list' and 2 other step(s) are undefined. You can implement these steps using the snippet(s) below: @Given("an empty employee list") public void an_empty_employee_list() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } @When("an employee is added") public void an_employee_is_added() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } @Then("the employee is added to the employee list") public void the_employee_is_added_to_the_employee_list() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } at io.cucumber.core.runtime.TestCaseResultObserver.assertTestCasePassed(TestCaseResultObserver.java:69) at io.cucumber.junit.platform.engine.TestCaseResultObserver.assertTestCasePassed(TestCaseResultObserver.java:22) at io.cucumber.junit.platform.engine.CucumberEngineExecutionContext.lambda$runTestCase$4(CucumberEngineExecutionContext.java:114) at io.cucumber.core.runtime.CucumberExecutionContext.lambda$runTestCase$5(CucumberExecutionContext.java:136) at io.cucumber.core.runtime.RethrowingThrowableCollector.executeAndThrow(RethrowingThrowableCollector.java:23) at io.cucumber.core.runtime.CucumberExecutionContext.runTestCase(CucumberExecutionContext.java:136) at io.cucumber.junit.platform.engine.CucumberEngineExecutionContext.runTestCase(CucumberEngineExecutionContext.java:109) at io.cucumber.junit.platform.engine.NodeDescriptor$PickleDescriptor.execute(NodeDescriptor.java:168) at io.cucumber.junit.platform.engine.NodeDescriptor$PickleDescriptor.execute(NodeDescriptor.java:90) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] The step 'an empty employee list' and 2 other step(s) are undefined. You can implement these steps using the snippet(s) below: @Given("an empty employee list") public void an_empty_employee_list() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } @When("an employee is added") public void an_employee_is_added() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } @Then("the employee is added to the employee list") public void the_employee_is_added_to_the_employee_list() { // Write code here that turns the phrase above into concrete actions throw new io.cucumber.java.PendingException(); } [INFO] [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 Add Step Definitions Add the example code from the output above into the StepDefinitions file. Run the tests again. Of course, they fail, but this time a PendingException is thrown indicating that the steps need to be implemented. Shell [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running com.mydeveloperplanet.mycucumberplanet.RunCucumberTest Scenario: Add employee # com/mydeveloperplanet/mycucumberplanet/employee_actions.feature:4 Given an empty employee list # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_empty_employee_list() io.cucumber.java.PendingException: TODO: implement me at com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_empty_employee_list(StepDefinitions.java:12) at ✽.an empty employee list(classpath:com/mydeveloperplanet/mycucumberplanet/employee_actions.feature:5) When an employee is added # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_employee_is_added() Then the employee is added to the employee list # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.the_employee_is_added_to_the_employee_list() [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.085 s <<< FAILURE! -- in com.mydeveloperplanet.mycucumberplanet.RunCucumberTest [ERROR] Add an employee.Add employee -- Time elapsed: 0.032 s <<< ERROR! io.cucumber.java.PendingException: TODO: implement me at com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_empty_employee_list(StepDefinitions.java:12) at ✽.an empty employee list(classpath:com/mydeveloperplanet/mycucumberplanet/employee_actions.feature:5) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TODO: implement me [INFO] [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 Implement Application The first scenario is defined, let’s implement the application. Create a basic EmployeeService which adds the needed functionality. An employee can be added to an employee list which is just a map of employees. The list of employees can be retrieved and the list can be cleared. Java public class EmployeeService { private final HashMap<Long, Employee> employees = new HashMap<>(); private Long index = 0L; public void addEmployee(String firstName, String lastName) { Employee employee = new Employee(firstName, lastName); employees.put(index, employee); index++; } public Collection<Employee> getEmployees() { return employees.values(); } public void removeEmployees() { employees.clear(); } } The employee is a basic record. Java public record Employee(String firstName, String lastName) { } Implement Step Definitions Now that the service exists, you can implement the step definitions. It is rather straightforward, you create the service and invoke the methods for the Given-When implementations. Verifying the result is done by Assertions, just as you would do for your unit tests. Java public class StepDefinitions { private final EmployeeService service = new EmployeeService(); @Given("an empty employee list") public void an_empty_employee_list() { service.removeEmployees(); } @When("an employee is added") public void an_employee_is_added() { service.addEmployee("John", "Doe"); } @Then("the employee is added to the employee list") public void the_employee_is_added_to_the_employee_list() { assertEquals(1, service.getEmployees().size()); } } Run the tests, which are successful now. Shell [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running com.mydeveloperplanet.mycucumberplanet.RunCucumberTest Scenario: Add employee # com/mydeveloperplanet/mycucumberplanet/employee_actions.feature:4 Given an empty employee list # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_empty_employee_list() When an employee is added # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.an_employee_is_added() Then the employee is added to the employee list # com.mydeveloperplanet.mycucumberplanet.StepDefinitions.the_employee_is_added_to_the_employee_list() [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.081 s -- in com.mydeveloperplanet.mycucumberplanet.RunCucumberTest [INFO] [INFO] Results: [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 Extra Scenario Add a second scenario that tests the removal of employees. Add the scenario to the feature file. Plain Text Scenario: Remove employees Given a filled employee list When the employees list is removed Then the employee list is empty Implement the step definitions. Java @Given("a filled employee list") public void a_filled_employee_list() { service.addEmployee("John", "Doe"); service.addEmployee("Miles", "Davis"); assertEquals(2, service.getEmployees().size()); } @When("the employees list is removed") public void the_employees_list_is_removed() { service.removeEmployees(); } @Then("the employee list is empty") public void the_employee_list_is_empty() { assertEquals(0, service.getEmployees().size()); } Tags In order to run a subset of scenarios, you can add tags to features and scenarios. Shell @regression Feature: Employee Actions Actions to be made for an employee @TC_01 Scenario: Add employee Given an empty employee list When an employee is added Then the employee is added to the employee list @TC_02 Scenario: Remove employees Given a filled employee list When the employees list is removed Then the employee list is empty Run only the test annotated with TC_01 by using a filter. Shell $ mvn clean test -Dcucumber.filter.tags="@TC_01" ... [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running com.mydeveloperplanet.mycucumberplanet.RunCucumberTest [WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.233 s -- in com.mydeveloperplanet.mycucumberplanet.RunCucumberTest [INFO] [INFO] Results: [INFO] [WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 1 Reporting When executing tests, it is often required that appropriate reporting is available. Up till now, only console output has been shown. Generate an HTML report by adding the following configuration parameter to the RunCucumberTest. Java @Suite @IncludeEngines("cucumber") @SelectPackages("com.mydeveloperplanet.mycucumberplanet") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "pretty") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "html:target/cucumber-reports.html") public class RunCucumberTest { } After running the test, a rather basic HTML report is available in the specified path. Several third-party reporting plugins are available. The cucumber-reporting-plugin offers a more elaborate report. Add the dependency to the pom. XML <dependency> <groupId>me.jvt.cucumber</groupId> <artifactId>reporting-plugin</artifactId> <version>5.3.0</version> </dependency> Enable the report in RunCucumberTest. Java @Suite @IncludeEngines("cucumber") @SelectPackages("com.mydeveloperplanet.mycucumberplanet") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "pretty") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "html:target/cucumber-reports.html") @ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "me.jvt.cucumber.report.PrettyReports:target/cucumber") public class RunCucumberTest { } Run the tests and in the target/cucumber directory the report is generated. Open the file starting with report-feature. Conclusion Cucumber has great support for BDD. It is quite easy to use and in this blog, you only scratched the surface of its capabilities. An advantage is that you can make use of JUnit and Assertions and the steps can be implemented by means of Java. No need to learn a new language when your application is also built in Java.

By Gunter Rotsaert

CORE

Beyond A/B Testing: How Multi-Armed Bandits Can Scale Complex Experimentation in Enterprise

A/B testing has long been the cornerstone of experimentation in the software and machine learning domains. By comparing two versions of a webpage, application, feature, or algorithm, businesses can determine which version performs better based on predefined metrics of interest. However, as the complexity of business problems or experimentation grows, A/B testing can be a constraint in empirically evaluating successful development. Multi-armed bandits (MAB) is a powerful alternative that can scale complex experimentation in enterprises by dynamically balancing exploration and exploitation. The Limitations of A/B Testing While A/B testing is effective for simple experiments, it has several limitations: Static allocation: A/B tests allocate traffic equally or according to a fixed ratio, potentially wasting resources on underperforming variations. Exploration vs. exploitation: A/B testing focuses heavily on exploration, often ignoring the potential gains from exploiting known good options. Time inefficiency: A/B tests can be time-consuming, requiring sufficient data collection periods before drawing conclusions. Scalability: Managing multiple simultaneous A/B tests for complex systems can be cumbersome and resource-intensive. Multi-Armed Bandits The multi-armed bandit problem is a classic Reinforcement Learning problem where an agent must choose between multiple options (arms) to maximize the total reward over time. Each arm provides a random reward from a probability distribution unique to that arm. The agent must balance exploring new arms (to gather more information) and exploiting the best-known arms (to maximize reward). In the context of experimentation, MAB algorithms dynamically adjust the allocation of traffic to different variations based on their performance, leading to more efficient and adaptive experimentation. The terms "exploration" and "exploitation" refer to the fundamental trade-off that an agent must balance to maximize cumulative rewards over time. This trade-off is central to the decision-making process in MAB algorithms. Exploration Exploration is the process of trying out different options (or "arms") to gather more information about their potential rewards. The goal of exploration is to reduce uncertainty and discover which arms yield the highest rewards. Purpose To gather sufficient data about each arm to make informed decisions in the future. Example In an online advertising scenario, exploration might involve displaying various different ads to users to determine which ad generates the most clicks or conversions. Even though some ads perform poorly initially, they are still shown to collect enough data to understand their true performance. Exploitation Exploitation, on the other hand, is the process of selecting the option (or "arm") that currently appears to offer the highest reward based on the information gathered so far. The main purpose of exploitation is to maximize immediate rewards by leveraging known information. Purpose To maximize the immediate benefit by choosing the arm that has provided the best results so far. Example In the same online advertising case, exploitation would involve predominantly showing the advertisement that has already shown the highest click-through rate, thereby maximizing the expected number of clicks. Types of Multi-Armed Bandit Algorithms Epsilon-Greedy: With probability ε, the algorithm explores a random arm, and with probability 1-ε, it exploits the best-known arm. UCB (Upper Confidence Bound): This algorithm selects arms based on their average reward and the uncertainty or variance in their rewards, favoring less-tested arms to a calculated degree. Thompson Sampling: This Bayesian approach samples from the posterior distribution of each arm's reward, balancing exploration and exploitation according to the likelihood of each arm being optimal. Implementing Multi-Armed Bandits in Enterprise Experimentation Step-By-Step Guide Define objectives and metrics: Clearly outline the goals of your experimentation and the key metrics for evaluation. Select an MAB algorithm: Choose an algorithm that aligns with your experimentation needs. For instance, UCB is suitable for scenarios requiring a balance between exploration and exploitation, while Thompson Sampling is beneficial for more complex and uncertain environments. Set up infrastructure: Ensure your experimentation platform supports dynamic allocation and real-time data processing (e.g. Apache Flink or Apache Kafka can help manage the data streams effectively). Deploy and monitor: Launch the MAB experiment and continuously monitor the performance of each arm. Adjust parameters like ε in epsilon-greedy or prior distributions in Thompson Sampling as needed. Analyze and iterate: Regularly analyze the results and iterate on your experimentation strategy. Use the insights gained to refine your models and improve future experiments. Top Python Libraries for Multi-Armed Bandits MABWiser Overview: MABWiser is a user-friendly library specifically designed for multi-armed bandit algorithms. It supports various MAB strategies like epsilon-greedy, UCB, and Thompson Sampling. Capabilities: Easy-to-use API, support for context-free and contextual bandits, online and offline learning. Vowpal Wabbit (VW) Overview: Vowpal Wabbit is a fast and efficient machine learning system that supports contextual bandits, among other learning tasks. Capabilities: High-performance, scalable, supports contextual bandits with rich feature representations. Contextual Overview: Contextual is a comprehensive library for both context-free and contextual bandits, providing a flexible framework for various MAB algorithms. Capabilities: Extensive documentation, support for numerous bandit strategies, and easy integration with real-world data. Keras-RL Overview: Keras-RL is a library for reinforcement learning that includes implementations of bandit algorithms. It is built on top of Keras, making it easy to use with deep learning models. Capabilities: Integration with neural networks, support for complex environments, easy-to-use API. Example using MABWiser. Python # Import MABWiser Library from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy # Data arms = ['Arm1', 'Arm2'] decisions = ['Arm1', 'Arm1', 'Arm2', 'Arm1'] rewards = [20, 17, 25, 9] # Model mab = MAB(arms, LearningPolicy.UCB1(alpha=1.25)) # Train mab.fit(decisions, rewards) # Test mab.predict() Example from MABWiser of Context Free MAB setup. Python # 1. Problem: A/B Testing for Website Layout Design. # 2. An e-commerce website experiments with 2 different layouts options # for their homepage. # 3. Each layouts decision leads to generating different revenues # 4. What should the choice of layouts be based on historical data? from mabwiser.mab import MAB, LearningPolicy # Arms options = [1, 2] # Historical data of layouts decisions and corresponding rewards layouts = [1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1] revenues = [10, 17, 22, 9, 4, 0, 7, 8, 20, 9, 50, 5, 7, 12, 10] arm_to_features = {1: [0, 0, 1], 2: [1, 1, 0], 3: [1, 1, 0]} # Epsilon Greedy Learning Policy # random exploration set to 15% greedy = MAB(arms=options, learning_policy=LearningPolicy.EpsilonGreedy(epsilon=0.15), seed=123456) # Learn from past and predict the next best layout greedy.fit(decisions=layouts, rewards=revenues) prediction = greedy.predict() # Expected revenues from historical data and results expectations = greedy.predict_expectations() print("Epsilon Greedy: ", prediction, " ", expectations) assert(prediction == 2) # more data from online learning additional_layouts = [1, 2, 1, 2] additional_revenues = [0, 12, 7, 19] # model update and new layout greedy.partial_fit(additional_layouts, additional_revenues) greedy.add_arm(3) # Warm starting a new arm greedy.warm_start(arm_to_features, distance_quantile=0.5) Conclusion Multi-armed bandits offer a sophisticated and scalable alternative to traditional A/B testing, particularly suited for complex experimentation in enterprise settings. By dynamically balancing exploration and exploitation, MABs enhance resource efficiency, provide faster insights, and improve overall performance. For software and machine learning engineers looking to push the boundaries of experimentation, incorporating MABs into your toolkit can lead to significant advancements in optimizing and scaling your experiments. Above we have touched upon just the tip of the iceberg in the rich and actively researched literature in the field of Reinforcement Learning to get started.

By Sapan Patel

The Impact of AI and Platform Engineering on Cloud Native's Evolution: Automate Your Cloud Journey to Light Speed

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. 2024 and the dawn of cloud-native AI technologies marked a significant jump in computational capabilities. We're experiencing a new era where artificial intelligence (AI) and platform engineering converge to transform cloud computing landscapes. AI is now merging with cloud computing, and we're experiencing an age where AI transcends traditional boundaries, offering scalable, efficient, and powerful solutions that learn and improve over time. Platform engineering is providing the backbone for these AI systems to operate within cloud environments seamlessly. This shift entails designing, implementing, and managing the software platforms that serve as the fertile ground for AI applications to flourish. Together, the integration of AI and platform engineering in cloud-native environments is not just an enhancement but a transformative force, redefining the very fabric of how services are now being delivered, consumed, and evolved in the digital cosmos. The Rise of AI in Cloud Computing Azure and Google Cloud are pivotal solutions in cloud computing technology, each offering a robust suite of AI capabilities that cater to a wide array of business needs. Azure brings to the table its AI Services and Azure Machine Learning, a collection of AI tools that enable developers to build, train, and deploy AI models rapidly, thus leveraging its vast cloud infrastructure. Google Cloud, on the other hand, shines with its AI Platform and AutoML, which simplify the creation and scaling of AI products, integrating seamlessly with Google's data analytics and storage services. These platforms empower organizations to integrate intelligent decision-making into their applications, optimize processes, and provide insights that were once beyond reach. A quintessential case study that illustrates the successful implementation of AI in the cloud is that of the Zoological Society of London (ZSL), which utilized Google Cloud's AI to tackle the biodiversity crisis. ZSL's "Instant Detect" system harnesses AI on Google Cloud to analyze vast amounts of images and sensor data from wildlife cameras across the globe in real time. This system enables rapid identification and categorization of species, transforming the way conservation efforts are conducted by providing precise, actionable data, leading to more effective protection of endangered species. Such implementations as ZSL's not only showcase the technical prowess of cloud AI capabilities but also underscore their potential to make a significant positive impact on critical global issues. Platform Engineering: The New Frontier in Cloud Development Platform engineering is a multifaceted discipline that refers to the strategic design, development, and maintenance of software platforms to support more efficient deployment and application operations. It involves creating a stable and scalable foundation that provides developers the tools and capabilities needed to develop, run, and manage applications without the complexity of maintaining the underlying infrastructure. The scope of platform engineering spans the creation of internal development platforms, automation of infrastructure provisioning, implementation of continuous integration and continuous deployment (CI/CD) pipelines, and the insurance of the platforms' reliability and security. In cloud-native ecosystems, platform engineers play a pivotal role. They are the architects of the digital landscape, responsible for constructing the robust frameworks upon which applications are built and delivered. Their work involves creating abstractions on top of cloud infrastructure to provide a seamless development experience and operational excellence. Figure 1. Platform engineering from the top down Platform engineers enable teams to focus on creating business value by abstracting away complexities related to environment configurations, along with resource scaling and service dependencies. They guarantee that the underlying systems are resilient, self-healing, and can be deployed consistently across various environments. The convergence of DevOps and platform engineering with AI tools is an evolution that is reshaping the future of cloud-native technologies. DevOps practices are enhanced by AI's ability to predict, automate, and optimize processes. AI tools can analyze data from development pipelines to predict potential issues, automate root cause analyses, and optimize resources, leading to improved efficiency and reduced downtime. Moreover, AI can drive intelligent automation in platform engineering, enabling proactive scaling and self-tuning of resources, and personalized developer experiences. This synergy creates a dynamic environment where the speed and quality of software delivery are continually advancing, setting the stage for more innovative and resilient cloud-native applications. Synergies Between AI and Platform Engineering AI-augmented platform engineering introduces a layer of intelligence to automate processes, streamline operations, and enhance decision-making. Machine learning (ML) models, for instance, can parse through massive datasets generated by cloud platforms to identify patterns and predict trends, allowing for real-time optimizations. AI can automate routine tasks such as network configurations, system updates, and security patches; these automations not only accelerate the workflow but also reduce human error, freeing up engineers to focus on more strategic initiatives. There are various examples of AI-driven automation in cloud environments, such as implementing intelligent systems to analyze application usage patterns and automatically adjust computing resources to meet demand without human intervention. The significant cost savings and performance improvements provide exceptional value to an organization. AI-operated security protocols can autonomously monitor and respond to threats more quickly than traditional methods, significantly enhancing the security posture of the cloud environment. Predictive analytics and ML are particularly transformative in platform optimization. They allow for anticipatory resource management, where systems can forecast loads and scale resources accordingly. ML algorithms can optimize data storage, intelligently archiving or retrieving data based on usage patterns and access frequencies. Figure 2. AI resource autoscaling Moreover, AI can oversee and adjust platform configurations, ensuring that the environment is continuously refined for optimal performance. These predictive capabilities are not limited to resource management; they also extend to predicting application failures, user behavior, and even market trends, providing insights that can inform strategic business decisions. The proactive nature of predictive analytics means that platform engineers can move from reactive maintenance to a more visionary approach, crafting platforms that are not just robust and efficient but also self-improving and adaptive to future needs. Changing Landscapes: The New Cloud Native The landscape of cloud native and platform engineering is rapidly evolving, particularly with leading cloud service providers like Azure and Google Cloud. This evolution is largely driven by the growing demand for more scalable, reliable, and efficient IT infrastructure, enabling businesses to innovate faster and respond to market changes more effectively. In the context of Azure, Microsoft has been heavily investing in Azure Kubernetes Service (AKS) and serverless offerings, aiming to provide more flexibility and ease of management for cloud-native applications. Azure's emphasis on DevOps, through tools like Azure DevOps and Azure Pipelines, reflects a strong commitment to streamlining the development lifecycle and enhancing collaboration between development and operations teams. Azure's focus on hybrid cloud environments, with Azure Arc, allows businesses to extend Azure services and management to any infrastructure, fostering greater agility and consistency across different environments. In the world of Google Cloud, they've been leveraging expertise in containerization and data analytics to enhance cloud-native offerings. Google Kubernetes Engine (GKE) stands out as a robust, managed environment for deploying, managing, and scaling containerized applications using Google's infrastructure. Google Cloud's approach to serverless computing, with products like Cloud Run and Cloud Functions, offers developers the ability to build and deploy applications without worrying about the underlying infrastructure. Google's commitment to open-source technologies and its leading-edge work in AI and ML integrate seamlessly into its cloud-native services, providing businesses with powerful tools to drive innovation. Both Azure and Google Cloud are shaping the future of cloud-native and platform engineering by continuously adapting to technological advancements and changing market needs. Their focus on Kubernetes, serverless computing, and seamless integration between development and operations underlines a broader industry trend toward more agile, efficient, and scalable cloud environments. Implications for the Future of Cloud Computing AI is set to revolutionize cloud computing, making cloud-native technologies more self-sufficient and efficient. Advanced AI will oversee cloud operations, enhancing performance and cost effectiveness while enabling services to self-correct. Yet integrating AI presents ethical challenges, especially concerning data privacy and decision-making bias, and poses risks requiring solid safeguards. As AI reshapes cloud services, sustainability will be key; future AI must be energy efficient and environmentally friendly to ensure responsible growth. Kickstarting Your Platform Engineering and AI Journey To effectively adopt AI, organizations must nurture a culture oriented toward learning and prepare by auditing their IT setup, pinpointing AI opportunities, and establishing data management policies. Further: Upskilling in areas such as machine learning, analytics, and cloud architecture is crucial. Launching AI integration through targeted pilot projects can showcase the potential and inform broader strategies. Collaborating with cross-functional teams and selecting cloud providers with compatible AI tools can streamline the process. Balancing innovation with consistent operations is essential for embedding AI into cloud infrastructures. Conclusion Platform engineering with AI integration is revolutionizing cloud-native environments, enhancing their scalability, reliability, and efficiency. By enabling predictive analytics and automated optimization, AI ensures cloud resources are effectively utilized and services remain resilient. Adopting AI is crucial for future-proofing cloud applications, and it necessitates foundational adjustments and a commitment to upskilling. The advantages include staying competitive and quickly adapting to market shifts. As AI evolves, it will further automate and refine cloud services, making a continued investment in AI a strategic choice for forward-looking organizations. This is an excerpt from DZone's 2024 Trend Report,Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report

By Kellyn Gorman

CORE

Managing Architectural Tech Debt

When I think about technical debt, I still remember the first application I created that made me realize the consequences of an unsuitable architecture. It happened back in the late 1990s when I was first getting started as a consultant. The client had requested the use of the Lotus Notes platform to build a procurement system for their customers. Using the Lotus Notes client and a custom application, end-users could make requests that would be tracked by the application and fulfilled by the product owner’s team. In theory, it was a really cool idea – especially since web-developed applications were not prevalent and everyone used Lotus Notes on a daily basis. The core problem is that the data was very relational in design – and Lotus Notes was not a relational database. The solution’s design required schema management within every Lotus Notes document and leaned on a series of multi-value fields to simulate the relationships between data attributes. It was a mess. A great deal of logic in the Lotus Notes application would not have been required if a better platform had been recommended. The source code was complicated to support. Enhancements to the data structure resulted in major refactoring of the underlying code – not to mention running server-based jobs to convert the existing data. Don’t get me started on the effort behind report creation. Since I was early in my career I was focused on providing a solution that the client wanted over trying to offer a better solution. This was certainly a lesson I learned early in my career, but in the years since that project, I’ve come to realize that the consequence of architectural technical debt is an unfortunate reality we all face. Let’s explore the concept of architecture tech debt a little more at a macro level. Architectural Tech Debt (ATD) The Architectural Technical Debt (ATD) Library at Carnegie Mellon University provides the following definition of ATD: Architectural technical debt is a design or construction approach that's expedient in the short term, but that creates a technical context in which the same work requires architectural rework and costs more to do later than it would cost to do now (including increased cost over time). In the “Quick Answer: How to Manage Architecture Technical Debt” (published 09/22/2023), Gartner Group defines ATD as follows: Architecture technical debt is that type of technical debt that is caused by architectural drift, suboptimal architectural decisions, violations of defined target product architecture and established industry architectural best practices, and architecture trade-offs made for faster software delivery. In both cases, benefits that often yield short-term celebrations can be met with long-term challenges. This is similar to my Lotus Notes example mentioned in the introduction. To further complicate matters, tooling to help identify and manage tech debt for software architecture has been missing in comparison to the other aspects of software development: For code quality, observability, and SCA, proven tooling exists with products like Sonarqube, Datadog, New Relic, GitHub, and Snyk. However, the software architecture segment has lagged behind without any proven solutions. This is unfortunate, given the fact that ATD is consistently the largest – and most damaging – type of technical debt as found in the “Measure It? Manage It? Ignore It? Software Practitioners and Technical Debt” 2015 study published by Carnegie Mellon. The following illustration summarizes Figure 4 from that report, concluding that bad architecture choices were the clear leader in sources of technical debt. If not managed, ATD can continue to grow over time at an increasing rate as demonstrated in this simple illustration: Without mitigation, architecture debt will eventually reach a breaking point for the underlying solution being measured. Managing ATD Before we can manage ATD, we must first understand the problem. Desmond Tutu once wisely said that “There is only one way to eat an elephant: a bite at a time.” The shift-left approach embraces the concept of moving a given aspect closer to the beginning than at the end of a lifecycle. This concept gained popularity with shift-left for testing, where the test phase was moved to a part of the development process and not a separate event to be completed after development was finished. Shift-left can be implemented in two different ways in managing ATD: Shift-left for resiliency: Identifying sources that have an impact on resiliency, and then fixing them before they manifest in performance. Shift-left for security: Detect and mitigate security issues during the development lifecycle. Just like shift-left for testing, a prioritized focus on resilience and security during the development phase will reduce the potential for unexpected incidents. Architectural Observability Architectural observability gives engineering teams the ability to incrementally address architectural drift within their services at a macro level. In fact, the Wall Street Journal reported the cost to fix technical debt at $1.52 trillion earlier this year in “The Invisible $1.52 Trillion Problem: Clunky Old Software,” article. To be successful, engineering leadership must be in full alignment with the following organizational objectives: Resiliency: To recover swiftly from unexpected incidents. Scalability: To scale appropriately with customer demand. Velocity: To deliver features and enhancements in line with product expectations. Cloud Suitability: Transforming legacy solutions into efficient cloud-native service offerings. I recently discovered vFunction’s AI-driven architectural observability platform, which is focused on the following deliverables: Discover the real architecture of solutions via static and dynamic analysis. Prevent architecture drift via real-time views of how services are evolving. Increase the resiliency of applications via the elimination of unnecessary dependencies and improvements between application domains and their associated resources. Manage and remediate tech debt via AI-driven observability. Additionally, the vFunction platform provides the side-benefit of providing a migration path to transform from monoliths to cloud-native solutions. Once teams have modernized their platforms, they can continuously observe them for ongoing drift. If companies already have microservices, they can use vFunction to detect complexity in distributed applications and address dependencies that impact resiliency and scalability. In either case, once implemented, engineering teams can mitigate ATD well before reaching the breaking point. In the illustration above, engineering teams are able to mitigate technical debt as a part of each release, due to the implementation of the vFunction platform and an underlying shift-left approach. Conclusion My readers may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” — J. Vester The vFunction platform adheres to my mission statement by helping engineering teams employ a shift-left approach to the resiliency and security of their services at a macro level. This is an important distinction because without such tooling teams are likely to mitigate at a micro level resolving tech debt that doesn’t really matter from an organizational perspective. When I think back to that application that made me realize the challenges with tech debt, I can’t help but think about how that solution yielded more issues than it did benefits with each feature that was introduced. Certainly, the use of shift-left for resiliency alone would have helped surface issues with the underlying architecture at a point where the cost to consider alternatives would be feasible. If you are interested in learning more about the vFunction solution, you can read more about them here. Have a really great day!

By John Vester

CORE

Hibernate Validator vs Regex vs Manual Validation: Which One Is Faster?

While I was coding for a performance back-end competition, I tried a couple of tricks, and I was wondering if there was a faster validator for Java applications, so I started a sample application. I used a very simple scenario: just validate the user's email. Controller With Hibernate Validator Hibernate Validator needs an object to put its rules to, so we have this: Java public record User( @NotNull @Email String email ){} This is used in the HibernateValidatorController class, which uses the jakarta.validation.Validator (which is just an interface for the Hibernate Validator implementation): Java @RestController @Validated public class HibernateValidatorController { @Autowired private Validator validator; @GetMapping("/validate-hibernate") public ResponseEntity<String> validateEmail(@RequestParam String email) { Using the validate method, we can check if this user's email is valid and get a proper HTTP response. Java var user = new User(email); var violations = validator.validate(user); if (violations.isEmpty()) { return ResponseEntity.ok("Valid email: 200 OK"); } else { var violationMessages = new StringBuilder(); for (ConstraintViolation<User> violation : violations) { violationMessages.append(violation.getMessage()).append("\n"); } return ResponseEntity.status(HttpStatus.BAD_REQUEST) .body("Invalid email: 400 Bad Request\n" + violationMessages.toString()); } Controller With Regular Expression For validation with regex, we need just the email regex and a method to validate: Java static final String EMAIL_REGEX = "^[A-Za-z0-9+_.-]+@(.+)$"; boolean isValid(String email) { return email != null && email.matches(EMAIL_REGEX); } The regexController class just gets an email from the request and uses the isValid method to validate it. Java @GetMapping("/validate-regex") public ResponseEntity<String> validateEmail(@RequestParam String email) { if (isValid(email)) { return ResponseEntity.ok("Valid email: 200 OK"); } else { return ResponseEntity.status(HttpStatus.BAD_REQUEST).body("Invalid email: 400 Bad Request"); } } Controller With Manual Validation We won't use any framework or libs to validate, just plain old String methods: Java boolean isValid(String email) { if (email == null) return false; int atIndex = email.indexOf("@"); int dotIndex = email.lastIndexOf("."); return atIndex > 0 && dotIndex > atIndex + 1 && dotIndex < email.length() - 1; } The programmaticController class just gets an email from the request and uses the isValid method to validate it. Java @GetMapping("/validate-programmatic") public ResponseEntity<String> validateEmail(@RequestParam String email) { if (isValid(email)) { return ResponseEntity.ok("Valid email: 200 OK"); } else { return ResponseEntity.status(HttpStatus.BAD_REQUEST).body("Invalid email: 400 Bad Request"); } } Very Simple Stress Test We are using Apache JMeter to test all 3 APIs. Our simulation runs with 1000 concurrent users in a loop for 100 times sending a valid email each request. Running it on my desktop machine got similar results for all APIs, but the winner is Hibernate Validator. | API | avg | 99% | max | TPS | |--------------------------------|-----|-----|------|--------| | Regex API Thread Group | 18 | 86 | 254 | 17784 | | Programmatic API Thread Group | 13 | 67 | 169 | 19197 | | Hibernate API Thread Group | 10 | 59 | 246 | 19960 | Conclusion Before this test, I thought that my own code should perform way better than somebody else's code, but actually, Hibernate Validator was the best option for my test. You can also run this test and check the source code in my GitHub.

By Fernando Boaglio

Rookie Mistakes Scrum Masters Make

TL; DR: Top Five Rookie Mistakes by Self-Proclaimed Scrum Masters Are you struggling with imposter syndrome as a new Scrum Master? Avoid five common rookie mistakes Scrum Masters make. Instead, discover how to set clear Sprint Goals, build trust, balance metrics, and empower your team to make independent decisions. Don’t let early missteps define your journey. Learn from these mistakes and transform them into stepping stones towards mastery. By understanding and addressing these pitfalls, you’ll gain confidence, enhance your leadership skills, and truly embody the principles of Scrum. This article provides actionable insights and practical exercises to help you grow from a beginner into an effective and respected Scrum Master. Rookie Mistakes Scrum Masters Make Let us delve into the rookie mistakes Scrum Masters make: 1. Ignoring the Importance of the Sprint Goal Mistake: Treating the Sprint Goal as optional or just a list of tasks, leading to a lack of focus and direction for the Scrum Team Why it’s a mistake: The team lacks a unified purpose without a clear Sprint Goal, resulting in fragmented efforts, reduced overall value delivery, and difficulty measuring success. The team may become directionless, working on tasks that don’t align with the Product Goal or the strategic objectives of the product generally. Learning opportunity: Legitimate beginners quickly realize the importance of a well-defined Sprint Goal as a beacon guiding the team’s efforts. It fosters collaboration and ensures that the team delivers meaningful value each Sprint. To practice this, try a Sprint Goal workshop before the next Sprint, where the team collaborates to draft a clear, cohesive goal. 2. Micromanaging the Team Mistake: Acting as a task or project manager, constantly overseeing and directing the work of the Developers Why it’s a mistake: Scrum Masters should serve the team by removing impediments and facilitating processes, not controlling the work as the Developers have agency doing their part. Micromanagement stifles team autonomy and innovation, leading to reduced morale and a lack of ownership among team members, ultimately hampers productivity and creativity. Learning opportunity: True beginners learn to trust their team’s capabilities, focusing instead on enabling the team to self-organize and resolve issues independently, leading to higher engagement and better problem-solving. An exercise to help with this is to restrain from solving issues during a Sprint but observe and support their teammates’ progress. 3. Neglecting To Build Team Trust and Psychological Safety Mistake: Failing to create an environment of trust and psychological safety where all team members feel comfortable sharing ideas and concerns Why it’s a mistake: Without trust and safety, team members are less likely to engage fully, collaborate effectively, or take risks. Neglecting trust stifles innovation and continuous improvement, leading to a work environment with undisclosed problems, lackluster team engagement, and restrained creativity. It can also result in high turnover and low job satisfaction. Learning opportunity: Proficient Scrum Masters actively work to build and maintain a culture of trust and psychological safety. They encourage open communication and constructive feedback. A practical exercise is to hold regular team-building activities and trust exercises, such as sharing personal success stories, challenges, and failures to build empathy and understanding among team members. 4. Focusing Solely on Metrics and Reporting Mistake: Overemphasizing metrics, OKRs, and KPIs, turning the Scrum Master role into a data-entry clerk burdened with excessive reporting Why it’s a mistake: While metrics can provide valuable insights, overemphasis can distract from the true purpose of Scrum, which is delivering value through collaborative efforts and continuous feedback based on frequent releases and an empirical process. A metrics-driven approach can also lead to gaming the system, where team members focus on meeting the metrics rather than creating genuine value, thus distorting the team’s priorities. Learning opportunity: Effective Scrum Masters balance metrics with qualitative insights, using them to support, not dictate, team decisions and progress. They understand that metrics are tools, not goals in themselves. An exercise to implement this is to periodically review the metrics with the team, discussing their relevance and how they align with actual value delivery, ensuring a balanced approach. 5. Failing To Empower the Team Mistake: Not empowering the team to make decisions and solve problems, often stepping in to make decisions or resolve conflicts Why it’s a mistake: This approach undermines the team’s confidence and ability to self-manage, leading to dependency on the Scrum Master and reduced team ownership of the work. It hampers the team’s growth, creativity, and innovation ability, as members are not encouraged to think independently or take initiative. Learning opportunity: Good Scrum Masters learn to step back and facilitate the team’s decision-making processes, encouraging team members to take ownership of their work and develop their problem-solving skills. A helpful exercise is to use a decision matrix, for example, based on the outcome of a Delegation Poker session, where the team collaboratively decides on solutions to issues without direct intervention from the Scrum Master, promoting autonomy and confidence. Useful Practices for Beginners To Avoid Rookie Mistakes Scrum Masters Make Some food for thought for the aspiring learner: there is no need for you to reinvent the wheel. Embrace Continuous Learning Scrum Masters should always be on a path of continuous learning. Scrum and agile practices evolve, and so should your understanding and application of them. Seek opportunities for training, certifications, and networking with other Scrum professionals. For example, join the Hands-on Agile Slack community or our Meetup group. Understand the Organizational Context Every organization has its unique culture and challenges. Understanding the broader context within which your team operates can help you better support and advocate for Scrum practices. Engage with stakeholders and management to align Scrum with organizational goals. Remember, you cannot change a system at the Scrum team level. Balance Empathy With Accountability Building a high-performing team requires a delicate balance of empathy and accountability. While fostering a supportive environment is crucial, holding the team accountable to commitments and quality standards is equally important. Great Scrum teams hold themselves accountable all the time; they are professionals. Be a Servant Leader As a Scrum Master, your primary role is to serve the team, for example, by removing impediments, facilitating communication, and supporting the team’s self-organization. The team’s success measures your success, so focus on empowering them. Adaptability Is Key No two teams are the same, and what works for one might not work for another. Be flexible and willing to adapt your approach based on the team’s needs and feedback. Continuously inspect and adapt not just the team’s processes but your own practices and mindset. Foster a Growth Mindset Encourage a culture where failure is seen as an opportunity to learn and grow. Coupled with a growth mindset, it can significantly enhance the team’s ability to innovate and improve continuously. Celebrate successes but also openly discuss failures and the lessons learned from them. Value Feedback Loops Feedback is the cornerstone of continuous improvement. Make sure your team regularly seeks and gives feedback, not just during formal events like Sprint Reviews and Retrospectives but also in daily interactions. Feedback taken seriously will help identify issues early and promote a culture of transparency and improvement. Conclusion The critical difference between the rookie mistakes of the ignorant imposter and the actions of a learning beginner is the willingness to reflect, adapt, and grow from experiences. Self-proclaimed experts who misunderstand and, consequently, misapply the principles of Scrum fail to recognize and rectify their mistakes. At the same time, true beginners use these early missteps as stepping stones toward becoming effective and respected Scrum Masters.

By Stefan Wolpers

CORE

Observations on Cloud-Native Observability: A Journey From the Foundations of Observability to Surviving Its Challenges at Scale

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. Cloud native and observability are an integral part of developer lives. Understanding their responsibilities within observability at scale helps developers tackle the challenges they are facing on a daily basis. There is more to observability than just collecting and storing data, and developers are essential to surviving these challenges. Observability Foundations Gone are the days of monitoring a known application environment, debugging services within our development tooling, and waiting for new resources to deploy our code to. This has become dynamic, agile, and quickly available with auto-scaling infrastructure in the final production deployment environments. Developers are now striving to observe everything they are creating, from development to production, often owning their code for the entire lifecycle. The tooling from days of old, such as Nagios and HP OpenView, can't keep up with constantly changing cloud environments that contain thousands of microservices. The infrastructure for cloud-native deployments is designed to dynamically scale as needed, making it even more essential for observability platforms to help condense all that data noise to detect trends leading to downtime before they happen. Splintering of Responsibilities in Observability Cloud-native complexity not only changed the developer world but also impacted how organizations are structured. The responsibilities of creating, deploying, and managing cloud-native infrastructure have split into a series of new organizational teams. Developers are being tasked with more than just code creation and are expected to adopt more hybrid roles within some of these new teams. Observability teams have been created to focus on a specific aspect of the cloud-native ecosystem to provide their organization a service within the cloud infrastructure. In Table 1, we can see the splintering of traditional roles in organizations into these teams with specific focuses. Table 1. Who's who in the observability game Team Focus maturity goals DevOps Automation and optimization of the app development lifecycle, including post-launch fixes and updates Early stages: developer productivity Platform engineering Designing and building toolchains and workflows that enable self-service capabilities for developers Early stages: developer maturity and productivity boost CloudOps Provides organizations proper (cloud) resource management, using DevOps principles and IT operations applied to cloud-based architectures to speed up business processes Later stages: cloud resource management, costs, and business agility SRE All-purpose role aiming to manage reliability for any type of environment; a full-time job avoiding downtime and optimizing performance of all apps and supporting infrastructure, regardless of whether it's cloud native Early to late stages: on-call engineers trying to reduce downtime Central observability team Responsible for defining observability standards and practices, delivering key data to engineering teams, and managing tooling and observability data storage Later stages, owning: Define monitoring standards and practices Deliver monitoring data to engineering teams Measure reliability and stability of monitoring solutions Manage tooling and storage of metrics data To understand how these teams work together, imagine a large, mature, cloud native organization that has all the teams featured in Table 1: The DevOps team is the first line for standardizing how code is created, managed, tested, updated, and deployed. They work with toolchains and workflow provided by the platform engineering team. DevOps advises on new tooling and/or workflows, creating continuous improvements to both. A CloudOps team focuses on cloud resource management and getting the most out of the budgets spent on the cloud by the other teams. An SRE team is on call to manage reliability, avoiding downtime for all supporting infrastructure in the organization. They provide feedback for all the teams to improve tools, processes, and platforms. The overarching central observability team sets the observability standards for all teams to adhere to, delivering the right observability data to the right teams and managing tooling and data storage. Why Observability Is Important to Cloud Native Today, cloud native usage has seen such growth that developers are overwhelmed by their vast responsibilities that go beyond just coding. The complexity introduced by cloud-native environments means that observability is becoming essential to solving many of the challenges developers are facing. Challenges Increasing cloud native complexity means that developers are providing more code faster and passing more rigorous testing to ensure that their applications work at cloud native scale. These challenges expanded the need for observability within what was traditionally the developers' coding environment. Not only do they need to provide code and testing infrastructure for their applications, they are also required to instrument that code so that business metrics can be monitored. Over time, developers learned that fully automating metrics was overkill, with much of that data being unnecessary. This led developers to fine tune their instrumentation methods and turn to manual instrumentation, where only the metrics they needed were collected. Another challenge arises when decisions are made to integrate existing application landscapes with new observability practices in an organization. The time developers spend manually instrumenting existing applications so that they provide the needed data to an observability platform is an often overlooked burden. New observability tools designed to help with metrics, logs, and traces are introduced to the development teams — leading to more challenges for developers. Often, these tools are mastered by few, leading to siloed knowledge, which results in organizations paying premium prices for advanced observability tools only to have them used as if one is engaging in observability as a toy. Finally, when exploring the ingested data from our cloud infrastructure, the first thing that becomes obvious is that we don't need to keep everything that is being ingested. We need the ability to have control over our telemetry data and find out what is unused by our observability teams. There are some questions we need to answer about how we can: Identify ingested data not used in dashboards, alerting rules, nor touched in ad hoc queries by our observability teams Control telemetry data with aggregation and rules before we put it into expensive, longer-term storage Use only telemetry data needed to support the monitoring of our application landscape Tackling the flood of cloud data in such a way as to filter out the unused telemetry data, keeping only that which is applied for our observability needs, is crucial to making this data valuable to the organization. Cloud Native at Scale The use of cloud-native infrastructure brings with it a lot of flexibility, but when done at scale, the small complexities can become overwhelming. This is due to the premise of cloud native where we describe how our infrastructure should be set up, how our applications and microservices should be deployed, and finally, how it automatically scales when needed. This approach reduces our control over how our production infrastructure reacts to surges in customer usage of an organization's services. Empowering Developers Empowering developers starts with platform engineering teams that focus on developer experiences. We create developer experiences in our organization that treat observability as a priority, dedicating resources for creating a telemetry strategy from day one. In this culture, we're setting up development teams for success with cloud infrastructure, using observability alongside testing, continuous integration, and continuous deployment. Developers are not only owning the code they deliver but are now encouraged and empowered to create, test, and own the telemetry data from their applications and microservices. This is a brave new world where they are the owners of their work, providing agility and consensus within the various teams working on cloud solutions. Rising to the challenges of observability in a cloud native world is a success metric for any organization, and they can't afford to get it wrong. Observability needs to be front of mind with developers, considered a first-class citizen in their daily workflows, and consistently helping them with challenges they face. Artificial Intelligence and Observability Artificial intelligence (AI) has risen in popularity within not only developer tooling but also in the observability domain. The application of AI in observability falls within one of two use cases: Monitoring machine learning (ML) solutions or large language model (LLM) systems Embedding AI into observability tooling itself as an assistant The first case is when you want to monitor specific AI workloads, such as ML or LLMs. They can be further split into two situations that you might want to monitor, the training platform and the production platform. Training infrastructure and the process involved can be approached just like any other workload: easy-to-achieve monitoring using instrumentation and existing methods, such as observing specific traces through a solution. This is not the complete monitoring process that goes with these solutions, but out-of-the-box observability solutions are quite capable of supporting infrastructure and application monitoring of these workloads. The second case is when AI assistants, such as chatbots, are included in the observability tooling that developers are exposed to. This is often in the form of a code assistant, such as one that helps fine tune a dashboard or query our time series data ad hoc. While these are nice to have, organizations are very mindful of developer usage when inputting queries that include proprietary or sensitive data. It's important to understand that training these tools might include using proprietary data in their training sets, or even the data developers input, to further train the agents for future query assistance. Predicting the future of AI-assisted observability is not going to be easy as organizations consider their data one of their top valued assets and will continue to protect its usage outside of their control to help improve tooling. To that end, one direction that might help adoption is to have agents trained only on in-house data, but that means the training data is smaller than publicly available agents. Cloud-Native Observability: The Developer Survival Pattern While we spend a lot of time on tooling as developers, we all understand that tooling is not always the fix for the complex problems we face. Observability is no different, and while developers are often exposed to the mantra of metrics, logs, and traces for solving their observability challenges, this is not the path to follow without considering the big picture. The amount of data generated in cloud-native environments, especially at scale, makes it impossible to continue collecting all data. This flood of data, the challenges that arise, and the inability to sift through the information to find the root causes of issues becomes detrimental to the success of development teams. It would be more helpful if developers were supported with just the right amount of data, in just the right forms, and at the right time to solve issues. One does not mind observability if the solution to problems are found quickly, situations are remediated faster, and developers are satisfied with the results. If this is done with one log line, two spans from a trace, and three metric labels, then that's all we want to see. To do this, developers need to know when issues arise with their applications or services, preferably before it happens. They start troubleshooting with data that has been determined by their instrumented applications to succinctly point to areas within the offending application. Any tooling allows the developer who's investigating to see dashboards reporting visual information that directs them to the problem and potential moment it started. It is crucial for developers to be able to remediate the problem, maybe by rolling back a code change or deployment, so the application can continue to support customer interactions. Figure 1 illustrates the path taken by cloud native developers when solving observability problems. The last step for any developer is to determine how issues encountered can be prevented going forward. Figure 1. Observability pattern Conclusion Observability is essential for organizations to succeed in a cloud native world. The splintering of responsibilities in observability, along with the challenges that cloud-native environments bring at scale, cannot be ignored. Understanding the challenges that developers face in cloud native organizations is crucial to achieving observability happiness. Empowering developers, providing ways to tackle observability challenges, and understanding how the future of observability might look are the keys to handling observability in modern cloud environments. DZone Refcard resources: Full-Stack Observability Essentials by Joana Carvalho Getting Started With OpenTelemetry by Joana Carvalho Getting Started With Prometheus by Colin Domoney Getting Started With Log Management by John Vester Monitoring and the ELK Stack by John Vester This is an excerpt from DZone's 2024 Trend Report,Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report

By Eric D. Schabell

CORE

Orchestrating the Cloud: Increase Deployment Speed and Avoid Downtime by Orchestrating Infrastructure, Databases, and Containers

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. Simplicity is a key selling point of cloud technology. Rather than worrying about racking and stacking equipment, configuring networks, and installing operating systems, developers can just click through a friendly web interface and quickly deploy an application. Of course, that friendly web interface hides serious complexity, and deploying an application is just the first and easiest step toward a performant and reliable system. Once an application grows beyond a single deployment, issues begin to creep in. New versions require database schema changes or added components, and multiple team members can change configurations. The application must also be scaled to serve more users, provide redundancy to ensure reliability, and manage backups to protect data. While it might be possible to manage this complexity using that friendly web interface, we need automated cloud orchestration to deliver consistently at speed. There are many choices for cloud orchestration, so which one is best for a particular application? Let's use a case study to consider two key decisions in the trade space: The number of different technologies we must learn and manage Our ability to migrate to a different cloud environment with minimal changes to the automation However, before we look at the case study, let's start by understanding some must-have features of any cloud automation. Cloud Orchestration Must-Haves Our goal with cloud orchestration automation is to manage the complexity of deploying and operating a cloud-native application. We want to be confident that we understand how our application is configured, that we can quickly restore an application after outages, and that we can manage changes over time with confidence in bug fixes and new capabilities while avoiding unscheduled downtime. Repeatability and Idempotence Cloud-native applications use many cloud resources, each with different configuration options. Problems with infrastructure or applications can leave resources in an unknown state. Even worse, our automation might fail due to network or configuration issues. We need to run our automation confidently, even when cloud resources are in an unknown state. This key property is called idempotence, which simplifies our workflow as we can run the automation no matter the current system state and be confident that successful completion places the system in the desired state. Idempotence is typically accomplished by having the automation check the current state of each resource, including its configuration parameters, and applying only necessary changes. This kind of smart resource application demands dedicated orchestration technology rather than simple scripting. Change Tracking and Control Automation needs to change over time as we respond to changes in application design or scaling needs. As needs change, we must manage automation changes as dueling versions will defeat the purpose of idempotence. This means we need Infrastructure as Code (IaC), where cloud orchestration automation is managed identically to other developed software, including change tracking and version management, typically in a Git repository such as this example. Change tracking helps us identify the source of issues sooner by knowing what changes have been made. For this reason, we should modify our cloud environments only by automation, never manually, so we can know that the repository matches the system state — and so we can ensure changes are reviewed, understood, and tested prior to deployment. Multiple Environment Support To test automation prior to production deployment, we need our tooling to support multiple environments. Ideally, we can support rapid creation and destruction of dynamic test environments because this increases confidence that there are no lingering required manual configurations and enables us to test our automation by using it. Even better, dynamic environments allow us to easily test changes to the deployed application, creating unique environments for developers, complex changes, or staging purposes prior to production. Cloud automation accomplishes multi-environment support through variables or parameters passed from a configuration file, environment variables, or on the command line. Managed Rollout Together, idempotent orchestration, a Git repository, and rapid deployment of dynamic environments bring the concept of dynamic environments to production, enabling managed rollouts for new application versions. There are multiple managed rollout techniques, including blue-green deployments and canary deployments. What they have in common is that a rollout consists of separately deploying the new version, transitioning users over to the new version either at once or incrementally, then removing the old version. Managed rollouts can eliminate application downtime when moving to new versions, and they enable rapid detection of problems coupled with automated fallback to a known working version. However, a managed rollout is complicated to implement as not all cloud resources support it natively, and changes to application architecture and design are typically required. Case Study: Implementing Cloud Automation Let's explore the key features of cloud automation in the context of a simple application. We'll deploy the same application using both a cloud-agnostic approach and a single-cloud approach to illustrate how both solutions provide the necessary features of cloud automation, but with differences in implementation and various advantages and disadvantages. Our simple application is based on Node, backed by a PostgreSQL database, and provides an interface to create, retrieve, update, and delete a list of to-do items. The full deployment solutions can be seen in this repository. Before we look at differences between the two deployments, it's worth considering what they have in common: Use a Git repository for change control of the IaC configuration Are designed for idempotent execution, so both have a simple "run the automation" workflow Allow for configuration parameters (e.g., cloud region data, unique names) that can be used to adapt the same automation to multiple environments Cloud-Agnostic Solution Our first deployment, as illustrated in Figure 1, uses Terraform (or OpenTofu) to deploy a Kubernetes cluster into a cloud environment. Terraform then deploys a Helm chart, with both the application and PostgreSQL database. Figure 1. Cloud-agnostic deployment automation The primary advantage of this approach, as seen in the figure, is that the same deployment architecture is used to deploy to both Amazon Web Services (AWS) and Microsoft Azure. The container images and Helm chart are identical in both cases, and the Terraform workflow and syntax are also identical. Additionally, we can test container images, Kubernetes deployments, and Helm charts separately from the Terraform configuration that creates the Kubernetes environment, making it easy to reuse much of this automation to test changes to our application. Finally, with Terraform and Kubernetes, we're working at a high level of abstraction, so our automation code is short but can still take advantage of the reliability and scalability capabilities built into Kubernetes. For example, an entire Azure Kubernetes Service (AKS) cluster is created in about 50 lines of Terraform configuration via the azurerm_kubernetes_cluster resource: Shell resource "azurerm_kubernetes_cluster" "k8s" { location = azurerm_resource_group.rg.location name = random_pet.azurerm_kubernetes_cluster_name.id ... default_node_pool { name = "agentpool" vm_size = "Standard_D2_v2" node_count = var.node_count } ... network_profile { network_plugin = "kubenet" load_balancer_sku = "standard" } } Even better, the Helm chart deployment is just five lines and is identical for AWS and Azure: Shell resource "helm_release" "todo" { name = "todo" repository = "https://book-of-kubernetes.github.io/helm/" chart = "todo" } However, a cloud-agnostic approach brings additional complexity. First, we must create and maintain configuration using multiple tools, requiring us to understand Terraform syntax, Kubernetes manifest YAML files, and Helm templates. Also, while the overall Terraform workflow is the same, the cloud provider configuration is different due to differences in Kubernetes cluster configuration and authentication. This means that adding a third cloud provider would require significant effort. Finally, if we wanted to use additional features such as cloud-native databases, we'd first need to understand the key configuration details of that cloud provider's database, then understand how to apply that configuration using Terraform. This means that we pay an additional price in complexity for each native cloud capability we use. Single Cloud Solution Our second deployment, illustrated in Figure 2, uses AWS CloudFormation to deploy an Elastic Compute Cloud (EC2) virtual machine and a Relational Database Service (RDS) cluster: Figure 2. Single cloud deployment automation The biggest advantage of this approach is that we create a complete application deployment solution entirely in CloudFormation's YAML syntax. By using CloudFormation, we are working directly with AWS cloud resources, so there's a clear correspondence between resources in the AWS web console and our automation. As a result, we can take advantage of the specific cloud resources that are best suited for our application, such as RDS for our PostgreSQL database. This use of the best resources for our application can help us manage our application's scalability and reliability needs while also managing our cloud spend. The tradeoff in exchange for this simplicity and clarity is a more verbose configuration. We're working at the level of specific cloud resources, so we have to specify each resource, including items such as routing tables and subnets that Terraform configures automatically. The resulting CloudFormation YAML is 275 lines and includes low-level details such as egress routing from our VPC to the internet: Shell TodoInternetRoute: Type: AWS::EC2::Route Properties: DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref TodoInternetGateway RouteTableId: !Ref TodoRouteTable Also, of course, the resources and configuration are AWS-specific, so if we wanted to adapt this automation to a different cloud environment, we would need to rewrite it from the ground up. Finally, while we can easily adapt this automation to create multiple deployments on AWS, it is not as flexible for testing changes to the application as we have to deploy a full RDS cluster for each new instance. Conclusion Our case study enabled us to exhibit key features and tradeoffs for cloud orchestration automation. There are many more than just these two options, but whatever solution is chosen should use an IaC repository for change control and a tool for idempotence and support for multiple environments. Within that cloud orchestration space, our deployment architecture and our tool selection will be driven by the importance of portability to new cloud environments compared to the cost in additional complexity. This is an excerpt from DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report

By Alan Hohn

The Maturing of Cloud-Native Microservices Development: Effectively Embracing Shift Left to Improve Delivery

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. When it comes to software engineering and application development, cloud native has become commonplace in many teams' vernacular. When people survey the world of cloud native, they often come away with the perspective that the entire process of cloud native is for the large enterprise applications. A few years ago, that may have been the case, but with the advancement of tooling and services surrounding systems such as Kubernetes, the barrier to entry has been substantially lowered. Even so, does adopting cloud-native practices for applications consisting of a few microservices make a difference? Just as cloud native has become commonplace, the shift-left movement has made inroads into many organizations' processes. Shifting left is a focus on application delivery from the outset of a project, where software engineers are just as focused on the delivery process as they are on writing application code. Shifting left implies that software engineers understand deployment patterns and technologies as well as implement them earlier in the SDLC. Shifting left using cloud native with microservices development may sound like a definition containing a string of contemporary buzzwords, but there's real benefit to be gained in combining these closely related topics. Fostering a Deployment-First Culture Process is necessary within any organization. Processes are broken down into manageable tasks across multiple teams with the objective being an efficient path by which an organization sets out to reach a goal. Unfortunately, organizations can get lost in their processes. Teams and individuals focus on doing their tasks as best as possible, and at times, so much so that the goal for which the process is defined gets lost. Software development lifecycle (SDLC) processes are not immune to this problem. Teams and individuals focus on doing their tasks as best as possible. However, in any given organization, if individuals on application development teams are asked how they perceive their objectives, responses can include: "Completing stories" "Staying up to date on recent tech stack updates" "Ensuring their components meet security standards" "Writing thorough tests" Most of the answers provided would demonstrate a commitment to the process, which is good. However, what is the goal? The goal of the SDLC is to build software and deploy it. Whether it be an internal or SaaS application, deploying software helps an organization meet an objective. When presented with the statement that the goal of the SDLC is to deliver and deploy software, just about anyone who participates in the process would say, "Well, of course it is." Teams often lose sight of this "obvious" directive because they're far removed from the actual deployment process. A strategic investment in the process can close that gap. Cloud-native abstractions bring a common domain and dialogue across disciplines within the SDLC. Kubernetes is a good basis upon which cloud-native abstractions can be leveraged. Not only does Kubernetes' usefulness span applications of many shapes and sizes, but when it comes to the SDLC, Kubernetes can also be the environment used on systems ranging from local engineering workstations, though the entire delivery cycle, and on to production. Bringing the deployment platform all the way "left" to an engineer's workstation has everyone in the process speaking the same language, and deployment becomes a focus from the beginning of the process. Various teams in the SDLC may look at "Kubernetes Everywhere" with skepticism. Work done on Kubernetes in reducing its footprint for systems such as edge devices has made running Kubernetes on a workstation very manageable. Introducing teams to Kubernetes through automation allows them to iteratively absorb the platform. The most important thing is building a deployment-first culture. Plan for Your Deployment Artifacts With all teams and individuals focused on the goal of getting their applications to production as efficiently and effectively as possible, how does the evolution of application development shift? The shift is subtle. With a shift-left mindset, there aren't necessarily a lot of new tasks, so the shift is where the tasks take place within the overall process. When a detailed discussion of application deployment begins with the first line of code, existing processes may need to be updated. Build Process If software engineers are to deploy to their personal Kubernetes clusters, are they able to build and deploy enough of an application that they're not reliant on code running on a system beyond their workstation? And there is more to consider than just application code. Is a database required? Does the application use a caching system? It can be challenging to review an existing build process and refactor it for workstation use. The CI/CD build process may need to be re-examined to consider how it can be invoked on a workstation. For most applications, refactoring the build process can be accomplished in such a way that the goal of local build and deployment is met while also using the refactored process in the existing CI/CD pipeline. For new projects, begin by designing the build process for the workstation. The build process can then be added to a CI/CD pipeline. The local build and CI/CD build processes should strive to share as much code as possible. This will keep the entire team up to date on how the application is built and deployed. Build Artifacts The primary deliverables for a build process are the build artifacts. For cloud-native applications, this includes container images (e.g., Docker images) and deployment packages (e.g., Helm charts). When an engineer is executing the build process on their workstation, the artifacts will likely need to be published to a repository, such as a container registry or chart repository. The build process must be aware of context. Existing processes may already be aware of their context with various settings for environments ranging from test and staging to production. Workstation builds become an additional context. Given the awareness of context, build processes can publish artifacts to workstation-specific registries and repositories. For cloud-native development, and in keeping with the local workstation paradigm, container registries and chart repositories are deployed as part of the workstation Kubernetes cluster. As the process moves from build to deploy, maintaining build context includes accessing resources within the current context. Parameterization Central to this entire process is that key components of the build and deployment process definition cannot be duplicated based on a runtime environment. For example, if a container image is built and published one way on the local workstation and another way in the CI/CD pipeline. How long will it be before they diverge? Most likely, they diverge sooner than expected. Divergence in a build process will create a divergence across environments, which leads to divergence in teams and results in the eroding of the deployment-first culture. That may sound a bit dramatic, but as soon as any code forks — without a deliberate plan to merge the forks — the code eventually becomes, for all intents and purposes, unmergeable. Parameterizing the build and deployment process is required to maintain a single set of build and deployment components. Parameters define build context such as the registries and repositories to use. Parameters define deployment context as well, such as the number of pod replicas to deploy or resource constraints. As the process is created, lean toward over-parameterization. It's easier to maintain a parameter as a constant rather than extract a parameter from an existing process. Figure 1. Local development cluster Cloud-Native Microservices Development in Action In addition to the deployment-first culture, cloud-native microservices development requires tooling support that doesn't impede the day-to-day tasks performed by an engineer. If engineers can be shown a new pattern for development that allows them to be more productive with only a minimum-to-moderate level of understanding of new concepts, while still using their favorite tools, the engineers will embrace the paradigm. While engineers may push back or be skeptical about a new process, once the impact on their productivity is tangible, they will be energized to adopt the new pattern. Easing Development Teams Into the Process Changing culture is about getting teams on board with adopting a new way of doing something. The next step is execution. Shifting left requires that software engineers move from designing and writing application code to becoming an integral part of the design and implementation of the entire build and deployment process. This means learning new tools and exploring areas in which they may not have a great deal of experience. Human nature tends to resist change. Software engineers may look at this entire process and think, "How can I absorb this new process and these new tools while trying to maintain a schedule?" It's a valid question. However, software engineers are typically fine with incorporating a new development tool or process that helps them and the team without drastically disrupting their daily routine. Whether beginning a new project or refactoring an existing one, adoption of a shift-left engineering process requires introducing new tools in a way that allows software engineers to remain productive while iteratively learning the new tooling. This starts with automating and documenting the build out of their new development environment — their local Kubernetes cluster. It also requires listening to the team's concerns and suggestions as this will be their daily environment. Dev(elopment) Containers The Development Containers specification is a relatively new advancement based on an existing concept in supporting development environments. Many engineering teams have leveraged virtual desktop infrastructure (VDI) systems, where a developer's workstation is hosted on a virtualized infrastructure. Companies that implement VDI environments like the centralized control of environments, and software engineers like the idea of a pre-packaged environment that contains all the components required to develop, debug, and build an application. What software engineers do not like about VDI environments is network issues where their IDEs become sluggish and frustrating to use. Development containers leverage the same concept as VDI environments but bring it to a local workstation, allowing engineers to use their locally installed IDE while being remotely connected to a running container. This way, the engineer has the experience of local development while connected to a running container. Development containers do require an IDE that supports the pattern. What makes the use of development containers so attractive is that engineers can attach to a container running within a Kubernetes cluster and access services as configured for an actual deployment. In addition, development containers support a first-class development experience, including all the tools a developer would expect to be available in a development environment. From a broader perspective, development containers aren't limited to local deployments. When configured for access, cloud environments can provide the same first-class development experience. Here, the deployment abstraction provided by containerized orchestration layers really shines. Figure 2. Microservice development container configured with dev containers The Synergistic Evolution of Cloud-Native Development Continues There's a synergy across shift-left, cloud-native, and microservices development. They present a pattern for application development that can be adopted by teams of any size. Tooling continues to evolve, making practical use of the technologies involved in cloud-native environments accessible to all involved in the application delivery process. It is a culture change that entails a change in mindset while learning new processes and technologies. It's important that teams aren't burdened with a collection of manual processes where they feel their productivity is being lost. Automation helps ease teams into the adoption of the pattern and technologies. As with any other organizational change, upfront planning and preparation is important. Just as important is involving the teams in the plan. When individuals have a say in change, ownership and adoption become a natural outcome. This is an excerpt from DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report

By Ray Elenteny

CORE