Using Preview Environments To Accelerate UX Feedback

10 min readApr 5, 2023

Depending on the project, agile workflows can be further adapted to fit the needs of not only the team, but also business. Using Kubernetes and automated pipelines, the feedback loop for UX changes can be shortened with the addition of preview environments. The results are less round trips to fulfill UX requirements, which leads to an increase in business value generated per time spent.

Introduction

Over time, the way development teams are managed and accomplish work has changed from strict, static structures to fluid ones, where the border of interaction is steadily decreased and inter-communicative workflows take hold. Whilst there are certainly also negative aspects of agile workflows, the opportunities they offer can be very beneficial. A quicker feedback and test loop has proven over and over again to benefit not only the development teams, but also the business behind a given project. In addition to this, automated pipelines, often referred to as CI/CD pipelines, allow teams to do more effective work with their time — by automating away repetitive workflows, for example creating automatic deployments of new features for the business to quickly evaluate. But this by itself is not a task with a set finish line. New features, user groups, devices, requirement changes, etc. can all lead to changes required for existing workflows, or even completely new ones. Or just simply measuring velocity of developers, how many tickets are re-opened can lead to insights that reveal possible cases that can be optimized.

Finding problematic cases

The first step is to look at the current environment present in the project. In ours, we use a trunk-based development workflow with short-lived feature branches. These are merged to the development branch using pull requests, where we also find our first set of triggers and automated workflows. These workflows involve several steps, from unit tests and system tests to image builds. Whilst some might argue that this goes against trunk-based development where the idea is to commit often and fast to the development branch, using these flows we can optimize the time of the individual developer. These flows take care of repetitive tasks like testing, which also helps the reviewers assess the quality of the change. The reviewer can therefore focus on the important parts: Does the implementation fulfill the requirements of the business? Are both positive and negative test cases present that cover real use cases? Does the coding style, naming, etc. fit the project? This helps our developers focus on one thing: generating more business value per time spent on the project.

*Figure 1: Example trunk-based development workflow using short lived feature branches.*

This workflow, with all triggers and automations is relatively easy to integrate and get right, and can be used for backend as well as frontend code, which works well enough in our monolithic application. One important difference here is that the frontend has not only requirements to the code, but also to the user experience. Whilst it can be argued that you can also create automated tests for this, you always need a first session where actual users tell you how they feel about it. This can be called User Acceptance Testing, short UAT, and it allows new UX features and changes to be evaluated with a selected group of users before rolling it out to the masses. In our use case, as soon as a pull request is merged onto the development branch, it is automatically deployed onto the staging environment that can be accessed and used to evaluate finished tickets. This is the same for UX changes: once these are merged to development, they are deployed and UAT testing can commence. Figure 2 shows an example git history of this workflow.

*Figure 2: Git history with visualization of staging environment deployments in a normal workflow*

The example illustrated in figure 2 is relatively simple, but shows a critical issue clearly. For a new feature x, we have a ticket that implements the required changes in the backend for it. Frontend work usually starts at the same time for the new elements, but is blocked until the API is finished. Once the frontend is also merged into the development branch, the evaluation of feature x can start through UAT. As our deployments to production are triggered manually, we don’t have the issue of accidentally pushing changes to users.

The UAT can go several ways: they can accept the change as is, they can find bugs related to uncovered business edge cases, or they might just come to the conclusion that the design does not fit the workflow of the user.

The feedback loop here is beneficial for everyone: The team can catch problems before they make it to the production environment, before angering users with implementations that just don’t fulfill their needs, which then require near time action that can disrupt the development team.
Still, there are several pain points in this workflow. The first one would be repetitive time sinks. Remember: each time a pull request is made, code reviews are done. This requires the reviewers to understand the requirements at hand, the changes and if they appear to meet the goal. This also increases workload on the developers, as they need to constantly go back and forth to tickets that are “finished”, so it limits their efficiency. On the other hand, this also makes releasing more complex, as releasing now would mean you have to pick single changes to push forward, which adds manual work and can delay the release of other important features. In one example case, UAT had problems correctly testing a new feature as there were two simultaneous changes and there was no clear distinction about where they came from, which delayed the testing of both changes.

As shown, we have a case that is not working perfectly smooth with our current workflows. This requires us to change the workflow to better cover, or completely eliminate, the defined pain points.

Improving the feedback loop

To help cover this, the workflow is adapted. The first step is always to find the pain points and the root cause. The pain points have already been described above, now the remaining question is the one of the root cause. This can easily be seen with figure 2, where we can see that there is too big of a gap between the frontend developer implementing UX changes and the UAT team analyzing these changes. When the developer thinks he is done, only half of his ticket is actually done; it might very well be that he has to reopen it several times again in the following days for some UAT related fixes/changes. By the time he gets feedback from the UAT, he might already be knee-deep in the next ticket. To cover this and reduce the feedback distance from UAT ↔ developer, we decide to move the point in time where UATs are done directly to the point in time after the PR for the frontend changes was opened. As this is how the developer usually confirms that his changes are “ready for review”, it’s the earliest point where UAT can start.

Technical implementation

Of course, this is easier said than done. With the solution in mind, the technical implementation has to be exactly defined, and the possible risks evaluated. Figure 3 shows the general idea of how UAT can be enabled during a pull request. Key point being the deployment and accessibility of the frontend for each pull request. To reduce the amount of necessary deployments, we add a simple filtering mechanism to the workflow: only pull requests labelled with frontend should be deployed.

*Figure 3: Workflow designed to help accelerate the feedback loop*

Using the conceptualized workflow, the team discusses it and agrees on its implementation. Luckily, our existing environment and tooling already offers the required foundations for the implementation. Having a cloud native application with extensive automations does have its benefits. Parts of the workflow could stay as is; some could be easily adapted, leaving only a minimal part to require new implementations.

*Figure 4: Technical workflow of the solution*

The red circled part in figure 4 shows what already exists and can be reused as is. This is one part of the previously discussed automated steps that help reviewers assess the state of the PR, namely the build step. The other parts are either completely new, or adaptation of something that already exists. For example, we already have workflows that automatically tag pull requests if they’re for frontend, backend or both. ArgoCD is used to facilitate and automate deployments. In our case, upon the creation of a PR with the tag frontend, it generates the necessary Infrastructure-as-Code defintions in combination with the hash of the latest commit on the PR, and uses that to deploy this new preview setup in our kubernetes environment. For this we gave our kubernetes environment a new namespace called “preview”, specifically for these use cases. The deployment then continuously fetches the freshly built image from the given docker registry. At the same time the existing issuers for https certificates are reused to generate the necessary certificates using LetsEncrypt¹, to fulfill the security requirements of the preview environments. As soon as the deployment is complete, the PR is updated with the necessary information as to which URL to use to access the new preview environment.

*Figure 5:* Example kubernetes environment

Figure 5 showcases our kubernetes environment, containing the default namespace that is used for the staging environment, as well as the new preview namespace created specifically for this new workflow. To make sure that the UAT teams have access to the deployed preview environment, ingress resources are used in each deployment, which inform the nginx ingress controller of any routing it should be aware of.

Security

As always, security is a big concern that needs to be taken seriously. In the example of figure 5, we can see that the pod running the API backend is only present in the staging deployment, which is done by choice. On one hand, it reduces the overall size of each preview environment, on the other hand, it allows the comparison of results of the same data on the current staging UX with the one in the preview environment. For this, we need access to the backend API to be able to pull data. This is possible by using kubernetes Services of type external name, which allows us to easily create connections between different namespaces and deployments for the access of different services.
Another security risk is the possibility of malicious/faulty code being deployed. To cover this, all containers run in the preview environments have highly limited privileges and capabilities to limit the potential damage that can be done.
Another covered case is when a developer applies changes to the actual workflow that creates these preview environments. If he can just change the limited privileges and enable full access, it is still automatically deployed and damage can be done before anyone realizes. The solution here is to always use the defined workflow that is currently present in the development branch and not the one in the feature branch of the pull request. As the development branch is protected, we can be sure that the workflow defined there fulfills our needs and security requirements.
The last part of the security measures are highly restrictive IP rules, set up using kubernetes NetworkPolicies. The preview environments have their egress traffic to other clusters highly restricted, so that the only possible communication is through the defined API services.

Results

Measuring the results of the changes can be done relatively quickly, and already in the first sprint after implementing the change we could see a reduction in re-opened UX tickets. Whilst figure 5 suggests that we are moving away from trunk-based development as our commit frequency to the development branch decreases, we are merely moving the goal post. Instead of aiming for high commit throughput, we aim to have a high throughput of commits that fully fulfill business requirements. When it comes to project management, this makes it much easier to see how long tickets are taking and why — which can otherwise involve multiple round trips between teams to get the right information together. It is important to note that the code review does not have to wait until UAT is done; it can very well, and should, be done in parallel, which further speeds up commit frequency. It also facilitates giving better feedback: combining code review, as well as UAT, enables the team to give more feedback to the people defining the tickets themselves; as badly written requirements in tickets can also lead to an increased time spent on implementations and testing.

*Figure 5: Git history with visualization of staging environment deployments with the optimized workflow*

Thanks to this, our developers can now focus more on adding business value with their time, rather than revisiting previously finished tickets. This comes to the benefit of everyone, and can be measured easily. After a sprint of using the preview environments for UX tickets, our metrics revealed an 87.5% reduction in reopened tickets, as shown in figure 6. Please note that this is also a very short timespan of 1 sprint for each workflow; confirming a real, long-lasting gain needs longer measurements.

*Figure 6: Pie charts of the distribution of UX tickets that were reopened multiple times and UX tickets finished in one go*

Conclusion

As shown, workflows and automations are an excellent enabler for teams to be more productive, and not exclusive to developers only! As with many things when it comes to DevOps, this is only a small part of it and certainly not applicable to every project out there. Depending on the workflows your project has, this might not work for you. In the end, beyond the simple, generic steps that can be done basically anywhere (automatic tests, builds), it requires specific analysis of the workflows and pain points a team faces. After this, a thorough analysis can often reveal the parts that can be optimized away using triggers and automated workflows. But the most important part is to realize that you’re never done optimizing your workflows, because your work is never done changing.

[1] Remember that LetsEncrypt has some limitations to the number of certificates you can generate in a given timespan!