The Autonomous Vehicle industry has a massive potential to change the world, and good developer tooling is going to be a huge part of how we get there. I designed a tool to help AV engineers rapidly assess the effectiveness and impact of code changes on driving behavior.
I have been interested in the Autonomous Vehicles space for quite some time. When this technology becomes widely available it is going to have a huge positive impact on so many aspects of society.
A key element of bringing this potential of Autonomous Vehicles to reality is creating tooling that allows Autonomous Vehicle engineers to rapidly iterate on AV technology while working effectively with safety considerations.
Developer tooling is a force multiplier, enabling insight and innovation. This is especially the case in the Autonomous Vehicles context, where the possibility space of outcomes is so vast, and the potential safety risk is so high.
Not long ago I had the pleasure of speaking briefly with a designer at Applied Intuition about a few of the challenges faced by AV Developers:
I've been really interested in these problems, so as an exercise, I decided to design a tool that would help AV developers rapidly assess the effectiveness and impact of code changes on driving behavior by augmenting an Integrated Development Environment with data visualization and simulation workflows tailored specifically to the AV context.
I set out to design a tool that would help AV engineers rapidly assess the effectiveness and impact of code changes on driving behavior.
For the sake of this exercise, I chose to hone in on a scenario of iterating on lane change behavior and analyzing results in that context, but I believe the same tool could be used to work on nearly any aspect of driving behavior as the underlying workflow remains similar no matter the context:
Here's my final concept from this exercise - an integrated visual test environment for driving behavior delivered as a VS code extension. It would allow AV developers to get near-instant feedback on how code changes affect driving behaviors across a variety of scenarios from within their code editor. The Simulation Results panel provides an overview of results across different scenarios and allows developers to quickly identify problem areas. Clicking any of the scenarios allows you to play back rich simulation visualizations to understand what went right or wrong.
A good notebook is a designer's best friend. Before actually designing this tool, I spent some time thinking through the problem space and writing and sketching out my thoughts. Below are some pages from my brainstorming session.
Apologies in advance for my handwriting...
One of the unique challenges of working in the AV space is that changes made to driving behavior will be released "into the wild" to operate in highly dynamic and unpredictable traffic situations, and that human lives are at stake.
Subtle changes to driving behavior may have unpredictable effects given the vast array of scenarios in which that behavior may be triggered. As a result it's incredibly important to provide direct insight into the effects of driving behavior changes across multiple scenarios.
When thinking about how a tool could aid in this process, my focus was on providing direct feedback to during the development process in a way that maximizes foresight while maintaining a reasonable balance with compute requirements.
In order to be able to iterate quickly on driving behaviors, developers would need:
The design I propose is a greenfield design, but it's based on a few fundamental assumptions:
Each of these assumptions is likely a huge problem itself worthy of its own article but outside of the scope of this exercise.
I am sure there are a myriad of other constraints that I am not taking into account here, but hey, this is just an exercise!
When thinking about how to design this tool my next step was to put myself in the mind of an AV developer to actually think through the problem space.
I brainstormed a few common scenarios for lane change behavior, along with edge cases and related scenarios:
To expand the possible scenarios even more, each of these situations is subject to various permutations including
When you take these permutations and combinations together, there's a huge breadth of possible scenarios for AV developers to consider when modifying driving behavior. This represents a unique challenge for AV development that does not exist in other areas of software engineering, and a huge opportunity for tooling to make a positive impact.
Based on this understanding of the domain, I am confident that rapid simulation and reporting across a handful of important traffic scenarios during the development process has the potential to improve developers ability to effectively modify driving behaviors.
Developer preferences around code editors varies widely, but if there's any consensus right now it's that VS Code is one of the best ever built. It was also created with extendability built in, making it a perfect candidate platform to expand on for this exercise.
The team at Microsoft, in addition to developers around the world, have been hard at work making VS Code the best it can be for years, so rather than focusing on the code editing aspect of AV software development tooling, I chose to focus on how I might be able to augment a tool such as VS code with functionality specific to the needs of an AV developer.
Presumably, this addition could be realized via a VS Code extension, making the editor even better than it is today.
While I have high confidence that running simulations in realtime has a huge potential to help AV developers understand changes, I'm also well aware that the compute power required to perform a robust simulation across many traffic scenarios is huge and infeasible to complete mid-workstream.
So I got to thinking: how could this tool be set up to offer quick feedback during development while also supporting a mechanism for robust simulation and analysis?
I think the best of both worlds is to support a 2-stage simulation and analysis process:
Small multiples is a powerful technique in data visualization, and is well suited for the AV development use case because developers need an understanding of the impacts of their changes across multiple traffic scenarios.
An AV generates a huge stream of potential data sources to use in the feedback process. For the sake of this exercise, I came up with 4 data visualization candidates for providing developers feedback on their changes during the development process.
The above four visualizations should be created for each of a handful of top traffic scenarios related to the driving behavior that's being modified, and the first three (scenario viz, velocity control viz, and metrics readout) should be linked so that you can scrub through and see data from the same point in time as the video clip.
In addition to the small multiples visualizations, there should be an additional section in the tool with pass/fail indicators and overall driving scores for some relevant edge cases. Since these edge cases are included to find potential pitfalls of any changes early and don't represent the breadth of all possible edge cases, I figure a simple pass fail at this stage would be sufficient. Omitting the full small multiples visualization by default for edge cases is also a consideration to optimize performance of the tool. If the developer is interested in diving deeper into one of the edge cases bases on the pass / fail readout, perhaps there could be an option to click on the edge case and have that box morph into a full small multiples visualization on demand for closer inspection.
In order to further optimize performance while maximizing value from rapid feedback, this overall visualization array could be generated every time that the developer saves the file that they are working on. This is a common pattern from other testing tools i.e. Jest's cli watch command. This was a deliberate choice, as generating data visualizations in this manner is akin to running a suite of unit and integration tests on save.
I've found this type of feedback extremely helpful in my own software development practice, and there are applications for development methods such as TDD as well as simple experimentation. Although the scenario visualization clips are likely the 'heaviest' part of this visual test suite, in many ways they are the most important - in my frontend development work, HTML logging has been immensely helpful to understand what's going on 'inside the code' in a relatively simple scenario. I can only imagine that the usefulness of a video clip would be even higher for understanding the behavior of a vehicle as it interacts with its surroundings at speed.
While data visualizations can often be flashier than they are useful, the whole point of this exercise is to create a tool that helps enable a better AV developer workflow. I imagine that the developer workflow with this tool might go something like this:
Step 1 is a bit outside the scope of this tool - I imagine that the concept generation part of the process probably happens at a whiteboard, as a result of reading some new research papers, or as the beginning of a new loop when analyzing results from step 4.
Step 2 and 3 are the primary scope of what I've covered in this design - the actual moment-to-moment developer experience and rapid feedback tooling to support it. As a developer works on building out the instructions for new driving behavior, they will have an opportunity to quickly scan across the pass/fail summary row to get a general sense of the impact their changes have and how far they are from the goal. When they have reached a point where they want to ensure that their changes have gone 'very well' or to understand why they went 'very badly' (which has just as much potential to be informative), the next step is to make a commit and kick off a CI/CD process that will submit the code changes for more robust simulation.
Step 4 is where things start to get really interesting, because once the code has been submitted, we start getting into the realm of team collaboration and analytics. This is a bit out of the scope of this exercise but I will elaborate on some of my thoughts below.
To be quite honest, I'm not entirely sure how feasible it is to run simulation of roughly 10 scenarios and edge cases on a single machine in order to generate the visualizations I described in 'realtime' during development. There are certainly ways to further minimize the compute requirements, such as defaulting to pass/fail with complex visualizations on demand for all simulations, and allowing configurability around the number of scenarios and edge cases to run each time.
A nontrivial consideration for the success of a tool such as the one I describe is establishing a comprehensive mapping of the scenarios that are relevant to the driving behaviors being modified. I have tried to take my best guess at what these might be for lane change behavior, but I would guess that behaviors such as pick up/drop off, pedestrian crossing response, defensive driving tactics, etc. have a different cluster of pertinent scenarios to simulate. Designing so that developers have sensible defaults but agency and flexibility in determining which scenarios to run would be helpful for experimentation during development.
The final step of my imagined developer workflow is to submit prospective driving behavior changes, and run them through a standardized and more robust simulation pipeline. I imagine that this could be done in the cloud and would probably need to run ads part of a CI/CD process. The results of this robust simulation and reporting pipeline might not look too different from what I've proposed for a real-time developer experience.
I imagine that the number of scenarios and edge cases that are simulated at this phase would be much higher and that there might be some additional meta-statistics generated for easy comparison between branches and over time.Perhaps the compute cost of running robust simulations across candidate branches is even so high that there might need to be a third (fourth, fifth?) stage of simulation and analysis to achieve high confidence in results before testing new behaviors on the road.
Nevertheless the key element here is that there's a centralized repository for reviewing and visually analyzing results across multiple behavior change candidates, hopefully allowing developers to see patterns in what's working or where the new behavior models are coming up short. The focus for this tool is less on code review (rich ecosystem of existing tools) and more on analyzing behavioral outcomes comprehensively. Looking at all the code changes in this zoomed out way is a potential jumping off point for deeper analysis of the source code as well as a dashboard of progress over time. Ideally as driving behavior is improved, the metrics should trend up and green.