Thoughts on Autonomous Vehicle Developer Tooling

The Autonomous Vehicle industry has a massive potential to change the world, and good developer tooling is going to be a huge part of how we get there. I designed a tool to help AV engineers rapidly assess the effectiveness and impact of code changes on driving behavior.

Why AV Developer Tooling Matters

I have been interested in the Autonomous Vehicles space for quite some time. When this technology becomes widely available it is going to have a huge positive impact on so many aspects of society.

A key element of bringing this potential of Autonomous Vehicles to reality is creating tooling that allows Autonomous Vehicle engineers to rapidly iterate on AV technology while working effectively with safety considerations.

Developer tooling is a force multiplier, enabling insight and innovation. This is especially the case in the Autonomous Vehicles context, where the possibility space of outcomes is so vast, and the potential safety risk is so high.

Developer Challenges

Not long ago I had the pleasure of speaking briefly with a designer at Applied Intuition about a few of the challenges faced by AV Developers:

  • How do I know that the latest version of our AV software is better than previous versions?
  • If I make a code change to improve our lane change behavior, what other behaviors of the car are impacted?

I've been really interested in these problems, so as an exercise, I decided to design a tool that would help AV developers rapidly assess the effectiveness and impact of code changes on driving behavior by augmenting an Integrated Development Environment with data visualization and simulation workflows tailored specifically to the AV context.

An AV Development Environment

I set out to design a tool that would help AV engineers rapidly assess the effectiveness and impact of code changes on driving behavior.

For the sake of this exercise, I chose to hone in on a scenario of iterating on lane change behavior and analyzing results in that context, but I believe the same tool could be used to work on nearly any aspect of driving behavior as the underlying workflow remains similar no matter the context:

  • Come up with ideas for new driving strategies or improvements to existing behaviors
  • Create a new branch and work towards implementing the new strategy/behavior
  • Iterate in branch with near-realtime feedback via integrated simulation & reporting
  • When satisfied with the results or interested in further analysis based on realtime feedback / progress, submit branch for deeper analysis

Here's my final concept from this exercise - an integrated visual test environment for driving behavior delivered as a VS code extension. It would allow AV developers to get near-instant feedback on how code changes affect driving behaviors across a variety of scenarios from within their code editor. The Simulation Results panel provides an overview of results across different scenarios and allows developers to quickly identify problem areas. Clicking any of the scenarios allows you to play back rich simulation visualizations to understand what went right or wrong.

Final Concept Mock

My Process

A good notebook is a designer's best friend. Before actually designing this tool, I spent some time thinking through the problem space and writing and sketching out my thoughts. Below are some pages from my brainstorming session.

Apologies in advance for my handwriting...

Coming up with the design brief and thinking through some related scenarios

What do AV developers need from a tool?

One of the unique challenges of working in the AV space is that changes made to driving behavior will be released "into the wild" to operate in highly dynamic and unpredictable traffic situations, and that human lives are at stake.

Subtle changes to driving behavior may have unpredictable effects given the vast array of scenarios in which that behavior may be triggered. As a result it's incredibly important to provide direct insight into the effects of driving behavior changes across multiple scenarios.

When thinking about how a tool could aid in this process, my focus was on providing direct feedback to during the development process in a way that maximizes foresight while maintaining a reasonable balance with compute requirements.

In order to be able to iterate quickly on driving behaviors, developers would need:

  • Realtime feedback at the time changes are made
  • The ability to test across a variety of common scenarios
  • The ability to test various edge cases
  • The ability to programmatically set and monitor safety tolerances
  • A visual display of results information across all of the above behaviors to aid in decision making

The design I propose is a greenfield design, but it's based on a few fundamental assumptions:

  • The ability to quantify standards and tolerances for safe driving behavior
  • The ability to simulate a handful of traffic scenarios quickly and efficiently
  • The existence of a comprehensive repository of edge case scenarios for simulation
  • The ability to summarize driving behavior into a single metric of performance

Each of these assumptions is likely a huge problem itself worthy of its own article but outside of the scope of this exercise.

I am sure there are a myriad of other constraints that I am not taking into account here, but hey, this is just an exercise!

Traffic scenarios and edge cases related to lane change behavior

When thinking about how to design this tool my next step was to put myself in the mind of an AV developer to actually think through the problem space.

I brainstormed a few common scenarios for lane change behavior, along with edge cases and related scenarios:

  • Changing lanes to merge between 2 vehicles
  • Changing lanes while overtaking another vehicle
  • Changing lanes while merging behind another vehicle
  • "Cut off" merge collision avoidance
  • Hard brake events
  • Yielding to a vehicle approaching from behind
  • Sudden obstructions
  • Low visibility conditions
  • Unprotected left turns
  • Chained lane changes while merging through traffic, with combinations of the above

To expand the possible scenarios even more, each of these situations is subject to various permutations including

  • Vehicle accelleration
  • Vehicle deceleration
  • Stop & Go traffic
  • Road obstacles
  • Slower moving road users including cyclists and pedestrians
  • Defensive driving considerations
  • Weather and road quality conditions

When you take these permutations and combinations together, there's a huge breadth of possible scenarios for AV developers to consider when modifying driving behavior. This represents a unique challenge for AV development that does not exist in other areas of software engineering, and a huge opportunity for tooling to make a positive impact.

Based on this understanding of the domain, I am confident that rapid simulation and reporting across a handful of important traffic scenarios during the development process has the potential to improve developers ability to effectively modify driving behaviors.

Thinking through some more scenarios and workflow

Imagining the tool

Developer preferences around code editors varies widely, but if there's any consensus right now it's that VS Code is one of the best ever built. It was also created with extendability built in, making it a perfect candidate platform to expand on for this exercise.

The team at Microsoft, in addition to developers around the world, have been hard at work making VS Code the best it can be for years, so rather than focusing on the code editing aspect of AV software development tooling, I chose to focus on how I might be able to augment a tool such as VS code with functionality specific to the needs of an AV developer.

Presumably, this addition could be realized via a VS Code extension, making the editor even better than it is today.

While I have high confidence that running simulations in realtime has a huge potential to help AV developers understand changes, I'm also well aware that the compute power required to perform a robust simulation across many traffic scenarios is huge and infeasible to complete mid-workstream.

So I got to thinking: how could this tool be set up to offer quick feedback during development while also supporting a mechanism for robust simulation and analysis?

I think the best of both worlds is to support a 2-stage simulation and analysis process:

  • Common Scenarios & important edge cases are rapidly simulated and reported on during development
  • Promising results from the rapid feedback phase can be submitted to a second stage for a more robust simulation and analysis
Thinking through visualizations to support the development process

Data visualization and realtime developer feedback

Small multiples is a powerful technique in data visualization, and is well suited for the AV development use case because developers need an understanding of the impacts of their changes across multiple traffic scenarios.

An AV generates a huge stream of potential data sources to use in the feedback process. For the sake of this exercise, I came up with 4 data visualization candidates for providing developers feedback on their changes during the development process.

  • Low fidelity traffic scenario visualization: a video clip showing the behavior of the AV and surrounding vehicles during a traffic scenario, with safety zones for surrounding vehicles clearly visualized. This is the 'richest' data visualization, and likely the most expensive in terms of compute, but important because it makes driving behaviors tangible
  • AV Velocity and control inputs visualization: visualization of the throttle & steering inputs to the vehicle that's synced with the video, superimposed above a vector visualization of the vehicle's velocity. This should be useful in combination with the simulation clip for understanding the actions that the AV is taking, as well as it's understanding of 'self' during the traffic scenario
  • Metrics readout: this is the least concrete idea in my mind as I don't have a great idea of what the relevant metrics would be for a driving scenario, but some possibilities include: detected objects distance, vehicle speed, planned behavior execution time, and defensive driving measurements. Presumably this would be useful additional data to provide context for the scenario and vehicle control input visualizations. in order to make this information more "visual," metrics that deviate from expected tolerances might be highlighted with a high visibility background color
  • Pass / Fail indicator and overall driving score: This is perhaps the most important visualization - a large color block with clear indication of scenario pass/fail based on safety and driving performance criteria. In addition to a simple pass/fail, it's supplemented by a summary score that provides insight into the magnitude of the pass/fail in the scenario. Scanning these blocks in aggregate provides granular feedback on the overall success

The above four visualizations should be created for each of a handful of top traffic scenarios related to the driving behavior that's being modified, and the first three (scenario viz, velocity control viz, and metrics readout) should be linked so that you can scrub through and see data from the same point in time as the video clip.

In addition to the small multiples visualizations, there should be an additional section in the tool with pass/fail indicators and overall driving scores for some relevant edge cases. Since these edge cases are included to find potential pitfalls of any changes early and don't represent the breadth of all possible edge cases, I figure a simple pass fail at this stage would be sufficient. Omitting the full small multiples visualization by default for edge cases is also a consideration to optimize performance of the tool. If the developer is interested in diving deeper into one of the edge cases bases on the pass / fail readout, perhaps there could be an option to click on the edge case and have that box morph into a full small multiples visualization on demand for closer inspection.

In order to further optimize performance while maximizing value from rapid feedback, this overall visualization array could be generated every time that the developer saves the file that they are working on. This is a common pattern from other testing tools i.e. Jest's cli watch command. This was a deliberate choice, as generating data visualizations in this manner is akin to running a suite of unit and integration tests on save. 

I've found this type of feedback extremely helpful in my own software development practice, and there are applications for development methods such as TDD as well as simple experimentation. Although the scenario visualization clips are likely the 'heaviest' part of this visual test suite, in many ways they are the most important - in my frontend development work, HTML logging has been immensely helpful to understand what's going on 'inside the code' in a relatively simple scenario. I can only imagine that the usefulness of a video clip would be even higher for understanding the behavior of a vehicle as it interacts with its surroundings at speed.

Developer workflow

While data visualizations can often be flashier than they are useful, the whole point of this exercise is to create a tool that helps enable a better AV developer workflow. I imagine that the developer workflow with this tool might go something like this:

  1. Come up with ideas for new driving strategies or improvements to existing behaviors
  2. Create a new branch and work towards implementing the new strategy/behavior
  3. Iterate in branch with near-realtime feedback via integrated simulation & reporting
  4. When satisfied with the results or interested in further analysis based on realtime feedback / progress, submit branch for robust simulation, reporting, and analysis

Step 1 is a bit outside the scope of this tool - I imagine that the concept generation part of the process probably happens at a whiteboard, as a result of reading some new research papers, or as the beginning of a new loop when analyzing results from step 4.

Step 2 and 3 are the primary scope of what I've covered in this design - the actual moment-to-moment developer experience and rapid feedback tooling to support it. As a developer works on building out the instructions for new driving behavior, they will have an opportunity to quickly scan across the pass/fail summary row to get a general sense of the impact their changes have and how far they are from the goal. When they have reached a point where they want to ensure that their changes have gone 'very well' or to understand why they went 'very badly' (which has just as much potential to be informative), the next step is to make a commit and kick off a CI/CD process that will submit the code changes for more robust simulation.

Step 4 is where things start to get really interesting, because once the code has been submitted, we start getting into the realm of team collaboration and analytics. This is a bit out of the scope of this exercise but I will elaborate on some of my thoughts below.

Concept Mocks

Additional considerations and future directions

Performance & scale

To be quite honest, I'm not entirely sure how feasible it is to run simulation of roughly 10 scenarios and edge cases on a single machine in order to generate the visualizations I described in 'realtime' during development. There are certainly ways to further minimize the compute requirements, such as defaulting to pass/fail with complex visualizations on demand for all simulations, and allowing configurability around the number of scenarios and edge cases to run each time.

Mapping of important scenarios and edge cases to driving behaviors being modified

A nontrivial consideration for the success of a tool such as the one I describe is establishing a comprehensive mapping of the scenarios that are relevant to the driving behaviors being modified. I have tried to take my best guess at what these might be for lane change behavior, but I would guess that behaviors such as pick up/drop off, pedestrian crossing response, defensive driving tactics, etc. have a different cluster of pertinent scenarios to simulate. Designing so that developers have sensible defaults but agency and flexibility in determining which scenarios to run would be helpful for experimentation during development.

Extended workflows / collaborating on AV development in a team environment

The final step of my imagined developer workflow is to submit prospective driving behavior changes, and run them through a standardized and more robust simulation pipeline. I imagine that this could be done in the cloud and would probably need to run ads part of a CI/CD process. The results of this robust simulation and reporting pipeline might not look too different from what I've proposed for a real-time developer experience.

I imagine that the number of scenarios and edge cases that are simulated at this phase would be much higher and that there might be some additional meta-statistics generated for easy comparison between branches and over time.Perhaps the compute cost of running robust simulations across candidate branches is even so high that there might need to be a third (fourth, fifth?) stage of simulation and analysis to achieve high confidence in results before testing new behaviors on the road.

Nevertheless the key element here is that there's a centralized repository for reviewing and visually analyzing results across multiple behavior change candidates, hopefully allowing developers to see patterns in what's working or where the new behavior models are coming up short. The focus for this tool is less on code review (rich ecosystem of existing tools) and more on analyzing behavioral outcomes comprehensively. Looking at all the code changes in this zoomed out way is a potential jumping off point for deeper analysis of the source code as well as a dashboard of progress over time. Ideally as driving behavior is improved, the metrics should trend up and green.

Let's connect

Life is better with friends