Cameron Yick
Great.
Hello, everyone.
Please raise your hand if any of this sounds familiar.
Have you ever worked on a project with more than one file?
And then have you ever drawn rectangles and lines on a whiteboard somewhere to explain to somebody how those files worked?
Nice, so many people.
And then have you ever felt like saying, “I’ve fallen into my code, and I can’t get up?” If not you, maybe a teammate.
So I’ve been there, and I think there’s a better way.
The origins and applications of automatic visualizations
In this talk, we’re going to explore how the world of generated visualizations of frontend code could make the developer experience just a little bit better.
So to understand the origins of this idea, you should know I’ve always liked pictures of how pieces of a system fit together.
In electrical engineering, we had circuit diagrams, with both an abstract and a physical view.
As a data engineer, I leaned on Airflow’s task dependency diagrams to show how pieces of a data pipeline fit together.
Currently, I work at Datadog where we make tools for visualizing data about your cloud infrastructure.
This is a view of our network topology graph, which shows you throughput and how different pieces of your system talk to each other, but I also like to explore this sort of thing in my free time.
So I take the train every day, and this is the Metro-North Rail in New York City, and I reused a technique from 19th-century France to see which tracks and stations are busiest.
And, unlike the paper version, because it’s React, you can filter and regenerate the graphic any time the underlying data changes.
I also experiment with making data-driven thank you cards for the interns on our team because, usually when you leave, you don’t get to keep your data.
So this is generated from people’s GitHub and Trello data, and everybody gets a unique snowflake.
I’d like to open source this soon, so stay tuned.
Automatic visualizations can help you find your way around code
But let’s talk about why you also might want to have some automatic visualizations in your life.
It’s pretty simple.
I think feeling lost in your code feels really bad and there’s no 911 for you to call.
And there are so many ways in which this can happen.
When you switch to working on a new feature which just went from prototype to prod so there’s no documentation, or you’re poking at the internals of some new npm dependency, maybe you’re on call and the stack trace puts you into an area of code you’ve never been in before.
Let’s think about how we get unlost when we’re in the real world.
First, you wander around, you look for landmarks, and these help you orient yourself.
You may ask for directions, hopefully, from someone who knows their way around and isn’t a tourist as well.
But, most of the time, you have some sort of map or GPS which shows you both where you are and where you might go next.
Let’s compare this to the code world.
So we also have wandering but it looks a little bit different.
You’re scrolling in files, you’re using Ctrl+F, maybe you’re using grep, and your signs look like comments or maybe a README.
You can also ask for directions.
You can use git blame, maybe your project has a CODEOWNERS file and you look for an expert in your feature.
But when it comes to maps, we don’t really have a GPS, there is no Google Maps.
And this talk was born from a desire to explore how we can make the code world just a little bit more like the real world, at least for mapping.
Mapping your code with automatic visualizations
So there are many types of data visualization that could help us, but we’ll focus on maps.
And many people fixate on it being something from the physical world, but I just like how it’s a graphical representation that lets you think about something as if it took up real space, because our brains like metaphors a lot.
Now, for a map to be effective, it has to be abstract.
The only way to include every single detail would be to just look at every single line of your code, and that doesn’t save you any time.
So to be useful, a map must leave information out, and that’s considered a feature and not a bug.
So modularity and information hiding are important in software engineering, but we see these in maps too.
So Massimo Vignelli spent eight years redesigning the New York City subway signage, and part of the project was to make the schematic diagram of the subway lines.
And this diagram focuses on information that you can see when you’re underground but it skips all the details aboveground, like where the streets are.
Now, it’s a really clean design and it’s informed by user research, but it’s not the map we use today.
This is the actual MTA map.
It conveys information about aboveground, and it reveals too that the lines are not perfectly parallel to each other.
You might be asking, “Which map is right?”
Well, it depends.
More information isn’t helpful if it’s irrelevant to what your users are trying to do, so a good design is going to be informed by the specific problem that your readers have.
In other words, I’d say that neither map is correct because the ideal abstraction level is going to be situation-specific.
With this in mind, today we’ll focus on the information that you need when you’re going to change your code, whether you’re adding a feature or fixing a bug.
It’s not just about getting to a destination.
A good map provides you with information that you didn’t ask for because when you’re lost, you don’t know where you might need to be cautious.
In other words, I’m talking about the dragons in your codebase.
So think about the parts that are waiting to bite the next person who joins your team.
Maybe it’s a function that depends on a global, maybe it’s a file with no test coverage, but maybe having a visual aid to organize this information could be helpful.
Drawing inspiration from the past
So here’s our plan for today.
We’re gonna look to the past, we’re gonna go to the ’90s, we’ll move on to some tools that you can use right now, and we’ll look to apply some newish ideas from outside the world of software to the diagrams we saw earlier.
So, in preparing for this talk, I spent a few weekends reading some HTI papers, looking at how people study developers as they change code, and this one comes from Jimmy Koppel.
To provide some context, let’s address the main concern I hear when I first tell people about this idea of a code diagram.
Usually, I hear something like, “My project had an architecture diagram at the start, but then we built it just a little bit differently and then nobody went back and changed it.”
And the diagram never got revisited because just like a bad translation, an incomplete or a misleading map is worse than not having one at all.
So, the Software Reflexion Model paper from 1995 addresses this head-on.
The authors proposed a technique in C and Java to bridge the gap between source and high-level models and it was used to help refactor Excel, which is millions of lines.
Before we go through an example, here are the steps.
So first, you’ll draw a concept model, which looks a lot like the flowcharts that you might be familiar with.
Then, you’ll relate each other’s concepts to files in your project.
Then, you’ll run some sort of automated dependency extractor.
And then, you’ll combine those earlier three steps and highlight the differences.
Now, I wanna spend a little bit more time before we see an example of why these differences are so important.
I think this is where bugs hide.
Because when the relationships in your mental models do not match what the code is actually…how it’s actually connected, you’re going to have a really hard time solving your problems and you’ll probably introduce bugs when you’re trying to fix things.
So here’s an example reflexion map from one of Professor Murphy’s slides.
On the left here, we have the plan, and on the right is what actually happened.
But to make this a little clearer, I’m gonna walk you through a small React example.
So here we have a concept model of the container/view pattern that we heard about from Becca and Tanya yesterday.
Each of these edges represents an import, so we have our container which holds our business logic in some state, we have our view function, and the intentionally vaguely named Data API, that will become important later.
So here’s a little bit of code for each of those concepts.
We have our data fetcher, we have our view component, and we have our container component, but don’t worry about the lines—I’ll post them later.
We then turn the reflexion crank.
So you make a regular expression mapping each one of those files to your concepts, then you run a dependency analysis tool.
And because this is a greenfield project, everything lines up (hopefully)—and in this case, it does, everything matches. But let’s consider what happens a few sprints later.
So, a new API was added and someone tells you they did some refactoring, but all your unit tests still pass because you were testing the interface (which is the container), so maybe everything’s okay.
So here are the changes.
Once again, don’t worry about the lines, but, like I said, some functionality was rearranged and some new things were added.
So we turn the reflexion crank once again and our graphic is quite different.
There’s an edge we didn’t expect and an edge that we thought would be there is gone.
So, reflexion has shown us that there’s been a gap between our plan and what we meant to do.
So what happens next?
Well, there’s a choice.
Well first, you can have a conversation with your teammate about what exactly Data API means, like should it be for network things or should it be for data processing?
But then you can choose to rearrange your code or you can redraw your diagram.
Neither one is correct per se, but now you can decide.
So this technique is especially useful when your system has many parts.
The first step to solving any problem is defining the problem, and these diagrams help you define the problem graphically.
So if we reconsider our original question of how we minimize the gap between our plan and what is made, well, we’ve learned at least two things.
First, we shouldn’t fixate on whether the plan or the code is the source of truth.
Seeing the difference is the important part to getting things to line up again.
And, secondly, what I really like about this approach is that it plays to the strength of people and machines separately, right?
It addresses the…“this is out-of-date” problems faced by hand-drawn diagrams, and it solves the “show everything but it’s chaotic” problem faced by autogenerated programs.
So humans and computers are good at different things, and reflexion lets each party focus on their respective strength.
Lessons from the present
We’re gonna jump forward in time, a quarter-century, and examine what we can use right now in the React ecosystem.
Specifically, we’re interested in tools that help us keep our visual plan and our code up-to-date without the annotation steps that are required by reflexion because I promised you automatic.
One way to think about this is to use a language that looks just like your plan, because they can’t diverge if it’s just the same thing.
This thinking guided the design of some signal processing languages and visual programming.
We’ll look at two small examples. The first of these is Max/MSP, and it’s older than JavaScript.
You have blocks that can make audio signals and transform signals and you can mix and match them.
And seeing the plan laid out this way makes it very approachable compared to the equivalent text-based code, even though you can still make very advanced music and sounds.
Then, there’s Meemoo, which is open source and entirely runs in the browser, and a nice feature here is at each block, you get a preview of what happens to the data that flows through it.
What useful visualizations can be generated from code?
You could have a whole talk just about visual programming, but they won’t help us with our existing JavaScript applications, so I’m gonna change our question a little bit.
We’re gonna focus on the visualizations that we can make from our existing code.
It turns out there are many tools out there and they generate some form of directed graph, visualized as some form of node and link diagram.
If our edges have some sort of direction, we can say it’s a directed graph, and if there are no loops, we can say it’s an acyclic graph.
Drawing a parallel to real-world maps
Every time we draw an edge, we’re just saying some node uses another node B, but I’m leaving the question of what a node means a little bit open-ended because this is situation-specific.
We don’t have to choose one thing.
And we can lean on how we treat political abstraction in the same way.
We go from continents all the way down to neighborhoods with many steps in between.
And we don’t need to worry too much about actual landmass when making this ladder because even though states are lower on the ladder than countries, you can fit the United Kingdom into Texas.
So, similarly, let’s not worry about whether a component is fitting into one file or it’s in four places and they’re imported by a root.
It’s the conceptual boundary that matters more.
We’re gonna walk down this ladder from general and go down to specific.
Visualizing modules and dependencies
So, tools at every rung of this ladder are all helping with variations of the same question.
If I change a node, what other nodes could be affected?
Our first stop is where we treat each node as a file.
So, there’s ARKIT, short for architecture, not the augmented reality tool.
You’ll get a module dependency graph, it’s powered by Graphviz.
Every time one of your files uses the import keyword, a new edge appears.
As your graph gets busier, you may want to group the related nodes.
So, here, everything that’s an npm dependency has been put together.
If you don’t have time to make custom groups and your project folder structure is very well-organized (as everybody’s is), you can use dependency cruiser.
It will group your nodes based on what’s already put together into the same folder and you can even write import-based linting rules.
And lastly, we have Madge, the Module Dependency Generator.
And a nice idea here is that we’ve colored our nodes differently based on their graph properties, so our nodes, our root nodes and our leaves are shaded differently.
It has a very extensible API, so we’ll see it more in our demos later.
Functions as nodes
The next rung on our ladder treats functions as our nodes, and these graphs aren’t necessarily DAGs because we permit recursion and looping.
So we’ve seen performance graphs like this several times already today, but each one of these things is a dynamic callgraph.
This is a snapshot of how your functions actually ended up calling each other and our x-axis is encoding real time, but there are variations.
If it looks like this, it’s a flame graph, if it’s upside-down, it’s an icicle chart, if you wrap it around a circle, it becomes a sunburst, but they’re all variations on the same thing.
The difference between this and the previous one is that the x-axis encodes duration rather than absolute time.
So if a function has been called multiple times, they will be bunched up into a single block.
So you’ve traded some summary information in exchange for having less information about control flow.
But let’s say you want to see everything that could happen instead of what actually happened, and for that, we have static callgraphs.
This is an example from Python called Pyan, and it shows, theoretically, every possible root.
Now, I don’t know of any open source tools that make something like this for JavaScript, but if you’re interested, there is a data structure library which can generate these things for projects that don’t have webpack and aren’t using TypeScript yet.
So if you’d like to contribute to an open source project after this presentation, I recommend building on top of js-callgraph.
Variables as nodes in dataflow graphs
So, at the bottom of our ladder, let’s treat variables as our nodes and make some dataflow graphs.
One place that embraces this is the Observable notebook.
If you haven’t seen it before, it’s a reactive environment for writing JavaScript, combining some of the best ideas from Jupyter notebooks and Excel.
So just like React and Excel, you don’t have to tell specific cells to rerun if you update a dependency. The dependents just rerun automatically, but you can’t see those relationships just from scanning.
So Observable was made by Mike Bostock who also created the D3 data visualization library.
And being the person who likes visualization that he is, he made a tool that lets you debug any Observable notebook by generating the dataflow graph for any URL that you pass in.
So each oval is a cell in your notebook and the dependency lines tell you what cells will rerun if you change some other one.
And he actually uses screenshots from this tool to help people when they have questions in the forums about why things are happening.
So, here’s a dataflow diagram used with Reselect, which is a helpful library for memoizing and composing selectors.
If you’re not familiar with it, you can check it out in the Redux Starter Kit that Mark Erikson launched this week, which is the official set of libraries for making Redux development easier.
Each one of our nodes is a selector function, which is a function that takes in state as an argument and computes for some new derived state when an upstream dependency has changed. But keeping track of these dependencies can get out of hand very quickly.
So, this is wrapped up inside something that looks like the Redux dev tools, which is because it uses the same components, and we have three details we haven’t seen yet.
One of these is if you click on a node, we highlight the ancestors and the descendants.
You can also see the value of each selector at runtime and you can count how frequently each node has been used, which helps with performance profiling.
State machines
I had to mention one more approach to modeling computation even though it didn’t fit into my ladder because it enables some really spectacular visual tooling, and I’m talking about state machines.
It showed up on one slide yesterday very briefly, and, luckily, it didn’t talk about it anymore, because I get to tell you more about it, but the nice thing is it gives us automatic visual documentation.
So, state machines are a model for computation where a system is a list of states and a list of ways to move between it.
The important thing to take away here is that the diagram on the left of a traffic light is generated from the thing that looks like JSON on the right.
And you can take this picture to your PM or your designer and work together to find edge cases, even if they’re not familiar with JavaScript.
Once you have a machine, you can plug it into an ever-growing number of visualization tools.
So, Sketch.systems is one where you can write something that looks a lot like YAML and is very concise, and you can generate a clickable document on the left and it will actually generate XState code for you to plug into your React applications.
Looking to the future
There are actually many more tools that we didn’t have time for, but it’s time to look to the future.
For this last section, I explored some ways in which we can extend what we’ve seen.
I picked two ideas to experiment with: hairball untangling and wayfinding. These are challenges that I felt were important after reflecting about why I wasn’t using all of these things in production already.
So in this case, a hairball is not something that a cat coughs up. It’s a network graph that’s too tangled to read.
Drawing large, directed graphs that remain readable is really hard, and in bioinformatics, they’ve developed some new layouts to deal with this.
We’re gonna focus on just node-link diagrams today.
I really think we need to solve this because if you don’t manage a busy diagram, people don’t take it seriously.
If you have something that’s up-to-date but people can’t understand how it helps them, it’s almost like not having it at all.
So, I’m gonna reframe our question once again.
We don’t have to show everything
Rather than asking, “How do I see everything,” we’re gonna focus on, “Which parts do I actually need to see right now?”
So, we’re gonna borrow a page from the world of cartography.
I’m gonna show you a series of maps, four panels, gradually zooming into a city.
So, at the country level, you barely see the state outlines.
As you zoom in closer, you get state labels, you get highways, and, eventually, you get streets. This progressive reveal approach limits information density so people don’t get overwhelmed.
How do map makers do this?
Well, they have this thing called a “map taxonomy chart.”
It’s a very detailed document where you outline exactly what labels and what concepts you can see at every zoom level, and they do it for 16 layers.
We’re a little bit of a ways off from having our file imports, and our callgraphs, and our dataflow all in the same diagram, but for now, we can look to this idea when we reduce visual noise even at one layer.
A design pattern for managing information throughput is the Information-Seeking Mantra coined by Professor Ben Shneiderman in ‘96, and it has three parts: overview first, zoom and filter, and details on demand.
So, if a picture’s worth a thousand words, a prototype’s worth a thousand meetings.
And so I’ve made a demo to illustrate the ways in which these ideas can be combined with the Madge diagrams we saw earlier.
Making a demo Electron app
I made a demo Electron app to help me find my way in two open-source projects that I looked at recently, JSNetworkX, which is for network analysis, and nteract, which is a React-based Jupyter environment.
So the first idea was to take the file dependency diagram and make it clickable with two tweaks.
So first, since it’s an overview, I can take the labels away and everything gets equal visual weight.
And secondly, I added some colors to separate our island nodes and our regular nodes from the roots and leaves that are already shaded.
And this helps me find entry points because that’s the starting point of any new codebase code spelunking.
Pruning your codebase
The next idea was to let the user make the tree smaller by removing the nodes that aren’t relevant.
Since we’re clipping branches off a tree, I call it “pruning.”
There’s a heavy prune where you can remove entire folders and a light prune where you take one file away at a time, and this is our chief flexible defense against the hairball.
If you get carried away with your pruning and your tree looks like Charlie Brown’s, every click generates a regular expression under the hood.
So you can go back and edit the regular expression, or if you don’t like clicking and you just really like configuration files, you can jump straight to the menu option.
But once your graph is pruned, layout algorithms that wouldn’t have made any sense on the full graph can become practical.
So, here, we have a hierarchical layout that wouldn’t have worked on the main tree, so the modules that have the most children get floated up to the very top.
Details on demand: neighbors
For the last stage of the mantra, I focused on details about individual nodes.
So if you click on something in the main map, you get a minimap of that node and all its neighbors one hop away.
And I found this useful when the graph was very busy and I wasn’t sure where to prune and I needed to walk a local subnetwork. But this metadata doesn’t have to be graph-related.
Any tool that gives you more information about some file can help.
So, I used git log, where you can get the most recent commits for any file, you can group them by author, and you can sort them by the most recent modifier.
And this might help you with the finding directions step of getting unlost, so you know who to ask.
Wayfinding: figuring out where you are—and where you need to go
The second idea that I explored was motivated by having a map, but not being sure where to try to go when I was in it, and architects think about this all the time through a subfield called “wayfinding.”
Wayfinding covers the techniques that people use to figure out where they are and where they need to go.
It’s very popular to have projects like this for hospitals and airports, because these are complex spaces where people are frequently entering not very often, they’re under time pressure, and there are high safety and monetary costs to go into the wrong place.
This felt to me like being on call, so I took a closer look.
There are several core questions in every wayfinding project, but today we’ll just focus on selection, and this is based on how I felt after looking at my own demo.
I had this automatically-generated map, I knew how to make it less busy, but I wasn’t sure which parts to spend more time in.
So my idea to do something about this was to just add some data and maybe the usefulness could grow.
This is kind of the same reason that most of us look at geographic maps.
You don’t really look at them to remember how the states are shaped, you look at them to put data into spatial context. So whether maps help you decide whether you’re going to cancel a trip and flight coverage on a map is much nicer than looking at a list of coordinates.
We see this idea in other software tools.
So when I worked on data pipelines, one of the best things about Airflow was you can see the success or fail status of every part of your data pipeline right on the graph instead of getting a printout of log messages.
And in our callgraphs, you can count the number of invocations or you can count total runtime and display this on our nodes and links instead of putting it into a table.
Finding TODOs
So for this second demo, I was inspired by location-based search tools like Yelp and Google Maps, because on one side, you get your list of search results, and on the right, you see those results overlaid onto a map.
This demo works on non-JavaScript projects because I’m just leaning on your folder structure.
So on the left, I’ve put the JSNetworkX algorithms folder flowing from left to right, and let’s say you want to find all the TODOs in this part of the project.
So I connected it to ripgrep, which is the Rust-based speedy grep, and I’ve highlighted all the places where TODOs appear in the project.
I also keep in two ideas from the mantra in our first demo.
We keep the map interactions where you can pan and zoom through your folder tree, and then if you click on a node, you see the actual lines that showed up in each one of those files.
In the future, I’d like to combine the information from the folder graph with the dependency graph and then maybe weave in some messaging data so we can see if Conway’s law about how software shape mirrors organizational shape is actually holding.
Main takeaways
So while I work on making some documentation for these demos, let’s step through the main takeaways from our tour today.
We started by looking at the reflexion model.
This approach combined our human ability to make abstract concept maps with the thoroughness of machines, and we highlighted the places where our plan and what was made ended up being different, and this can help us debug and communicate with our teams more effectively.
Secondly, we walked down a ladder of tools for building some sort of dependency graph.
We worked at many layers of abstraction, we saw over nine different tools and they’ll all stay up-to-date at the press of a button if you point them at the right source file.
So, please try these out with your teams.
And, lastly, we looked at some tools or ideas to make those graphics a little bit more actionable by adding some interaction techniques guided by the Information-Seeking Mantra and the idea of layering data on top of our map so we have some more context.
I’d like to leave you with a quote about why I think working on tools that make code visually understandable is so important.
First, “We shape our tools and thereafter our tools shape us.”
If we spend less time trying to figure out how our code works, we can apply that energy to focusing on more important problems in the real world.
As Tom said yesterday, our better DX is in the service of delivering a better UX.
So, while I don’t have a crystal ball and I don’t know what that new efficient shape will look like, I’m optimistic that spending less time being lost will leave us happier and with more time to help other people find their way.
So I’m planning to open source the demos.
Please send me your screenshots and ideas for improvement, but thank you for listening and reach out to me if you have questions.
Thank you.