design and product
audiowaves2.png

Amazon Echo Bot

 
 

Echo News Bot

 

In early 2016, my team at Vox Media explored what it would take to create a bot for the Amazon Echo. The idea of designing a compelling experience for a voice-based product intrigued me. After all, an interface is merely a means of connecting a user to a system, be it screen or speech—how hard could it be?

This is the story of how my team and I designed and developed a MVP news bot for the Amazon Echo.

 

Challenge

Create a useful news experience that feels native to the screenless, voice-activated Amazon Echo. To accomplish this, I had to ensure that the app hierarchy and navigation system matched our users’ mental models, so they always felt in control of the experience.

Solution

The final model invites users to explore three new stories each day written for audio. The navigation system gives flexibility without overwhelming people with choices.  

Results

Within two months, we developed a working MVP that performed well with our target users. As Vox Media’s first official foray into conversational and voice interfaces, our bot set the groundwork for future work in this space. I shared our takeaways in a post on Vox Product’s blog, and spoke about the design process at various conferences.

Role

I was the lead designer on this project. My team included Chao Li (product manager), Yuri Victor (developer), and Allison McHenry (developer). We also worked with Emily Withrow and Joe Germuska of the Northwestern University Knight Lab, who brought us the initial idea and research for the project.


 

Process

User Research

Conducting user research on a new platform is challenging because there are few competitors and experiences to compare.

Fortunately, prior to taking on this project, our team had already conducted research and design work on bots, messaging platforms, and conversational interfaces. I sifted through that research and brought relevant takeaways to the Echo project. I supplemented these learnings with additional research on voice UI.

 
 
(Clockwise from top left) Freeform whiteboarding session on conversational UI,  prototype of a SB Nation bot on Facebook Messenger, competitive analysis of messaging platforms, divergent brainstorming sketches, freeform word association on messaging. 

(Clockwise from top left) Freeform whiteboarding session on conversational UI,  prototype of a SB Nation bot on Facebook Messenger, competitive analysis of messaging platforms, divergent brainstorming sketches, freeform word association on messaging. 

 
 

I also pushed to identify our editorial partner early in the process. Once we settled on a gadget bot for The Verge, I could then gain a better understanding of our target users and their needs.

Brainstorming and Concepting

From here, I had an idea about what might make a great news experience, and needed to marry that experience with the realities of our newsroom.

Given our product resources, editorial manpower, and scope, we decided to go with a prompt-and-response system as opposed to a completely open-ended AI chatbot.

Navigation and User Flow

Our bot centered around this idea of a “story”: a single piece of content, with a number of related pieces of content. Here’s how it works: when a user starts the experience, the Echo plays the first story (an audio clip). Users can then move onto the next story, or say one of three prompts to access a related clip.

I had to create a voice-based navigation system that made it easy for users to access all of the information.

Web designers can count on some inference on the part of their users, such as their location on the site based on the visual indicators they see. In contrast, voice UI designers have to be extremely explicit.

I considered a ton of approaches, from a completely set flow (where the user says a single prompt to move forward in the experience)...

 
 
 
 

… to a totally open-ended model (where the user can try a number of prompts to reach more content).

 
 
 
 

Neither of those models worked for us. The set flow felt too scripted, whereas the open-ended option paralyzed users with choice.

That reasoning ultimately led us to a limited choose-your-own-adventure experience, where users could say one of three prompts to dive deeper into a single story, or move onto the next story.

 
 
 
 

During testing, we found that this flow gave just the right amount of authority to the user, giving them choice without overwhelming them with options. 

With the flow set, I could focus on the details, like help commands, prompts, stress cases, and so forth.

 
 
 
 

Design, Prototype, Test

At this point, we were ready to move into code. Yuri built out a baseline bot that I could get in front of users. We then worked together to iterate in code. As Yuri refined the bot, I conducted more than a dozen usability testing sessions with Echo users and non-Echo users to identify problems with the design. Some of the questions I wanted to answer were specific (e.g. “Can the user access all of the content?”) while others were more intangible (e.g. “Does the copy work for the first-time use and 100th-time use?”)

These key takeaways emerged:

1) Be wary of what you ask users to remember

During early testing sessions, users regularly forgot prompts, merged two into one, or recalled phrases that were never actually stated. Whenever this happened, users immediately assumed the bot was broken—and for all intents and purposes, they were right.

To address this, we made sure that our prompts did not change from story to story—I didn’t want users to have to relearn the commands every time they launched the bot. We also played the prompts after the first story, so users had little time between hearing the instructions and having to act on them. Finally, we provided auditory cues. If users hesitated or said something the bot didn’t recognize, they were reminded of the three prompts.

2) Make room for natural speech

Both Echo users and non-Echo users addressed the Echo differently. Some users stated the prompts as single word commands, others asked them as questions, and others still phrased them as directives (e.g. “Tell me about the…”).

To account for those behaviors, we broadened the inputs of what the bot would accept. Our test bot anticipated and accepted more than a dozen alternate inputs for each of the three prompts. People are not robots, and we shouldn’t expect them to behave as such.

 
 
 
 

3) Design for uncertainty

While a consideration for every product, designing for multiple contexts is particularly important for the Echo, a product intended to sit in the home, that listens and responds to spoken prompts.

I pressed our team early on to consider our users’ contexts when they would be interacting with the bot and brainstorm ways that it might be invasive. What if they trigger the bot by accident? What if they are in a rush and need a response immediately?  

I devoted an entire portion of our usability testing to skipping content and ending the experience. I tried to mimic intense situations by describing scenarios for users and asking them to react. Doing so revealed a number of phrases that a user might rely on to stop the bot (e.g. “leave,” “quit,” and “enough”) in addition to the ones that Amazon automatically provides. Although Amazon supports a final message after a stop command, we decided not to include one. If a user said stop, the bot stopped. The last thing I wanted was for users to feel trapped in the experience.

Designing the Editorial Experience

We had two sets of users for this product: the end-users who would be interacting with our bot, as well as the reporters and editors who would be inputting the content.

I devoted as much effort to the admin interface as the bot experience. At minimum, we needed to ensure that editors understood how to input content into the bot and clearly identify the different buckets of content.

Over the two month period, I designed, tested, and iterated on the interface. I started with rough sketches and ended with high-fidelity prototypes.

The final prototype reflects some of the most significant insights that emerged from testing: First, the interface shows a quick snapshot of the day’s story lineup. The language, features, and flow of adding a story supports our users’ mental models of “publishing” to a bot. The interface caters to the power user workflow by supporting quick, WYSIWYG editing. Finally, the interface helps users avoid the worst case scenario of publishing a story accidentally or unknowingly by enforcing smart constraints. We eventually decided to go with a WordPress interface for the sake of the deadline, but I’ll include the admin wire below for reference. 

 
 
 
 

Results

Our team finished a MVP bot by May 2016. The news bot succeeded in delivering an engaging, intuitive experience to users.

We didn’t launch the bot publicly, so I can’t share a link to the actual product, but our insights are recapped in this post on the Vox Product blog.

A number of designers have touted that the best interface is no interface. Whether you subscribe to that mantra or not, this project gave me the opportunity to entertain that reality. Voice design is still nascent, and I’m happy that our team could help set the stage for how Vox Media’s content could exist in a voice interface.

Update (June 26, 2017): Since the time of this case study, I’ve spoken about this project and voice design in general at two different conferences: the Society for News Design annual conference (Designing interfaces for voice-based products) and the International Symposium on Online Journalism annual conference (Conversational journalism: How bots and artificial intelligence can get us there).