What is Keystroke Level Modeling? (Update) Artwork

Inside UXR

Explore the practice of user experience research with Drew and Joe, one question at a time.

Send your questions and feedback to insideUXR@gmail.com

All Episodes

Inside UXR

What is Keystroke Level Modeling? (Update)

December 23, 2024 • Drew Freeman and Joe Marcantano

We’re revisiting one of our favorite episodes: Keystroke Level Modeling (KLM). In this rebroadcast, Drew and Joe break down how KLM can predict the time it takes for an expert user to complete routine tasks in software. From understanding key components like keystrokes, mouse movements, and mental preparation time, to real-world use cases and practical tips, this episode offers a deep dive into a niche but powerful UX research tool. If you’ve ever wondered how to measure workflow efficiency before development even begins, this one’s for you!

Send us a text

Support the show

Send your questions to InsideUXR@gmail.com

Visit us on LinkedIn, or our website, at www.insideUXR.com

Credits:
Art by Kamran Hanif
Theme music by Nearbysound
Voiceover by Anna V

What is Keystroke Level Modeling? (Update)

Joe Marcantano: Hey, folks. Joe here. Like Drew mentioned last week, we're going to do some rebroadcasts of some of our favorite episodes over the holidays. Here we're gonnna take some time to enjoy the, holidays and some time with our families. We will be back on January 6th with all new episodes. So in the meantime, enjoy these. These are some of our favorites. This next one is the keystroke level modeling episode. It's one of my favorites, and I hope you enjoy it as well. I'll just, also reiterate the call that when we come back, we want to talk about your questions. So send us in your questions inside uxr@gmail.com and otherwise enjoy the episode.

Joe Marcantano: Drew, welcome to today's episode.

Drew Freeman: Hey, Joe, how you doing?

Joe Marcantano: I am doing well. I'm excited about this one. I had to do zero prep for this episode.

Drew Freeman: I mean, I also had to do very little prep, but that's because we're nerding out on a topic that is very niche and very specific to me.

Joe Marcantano: Yeah, it's great because I get to play myself someone who does not know much about this particular topic, and I get to just ask you questions the whole time.

Drew Freeman: Everybody likes nerding out a little bit occasionally.

Joe Marcantano: Yeah. So why don't we jump right into the question?

Drew Freeman: Yeah, let's do it.

Joe Marcantano: So, Drew, what is keystroke level modeling?

Drew Freeman: So, keystroke level modeling is a tool that allows someone to predict or to estimate the time it will take for an experienced user to complete a routine task in a piece of software. There's a lot of pieces to that, and I'll explain them one by one. But let me give you the history just a little bit first.

Joe Marcantano: Yeah.

Drew Freeman: So the keystroke level model goes all the way back to, like, the 70s, but was publicly introduced and published, in paper form in the early 80s, by three computer scientists, Card, Moran and Newell. And broadly, it stayed relevant and it stayed the same. Since then, it's been adapted upon, it's been updated for things like smartphones, touch screens, those kinds of interactions. But the bones of it are still pretty much the same as they were when it was introduced and created in the 80s.

Joe Marcantano: So I have so many questions. I want to start with. You said this is a tool now when you say that, do you mean that this is a research technique or is this like a piece of evaluative software?

Drew Freeman: It is not a piece of evalative with software. It's a tool in the way that a hammer is a tool.

Joe Marcantano: Okay.

Drew Freeman: It is a method to do things, just like a hammer is a method for doing things.

Joe Marcantano: And then it sounds like, correct me if I'm wrong here, it sounds like this is more of a predictive thing than it is a evaluative thing. So this isn't like measuring people who are already, who are already proficient at the task. This is if someone were proficient, how quickly we would we expect them to be able to do it. Is that right?

Drew Freeman: You've hitte upon it exactly that you've picked up on the like, major things that I wanted to zero in on in that original explanation that I gave. You've zeroed in on, them immediately. So, yeah. I specifically do not use the word measure how long it will take because that's not what we're doing. If you want to measure how long it takes an experienced user to complete a task, you use a stopwatch. But that only tells you so much. Like that doesn't give you all of the information that you really need. And it also doesn't allow you to do things like, well, what if we added this step into the workflow? How would that impact the efficiency and how long it might take someone to do this thing? A keystroke level model analysis or keystroke level modeling that will, that's the way that you can try to do that, like hypothetical or proscriptive prospective kind of analysis.

Joe Marcantano: Okay. So it's a way for us to kind of predict how might our user journeys or our flows change given someone who is already proficient with the software and the task.

00:05:00

Drew Freeman: Yes, but there are other use cases that than that too. So when I was training this in my last job, because this, this was my thing, this was my niche at my last job. I always like to say there's a couple of different buckets of use cases. So one bucket is that proscriptive. We've got a change that we think will be beneficial to the workflow. Let's test it and let's find out. That's. I love anyone who's thinking about using that workflow, but honestly, that was the one where it was like, I'm least expecting anyone to do this. That's the most advanced, the most nuanced use case, the most common use case is we've got a pre, Pre Change workflow and a post change workflow that we've already made the changes and we wanna see did we have an impact on the efficiency. It's like we've already done the development, we've already made the configuration change, whatever. We just want to compare a pre treatment and a post treatment scenario. So then the middle use case is more along the lines of we've got users complaining about how long it takes to do this workflow. Let's use keystroke level modeling to kind of baseline everything and see if we can zoom in and identify individual subtasks that are not very efficient and then we can target those for workflow improvements.

Joe Marcantano: So then talk to me about how or why someone might use this method versus you know, maybe doing an unmoderated test and just getting 50 people to, to 50 people to test each journey before and after and then just comparing the times. Why might this be better?

Drew Freeman: So this could be better. I will say could because it depends a little bit on how skilled and familiar you are with using this tool to do analysis. But for me, as someone who's very experienced in this, I can run a pretty basic analysis in a matter of a couple of hours. Like we're talking hours, not days.

Joe Marcantano: Oay.

Drew Freeman: Whereas if I wanted to do that kind of analysis with you know, 50 participants in an unmoderated sense, that's a lot of, that's a lot of people hours.

Joe Marcantano: Yeah, that's a lot of, A lot of time, both for the participants, but then for you to, you know, takes a day to program it and then you've got to soft launch it and then you've got to analyze all your results.

Drew Freeman: Yeah, I wasn't even thinking about the amount of people hours that you take up with the, those 50 participants. I was just thinking about how long is it gonna take me to sit with a stopwatch and having to watch all 50 of those recordings basically and start and stop and keep notes and all that sort of stuff.

Joe Marcantano: Yeah, I mean even if you're using a tool that does that automatically, you still have to watch them. You still have to see if people are guessing or if they're you unsure and pause. Like you still have to watch all of those journeys. So it still takes a lot of time.

Drew Freeman: And that's a, you just hit on another nuance of this keystroke level modeling kind of analysis framework, which is that this is not a great framework to use or it is not a framework that is well set up to use live in production, because live in Production. There is so much that changes with the context of the individual situation. There is, you know, the person might make a wrong, a wrong move down a, false path and then have to backtrack. So with keystroke level modeling, what you're honestly, usually the task 0 for keystroke level modeling is defining your workflow and scripting out. These are the steps of the workflow, which also actually makes it a really interesting use case for what are we training? And are we training the most efficient workflow?

Joe Marcantano: So this is about trying to figure out experimentally or analytically which flow is the best and then teaching, users that flow or seeing if users can intuitively find it on their own. That's a separate thing. That's kind of step two, if you will.

Drew Freeman: Yes, the tool is all about identifying the best caveat. Best in this case means most efficient workflow. It might not be the easiest. And you might decide that while workflow A is 20% more efficient, it is 50% more difficult for people to do correctly than workflow B. So we're going to go with workflow B because it means we get fewer failures, even if it does take longer.

Joe Marcantano: Okay, interesting.

Drew Freeman: So a keystroke level model analysis is. It requires a lot of understanding and it is, you know, it's like a hammer. A hammer is good at hammering things. A hammer is not good at anything else. A

00:10:00

Drew Freeman: hammer is the perfect tool for a subset of tasks, but it's not a big subset.

Joe Marcantano: Okay, so let's talk a little bit about how one goes about doing this analysis. What is step one? How does somebody approach this and say, all right, let's get going.

Drew Freeman: So I'm going to take a step back and describe each of the constituent components that make up the model. So there are six different elements. They're also called operators because scientists like to have their own jargon. But there's six different things. There's the letter K, and K is a keystroke or a button press. So simply, it counts the number of times that keyboard buttons are pressed. It counts the number of mouse clicks, and it aggregates them all together. It's important to note that it counts keystrokees rather than characters. So typing a capital A is two different K actions, one for pressing shift and then another for pressing A. Sometimes you will get people who will. So generally, K includes button clicks as well. Some people will like to use, button clicks, mouse button clicks as B. They'll categorize it under a different letter. I don't bother with that. I just Count K as this is all of the times I'm pressing a button on the keyboard and I'm pressing a mouse button.

Joe Marcantano: That's K. Does that include the, the scroll wheel?

Drew Freeman: Scrolling. Scrolling is a different action. It gets counted in a different element. Okay, okay. So then the next element is the letter P, and this is pointing with a mouse. So this is moving your mouse from one part of the screen to another. It is a separate action than clicking the mouse. But basically every single mouse click is always going to have a P pointing the mouse action associated with it.

Joe Marcantano: That makes sense.

Drew Freeman: Then the next one is H for homing the hands on the keyboard or another device like a mouse. and in this context, homing can be thought of as positioning. So that can be moving from your keyboard to your mouse. It can be more finely positioning your hands on the keyboard. like moving from the letter keys to a number pad. Anything where you are repositioning your hands on the same device or from one device to another. That's a, that's an H action.

Joe Marcantano: Okay, so not a me moving my pointer finger from the H to the Y, but more correct, like my hand entirely.

Drew Freeman: It's like a reset.

Joe Marcantano: Okay.

Drew Freeman: Yeah. One of the things that this modeling really drives home for people is that those kinds of transitions or resets from one device to another are incredibly costly. So if you can devise a workflow that can be done entirely with the keyboard and you don't have to use the mouse at all, that is a really, really big efficiency accelerator because resetting your hand from the keyboard to the mouse is really slow.

Joe Marcantano: I'm, thinking about a very specific program that I was using a lot this week that really desperately needs this analysis, I think.

Drew Freeman: Yeah, I mean, there'they're all over the place. Okay, so then the next one is the letter D for manually drawing. this is used when you're drawing a straight line with a mouse. It frankly doesn't get used very often. Like, there aren't that many scenarios where you're drawing a line with a mouse. The next one is M. M for mental preparation. And this is the time that's needed to think or plan out an action. It's also used for decision making time. And then the last element, the sixth element is R for system response time. And this element is only used when the person using the software has to wait for the system to do something. So this might be processing time, this might be a loading time, that sort of thing. What that does mean is that R is unique to the individual Software and has to be measured for that given piece of software.

Joe Marcantano: I would imagine it's unique for the software and the computer that the user is using.

Drew Freeman: Yes, yes, that's fair. But typically you're going to, typically you're going to try to normalize all those variables. So re you're not going to try to evaluate for someone's supercomputer versus my potato of a computer. You're just going toa pick something that's kind of industry or consumer average and you're just going to use that and pretend that everyone's using that same computer

00:15:00

Drew Freeman: all the time. Some people have a hard time kind of understanding this response time. So some experts do suggest calling it W or waiting time, which can more clearly like describe what is happening. So some people call it W for waiting time. They're interchangeable in terms of what they are actually measuring.

Joe Marcantano: Hey folks, Joe here wanted to take a quick second to ask for your help. We want to hear from you, our, listeners. We'd like to know what you want us to talk about. What questions do you have in order to make this show be the most beneficial for you? We want to talk about your questions. Send those in to inside uxrmail.com. you can reign anonymous if you'd like and you'll get to hear us talk about your questions. Thanks all. Okay, so let's say we've measured all these things, we've got our values. What are we doing with these numbers?

Drew Freeman: So really all you're doing is scripting out the tasks or scripting out the steps that are needed to complete a task. This unlike usability testing, where you don't want to define your task as system dependent, you want to define your task as real world steps and real world actions. With a klm, which is what I shorten, keystroke level model too. that's the abbreviation. When you perform a KLM analysis, you need to make sure that every single step is software specific because you need to be able to know how many button presses, how many keystrokes, yada yada. So basically all you do is script out all of the steps, count how many of each of those individual elements there are, and then aggregate those all together and multiply each of the elements by the time that is associated with each of those elements. So for example, K, which, remember is the keystrokes the keyboard button presses and the mouse button presses. For an average typist, which is defined as 40 words per minute, is 0.28 seconds. So for someone who types at 40 words per minute. Pressing a button takes just under a third of a second P, which again is pointing the mouse. So moving your mouse from one part of the screen to the other has been defined as 1.1 seconds. So on average that's how long it takes to move your mouse from one part of a UI to another. And then M M, which is thinking time has been defined as 1.35 seconds. And again, this all comes from the, you know, the empirical research that those scientists back in the 80s did. and it does, you know, it can change from field to field and from time to time. But you know, as we go through eroras and we go through, people have greater understanding. But also software gets more complicated. So those are the numbers as I am aware of them right now.

Joe Marcantano: So you take all these actions and times, you multiply the actions by the times, then I presume add those numbers together and it gives you a score.

Drew Freeman: Yeah. So I'll give you a incredibly simplified, ah, example. So our example task is to enter a street address into a text field. So you break it down into its component parts. So step number one in our example task is to initiate the entry. Or like we're going to start thinking about putting our address into the field. That is one. That's an M M or a thinking time. yeah, thinking time. Action. Step two, finding the correct text field to put our address into. That's another M action. Step three, we need to move our mouse or point to the correct field. That's a P action. Step four, we need to press the mouse button to get our focus in there. That's A, B. Or remember, B can be rolled up into K. So that's one of those actions. Then you also need to release that mouse button. So that's another one. Then you need to move your hands from the mouse to the keyboard. That's an H for homing. And then we need to type102.3 Main Street. In this case, typing 123 Main street is 14 button presses. Because you have spaces, you have capitalization. So that's 14k actions. So at that point we've got 2m's, 1p, 2b's that are the button presses, 1h and 14ks s. So our total time is 2m m plus 1p plus 2b plus 1h plus 14k. And it's a math

00:20:00

Drew Freeman: problem.

Joe Marcantano: So Drew, I'm presuming that just like a chef that goes onto a talk show, you've got the completed one in the oven that you're going toa pull out and you're going to tell us the answer here.

Drew Freeman: Yeah. So let me just pull that out of the oven and the total time that for this very small task is 8.68. And I don't generally put a unit on that. You could, but it doesn't really mean anything. Like this is, this is not actual, this is not an actual measurement time of 8.68 seconds. This is a theoretical calculated kind of prediction time. So for me I just don't put a unit on it. If you want to just say like 8.68 efficiency units, that's fine. I just think of it as 8.68.

Joe Marcantano: And this sounds like it's just like golf, the lower score is better. So if you're comparing two paths that you've kind of drawn up, the more efficient path would be the one with a lower score.

Drew Freeman: 100%. Exactly.

Joe Marcantano: And like you talked about earlier, just because something's most efficient might not mean it's the easiest for the user. So it sounds like this is a modeling that needs to be done in conjunction with some other form of research to test to ensure that the path you've created is actually intuitive and useful to the user.

Drew Freeman: That's always literally the next paragraph in any written blog or article I write about this. Yeah, you should not use keystroke level modeling by itself. You should always use it in conjunction with another usability method to if what you're trying to do is decide which is the most usable or quote unquote best workflow. All the keystroke level modeling can do is measure ideal efficiency.

Joe Marcantano: And this doesn't I don't mean to demean this, but this doesn't sound terribly difficult. This sounds like something a junior researcher with a calculator or a smartphone could do. Percent to sit down and map out all of the actions.

Drew Freeman: Yeah, absolutely. The in practice, the most difficult parts of doing this analysis are there's two things. One, figuring out what that system or software dependent our response time is. You're generally going going to have to go to a developer, an engineer, or specifically if your group has a performance engineer, that's the kind of person that you want to go to. I can almost guarantee that they will have metrics that they already use for system response time for the researcher. The actual hardest part of this is getting people to agree on what the scripted workflow is like. When you get a group of experts together and say what is the way that we want to have people accomplish X task in Our system, that is where the most disagreement and the most challenge will be.

Joe Marcantano: I totally believe that because in our example it was just click on the text field and type in the address. But if we were to take even a slightly more complicated example, say copy and paste a paragraph from one document to the other, I could think of four or five paths that we could do that. And they're all, you know, the correct path.

Drew Freeman: Right? So I mean, and let's even just take the address example. That's just one text field in a form that probably has dozens of fields that need to be entered in. And they're not all gonna be text fields. Some of them might be drop downs, some of them might be radio buttons, you know, so you think about even like a one page form and it starts to just get really long. And there's lots of different ways that you could go through that. Are we going toa say that people are using, are doing it with keyboard only and they're tabbing through everything, or are wenna say that they're using the mouse to go from one to the other? But that agreement and that alignment on this is the ideal workflow. This is the script that we want to follow is really important because remember, we are not testing the real world efficiency. We are testing the ideal kind of laboratory efficiency of a workflow. And yeah, that sounds like a lot of ifs and buts and caveats, but this is still an important measurement. Like this is still a very powerful tool. You just have to be very aware that it has its place. And its place is not a big place. It's a pretty narrow set of use cases, but it's a powerful set of use cases.

Joe Marcantano: When done well, I could totally see the value of maybe it helps give you a starting point on what you think your user journeys are going to be. Or just a baseline test of, hey, does this seem like

00:25:00

Joe Marcantano: something that's going to frustrate our users? What does this score gonna end up being? This absolutely seems like something that even though we're not involving the specific user at this point, definitely gives us a starting point.

Drew Freeman: Well, and so one of the use cases that I actually loved, in a previous job is seeing teams, so specifically software development, specifically seeing these teams say these are our key highest m volume, highest value workflows that users do in our software. We are going to say that we, every time we have a new release that comes out, every time we have a new patch, every time, whatever, we need to make sure that we are being efficiency neutral. At worst, we either need to be the same or better. So if we are introducing a new thing that has to go into this high volume, high, high throughput workflow, we have to find efficiency somewhere else. I kind of love that, as like, this gives us a way to measure that and to know if we are getting closer to or farther away from our ideal.

Joe Marcantano: So Drew, tell me about, you know, where's the right place to use this? Where's the right place to implement it?

Drew Freeman: Yeah, so I've talked a little bit about the buckets that I kind of see it. But some examples where I have seen teams that I've worked with or clients that I've worked with use this really well in the real world. I've got three of them. One, identifying those parts of tasks or parts of workflows that are particularly painful when it comes to efficiency. So basically finding those inefficient sub tasks within a larger workflow. Generally I've seen this done so that you can then help point designers, engineers, people who are doing configuration to those parts of the workflow where they can, where they can have the biggest impact. The second place, the second example is comparing various options, whether that be configuration options, whether that be, you know, different patches or releases of a software or possible solutions. Maybe there's two different solutions to a problem and we're pretty confident that both of them will solve that problem in about equally as good a way. You can use the keystroke level model to kind of be your tiebreaker. Maybe one of them is more efficient than the other. There you go. You should use that solution. You can also use this, like I said, as a pre and post measurement, and you can use it to model those non existent theoretical workflow changes before you actually spend time coding and developing it. And then the third bucket is not really specific to actually making the software faster. This is more of a change management and communication use for the tool, which I'm actually really excited about and really interested in. I think it's a creative use, but I've seen people use this as part of their change management process to really help drive that enthusiasm for a change or for an upgrade and can help ease those concerns from users who might otherwise be change resistant. Because if you can tell a user, hey, this task that you're really frustrated with and you have to do all day long, we've made it 20% more efficient. You've probably just converted that user and now they're excited about an upgrade rather than dreading an upgrade.

Joe Marcantano: This is a really cool model and a really cool tool. And I'm excited to add it, as a new tool into my chest of UX research tools.

Drew Freeman: And I will. I could talk about this for hours and hours, but I'm gonna have to cut myself off somewhere.

Joe Marcantano: Yep't. we don't want to lose the whole audience. So, Drew, this was really cool both, for me, for a chance to learn something and not have to do any prep work for the episode. But also, it was really neat to be able to present a research method to folks that might be a little less utilized than it probably sounds like it should be.

Drew Freeman: Yeah. And's like I said, this is one of those areas where I am an expert and not very many people are, so it's always fun to talk about.

Joe Marcantano: Awesome. Well, I want to thank everybody for joining us today. Give us a like a subscribe wherever you listen to your podcast. If you have a question that you want to hear us talk about and blab on about and make an episode about, send it over to inside uxrmail.com. you can find us on the big social platforms inside uxr. And if you'd like to support the show, you can do that through the link in the show notes with that. I'm Joe Marantano.

Drew Freeman: I'm Drew Freeman.

Joe Marcantano: And we'll see you next time.

00:30:00

Joe Marcantano: A chef on a cooking show. And you're gonna pull out the completed one from the oven. What would our score be on this?

Drew Freeman: You would assume that, but no. No, I did not.

Joe Marcantano: Well, now you ruined my joke.

00:30:18

People on this episode

Drew Freeman

Host

Joe Marcantano

Host