Notes from Jeff Johnson’s presentation at BayCHI

Notes from Jeff Johnson presentation at PARC. He’s from the company UI Wizards Inc

He points out that Shneiderman’s UI Design Guidelines is the classic book, but that the guidelines in it are very “open to interpretation” and it requires a lot of background (like watching users use applications). He thinks there’s also a lot of cognitive and perception psychology underlying the rules.

Neilsen & Molich’s UI Design Guidelines is another classic text
* error prevention
* recognition rather than recall
* flexibility & efficiency of use
* consistency & standards
again, these are all subject to interpretation.

Stone (et al, 2005) UI Design Guidelines is another set of rules, but the same problem applies.

It used to be that people who came into UI design had a background in cog psych, but not anymore. His goal is to give people a basic background.

He says UI guidelines are like laws — they’re interpreted by judges who have extensive background in understanding the rules, and how to apply them.

He put up a list of facts about human perception, and said (quoting verbatim here) “Bla bla bla.”

==Perception==
We perceive what we expect, and this is biased by what we’ve seen in the past, the context of the thing we’re seeing, and our plans for the future (what we plan to do)

He showed R.C. James’s dalmation picture, and explains how the way he introduces it affects the way people perceive it.

Another example is 4 screens, with a “back” button on the left, then a “next” button on the right, but in the 4th screen, they’ve been reversed. He says people will get used to clicking the button on the right, and then when they are switched, they’ll click it and wonder why they went backwards — they’re not reading anymore, they’re being biased on previous experience.

“Our perception & attention focuses almost totally on our goals. We tend not to notice things unrelated to goal”

He also did an example of asking people to look at a web page for a task, and the audience did, but focused on the task he gave, and no one noticed the text in the corner that said “If you notice this, you get $100”

He says Gestalt principles are no longer considered useful for understanding how the human vision system works, but they still are relevant for understanding how to design things. Goes into examples (classic examples of the Closure principle; Symmetry — uses example of two overlapping squares, and points out that we’re all seeing two squares overlapping, not into two ‘L’ Tetris-looking pieces being next to each other, or some weird 8-shaped outline with a square in the middle)

He shows the cover of Thagard’s “Coherence in Thought & Action”, which looks like a cube, but there’s no cube actually printed onto the page — our brain puts it there.

The point he’s making is that humans seek structure, and gives an example of a sentence versus tabular data for communicating. Also shows the exact same information in paragraph form, but broken up with headings, and the heading-ized version is significantly easier to look at.

He makes the point that despite everyone in the room presumably being a good reader, that reading is not a natural human activity (as opposed to speaking, which is). He bases this on the fact that the human brain has physical structures for learning spoken language — we’re pre-wired for it. But, there’s no equivalent for reading text — it’s a learned activity the same way that you learn to write, ride a bike, or kung fu kick. It’s very much a practice/learned activity. He points out that people learn to read better when they’re *taught*, so if your parents read books to you, you will be a better reader.

He points out that when a reader sees a common word, you see the pattern, and your brain goes to match it to meaning. If you see an uncommon word, you actually parse the word apart into morphemes (?), and build the word in conscious thought. He uses examples of legal words, but also uses the example of extreme fonts/all caps, or poor contrast, or centered text — it’s not the way we’re used to seeing text presented, so it takes more effort to process. He says it’s literally hard work for people to read if they don’t read much.

He says in modern society, vision doesn’t require use of our rods — in fact, modern society is so well-lit that it’s hugely overstimulating the rod cells. It’s basically all about the cones. Rods are exclusively for night vision. We haven’t really used our rods for about 300 years now.

He points out that we’re optimized to see contrasts — edges & changes, but not absolute levels. Uses the example of the checkerboard with a shadow. The image is originally from Edward H Adelson. As further example, he claims you could darken the brightly lit room that we’re in by 10% and no one would notice because all the relative colors and shades are still the same relative to one another.

We have trouble discriminating
* pale colors
* small color patches
* separated patches
Uses an example of a web page with visited and unvisited links in similar shades of blue. No one figured out which color was which. He adds that from certain angles on an LCD monitor, there is visually no difference. You can’t control the monitor that your user uses — so don’t try.

He points out that color blindness doesn’t mean you can’t see colors — it means there’s certain combinations of colors that the person can’t tell apart.

The important implication for UI design — don’t use color exclusively to indicate things, use it redundantly with other cues — he gives the example of colored text, and then improves it by using colored text with bolded box & text.

His next point is that our peripheral vision is poor. He shows an example of a login screen that has an error, but the error is in the top left, but the bulk of the information is in the very middle of the screen.

He then puts up a graph that he stole from Don (Norman?)’s book in 1973, it shows the visual field in degrees. In the dead center of your vision, you have 150k cones per square millimeter. Daaamn. This is called your fovea (sp?). If you hold your arm out with your thumb up, your thumbnail is basically the size of your fovea. Immediately outside this range, your eye has cones at 10k/sq mm. That’s a big dropoff.

Also, in the fovea, the cells are directly linked to the brain. In the periphery, the cones are linked in groups of 2-3. So, our brain effectively has to compress the information from the periphery, which means loss, so we don’t see as well. (From the graph, it looks like about 13-19-degrees has a full blind spot where you cannot see anything — but your brain photoshops in the gap). Additionally, half of the visual cortex is hooked up to the fovea — half of the visual cortex deals with 1% of your vision.

You can see 300 dpi on your thumbnail at arm’s length. At the edge of your visual field — you don’t measure pixels per *inch*, you measure in pixels per *yard*. At the edge, you only have 5 pixels per yard.

He says our visual field is also optimized for up-down viewing, not left-right. Our visual field is actually an oval, not a circle.

The whole point of this is shown in a regular login form, where there’s a red title, with a red error message above the field name. Honestly, it looks like a pretty reasonable login screen — not playing any forced games here. He says when people log in, they are looking at the login button, and then shows the same form blurred according to our visual acuity, and sure enough, the error message is basically not visible, and the blur where it used to be also blends a bit with the title. It’s really hard to notice.

He says the heavy artillery for vision is a popup, and audio beep, or a flash/wiggle/motion (but *not* continuous). If you *have* to use these, use these sparingly. A beep does cause people’s eyes to start moving, but is inappropriate for both loud and quiet environments. Also, peripheral vision is great for picking up motion, so that’s a good way to grab attention. A small animated error icon might be a good idea.

He also points out that red == error is not good on Stanford’s homepage — the whole page is red so it gets lost. Or, in China, red means good, so they won’t realize that it’s an *error*. Animated text/blinking is useful, but only blinking once or twice, otherwise it will get ignored as an ad.

He also points out in OSX that when you select a menu item, it actually blinks momentarily, which is a very useful visual cue. He re-emphasizes that blinking is meant to be used very sparingly. (Amusingly, he gets called out on the OSX blink by a member of the audience, who Jeff points out to be a world-premier expert on the UIs of operating systems. Lol.)

==Memory==
Short term memory is not a separate store than long-term. He points out STM has a limited capacity. The old rule was 7 +/- 2, but this is inaccurate (thanks George Miller). He says it’s really 3-5 items (or, 5 +/- 1). New items can push out the old, and they’re easy to forget. In this case, an item == a *feature*. Goals, words, numbers, objects. So, a face isn’t 2 eyes, one nose, one mouth — it counts as a single whole.

As an example, he puts 3 8 4 7 5 3 9 up on the board, covers them, and then asks us to repeat our phone numbers backwards. No one can remember the numbers after this. He repeats it, but with 3 1 4 1 5 9, and 1 3 5 7 9 11 13. The first example is stored as 7 distinct features/items. The second and third are single features/items.

Another example is a search form from Slate.com that doesn’t redisplay the search term. A programmer would think people would remember what they typed on the previous page, *but they don’t*. As soon as people start trying to parse the features of the new page, they flush STM from the previous page, and they literally will forget what they’ve searched for by the time they have had a moment to examine the search results.

Our brains are designed for pattern recognition — but not recall. He specifically cites the example of pilot checklists (lol @ Atul Gawande reference).

He makes the point that users actually ignore the interfaces that the designers in the room try so hard to design. He also points out Krug’s “Don’t Make Me Think” is highly applicable.

People are focused on completing goals, and prefer familiar paths over exploration. He quotes a user (verbatim) who said “I’m in a hurry, so I’ll do it the long way”.

He also points out that people are super literal when following goals or following instructions. If given an instruction, people will look at the UI for the exact words given in the instructions — people don’t naturally consider synonyms or alternate text — the first thing they do is go for exact pattern matching. If they don’t find it, conscious thought kicks in and *then* they’ll start with thinking and synonyms.

He recommends checking youtube for the “door study”. Derren Brown also does a person swap. Check that out too — he cranks the concept to 11. This is an example of how quickly and fully STM gets flushed.

He points out that people often forget the final step of a sequence of actions if the last step isn’t crucial to the task (ie: if you xerox a bunch of pages, when the last page shoots out, you’ll pick them up and walk away because the task of copying is done, but you’ll leave the original copy of the last page still on the copier). Systems should remind people of loose ends.

As an example of how our brain recalls/searches, he puts up a picture of Bill Gates (everyone recognizes, duh). Then he puts up a picture of his friend’s father. He points out that it takes the same amount of time for us to recognize that we’ve never seen this face before. It’s basically the brain firing a sequence of neurons, and then immediately replying “nope, never fired those neurons before”. (Tim note: it’s almost like our brain is similar to a Bloom Filter).

“See & choose” is easier to learn than “remember and type”. So this is why a UI is easier than command line. (Don Norman (?) points out this is why gestural interfaces are not very good — they require too much learning).

He points out that automated actions don’t involve STM at all. He compares it to “compiled mode, parallel processing”. Our brains have done this forever — generalizing and performing well-learned routines.

He gives examples of reciting the alphabet from A-M, then M-A. Write your name with dominant hand versus non-dominant. Count backwards from 10-1, versus counting backwards from 21 in 3s.

Problem solving is evolutionarily new. Only a few mammals & birds can do it. Humans are best. The cerebral cortex is where conscious reasoning happens. It’s more like a dynamic/scripted language, and ends up not multitasking, uses STM, and runs slow. IE: diagnosing computer problems — it’s hard and requires systematic testing of possibilities (imagine helping someone over the phone who says “I can’t hear the audio on this youtube video”).

We learn faster when vocabulary is familiar and task focused. So, a message like “The request is not valid in the current state” (an error message from Windows Media Player) doesn’t make sense to people. Jeff, playing the part of an average user, says “What, I can’t do this in California?”

==Human Real-Time Characteristics==
The shortest audible silent gap in sound is 0.001 seconds. The auditory system is physical based (as opposed to vision where it’s chemical signals), so it’s faster. For vision, it’s 0.005 seconds to create a recognition — but you won’t *see* it. It’s like Tyler Durden style — people see it and respond, but don’t consciously notice.

The threshold for auditory fusion of clicks: 0.02 seconds; visual fusion of images is 0.05 seconds. Speed of involuntary flinch reflex is 0.08 seconds. Lag of full awareness of visual event is 0.1 seconds. Limit on perception of cause/effect is 0.14 second. A skilled reader can comprehend a word in 0.15 seconds (no morphemic breakdown/analysis). Time to subitize 1-5 items is 0.2 seconds. Time to identify/name a visual object is 0.25 seconds. Time to count items in visual field is 0.5 seconds/item. Minimum visual-motor reaction time: 0.7 seconds (in contrast to the flinch reflex; this is talking about a learned response to a visual action — for example breaking to avoid a kid running into the street is 10x slower than flinching if someone throws something at your face). Average conversational gap is an average of 1 second (but varies between cultures; if you stop talking for a full second, people wonder what’s happening). Length of unbroken attention to a task: 6-30 seconds.

In a neat trick, if you were watching a rabbit run across a field, your brain actually displays the rabbit ahead and further along the trajectory than it actually is when the photons hit your cones.

Controls must react within 0.14 seconds to clicks, or perception of cause/effect breaks.

The HTML structure of webmail interfaces: Gmail, Hotmail, and Yahoo Mail

As part of the Zentact project I’ve been working on, we were asked to integrate with various webmail clients. This makes it easy to manage your contacts while sending email.

Doing this was a bit of a pain. Since all code is minified, and they all use Javascript events differently, there was a good bit of working to figure out the details. I wanted to share this info in a blog post for programmers who come along in the future. If you don’t know/care about HTML, Javascript events, the DOM, YUI, or AJAX, this post is not for you. Please enjoy one of my other fine posts, perhaps this post on military code names.

Before I begin: there was a ton of info learned (and already forgotten) about this process. This is not a complete guide, but is mostly a brain dump from implementing UI integration on three different webmail interfaces.

  • Gmail uses 6-character strings, [A-Za-z0-9] for all its classes. These classes remain the same from load-to-load, but I believe that they may change over time with minification. IDs are not as constant, and many are dynamically assigned. These start with a colon.
  • When you’re working with events, you may get inconsistent results. Some events are not fully propagated, they get captured and you can’t find out about them. If onclick doesn’t work, try listening for onmousedown or onmouseup. One of them may get you notified of the event you want. Same advice goes for onkeydown, onkeyup, and onkeypress. That being said, once you get into these, be sure to realize that these three events will occur in particular orders. Make sure you’ll be getting notified at the right time.
  • All of the webmail UIs use iframes. This lets them keep their code for loading the UI separate from the code to display the UI. I know there’s some cross-site scripting implications in this, but I’m not sure of all the details. Gmail’s loading screen (the loading bar they show you) is a different iframe than the one that shows you the inbox. All of these iframes are at the root of the document, and there’s nothing else in there.
  • You could use Firebug break points to pause the code and examine what’s going on, but nearly all JS is minified. Since breakpoints can only be set by line, and there’s multiple functions defined per line, it ends up not being helpful.
  • For its UI, Yahoo seems to use YUI, plus some other stuff on top of that. There’s some weird results because of this. The body of the email editor is a group of DIVs, some are invisible, some are for border decoration, and others are for the background of the editor.
  • When we inserted elements into Yahoo Mail using regular DOM operations, they would appear behind other page elements, until another part of the UI was interacted with, when the screen would redraw and then they would bump into place. YUI seems to have its own redraw/repaint functionality, and it won’t play nice with DOM manipulations.
  • Hotmail is strangely one of the less-exotic interfaces. They use consistent IDs. I don’t think they’re hand-coded, however, because they submit to a naming scheme that seems too machine-generated. But still, they are there, and you should take advantage of them.
  • When you’re using events, and you get notified of an event, use the event.originalTarget property to find out where in the DOM you are. That’s useful information when you’re dealing with a DOM tree of nonsense class names and IDs.
  • When you’re trying to figure out where in a DOM tree you are, don’t hesitate to go up several levels and check a great grandparent node, or a “cousin” node. Once you get a single point of reference, you can generally work out where everything else is, relative to it.
  • Some UIs open each message in its own iframe, which means that IDs are consistent since they’re in their own namespace.

Also, thanks to Nate Koechley for helping me get through some of the Yahoo details.

If you’ve got other questions, shoot me an email. I remember more stuff, but might need a good question to shake it loose.