I have used a number of UIs over the years on a number of operating systems. I have stabilized on a rather select group of utilities with which I can be extraordinarily efficient. However, OS X's general utility value has lured me into the wild world of commercial GUIs on a daily basis. I came for the seamless support of all of my laptop's hardware, but there are a number of daily frustrations from a few poor UI choices.
There are tomes upon tomes written about good UI practice. They mostly focus on making things obvious for people who are using an application for the first time. These guidelines are generally pretty good as they go. Apple, for example, famously produced Apple Human Interface Guidelines. This has resulted in most applications looking and working the same. Once you get inside the mind of iTunes, using iPhoto or iMovie will yield few surprises. Apple just as famously completely ignored these guidelines in QuickTime Player, but that's another story.
The trouble with these guidelines is that they are targetted at people who have never used a computer before. iTunes is a very compelling application if you've never seen a well-managed collection of MP3s before. But if you have already established a management technique, you may find iTunes to be worse than useless as it tries to apply its own least-common-denominator approach to music management. The simple fact of the matter is that most people are not using a computer for the first time.
So I am publishing here a few generalized guidelines for UI design that should be followed in addition to the existing rules. These rules mostly have to do with stuff a newbie will never notice, but for those of us who got an iBook as a compromise until we can afford a neural jack, these suggestions are necessary.
The number one problem with OS X is its control of keyboard and visual focus, though OS X is not unique in this respect. There are two basic classes of focus problems: inter-appliaction (window manegement), and intra-application (navigating the various forms and other elements within a window).
Consider an example situation. I am reading email in a text-based email client and I encounter a URL I would like to visit. My email environment uses AppleScript to tell Firefox to open the URL. Firefox then steals keyboard focus, but Firefox takes a long time to do anything so I Command-Tab back to my email while it loads. Then Firefox steals visual focus, painting over the email program (but the email retains keyboard focus). So I Command-Tab around until I get my email shuffled back on top. Then the page finally loads and Firefox steals focus again, interrupting me halfway through writing an email. The second half of a sentence and an Enter press make it through to the Firefox location bar, which causes Firefox to start churning again trying to resolve the URL "g home now." into something useful.
There is unfortunately a counter-example where the Principle of Least Astonishment applies. Suppose a newbie has just clicked on a URL in his email and Firefox opens in the background and the newbie has no idea how to find it. OS X has a compromise here where the Dock will provide a non-interrupting visual cue (a bouncing icon) that a background window would appreciate your attention. This may work, but probably what is really needed is the ability to have it both ways (something Apple hates to give you). Bizarrely, AppleScript is already built with this in mind. There are separate "open" and "activate" commands. So Outlook for OS X can by default send open and activate together to the web browser, and my custom AppleScript can send just the open and not the activate. But I don't have this option because crappy Firefox is least-common-denominator.
I hardly ever use the mouse, so I really notice when applications have poor intra-application keyboard focus behaviour. I again come to Firefox. Firefox has an enormous number of useful keyboard shortcuts, such as Command-W to close the current tab. Unfortunately Command-W closes the tab that has current keyboard focus, not the tab that has current visual focus (i.e., the one you can see). Most of the time no tab has keyboard focus (it is instead lost in some Flash or Java or perhaps *nothing* has keyboard focus), so you must click on the tab then press Command-W. What's the point of a keyboard shortcut you have to use the mouse to access?
This problem shows up with every single Mozilla keyboard command. For example, consider Tab (which moves between various GUI elements such as links in Firefox). If the focus has become lost, Tab becomes useless.
How do you give a tab focus in Firefox? Well, you can click on the little tab indicator in the tab bar, which will (almost) always work. Most of the time, however, people just click in the middle of the window/tab that they want to have focus. This works very well most of the time, but in Firefox the odds that clicking on a random piece of the screen will activate a link are pretty high. Oddly, there are few visual cues about which parts of the screen are active in this way. Some links are underlined or colored, but that varies depending on the page. Some pages even have large areas of background imagery that are active. Some pages have no inactive area whatsoever. You should not have to hunt for inactive area just to establish focus.
Consider the ubiquitous file selection dialog. Every commercial GUI has several variations on it. But they all share this idea of a widget with a very large list of files that you can only see about ten of at once and must then scroll through. Various keyboard shortcuts can make this task less arduous, but it's inherently crapulent. Let me explain why.
Imagine you would like to be listening to Soft Cell's Non-Stop Erotic Cabaret. If you are using CDs, you will probably have a shelf of some sort with a zillion plastic cases piled up on it. If you are at all anal about your music, you probably know exactly where Soft Cell is. You don't need to scan through Beatles, Bowie, Byrne, Grateful Dead, and so on to get to the Soft Cell on the third shelf. What scanning you do perform is probably limited to Radiohead or Sneaker Pimps. The moral of the story here is that the shelf is truly random access.
Imagine you are using Unix and you happen to know that you have Soft Cell on a "/media" NFS share. I type very quickly, so I type "mpg321 /media/mp3/soft<tab><tab>" into bash before you can even move your fingers from the mouse back to the keyboard. bash interprets Tab to mean "show me the completions of this filename", which produces a list of Soft Cell albums on my screen. I then type in the first few letters of the album name, hit Tab again, and I'm there.
Imagine you are using a GUI mp3 player. You switch focus to the MP3 player, then you visit a file selection dialog. You then wait for whatever directory you last looked at to finish loading in the file selection dialog. Then you select the /media volume using some clumsy technique and wait for your NFS to wake up and show you the list of directories in /media. Then you select mp3 from that list and wait while it sloooowly loads the list of every MP3 album over NFS just so you can skip two thirds down the list for Soft Cell. Because you are impatient you are scrolling down the list while it is still loading, causing all sorts of confusing poor scrollbar behaviour. Finally you're done.
The actual level of frustration experienced in the file selection dialog can be ameliorated with any number of tricks. For example, I'm certain most MP3 players employ some form of playlist caching to speed up the process. For example, most Winamp users have a single large list of all of their songs or albums open at all times so they don't have to navigate any directory structure or wait for the list to reload. However, none of these approaches are perfect. How many times have you had to wait for Winamp to finish drawing the part of the list you're looking at because it was swapped out or something?
The key difference is in the number of interactions between myself and the computer. In Unix there are three steps:
The ideal scroll list navigation step operates as a fast feedback loop. Every time you move the scrollbar a single pixel, the list of files changes instantly, and your mind sees "Beatles" and knows it is nowhere near yet, then it sees "Kraftwerk" and knows it's getting there, then by the time you see "Radiohead" you better be slowing down. If your computer is very fast and you are very lucky, this scrolling may seem like a single smooth operation. But in fact it is a zillion interactions -- each minute movement of the scrollbar produces new feedback for you and you cannot continue moving the scrollbar efficiently until the feedback appears.
File selection dialogs pose the worst case of this flaw because you may know ahead of time a very long directory path, but in order to navigate it, you must generally have a two-step interaction with the computer for each step of the path. If any of the intermediate directories are big or slow (or both), you may be waiting for seconds or minutes for the computer to look up information you aren't even interested in. At best, it breaks up your typing.
The solution is to accept complete paths typed into the file selection dialog without any surprises. Bash-style tab completion is also necessary.
Really this is just a generalization of why scroll lists are evil.
Imagine you want to run the batch file c:\dos\run within a DOS window
on a Windows XP machine.
You know you must type something like:
Since those are all simple keyboard commands, type them as fast as you can. Go ahead, try. Depending entirely on how fast your computer is and how fast you type, you will either see what you want or perhaps the little Run window with just the words "s\run" in it, because the rest of your keypresses got gobbled up and lost while Windows was struggling to pop up the Start menu.
This problem is pervasive in GUIs. Every action has an accompanying visual feedback. The whole system falls apart if you respond to that visual feedback before the computer is ready. Computers are really very simple devices and 99% of the time you actually know what is about to happen and do not need to see the visual feedback.
The problem gets a lot worse when mice get involved. Suppose you want to drag and drop something, but your stupid computer is giving you the hourglass cursor. You know what you want, and all of the visual elements are on the screen, but if you start telling the computer what to do before it's ready to listen, god knows what will happen. Often enough, you will be able to insert your drag and drop operation into the queue of things to do when it finishes whatever it's doing. But sometimes you will enter your action to the current screen state, then it will process your actions in a different context. For example if you are looking at the desktop while Firefox is loading, and you decide to drag some stuff around on the desktop...if Firefox finishes loading at the wrong time, your drag and drop commands that you input while the desktop was visible instead become gibberish commands to Firefox.
The core element here is that the user is always waiting for the computer before he can proceed to the next step, no matter what the next step is or how well the user understands it. Visual feedback is useful -- even necessary -- for newbies, but you're not embarking on a voyage of discovery every time you select the mp3 folder within the media folder. Why should your computer?
The average text on my screen is about 20 pixels tall. In, for example, a scroll list, I must click on an exact line of text -- not above or below even a pixel. I am not an accurate mouser at that level of detail. Before you tell me to learn, note that I also do not experience repetitive stress injuries.
Most of the time the mouse is used as a poor-man's touch screen. Laptops have shit-poor mice. This is a fact of life. Why use a poor-man's poor-man's touch screen on a laptop when you could just install an actual touch screen?