The desktop metaphor is, like, so five minutes ago
Update: this was written before I ever touched an iPhone or iPad. These devices are major improvements over the desktop metaphor GUIs I complain about below.
When you grow up playing video games, like I did, the primitiveness of office software user interface design comes as a shock. The desktop metaphor was a brilliant stroke back in 1970 when they thought it up at Xerox PARC, but I feel like it has outlived its usefulness.
User interfaces are the first and most immediate form of computer instruction, and for many people the only instruction they ever receive. Not every interface designer teaches their own products equally well. The problems mostly emerge from designers' presuming implicit knowledge from the user that might not really be there. There are plenty of computer science concepts that are common knowledge to programmers and engineers, but that are esoteric or totally opaque to the population at large. For example, the general public uses the terms memory and storage interchangeably, even though they refer to different computer components that function in very different ways. Most normal people don't have mental models of a computer program's inner workings, and rely entirely on the interface to provide the model.

The desktop metaphor treats the screen of a computer as if it's the top of your desk. Great, except that it's vertical instead of horizontal, and the laws of physics mostly don't apply. A window in an operating system isn't very much like a window in a wall and is even less like a sheet of paper. Typing some text on the screen only very superficially resembles typing it on paper. Unsaved text is in volatile memory that needs to be continuously powered to work. If you turn off the computer, intentionally or not, memory gets instantly blanked, and your text is gone forever. The only way to prevent this tragedy is to make sure to explicitly tell the computer to copy the text from memory onto a storage device like the hard disk.
Computer experts sneer at people who don't understand the concept of saving, but it's wrong to sneer, because there's no reason for someone to intuit the difference between volatile memory and non-volatile storage unless they've had it carefully explained to them. We're used to making marks on a surface and having them persist unless we take steps to erase them. There is a trend towards auto-saving in software, but it's been absurdly slow in coming.
The filing cabinet analogy for hard drives made more sense back in the eighties, when disks were expensive and limited in their storage capacity. The first computer I was in charge of was the one I took to college in 1993. I knew intimately which programs I had installed on it at all times, because if I wanted to install a new one, I had to erase something first. Now the hard disk's filing system has to contend with billions of bytes of data. In this era, a big searchable database is a better model than a filing cabinet. Instead of putting a file in a single, unambiguous location, you're better off tagging it with descriptive metadata so you'll be able to find it in a search later. The filing cabinet analogy has the virtue of accurately representing the file system's actual organization, but it's not very human-friendly. Our own minds are organized into associative networks, not hierarchical directories.

The most annoying aspect of the conventional desktop metaphor is its Escher-like recursiveness. Within the screen representing part of the computer is an icon representing the entire computer. The desktop folder appears to be inside itself. The recursion is interesting from a metaphysical perspective, but deeply confusing if you're just trying to understand your file system.

Much as I love the Mac OS, its recursiveness can be even more confusing than Windows. The Mac has two sets of folders called Applications, Documents, Library, and so on. Even though their names are identical, their identities and functions are totally separate.
For non-expert users, probably the most difficult aspect of the graphical user interface is keeping track of the keyboard focus. Even the Mac OS doesn't always do a wonderful job of making it obvious which text field in which window has the keyboard focus. In the screenshot below, the text cursor could be subtly blinking in any of at least three different places.

As a teacher of novice users, the phrase you hear a lot is: "I don't know where I am, I want to go here." This is an interesting phrase to me. No one says, "I don't know which task or process is set to receive text input." People intuitively conceptualize computer interfaces as places inhabited by their own bodies. This intuition is misguided - even when you "surf" the web, you're not going anywhere, bits are being transmitted to and fro from one computer disk to another. But there's no talking people out of their intuition. Computer programs are easier and more fun to use when they present a user illusion that accommodates our instincts.
Windows represents your onscreen avatar as a little arrow, a little cartoon glove or a little hourglass. Macs represent you as an arrow or a psychedelic rotating rainbow ball. Efforts to make the computer into a "person" you're talking to always fail. Everyone hates Microsoft's anthropomorphic paper clip and dog, and people would rather interact with impersonal Google than to ask Jeeves. It would be better to represent the operating system as a place, with the user "embodied" by an avatar. It doesn't have to be a person, or even humanoid. It can be a bird or a robot or whatever. But you do have to be able to easily control and manipulate it, and it needs to unambiguously represent the keyboard focus.
I think the future of interface design is to be found not in Apple's products, but in video games, especially eighties video games, most especially the ones by Nintendo. The user interface of every Nintendo product has to be intelligible by semi-literate young children in every world culture. For the most part, they succeed heroically. Long before video games had any of their present lavish production values, they conveyed a strong sense of first-person experience that could be instantly grasped by preschool-aged children. This metaphor is as old as human consciousness, and we're vastly more adept with it than we are with the metaphors of desktops, file cabinets and disembodied cartoon fingers. If I controlled the universe, I'd want my computer's file system to look like this:


The Mario games are models of clarity and graphic economy. One visual metaphor I'm particularly fond of is the level selection system in Super Mario 64 -- I'd include a screenshot but I can't find a good one. You start the game in Princess Peach's castle. Each level is represented by a painting hanging on the wall. To visit that level, you jump into the painting. I'm imagining a nice system for pointers and aliases where you could take the paintings off the wall, carry them around with you as you see fit, rearrange them, etc.
Mario 64 is a 3D game, but I'm not advocating the use of 3D spaces as interfaces generally. I think eighties video games are a better model for interface designers exactly because they're limited to 2D spaces with limited attempts at depth. Unless it's handled very expertly, illusory projections of 3D spaces onto a 2D screen cause a lot of confusion. Playing a game like Halo or Quake is a poor approximation of our actual 3D experience, it's like viewing the world through a cardboard box with a little rectangular hole cut through it, and with only one eye. I think it's better to look for creatively optimized plane layouts than to burden the user with a lot of projective geometry.