I’m long on spatial computing

What follows are some rough notes on spatial computing, after a month of heavy use of my Apple Vision Pro.

Buckminster Fuller’s term ephemeralization describes doing “more and more with less and less until eventually you can do everything with nothing,” and few framings are more apt for the future of spatial computing.

Humans are better adapted to spatial computing than 2D screen computing.
Spatial computers require pushing display, camera, battery, networking, storage, and silicon to infinity (areas where Apple has been investing for awhile!) with an array of biometrics & enchanted devices/accessories.
The interfaces are better, the form factor will become more portable, it’s the ultimate modular computer. I think spatial computers will replace all our desktop/laptop/tablet computers to start, phones last, some other wearables ~never.

Why spatial?

Human brains are dramatically better-adapted to physical spaces & tools than virtual ones. I knew how to use Blender way better 5 years ago, but there's basically no muscle memory for software years later. Taking a break from knitting for 5 years, I forgot nothing about the physical motions. Our brains remember physical spaces & interactions better, and our brains are no longer evolving, at least meaningfully on the time scale of computers. I almost never dream about computer UI, but it's more important to me than most physical objects. Embed virtual experiences in the core of our brains to make them more compelling & memorable.

What’s the final abstraction over owning an array of every screen size? Having anything virtual in your physical world
- Having your computer everywhere is alluring. There's a reason iPhone got so popular, and Apple Watch, and AirPods.
- What's still better about an iPad than Vision Pro today? Cameras, Apple Pencil, 5G. 5G will be solved when the form factor makes sense to wear in public, and the battery has any spare capacity. Cameras will keep getting better, and the silicon for them is a big limiter right now. Imagine "scanning" a document on future Vision Pro, then marking it up with a digitally-enchanted pencil, and it being a tactile object in virtual space.

Isolation vs immersion

In an increasingly stressful world, controlling your level of immersion is power & relief for the privileged. It's nice sometimes, even to watch a movie on my couch, much less an airplane. AirPods & iPhones are appealing to be able to control your level of immersion in the world. Expanding your body language & sound immersion/contortion to your vision is appealing.
Isolation is a central issue with Vision Pro today. Form factor decreasing partly solves that; make isolation optional.
Are glasses the form factor for eventual product?
- Need to show virtual materials to your eyes, glasses are socially acceptable & provide a relatively easy form factor, they’re easy to remove too
- Immersion will need to be an accessory if so—valuable to dial into full VR sometimes
- Further down the line of ephemeralization: contact lenses with infinite networking/graphics/sensing that are completely invisible

The computing

Voice interfaces are not the future for complex computing. Humane & Rabbit have nice features for on-the-go, and some interactions are fastest by voice—I can say "5 minute timer" to a device faster/more smoothly than opening any timer & starting it, or typing it. Voice interfaces aren't going away. But the Bloomberg Terminal isn't switching to being an Alexa skill. Visual interfaces are faster to parse, denser with interactions. This will continue to be the case regardless of the technology.
We need to move beyond the Files app, app silos. These aged quickly & don't feel at home on visionOS today.
The interaction of AirDropping a file to visionOS & having it open in your space is incredible. Take actions on content, physically move it around, feel progress in your space. So much more human than file interactions on computers today. AI will help here.
Potential for array of digitally-enchanted accessories. AirPods are the first, AR preview on keyboard, but adding haptics (link), precision input devices of all sorts. Unclear to me if inanimate tactile objects help when virtually enhanced—if iPad is a magical sheet of glass that can turn into anything, but you can spawn one anywhere, does holding a sheet of glass help? For now, having physical objects that beam virtual content into our vision (Mac Virtual Desktop, AR keyboard bar) is compelling
Mac Virtual Desktop today is needed because of political software restrictions (macOS vs visionOS apps), plus the silicon overhead on Vision Pro isn't sufficient to run all software one might want to. Both of those are solvable problems. Having entire workstation with you at all times is the default trend of the world right now. Unclear how much networking improvements make some offloadable (e.g. game streaming, Mighty browser for your computer), but betting on silicon getting better faster seems more viable.

AI

Humane & Rabbit are trying to make some interactions easier, and I applaud the effort on both hardware & software. But those devices can't succeed as silos outside our primary computers, they need to be integrated with all our private data, biometrics, apps, subscriptions, friends, content, internet, suite of accessories. Apple has huge headstart here.
With less precise input devices by default—voice & hands are great because you have them everywhere, in public & private, you can’t lose them, they’re usually available—being able to communicate more generally & have a computer make that request more specific is valuable.
Generative UI makes a ton of sense in a spatial environment—like the AirDropping example
Trustable AI that's a network of what’s on your computer—your apps & your data, to use today’s containers—is far more useful than general-purpose AI like ChatGPT. Apple Shortcuts provides deeply-integrated hooks for thousands of pieces of software to work together, ChatGPT's API spec provides a future for how creating those integrations can be a lot easier. But there's privacy/consent built in, there's on-device processing of your data & permissions for when it leaves.
Having assistants understand what's in your physical world with cameras & sensors, plus what's onscreen with machine-readable UIs, then being able to take actions to connect apps & content together, that's game-changing. Companies are working at this from all different angles but the dream is what Siri could be on visionOS.
As ~everything becomes digital, having assistance that manipulates your space means so much more. Imagine telling Siri to draft emails and leave them on your desk for tomorrow. When you're in the space of your office & in the headspace of work, you'll find what office men in Mad Men wanted.

The hardware

The majority of what’s wrong with Vision Pro can be fixed with dramatically better tech.

Video game development & play, and digital animation/VFX, has long pushed the digital state of the art. Spatial computing is like that—an M2 that's super fast in a MacBook can barely keep up with visionOS, and visionOS is operating at a fraction of the resolution/throughput it ultimately will.
We need dramatically better tech at every level. We need cameras to record entertainment at 4x the resolution—32K 3D video, not 8K. We need to stream that in real-time, store it instantaneously. Render as many 3D people with realistic lighting in our space as we want. We'll need orders of magnitude more internet bandwidth, storage, graphics power, battery energy density, clean energy for data centers to stream all this. Silicon improvements are critical above all else, but humans’ track record there is solid.
Open question to me: how much sensing does this computer do automatically when you're walking around? On Google Glass or Humane, interact to take a photo or ask it something using the camera. Vision, meanwhile, can’t use your space as input at all. Tab is a vision where the AI-enabled wearable is mostly for inputting data, which makes sense with today's technical constraints (indie founders can't build next-gen Vision Pro), but socially that's a huge no. Wearables sensing me consensually—measuring vitals, fitness on Apple Watch—are great. Unclear to me how valuable automatic capture of anything about other people is, and opens cans of worms. (Also see: Ted Chiang’s “The Truth of Fact, the Truth of Feeling”)

The future

Teleportation isn't getting any more likely from physicists. But as Cleo Abrams describes, Apple Vision Pro is laying the groundwork, through the visual immersion, Personas, Spatial Audio, and more, for the closest, most low-carbon method we’ll get anytime soon. Most people are under-prepared for how soon convincing virtual teleportation will be possible.
- Shared environments. Made possible by 3D photogrammetry of spaces past & present & here & elsewhere, real-time streaming of full-body Personas, shared software experiences & content. Doing things together regardless of where you are.
- FaceTime fundamentally changed what a long-distance relationship can look like 15 years ago, with global video calling with front & back cameras. That story is not over because we have mostly-functioning video chat, as marvelous as it is. We're not going to make it feel like in-person, but it will continue marching along the spectrum of how much spending time together digitally feels like "spending time together."
Nostalgia is one of the most powerful emotions on Vision Pro today. Not sure where this fits in yet.

We’re not going to stop going places physically, spending time in nature, spending time with people. But there are experiences our computers can enhance, ways our wearables can make us better versions of ourselves (health, navigation, reminders). These realities will continue to coexist, getting wired into each other in ever-more-complex ways until our tools merge with us.