Beyond the SmartWatch

The Problem

Smartwatches are powerful, always-available computers on your wrist, but interacting with them is still frustratingly limited. The tiny touch screen demands precise taps with your other hand. The digital crown works for scrolling but little else. Voice assistants are slow, unreliable in noise, and socially conspicuous. Most of the time, interacting with a smartwatch means stopping what you’re doing, raising your wrist, and fiddling with a screen smaller than a postage stamp.

This isn’t just an inconvenience; it’s a fundamental constraint on what wearables can be. The hardware is capable of far more than the interaction surface allows. And for users with disabilities, limited dexterity, or situational impairments (holding a bag, gripping a bike handle, cooking), the gap between what the device can do and what you can actually ask it to do becomes even wider.

The skin around the wrist (the hand, forearm, digits) is an untapped interaction surface that travels with the user everywhere the watch does.

The Idea

What if the watch could sense gestures performed on your own skin? Not through a camera or an external sensor, but through the accelerometer already built into every smartwatch on the market.

When you tap the back of your hand, knock on your wrist, clench your fist, or flick a finger, the mechanical impact creates vibrations that propagate through bone and tissue to the wrist. These bioacoustic signals are detectable by the watch’s inertial measurement unit (IMU), the same sensor that counts your steps and tracks your workouts. By recording accelerometer data at high frequency and classifying the resulting waveforms with a machine learning model, the watch can distinguish between different gesture types performed at different locations on the body.

No additional hardware. No external sensors. Just the watch people already wear, listening to the vibrations of their own body.

Why Bioacoustics?

The skin is a surprisingly natural input surface. Studies show that around half the population already writes on their own skin regularly: phone numbers, grocery lists, reminders. It’s ubiquitously available, proprioceptively rich (you can feel exactly where you’re touching without looking), and socially unobtrusive compared to voice commands or exaggerated air gestures.

Prior work in on-skin interaction (systems like SkinTrack, TapSkin, and Back-Hand-Pose) has demonstrated that skin-transmitted vibrations can be classified with high accuracy. But most of these systems require dedicated external sensors: electrode arrays, additional IMUs strapped to the forearm, or specialized microphones. This thesis asks a more constrained question: what can we do with just the watch itself?

By limiting the sensing to the consumer-grade accelerometer already in the Apple Watch, the results are immediately applicable to the hundreds of millions of smartwatches already on people’s wrists.

Design Space

The thesis maps the interaction space across three key dimensions:

Gesture types: The set of bioacoustic primitives detectable at the wrist: tap, double-tap, knock, clench, flick, pinch, and combinations. Each produces a distinct vibration signature in the accelerometer data. The gesture vocabulary is deliberately kept compact to support memorability; this is an imaginary interface with no visual signifiers, so every gesture must be intuitive enough to learn and recall without labels or buttons.

Skin locations: Where on the body the gesture is performed: back of hand, palm, individual digits, radial and ulnar sides of the wrist, and proximal and distal forearm. Location adds a second axis of expressiveness: the same tap means different things depending on where it lands.

Interaction modes: Foreground interactions replace what you’d normally do on the screen (opening an app, selecting a notification). Background interactions happen while the user’s primary attention is elsewhere: controlling music playback while cycling, pausing a workout timer while exercising, advancing slides during a presentation. This foreground/background distinction is a novel contribution of the thesis, enabling mode-switching through gesture context.

Research Approach

The thesis follows a three-study methodology, each building on the last:

Study 1: Gesture Elicitation

With 12–20 participants, I use a participatory gesture elicitation methodology, the standard in HCI for discovering intuitive gesture mappings. Participants are shown action referents (visual demonstrations of smartwatch functions like “open the app switcher” or “skip to the next song”) and asked to propose on-skin gestures that feel natural for triggering each action. Think-aloud protocol and semi-structured interviews capture the reasoning behind their choices. Agreement analysis reveals which gesture-to-function mappings have strong consensus across participants.

Study 2: ML Classification

Using data collected alongside the elicitation study, I train a CNN classifier on accelerometer waveforms to distinguish gesture types and locations. The custom desktop Swift/SwiftUI application pairs with a WatchOS data collection app: the watch streams sensor data in real time while participants perform guided gesture sequences. Validation measures accuracy, latency, and cross-user robustness, and identifies which gestures in the taxonomy are too similar for reliable classification.

Flowchart showing three research sessions: Session 1 combines gesture elicitation (1h) and ML data collection (30m), Session 2 handles classifier validation (up to 1h), and Session 3 evaluates the final prototype (1h) — Research session structure. Gesture elicitation and data collection are combined in Session 1 to reduce participant burden, followed by classifier validation and prototype evaluation in separate sessions.

Study 3: Prototype Evaluation

The final study builds functional WatchOS prototypes that use the trained classifier to enable real on-skin gesture interaction. Participants use the prototypes in realistic task scenarios and evaluate usability, learnability, and overall experience. This closes the loop between theory and implementation: not just can we classify these gestures, but should we, and do people actually find it useful?

Current Progress

Applications & Implications

The interaction modality explored in this thesis has broad implications:

Eyes-free interaction: Proprioceptive feedback from your own body means you can execute gestures confidently without looking at the watch. Control music while cycling, manage a workout timer during exercise, or advance presentation slides, all without breaking focus from your primary task.

Accessibility: The same qualities that enable eyes-free use make this modality a natural fit for blind and low-vision users. One-handed gesture input benefits users with limited mobility or dexterity. And unlike voice interfaces, on-skin gestures work in noisy environments and don’t require speech.

Discreet input: Small, quiet gestures on your own skin are far less conspicuous than talking to your wrist or performing exaggerated air gestures. Useful in meetings, classrooms, public transit, any context where social norms constrain interaction.

AR/VR input: Smartwatches are always-available controllers. Rather than requiring dedicated handheld devices or external tracking systems, the user’s own body becomes the interaction surface, a “good enough” input method for lightweight augmented and virtual reality experiences.

Expanded vocabulary for everyone: Even for users without specific accessibility needs, on-skin gestures dramatically expand what’s possible with a smartwatch. The tiny screen is no longer the bottleneck for interaction complexity.

My Role

I’m the sole researcher on this project, designing the studies, building all software (WatchOS data collection app, desktop training/validation tool, ML pipeline, functional prototypes), conducting user studies, analyzing data, and writing the thesis.

Gesture Elicitation Study	Participatory design study method, a gold standard for determining intuitive gestures for novel and emerging interface modalities
WatchOS Data Collection App	Custom Apple Watch app for high-frequency accelerometer recording during gesture sessions
Desktop Training & Validation Tool	Swift/SwiftUI app for guided data collection, ML model training, and classifier evaluation
CNN Gesture Classifier	CoreML model distinguishing gesture types and skin locations from accelerometer waveforms
Functional WatchOS Prototypes	Working smartwatch apps using the trained classifier for real on-skin gesture interaction
Literature Review	Academic literature review of previous related implementations and design explorations
Gesture Taxonomy	Mapped design space of bioacoustic on-skin gestures across type, location, and interaction mode