Machine Vision Exercise Coaching App

The Problem

Supine-to-Stand is a deceptively simple exercise: lie flat on your back, stand up as quickly as you can. But execution time on this movement is a meaningful clinical indicator, correlating with balance, coordination, strength, and overall motor development. For children who are blind or low-vision (BLV), tracking and improving STS performance is particularly valuable; but there are also increased challenges towards being able to independently practice technique without visual feedback.

A sighted coach can observe a student as they perform the exercise and offer tips and encouragement, but this limits the opportunities for independent practice. It is also difficult for a coach to provide thorough and comprehensive feedback about other aspects of the technique for a fast-moving full-body exercise: which phases were slow, which transitions were inefficient, where specific joints lagged. Doing this across numerous students and reps and tracking the data over time is impractical without technological support.

The Solution (Coaching App)

To resolve these problems, I have built a native iOS (iPhone) application capable of recording an exercise session and transforming it into reviewable, detailed, actionable feedback.

The workflow:

Record: The app captures a video of one or more reps of the Supine-to-Stand exercise. It automatically segments the recording into individual reps, detecting the boundaries between repetitions.

Analyze: Each rep is processed through a machine vision pipeline. Pose landmark extraction (using Apple Vision and MediaPipe) identifies joint positions and body orientation across every frame. A custom-trained BiLSTM model classifies each moment of the exercise into one of five sub-phases, capturing the sequential progression of the movement. From the pose data and phase segmentation, the system computes granular metrics: execution time (the primary research variable), phase durations, joint angles, transition quality, and other biomechanical indicators.

Feedback: The raw analysis is synthesized into human-readable coaching feedback through two parallel paths. A rule-based system maps specific patterns to targeted tips and suggestions. An LLM layer (GPT-4o) produces a natural-language synthesis that contextualizes the findings and offers encouragement and guidance.

Review: All session data is preserved. Coaches and participants can replay any rep, scrub through the video with pose overlays, review the phase breakdown, and compare performance across sessions over time.

The ML Pipeline

The most technically demanding part of the project is the custom phase classification model and the infrastructure around it.

The Supine-to-Stand exercise decomposes into five sequential sub-phases, distinct stages of the movement from lying flat to standing upright. I’m training a BiLSTM (bidirectional long short-term memory) network in PyTorch to classify these phases from sequences of pose landmarks. The sequential architecture matters: unlike a frame-by-frame classifier, the BiLSTM captures temporal context: it knows what phase came before and uses that to make better predictions about what’s happening now.

Training data is annotated using Label Studio. I lead two undergraduate researchers who handle annotation tasks under my guidance, working from a codebook I developed for consistent phase labeling.

The training pipeline handles several real-world complications. Sample videos come from varied environments, some with crowds, occlusion, or multiple people in frame. I use BoT-SORT (a multi-object tracking algorithm) for re-identification, ensuring the pipeline tracks the correct person across frames even in cluttered scenes. The model training uses focal loss to handle class imbalance across the five phases (some phases are inherently shorter than others), along with sequential constraints that encode the fact that exercise phases follow a natural ordering.

Research Collaboration

This project sits at the intersection of three labs across three universities:

AIR Lab (RIT): Accessible and Immersive Realities Lab, directed by Dr. Roshan Peiris, my advisor. We specialize in the design and technological implementation of “immersive accessibility” technologies and human-computer interaction.

IMSVID Lab: Institute of Movement Studies for Individuals with Visual Impairments, directed by Dr. Pamela Beach, who is also the RIT Associate Dean for the College of Health Sciences and Technology. IMVSI is a multi-institutional research institute, split between RIT, University of South Carolina, and SUNY Brockport.

PDDR Lab (University of South Carolina): Physical and Developmental Disabilities Research Lab, Dr. Ali S. Brian, Associate Dean for Research in the College of Education.

Field Study

Phase 1 data collection took place on-site at a camp for BLV children. We recorded 15 children (ranging from elementary to older teens) each performing three reps plus a practice rep of the Supine-to-Stand exercise. The app analyzed each session, and we shared different versions of its feedback with the participants.

Data collection included two rounds of focus groups per group, with the 15 children split into three groups of five. The first round was an informal sharing session and mini focus group that I co-facilitated with Dr. Brian. The second was a more detailed, structured focus group conducted by Dr. Beach and her undergraduate research assistants.

My Role

I am the design and technical project lead. I conceived the technical approach, designed and built the full iOS application, created the ML training pipeline, trained the phase classification models, developed the annotation codebook, and supervise an undergraduate annotation team. I coordinate the research across three labs at different universities, participate in study design, and co-facilitate data collection sessions. This project is supervised by Dr. Roshan Peiris.

Graduate Research, AIR Lab, RIT · In collaboration with IMSVID Lab (multi-institution) and PDDR (USC) · 2025 – present

iOS Coaching Application	Native Swift/SwiftUI app with video capture, rep segmentation, pose analysis, ML inference, and age-adaptive feedback, distributed via TestFlight
Custom Phase Classification Model	BiLSTM trained in PyTorch to segment the Supine-to-Stand exercise into 5 sub-phases from pose landmark sequences
ML Training Pipeline	End-to-end PyTorch pipeline with data annotation (Label Studio), pose extraction (MediaPipe, Apple Vision), BoT-SORT re-identification, and model training
Field Study Data	Phase 1 data collection with 15 BLV children at camp: video analysis, feedback delivery, and focus groups

Machine Vision Exercise Coaching App

What was built

The Problem

The Solution (Coaching App)

The ML Pipeline

Research Collaboration

Field Study

My Role

Tech Stack

Collaborators

Publications

Links