AI-Powered AR Assistant and Vocational Coach

The Problem

Customer service looks simple from the outside, but it demands a lot happening at once: listening, interpreting, remembering menu options, tracking a multi-step order, making change, recovering when a customer changes their mind, all while maintaining a natural conversational flow. For neurodivergent individuals, particularly autistic job trainees, the challenge isn’t a lack of capability or knowledge. It’s the unpredictable, fast-moving, socially loaded nature of live interactions, where there’s no pause button and the “right” response depends on context that shifts moment to moment.

Traditional vocational training relies on scripts, shadowing, and repeated practice. Career coaches do essential work, but they can’t be present for every interaction. When the coach steps away, the trainee is on their own, and the gap between supported practice and independent performance is where confidence breaks down.

The System

We’re building a modular AR platform that sits in that gap: a discreet, always-present assistant that coaches the trainee during the interaction, not before or after it. The system is designed to be adaptable to different training scenarios and task domains; the barista counter we’re currently testing with is one instantiation of a broader architecture.

The trainee wears a pair of XREAL Air AR glasses. A microphone captures the ongoing conversation, and a Python Flask server orchestrates the pipeline behind the scenes. Whisper handles speech-to-text, feeding a running transcript to GPT-4o, which analyzes the conversation history, the current task state, and a structured knowledge base to determine what the trainee needs to know right now. The AR display overlays guidance directly in the trainee’s field of view.

What the trainee sees depends on what’s happening:

Conversational coaching: Suggestions for how to respond to a customer’s question or request, updated as the conversation evolves. Not a rigid script, but contextual prompts that adapt to the actual flow of dialogue.

Dynamic task checklists: Multi-step tasks (taking an order, preparing a drink, closing out a transaction) are tracked and checked off in real time as the trainee completes them, reducing the need to hold the full sequence in working memory.

Menu and pricing reference: The system knows the full inventory: items, categories, prices, customization options. It can surface this information when relevant, helping the trainee answer a customer’s question about availability, price an in-progress order, or guide through customization options.

Making change: When it’s time to handle payment, the system can visualize the correct cash and coin change, removing the mental arithmetic from an already cognitively demanding moment.

The design philosophy is support without substitution. The system doesn’t speak for the trainee or automate the interaction. It reduces cognitive load on the mechanical parts (remembering prices, tracking steps) so the trainee can focus their energy on the human parts (listening, responding, building rapport). The goal is independence: using the system as scaffolding that can eventually be removed.

A Scenario: The Barista Counter

Our primary test scenario is a coffee shop. A customer approaches and orders a latte. The system recognizes the order type and coaches the trainee through the natural follow-ups: what size, what type of milk, any flavor shots. As the trainee works through the conversation, the checklist updates. If the customer asks about pricing or available flavors, the system surfaces the answer from its menu database. If the customer changes their order mid-conversation, the LLM reanalyzes the context and adjusts its guidance.

The barista scenario is our current testbed, but the platform is designed to be modular. The same architecture (real-time conversational analysis, dynamic task tracking, contextual knowledge retrieval) can be reconfigured for different training scenarios by swapping out the knowledge base and task definitions. Retail, food service, front desk work, inventory management: any domain where a trainee needs contextual guidance during live interactions. Related work at the AIR Lab includes a complementary VR training system where job coaches can author their own training scenarios and virtual environments, and this AR platform shares the same philosophy: empowering coaches and trainees with flexible, adaptable tools rather than one-size-fits-all solutions.

Research Process

This is a collaboration between the RIT AIR Lab and AutismUp, a Rochester-based organization that supports autistic individuals and their families through vocational training and other programs. We work directly with their trainees and career coaches, who are co-designers and evaluators of the system, not just study participants.

Phase 1: Co-Design & Discovery (Complete)

We ran co-design workshops where trainees used the system through structured task scenarios while coaches observed and participated. Semi-structured interviews with both groups captured detailed perspectives on the experience. An ethnographic study component let us observe the natural dynamics and pain points of the existing training environment.

Phase 2: Refined Evaluation (Imminent)

The prototype has been significantly iterated based on Phase 1 findings. Phase 2 will run a focused usability study with the improved system, followed by in-depth interviews to assess how the refinements address the issues and opportunities identified earlier.

My Role

I serve as lead developer and co-researcher on this project, working as a graduate research assistant at the AIR Lab under Dr. Roshan Peiris. On the engineering side, I built and continue to iterate on the Flask server, Whisper integration, GPT-4o pipeline, and AR interface. On the research side, I participate in all co-design workshops, usability sessions, and interviews, and contribute to study design and analysis. The project is led by Dhaval Mahajan as part of his master’s capstone work; we collaborate closely on both the technical and research dimensions.

Why This Matters

There’s a version of AR + LLM work that’s about novelty: impressive demos, futuristic interfaces, technology looking for a problem. This project is the opposite. It started with a real population, real barriers to employment, and a real partner organization, and the technology was shaped entirely by what those people actually needed. The co-design methodology means the system reflects the preferences and feedback of the people who would use it, not our assumptions about them.

The broader implication is that modular, adaptable assistive platforms can expand who gets to participate in the workforce, not by lowering the bar, but by providing the right support at the right moment so people can meet it on their own terms. And by designing the system to be reconfigurable across scenarios, we’re building something that coaches and organizations can shape to fit their own training needs.

Graduate Research, AIR Lab, RIT · In collaboration with AutismUp · 2025 – present

AR Guidance System	Real-time conversational coaching, dynamic task checklists, and contextual prompts displayed on XREAL Air glasses
Flask Orchestration Server	Python backend managing session state, conversation history, speech-to-text, and LLM analysis pipeline
Co-Design Research Findings	Quantitative metrics and detailed qualitative feedback from workshops with autistic trainees and career coaches
Research Paper	In progress, targeting ASSETS 2026

AI-Powered AR Assistant and Vocational Coach

What was built

The Problem

The System

A Scenario: The Barista Counter

Research Process

Phase 1: Co-Design & Discovery (Complete)

Phase 2: Refined Evaluation (Imminent)

My Role

Why This Matters

Tech Stack

Collaborators

Publications

Presentations

Links