ProsodyAI Documentation

ProsodyAI provides real-time speech emotion recognition powered by state space models (SSMs) with domain-specific taxonomies for enterprise applications.

ProsodyAI analyzes prosodic features—pitch, energy, rhythm, and voice quality—to estimate emotional state. Latency depends on the deployed API, model backend, chunking, and ASR path.

Features

Streaming Analysis: Chunk-level estimates for streaming audio; benchmark latency in your deployment
Domain-Specific Taxonomies: Pre-built emotional states for 8 enterprise verticals
Forward Prediction: Predict conversation outcomes before they happen
Continuous Learning: Models improve from production feedback

Quick Links

Quickstart

Get up and running in 5 minutes

TypeScript SDK

Install and use the @prosody/sdk package

API Reference

REST API endpoints and authentication

LangChain Integration

Use ProsodyAI as a LangChain tool

Supported Verticals

Vertical	Use Cases
Contact Center	Escalation prediction, CSAT forecasting, agent coaching
Healthcare	Mental health screening, clinical attention indicators
Sales	Deal probability, objection detection, buying intent
Education	Engagement tracking, comprehension monitoring
HR/Interviews	Authenticity detection, confidence assessment
Media/Entertainment	Audience engagement, emotional impact measurement
Finance	Suitability assessment, comprehension verification
Legal	Credibility assessment, testimony analysis

Architecture Overview

Audio Input → Feature Extraction → SSM Classifier → Vertical Taxonomy → Actionable Insights
                    ↓                    ↓
              Prosodic Features    Emotion + VAD
              (pitch, energy,      predictions
               rhythm, voice)

ProsodyAI extracts 28 prosodic features and 4+ phonetic features, processes them through a Mamba-style state space model, and maps predictions to your configured vertical taxonomy.

Introduction