Data Pipeline
Meeting content flows from live capture through transcription, semantic chunking, and embedding before landing in the vector store. Each processing step is handled by an abstracted provider adapter — swapping implementations is a configuration change, not a code change.
Ingestion Flow
The meeting bot and Job Runner operate independently. The bot captures audio and uploads to blob storage; the Job Runner handles all downstream processing asynchronously.
Transcription Provider Strategy
Transcription is a discrete pipeline step. The meeting bot delivers raw audio (WebM) to blob storage; the Job Runner picks it up and dispatches to the configured transcription provider. The interface is abstracted behind a provider adapter.
At these price points, transcription output approximates what COTS providers deliver (Fireflies, MeetGeek, etc.). Accuracy can be tuned upward, but returns diminish quickly and costs increase exponentially.
Cheapest option with strong accuracy
Nano tier available for lower cost when top accuracy isn't needed
Build the provider adapter from day one. Swapping transcription providers should be a configuration change, not a code change.
Embedding Model Strategy
The embedding model is separate from the reasoning model used at query time. It converts semantic chunks into dense vectors for storage in Qdrant. Two strong candidates exist, each with different trade-offs.
Generally considered highest quality general-purpose embedding model
Anthropic has optimized Claude to work particularly well with Voyage embeddings
Which combination performs best is a moving target. OpenAI's embeddings tend to be higher quality in isolation, but Claude's optimization for Voyage means retrieval quality can be better in the full Claude + Voyage pipeline.
Build the embedding provider behind an adapter from day one. Swapping models should be a configuration change.
We will almost certainly swap embedding models at least once. The ability to re-embed the entire corpus against a new model needs to exist before that day comes.