Connect Build

Project

Automated Video-to-RAG Content Pipeline Using Object Storage & AI

Objective

Retrieval-Augmented Generation (RAG) systems have become a core pattern for building intelligent applications that use enterprise data for AI-assisted interpretation and question-answering. Traditional RAG pipelines primarily rely on structured text documents, PDF files, or web content that can be easily vectorized. However, a growing share of enterprise knowledge now exists in video formats—training sessions, technical demos, customer support recordings, security briefings, and product walk-throughs. These videos contain rich audio information, but most RAG systems cannot directly use this content because they lack the ability to extract, transcribe, and vectorize information from multimedia sources.

Outcome

This project aims to bridge that gap by designing and implementing an automated ingestion pipeline for video files stored in an open-source S3-compatible object storage platform (e.g., Ceph, MinIO). When a user uploads a video object into a designated bucket, the system should automatically: Detect that the uploaded file is a video using metadata or content inspection. Extract the audio stream from the video. Convert the audio to text using an open-source speech-to-text model or API. Chunk and vectorize the extracted transcript using an embedding model. Store the resulting vectors in a vector database, making the video’s content available for RAG-based querying and chat interfaces.

Apply By Date	15 Jan 2026
Students	0 / 10
Duration	6 months
Mentor	Sarvesh S Patel

Tools-Technologies

Java, Watson APIs, Watson VR API, WatsonX.ai

Platform

1 ) WatsonX

College

1. Sanjivani College of Engineering

Sarvesh S Patel' Comment

Students will design the complete event-driven pipeline using open-source tools, agentic automation (optional n8n-based orchestration), and AI components. The final system should allow any video uploaded into the storage bucket to become searchable and usable by downstream RAG applications, effectively turning raw media into structured, AI-ready knowledge.