Skip to content

Akshat2512/AI-Voice-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Assistant Overview

I successfully developed and deployed a sophisticated AI chatbot application using FastAPI and Azure Cloud Platform. The chatbot leverages advanced AI models to provide a rich, interactive user experience.

Technologies Used

  • FastAPI: To handle continuous streams of audio efficiently on server side.
  • HTML, CSS and Javascript: For creating web-based UI for the Chatbot and add styling and interactivity to the web-app.
  • AI models: Various AI models utilized to provide AI responses based on user's text and voice input.

AI Models

  • Gemini 2.0 Flash: Utilized for generating natural language responses to user inputs and speech-to-text conversion, enabling voice interaction.
  • black-forest-labs/FLUX.1-schnell-Free: Employed for creating images from textual descriptions provided by users.

Features

  • Speech Recognition: Implemented using Gemini 2.0 Flash model to convert user speech to text accurately.
  • Text Response Generation: Integrated Google's Gemini model to handle text-based queries, ensuring conversational relevance and coherence.
  • Text-to-Image Generation: Leveraged black-forest's model to transform user-provided text into high-quality images.
  • Continuous Audio Streaming: Used FastAPI to handle continuous streams of audio, enabling real-time processing and interaction on server side.
  • Memory Retention Feature: Implemented a chat memory management feature for each user, ensuring that the chat conversation is maintained in memory as long as the backend instance is running.

Here’s a sneak peek of the frontend and the sample conversation between me and my assistant.

Image 1

Image 1

Image 2

Image 3

Image 4

Image 5

Key Responsibilities and Achievements

Design and Development

  • Designed the architecture of the chatbot application, ensuring seamless integration of various AI models.
  • Developed the front-end using Jinja2 template library, creating an intuitive and user-friendly interface.

Integration of AI Models

  • Successfully integrated Gemini 2.0 Flash to handle text-based queries, ensuring conversational relevance and accurate speech recognition.
  • Implemented Black-Forest's FLUX-1 model for generating images based on user descriptions, enhancing the visual interaction capabilities of the chatbot.

Backend Management

  • Managed and optimized the backend processes to handle real-time user interactions efficiently.
  • Used FastAPI to handle continuous streams of audio, ensuring smooth and responsive audio processing.
  • implemented chat memory management feature until instance is running.
  • Ensured smooth communication between the UI and the AI models, reducing latency and improving performance.

Impact

  • Improved accessibility through voice interactions, making the application user-friendly.

Setup Instruction

  • Clone the Repository

If you have a repository for your project, clone it using git:

     git clone https://github.com/Akshat2512/AI_Voice_Assistant.git 
     cd AI_Voice_Assistant # move to the root folder of the application

If you want to create separate virtual environment for python

     python -m venv my_env &&
     my_env/Script/activate

Then install required libraries

     pip install -r requirements.txt

Then for starting application, first start the fastapi server i.e., app.py, run directly using this script in the terminal:

    uvicorn app:app --host localhost --port 5000 --reload

then, Go to https://localhost:5000.

About

Engineered a sophisticated voice assistant, integrated OpenAI models for AI-generated responses and image generation like ChatGPT, utilized real-time voice recognition to process audio chunks every 100ms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors