🧠 ALXR Agent
WebSocket Chat API
This project provides a real-time AI assistant powered by local LLMs and Retrieval-Augmented Generation (RAG).
It exposes a FastAPI WebSocket API that streams model responses chunk-by-chunk, supports product and document retrieval, and can be deployed locally or on cloud infrastructure.
🚀 Features
- 🔸 Real-time streaming responses through WebSocket
- 🪄 Local LLM support (GGUF via llama.cpp or Hugging Face Transformers)
- 📚 Integrated document & product retrieval (RAG)
- 🧭 Simple API key enforcement (optional)
- ⚡ Fast cold-start through pre-warming of embeddings
- 🧩 CORS-enabled for frontend clients (e.g., React / Streamlit)
📦 Tech Stack
- FastAPI – Web framework
- uvicorn – ASGI server
- llama.cpp / Hugging Face Transformers – Model inference
- ChromaDB – Vector database for RAG
- Python 3.10+
🧰 Installation
-
Clone the repository
git clone https://github.com/your-username/alxr-agent.git
cd alxr-agent
2. **Install the dependencies**
```bash
pip install -r requirements.txt
3. Set up environment variables
```bash
Create a .env file in the root directory:
ALXR_MODEL_TYPE=gguf
ALXR_GGUF_MODEL_PATH=./Mistral-7B-Instruct-v0.3/mistral-7b-instruct-v0.3-Q5_K_M.gguf
ALXR_TRANSFORMERS_MODEL_PATH=./Mistral-7B-Instruct-v0.3
ALXR_CHROMA_PATH=./chroma_db
ALXR_DOC_COLLECTION=documents
ALXR_PRODUCT_COLLECTION=products
ALXR_API_KEY=your_secret_key_here # Optional
ALXR_HOST=0.0.0.0
ALXR_PORT=8080
# ==== Embeddings (LOCAL only) ====
ALXR_EMBED_PATH=./models/bge-m3
EMBED_BACKEND=local
HF_HUB_OFFLINE=1
TRANSFORMERS_OFFLINE=1 -
Running the server
python alxr_ws_server.pyor with uvicorn directly:
uvicorn alxr_ws_server:app --host 0.0.0.0 --port 8080 -
Websocket Endpoint
ws://localhost:8080/v1/chat/wsExample payload sent from client after connecting:
{
"api_key": "your_secret_key_here",
"message": "What is the price of CBD oil?",
"history": []
}