The goal of this project is implementing a Retrieval-Augmented Generation (RAG) system that helps all students at Brown University explore the courses that are open in the semester and understand their degree requirements by combining two sources:
A FastAPI backend is implemented to handle data processing, and serve queries efficiently. In addition, a user-interface is built to provide an interactive interface for users.
Note: Currently, the concentration section is specific to undergraduate programs, and the course-related information can be taken only for the Fall 2025 semester. But this can be easily extended to other programs and semesters.
1) Data Acquisition
bulletin.py: Scrapes Bulletin concentration pages concurrently using crawl4ai, processes, cleans, and organizes it and writes the data into files/bulletin.jsoncab.py: Queries the CAB API for all available departments in parallel for Fall 2025 term, processes, cleans and organizes it and writes the results into files/cab.json2) Indexing and Vectorization
indexing.py: Reads the data in files/bulletin.json and files/cab.json
text-embedding-3-large)vector_store.pkl stores human-readable chunks and metadata used by the RAG runtime3) Retrieval and Generation
rag.py: Core RAG class with methods:
load: Loads the persisted vector store, selects the appropriate collection based on the chosen embedding modelretrieve: Filters the chunks in the vector store based on the department and/or concentration specified by the user, retrieves the most similar/relevant chunks based on the user query and selected embedding model, and (optionally) reranks the retrieved chunks with CrossEncoder modelgenerate: Calls ChatOpenAI to produce final answer based on user query and the retrieved context4) Serving
api.py: Initializes and caches a RAG instance per embedding backend (via rag_instances and get_or_create_rag) so models and Chroma collections load once and are reused. This lets clients switch embedding backends per request without reloads and keeps both instances warm and ready for queries. Serves /query and /evaluate, logs requests/responses, and can precompute evaluation summaries at startup.
/query: Retrieves relevant chunks from vector store+ reranks them, and generates an answerui.py: Allows users ask questions about different courses and degree requirements.In addition,
models.py includes Pydantic request/response models used by the APIstartup.sh orchestrates extraction (if missing), indexing (if missing), then starts API and UI with health checks.docker-compose.yml and Dockerfile containerize the full stack and run both services together.deploy-aws.sh sets up Docker and Docker Compose on a fresh Ubuntu EC2 instance.1) Bi-Encoder Embeddings
text-embedding-3-large
2) Vector Database for Indexing and Retrieval
where clause, simple Python API, works well for small/medium datasets, self-contained, easy to dockerize, no extra dependencies.3) Cross-Encoder Reranking for Retrieval
BAAI/bge-reranker-base
4) Generator
gpt-4o-mini
Currently, the pipeline performs well for most questions. Wrong answers usually occur when the question is too short, when it contains only technical words (e.g., APMA 2230, CSCI 0320, ECON 2950, etc.) without context and/or when the wrong department or concentration is selected in the UI. To solve the issue of poor performance with technical words, I tried to integrate a sparse retriever, BM25, along with the dense retriever, retrieved and then re-ranked chunks with a Cross Encoder, but this did not help much. Therefore, I removed it.
In addition, retrieval + generation process usually takes between 2-5 seconds. Sometimes it may take up to 30 seconds for the chunks to be retrieved from the vector store and for an answer to be generated. To reduce the latency, Cross Encoder part is currently disabled. But it can be activated by specifying rerank_top_n and rerank_min_score parameters in the rag.retrieve() function in api.py.
Although significant amount of time is spent for text generation, caching is still one of the important features to add and it will be integrated into the process in the next steps considering the fact that the same or similar questions can be asked by hundreds/thousands of other students as well. FAISS vector database can also be tried instead of ChromaDB to check if retrieval time reduces significantly.
1) Create .env
Copy example environment and edit:
cp env.example .env
Open .env and set OPENAI_API_KEY and API_TOKEN
2) Build and Run
docker-compose up --build
3) Access
http://localhost:8501http://localhost:8000/docsFirst run notes
1) Create EC2 Instance
2) Connect and Prepare the Machine
ssh -i your-key.pem ubuntu@your-public-ip
sudo apt update && sudo apt upgrade -y
sudo apt install -y docker.io git
sudo systemctl enable --now docker
sudo usermod -aG docker ubuntu
# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
exit
ssh -i your-key.pem ubuntu@your-public-ip
3) Deploy Code
Clone your repository and configure environment:
git clone https://github.com/ozyurtf/brown-assistant.git
cd brown-assistant
cp env.example .env
nano .env # set OPENAI_API_KEY and any other variables
4) Launch Services
docker-compose up -d --build
docker-compose up
5) Access
http://your-public-ip:8501http://your-public-ip:8000/docs