The goal of this project is implementing a Retrieval-Augmented Generation (RAG) system that helps all students at Brown University explore the courses that are open in the semester and understand their degree requirements by combining two sources:
A FastAPI backend is implemented to handle data processing, and serve queries efficiently. In addition, a user-interface is built to provide an interactive interface for users.
Note: Currently, the concentration section is specific to undergraduate programs, and the course-related information can be taken only for the Fall 2025 semester. But this can be easily extended to other programs and semesters.
1) Data Acquisition
ingestion/bulletin.py: Scrapes Bulletin concentration pages concurrently using crawl4ai, processes and cleans the markdown, and writes data/concentration.json.ingestion/cab.py: Queries the CAB API for all departments in parallel (Fall 2025 term by default), normalizes the results, and writes data/cab.json.2) Indexing and Vectorization
ingestion/indexing.py: Reads data/cab.json and data/concentration.json.
text-embedding-3-large).courses_openai) along with metadata (source, department, concentration, term).3) Retrieval and Generation
backend/rag.py: Core RAG class with methods:
load: Opens the persisted Chroma collection.retrieve: Retrieves the most similar/relevant chunks across the vector store based on the user query, and (optionally) reranks them with a CrossEncoder model when rerank=True. Department, concentration, term, and source are stored as chunk metadata at indexing time and returned alongside results, but they are not used to pre-filter the search — retrieval is pure semantic similarity over all Bulletin and CAB chunks.generate: Calls ChatOpenAI to produce a streaming answer based on the user query and the retrieved context.4) Serving
backend/api.py: Initializes a single RAG instance at startup so models and the Chroma collection load once and are reused across requests. Serves POST /query, which retrieves relevant chunks, optionally reranks them, and streams the generated answer back via Server-Sent Events. Logs each request to logs/records.txt.frontend/ui.py: Streamlit interface that lets users ask questions about courses and degree requirements.In addition,
backend/models.py includes the Pydantic request/response schemas used by the API.scripts/startup.sh orchestrates extraction (if missing), indexing (if missing), then starts the API and UI with health checks.docker-compose.yml and Dockerfile containerize the full stack and run both services together.scripts/deploy-aws.sh sets up Docker and Docker Compose on a fresh Ubuntu EC2 instance.1) Bi-Encoder Embeddings
text-embedding-3-large
2) Vector Database for Indexing and Retrieval
where clause, simple Python API, works well for small/medium datasets, self-contained, easy to dockerize, no extra dependencies.3) Cross-Encoder Reranking for Retrieval
BAAI/bge-reranker-base
4) Generator
gpt-4o-mini
Currently, the pipeline performs well for most questions. Wrong answers usually occur when the question is too short or when it contains only technical words (e.g., APMA 2230, CSCI 0320, ECON 2950, etc.) without context. Because retrieval runs over the entire collection (Bulletin + CAB), very short or vocabulary-overlapping queries can pull in chunks from unintended departments. To solve the issue of poor performance with technical words, I tried to integrate a sparse retriever, BM25, along with the dense retriever, retrieved and then re-ranked chunks with a Cross Encoder, but this did not help much. Therefore, I removed it. The other option was integrating components to the user-interface that allows metadata pre filtering by department or concentration, but I wanted to make things as simple as possible for the users. Therefore, I did not do this.
In addition, retrieval + generation process usually takes between 1-5 seconds. To reduce the latency, the Cross Encoder reranker is disabled by default. It can be enabled either per-request by sending "rerank": true in the /query payload, or globally by flipping the rerank argument in the rag.retrieve() call inside backend/api.py.
1) Create .env
Copy example environment and edit:
cp env.example .env
Open .env and set OPENAI_API_KEY and API_TOKEN
2) Build and Run
docker-compose up --build
3) Access
http://localhost:8501http://localhost:8000/docsFirst run notes
1) Create EC2 Instance
2) Connect and Prepare the Machine
ssh -i your-key.pem ubuntu@your-public-ip
sudo apt update && sudo apt upgrade -y
sudo apt install -y docker.io git
sudo systemctl enable --now docker
sudo usermod -aG docker ubuntu
# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
exit
ssh -i your-key.pem ubuntu@your-public-ip
3) Deploy Code
Clone your repository and configure environment:
git clone https://github.com/ozyurtf/brown-assistant.git
cd brown-assistant
cp env.example .env
nano .env # set OPENAI_API_KEY and any other variables
4) Launch Services
docker-compose up -d --build
docker-compose up
5) Access
http://your-public-ip:8501http://your-public-ip:8000/docs