nathaphat.net

frontend

netwrxk is built using next.js for the frontend

scraping pipeline

for our scraping pipeline we used browser-use as our initial step, to gather names of professors in any university and department. then we send that into SERP api to retrieve websites and we utilize both LLM and non-LLM methods for scraping those websites (ofc following robots.txt!) on the professors to gather their information. then we utilize LLMs to summarize all the data into json formats to be sent into our R2 storage on cloudfare.

backend

we built the backend using python (fastapi).

API Framework: handles all incoming requests, routing, and response generation. CORS configured to allow comms with next.js frontend, and rate limiting is implemented using slowapi.
AI Orchestration & Language Models: langgraph is our central AI orchestration layer, managing complex conversational flows and interactions with various AI services. Google's Gemini models gemini 2.5 flash is used for the chatbot, RAG, and creating vector embeddings. for external data retrieval beyond our scraped data we use Exa AI to do web searches.
retrieval augmented generation (RAG): we have a RAG pipeline, built with LlamaIndex. we use the following tools:
Vector Database: qdrant is used as the vector store (transitioned to HelixDB)
Chat History Management: user conversation history is stored and managed using supabase.
auth & security: secure access to the API is enforced using JWT, ensuring that only authenticated users can access protected endpoints.