netwrxk
frontend
netwrxk is built using next.js
for the frontend
scraping pipeline
for our scraping pipeline we used browser-use
as our initial step, to gather names of professors in any university and department. then we send that into SERP api to retrieve websites and we utilize both LLM and non-LLM methods for scraping those websites (ofc following robots.txt!) on the professors to gather their information. then we utilize LLMs to summarize all the data into json formats to be sent into our R2 storage on cloudfare
.
backend
we built the backend using python (fastapi
).
- API Framework: handles all incoming requests, routing, and response generation. CORS configured to allow comms with
next.js
frontend, and rate limiting is implemented usingslowapi
. - AI Orchestration & Language Models: langgraph is our central AI orchestration layer, managing complex conversational flows and interactions with various AI services. Google's Gemini models gemini 2.5 flash is used for the chatbot, RAG, and creating vector embeddings. for external data retrieval beyond our scraped data we use
Exa AI
to do web searches. - retrieval augmented generation (RAG): we have a RAG pipeline, built with
LlamaIndex
. we use the following tools: - Vector Database:
qdrant
is used as the vector store (transitioned toHelixDB
) - Chat History Management: user conversation history is stored and managed using
supabase
. - auth & security: secure access to the API is enforced using
JWT
, ensuring that only authenticated users can access protected endpoints.