Professor Starstuff
Professor Starstuff ✨
~ An AI Chatbot ~
Bringing Astronomy to Life for Kids πŸš€

πŸ“ Introduction

"What if kids could have their very own AI-powered space professor, ready to answer all their astronomy questions, tell fascinating space stories, and even generate mini-podcasts about the cosmos?"
Listening children

Professor Starstuff, an AI chatbot
that makes astronomy fun and interactive for kids! It combines:

  • 🌟 Natural Language Processing (NLP) to understand kids' questions.
  • 🌟 A vector-based knowledge system for retrieving accurate space facts.
  • 🌟 NASA's Image API to provide real space images and enhance learning.
  • 🌟 Podcast-style responses that turn complex topics into engaging storytelling.

With Professor Starstuff, kids don’t just learn about space - they explore it in a whole new way! πŸš€πŸŒ 

πŸ“Š Dataset

Where’s the Data From?

The knowledge is built from provided YouTube video transcripts.

How is the Data Retrieved?

Using the youtube_transcript_api library:

  • πŸ“Œ Fetch video IDs & titles
  • πŸ“Œ Extract transcripts from ~8 hours of content

Chunking for Better Searchability:

  • πŸ“Œ Text is split using RecursiveCharacterTextSplitter
  • πŸ“Œ Chunk size: 500 tokens
  • πŸ“Œ Overlap: 100 tokens (for better context retention)

Embedding & Storage:

  • πŸ“Œ Chunks are embedded and stored in ChromaDB
  • πŸ“Œ Metadata (e.g., video title) for improved retrieval
Cat retrieval
ChromaDB

πŸ’» Technical Summary

Architecture Overview

Professor Starstuff is powered by a LangGraph agent, using:

  • πŸ› οΈ GPT-4 used for decision-making (tool usage).
  • ⚑ GPT-3.5-turbo for faster response time.
  • πŸ—‚οΈ ChromaDB as a vector store for context-aware retrieval.
  • πŸ“‘ NASA Image API for real-time space images.
  • πŸ”Š OpenAI TTS for converting answers into spoken podcasts.
Langsmith Trace
LangGraph

How It Works Together

  • ⁉️ User Input: The agent first determines if the query is about astronomy or general conversation.
  • ❓ Decision Phase (GPT-4):
    • β˜‘οΈ Determines if the query is astronomical or general conversation.
    • β˜‘οΈ If general, it takes a shortcut for a quick response.
  • ❓ Retrieval Phase (GPT-3.5 Turbo):
    • β˜‘οΈ ChromaDB (retrieves relevant space facts).
    • β˜‘οΈ NASA Image API (fetches related images).
    • β˜‘οΈ OpenAI TTS (generates a podcast-style response).
  • ‼️ Final Answer: The chatbot returns a text response, related images, and a topic teaser podcast based on the user’s query.

Deployment & Infrastructure

  • 🟒 Django (Backend Framework):
    • Handles chatbot logic, API communication, and serves as the main application.
  • πŸ”΅ SQLite (ChromaDB, Embedded in Django):
    • Local vector database stored within Django, used for retrieval-augmented generation (RAG) without requiring an external database server.
  • πŸ”΄ Redis (Cloud Memory Storage):
    • A fast, cloud-based key-value store, used to maintain conversation memory across user interactions, ensuring context persistence.
  • 🟣 Heroku (PaaS Hosting Platform):
    • Deploys and runs the full Django + ChromaDB backend, with Redis connected as a cloud service for optimized performance.
Django
SQLite
Redis
Heroku

πŸ› οΈ Tools & Technologies

Professor Starstuff leverages multiple tools to enhance user experience.

🧠 Vector Database ChromaDB Logo

  • 🧩 ChromaDB stores and retrieves preprocessed astronomy facts.
  • 🧩 OpenAI text-embedding-3-large for quality text vectorization.
  • 🧩 Chunking Strategy: 500-token size, 100-token overlap
  • 🧩 Retrieval Strategy: Fetches top 3 most relevant chunks (k=3).
Vector database

πŸŽ™ Podcast AI Storytelling OpenAI Logo

OpenAI TTS is used to generate two types of audio responses:

  • πŸ”Š Teaser Podcast – A quick summary of the topic (default).
  • πŸ”Š On-Demand Podcast – Full episodes for deeper exploration.
Podcast

πŸ”­ NASA Images API Nasa Logo

  • πŸͺ LLM extracts the main topic from user queries.
  • πŸͺ Generates a relevant search term dynamically.
  • πŸͺ Queries NASA's Image API for relevant space images.
  • πŸͺ Provides 10 images for users to shuffle and explore.
NASA Images

πŸ“ˆ Evaluation & Optimization

LangSmith Evaluation LangSmith Logo

  • 🦜 Inference time (overall speed of responses).
  • 🦜 Vector chunk retrieval efficiency (ensuring relevant facts are fetched).
  • 🦜 Tool retrieval efficiency (NASA Image API, Teaser Podcast, ChromaDB).
LangSmith Evaluation
LLM Selection

Model Selection Optimization

  • βœ… Goal: Improve response speed without compromising quality.
  • πŸ§ͺ Experimented with different LLMs to optimize performance.
  • πŸ“Œ GPT-4 β†’ πŸ† Best for decision-making (accurate classification).
  • πŸ“Œ GPT-3.5 Turbo β†’ ⚑ Best for response generation (fast, effective).

✨ Next Steps for Professor Starstuff πŸš€

  • πŸš€ Expand Knowledge Base: Integrate more astronomy datasets and sources.
  • πŸ›°οΈ Implement Full Voice Interaction: Improve natural conversation flow.
  • πŸ”­ Increase Real-Time Data Access: Connect to live space events and news.
  • πŸŽ™οΈ Implement Streaming: Enable faster podcast-style interactions.
  • 🌌 Make It More Interactive: Add quizzes and space learning features.
  • πŸ–ΌοΈ Discuss Retrieved Images: Provide in-depth explanations of space images.
  • πŸ‘€ User Profile Management: Personalize experiences with saved preferences.

πŸ₯³ School’s Out! πŸŽ‰

Thank
You!
πŸ’™
School's Out
✨ Professor Starstuff
πŸ‘¨πŸΎβ€πŸš€
Hi, I’m Professor Starstuff! ✨
Ask me anything about space, and let’s explore the universe together! πŸš€
🌌 Current Topic
... waiting
πŸ§™β€β™‚οΈ "Our topic updates while we chat!"
πŸͺ NASA Images
Shuffle
πŸŽ™οΈ Podcast
… playing audio 🎀
▢️ TEASER
▢️ EPISODE