This video was created from a simple Markdown file: chalktalk_features_demo.qmd
AUDIO ON: Remember to click the volume (🔊) button at the bottom right of the video to hear the audio!
chalktalk_demo_2.mp4
Even as teaching tools have evolved from chalkboards to smartboards, and pencils to tablets, the gold standard of education—truly personalized tutoring—remains out of reach for most.
The recent revolution in Large Language Models (LLMs) may finally bridge this gap by adapting content to individual learning needs.
However, current AI models primarily generate text, while research shows video content can be more effective for learning 1.
The GRAPH Courses is developing ChalkTalk, an open-source tool that converts markdown text into dynamic videos with audio overlays and avatars. ChalkTalk empowers educators to create engaging video content efficiently, leveraging the simplicity of markdown for quick iterations and updates.
ChalkTalk's main function is simple: it uses text-to-speech to turn markdown files into videos. But with added AI modules, it becomes a powerful tool for creating personalized and engaging educational content.
-
Personalization with LLMs: Since text is the primary medium in ChalkTalk, Large Language Models (LLMs) can quickly adjust lessons to fit each student's needs, adapting to different learning styles and levels.
-
Custom Images with AI: Use text-to-image models to create custom media that enhance the material, providing helpful visuals to support learning.
-
Easy Diagrams with Markdown: Use markdown tools like Mermaid to make diagrams directly in your content, making it simple to illustrate complex ideas without extra software.
-
Dynamic Visual Highlights: Apply computer vision models to highlight key parts of diagrams during explanations, helping students focus on important details.
-
Instant Translation: Easily translate content into many languages, making educational materials globally accessible.
Given the importance of this area, similar tools are being created, such as Synthesia, InVideo, and Heygen. However, most existing solutions have significant downsides for our use case:
- Lack of Markdown Support: These platforms do not support raw text or markdown inputs, limiting iteration speed and flexibility.
- Proprietary: Being closed-source, they require paid subscriptions, creating barriers for widespread adoption.
- Commercial Focus: They are primarily designed for sales and enterprise use cases like automated advertising and employee onboarding, and lack the ability to handle technical academic or educational content.
Automated video generation from markdown slides has been attempted before, notably with the Ari R package. However, this approach predates key technological advancements such as modern LLMs, Quarto, lifelike text-to-speech (TTS), and vision-language models, limiting its effectiveness.
The table below compares some of these products with what we aim to build with ChalkTalk.
Feature | Synthesia | InVideo.io | Heygen | Ari R package | ChalkTalk (proposed) |
---|---|---|---|---|---|
Markdown support | ❌ | ❌ | ❌ | ✅ | ✅ |
Open-source | ❌ | ❌ | ❌ | ✅ | ✅ |
LLM integration for automated content generation and personalization | ❌ | ❌ | ❌ | ❌ | ✅ |
Video avatars | ✅ | ❌ | ✅ | ❌ | ✅ |
Multilingual support | ✅ | ✅ | ✅ | ✅ | ✅ |
Computer vision for dynamic visual explanations (e.g., highlighting parts of diagrams) | ❌ | ❌ | ❌ | ❌ | ✅ |
Hosted version available | ✅ | ✅ | ✅ | ❌ | ✅ |
We are currently seeking funding to support the development of the core ChalkTalk libraries and AI extensions, which will be open-source.
Since running TTS and LLM locally can be technically challenging for individual users, we are considering offering a hosted version of ChalkTalk with a subscription model. This hosted service will provide an easy-to-use interface and manage all technical aspects, making it accessible to educators without technical expertise. Over time, this could sustain the development of the open-source libraries.
Run the app with python app.py
. It assumes you have an Azure key for text to speech, and an OpenAI key for LLM access stored in a .env
file.