ArXFlix is a powerful tool that automatically transforms research papers from ArXiv into engaging two-minute video summaries. It leverages advanced AI models to extract key information, generate concise scripts, synthesize audio, and produce visually appealing videos complete with subtitles and rich content.
- Automated Paper Summarization:
- Fetches paper content from ArXiv using either
arxiv_gpt
orarxiv_html
methods. - Generates concise summaries using AI models like OpenAI, Gemini, or local models.
- Fetches paper content from ArXiv using either
- Script Generation:
- Creates engaging video scripts tailored for a two-minute format.
- Supports multiple script generation methods:
openai
,local
, andgemini
.
- Audio Synthesis:
- Converts scripts into natural-sounding audio using either
elevenlabs
orlmnt
text-to-speech services.
- Converts scripts into natural-sounding audio using either
- Video Generation:
- Combines generated audio, subtitles (SRT), and rich content (JSON) to create a complete video.
- Uses FFmpeg for video processing.
- Flexible API:
- Provides a FastAPI backend with endpoints for each stage of the video generation pipeline.
- Allows customization of AI models, audio services, and output formats.
- User-Friendly Frontend:
- Offers a React-based frontend built with Next.js and Tailwind CSS.
- Provides an intuitive interface for users to input ArXiv paper IDs and generate videos.
- Gradio Demo:
- Includes a Gradio demo (
arxflix_gradio.py
) for easy experimentation and sharing.
- Includes a Gradio demo (
- Backend:
- Python 3.9+
- FFmpeg
- pnpm
- Frontend:
- Node.js
- pnpm
-
Clone the repository:
git clone https://github.com/julien-blanchon/arxflix.git cd arxflix/backend
-
Create and activate a virtual environment (recommended):
python3 -m venv .venv source .venv/bin/activate # Linux/macOS .venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Install FFmpeg and pnpm (if not already installed):
# macOS brew install ffmpeg pnpm # Debian/Ubuntu (adjust for your distribution) sudo apt-get install ffmpeg pnpm
-
Run the backend server:
uvicorn main:api --reload
The backend server will be running at
http://localhost:8000
.
-
Navigate to the frontend directory:
cd ../frontend
-
Install dependencies:
pnpm install
-
Generate the API client:
pnpm generate-client
-
Run the frontend development server:
pnpm dev
The frontend will be accessible at
http://localhost:3000
orhttp://localhost:3001
.
-
From the root of the repository, install the required dependencies:
pip install gradio
-
Run the Gradio demo:
python arxflix_gradio.py
This will launch a Gradio interface in your browser for easy interaction with the ArXFlix pipeline.
You can interact with the API directly using tools like curl
or through the frontend.
curl -X GET "http://localhost:8000/generate_paper/?method=arxiv_html&paper_id=2404.02905"
curl -X POST "http://localhost:8000/generate_script/?method=openai&paper_id=2404.02905&paper_markdown=<PAPER_MARKDOWN>" -H "Content-Type: application/json"
curl -X POST "http://localhost:8000/generate_assets/?method=elevenlabs&script=<SCRIPT>" -H "Content-Type: application/json"
curl -X POST "http://localhost:8000/generate_video/?input_dir=<INPUT_DIR>&output_video=output.mp4" -H "Content-Type: application/json"
Note: Replace placeholders like <PAPER_MARKDOWN>
, <SCRIPT>
, and <INPUT_DIR>
with actual values.
- API Keys: You'll need to set up API keys for services like OpenAI, ElevenLabs, etc. Store these in a
.env
file in thebackend
directory (seebackend/requirements.txt
for required services). - Customization: You can modify the script generation prompts, audio settings, and video processing parameters within the
backend/utils
files.
Contributions are highly encouraged! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature/your-feature-name
- Make your changes and commit them:
git commit -m "Add your feature"
- Push to the branch:
git push origin feature/your-feature-name
- Open a pull request against the
main
branch.
Please ensure your code follows the project's coding style and includes appropriate documentation.
This project is licensed under the MIT License - see the LICENSE file for details. Note that some components may have their own licenses (e.g., Remotion).