Skip to content

PromptOps is a local-first automation framework that uses LLMs to mimic human actions—typing, clicking, reading screens—to execute complex tasks from natural language prompts.

Notifications You must be signed in to change notification settings

rishabh3562/PromptOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptOps

PromptOps is a local-first LLM automation system that mimics how humans operate a computer—via reasoning, vision, and keystrokes. It interprets natural language goals and executes them like a real user would, using keyboard inputs and visual feedback to interact with applications.


🚀 What It Does

PromptOps takes natural language prompts, plans the necessary steps, and simulates human-like actions—typing, scrolling, reading screen content—to execute the task on a desktop autonomously.


🛠️ How We Built It

We used Python for core logic, integrating pyautogui/pynput for UI simulation and Gemini for LLM reasoning. The system includes a planner, a skill execution engine, and a vision layer that parses screen content to guide decisions.


🧗 Challenges We Ran Into

  • Reliable UI control without clicking
  • Parsing dynamic screen content contextually
  • Balancing flexibility with deterministic execution
  • Designing prompt interpretation without rigid skill trees

🏆 Accomplishments That We're Proud Of

  • A modular LLM-agent pipeline with screen-grounded actions
  • Local-first design with no external APIs required
  • Real-time execution based on visible UI context
  • Planner that adapts actions based on outcomes

📚 What We Learned

  • LLMs can simulate goal-directed human behavior when grounded in visual input
  • Skill-based design is brittle early on; prompt-based planning is more flexible
  • Abstracting actions into reusable modules improves maintainability and growth potential

🔮 What’s Next for PromptOps

  • Add support for dynamic skill generation using LLMs
  • Integrate full vision-based UI navigation
  • Build memory and long-term goal management
  • Extend to goal-based software creation from prompts

⚙️ Architecture Overview

  • main.py: Entry point that loads model and initializes all agents and controller
  • PlannerAgent: Converts user prompt into a structured plan (dict of steps)
  • EvaluatorAgent: Validates execution outcomes and identifies failures
  • FixerAgent: Attempts to replan or fix issues if execution fails
  • ClarifierAgent: Requests clarification from the user if the prompt is ambiguous
  • VisionAgent: Takes screenshots and interprets screen state using an LLM vision analyzer
  • Memory: Tracks plan steps, history, and prior context
  • Controller: Central executor coordinating planner, vision, and evaluator to run the task

architecture promptops


✅ Example Use Cases

  • "Search Google for latest tech news"
  • "Write a three-line summary in Notepad"
  • "List all files in Downloads folder via terminal"

All executed via reasoning + keyboard, without direct UI automation or APIs.


🧠 Why It’s Different

  • Doesn’t rely on hardcoded scripts, XPath selectors, or app-specific APIs
  • Doesn’t use robotic mouse control—fully keyboard driven
  • Uses vision as a feedback mechanism to emulate human perception

📦 Tech Stack

  • Python (core logic)
  • OpenAI Vision or Gemini Vision (LLM-based screen reading)
  • PyAutoGUI / Pynput (keyboard control)
  • FastAPI for API hooks (optional)

🔭 Roadmap

  • Full multi-agent loop (planner, executor, evaluator, fixer)
  • File system awareness + context memory
  • Human-like web browsing & data extraction
  • Task persistence + retry logs

📄 License

MIT License


About

PromptOps is a local-first automation framework that uses LLMs to mimic human actions—typing, clicking, reading screens—to execute complex tasks from natural language prompts.

Topics

Resources

Stars

Watchers

Forks

Languages