Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[shortfin_apps/llm] Need documentation for SystemManager: sysman.ls.create_worker() hangs if ran immediately after sysman.start() #906

Open
renxida opened this issue Feb 4, 2025 · 0 comments

Comments

@renxida
Copy link
Contributor

renxida commented Feb 4, 2025

I'm trying to get one fiber, one worker, so i can use it to test some kvcache related things. I'm a little confused about how SystemManager works and what the order of events should be between sysman.start() and sysman.ls.create_worker()

From service.py, it looks like sysman.start() should be called by FastAPI AFTER the workers and fibers are created, but looking at SystemManager I had expected the opposite.

Naiively, I was expecting to have to make a system manager, fully initialize it, and then try to create workers from it, so:

from shortfin_apps.llm.components.manager import SystemManager
sysman = SystemManager(device="local-task")
sysman.start()
worker = sysman.ls.create_worker("test-worker")

but this hangs indefinitely.

If I just don't start the sysman, then everything works:

import time
from shortfin_apps.llm.components.manager import SystemManager
sysman = SystemManager(device="local-task")

worker = sysman.ls.create_worker("test-worker")
fiber = sysman.ls.create_fiber(worker)

print(fiber.devices_dict)
device = list(fiber.devices_dict.values())[0]
print(f"Using device: {device}")

Interestingly, sleeping between sysman.start and trying to create the worker works.

import shortfin as sf
import logging
from shortfin_apps.llm.components.manager import SystemManager

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Create and start system manager
logger.info("Creating system manager...")
sysman = SystemManager(device="local-task")
sysman.start()

try:
    import time
    # time.sleep(1) # if you don't sleep this hangs indefinitely
    logger.info("Creating worker...")
    worker = sysman.ls.create_worker("test-worker")
    logger.info(f"Created worker: {worker}")
    
    logger.info("Creating fiber...")
    fiber = sysman.ls.create_fiber(worker)
    logger.info(f"Created fiber: {fiber}")
    
    logger.info("Available devices:")
    logger.info(fiber.devices_dict)
    device = list(fiber.devices_dict.values())[0]
    logger.info(f"Using device: {device}")
    
    input("Press Enter to exit...")
    
finally:
    # Clean up
    logger.info("Shutting down system manager...")
    sysman.shutdown()

It looks like sysman.start and sysman.shutdown does some interfacing with fastapi (see the following snippet from server.py:

@asynccontextmanager
async def lifespan(app: FastAPI):
    system.start()
    yield
    print("Shutting down shortfin")
    system.shutdown()


system = System()
app = FastAPI(lifespan=lifespan)

and worker / fiber configuration happens BEFORE sysman.start should be called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant