Creating accurate and descriptive captions for images is essential for accessibility, content organization, and automated tagging. Traditional methods often struggle with accuracy and context. Our goal is to build a system using the BLIP model to generate precise and relevant captions, improving both accessibility and content management.