Skip to content

Service which processes input lecture material files (PDFs, PowerPoint presentations, lecture videos) to text which can then be ingested by the DocProcAI neural networks to provide semantic search, linking, summarization etc.

Notifications You must be signed in to change notification settings

MEITREX/docprocai_service

Repository files navigation

DocProcAI-Service

This service is designed to process and manage uploaded lecture material (video recordings, documents, slides) to facilitate some advanced features in the MEITREX platform.

Features

  • Splitting of lecture videos into sections based on detected slide changes via computer vision
  • OCR of lecture video on screen text
  • Transcript & Closed Captions generation for lecture videos
  • Generating of text embeddings on a per-section-basis for videos and per-page-basis for documents
  • Semantic search/fetching of semantically similar sections of lecture material
  • Automatic generation of section titles for the video sections generated

For a deeper dive into the features and considerations made during development, check out our paper on DocProcAI.

Installation

Neural Network Models Installation

This service requires neural network models to function at all. These models need to be downloaded and placed into a llm_data folder in the root. This folder is then mounted in the docker container automatically and the files inside can then be referenced as seen in the config.yaml

Caution

The service cannot run without at least a sentence embedding model installed!

Tip

The segment_title_generator and document_summary_generator tasks only require LLMs if these features are enabled in the config.yaml. They are enabled by default.

Recommended Neural Network Models

  • For the text embedding, we recommend Alibaba-NLP/gte-large-en-v1.5
  • For the title and summary generation, we recommend meta-llama/Llama-3.1-8B-Instruct
  • While the title generation should work with just a base model, we recommend our custom fine-tuned LoRA Adapter for better results in the title generation task. The adapter files may be provided to you upon request.

GPU Acceleration

This service requires pytorch to function. As pytorch GPU-support is required for some features of this service, the pip-distributed version of pytorch cannot be used and instead a platform-specific version has to be used. By default, pytorch for NVIDIA CUDA 12.4 is used, as this should provide the most capability for widespread GPUs. If you need to use a different version of pytorch, you can change the install script located in the Dockerfile.

Warning

Note that GPU features require a supported GPU and OS to function, especially in conjunction with Docker, as the service runs in a Docker container.

Docker does not provide GPU-support for MacOS at this point in time, thus GPU-features of the service do not function on MacOS.

GPU features can be disabled using the config.yaml. Additionally, it might be necessary to change the docker-compose.yaml file and remove the GPU device reservation.

Configuration

The service uses the config.yaml file located in the root directory for configuration. For further information about configuration check out this file, all configuration properties are explained using in-file comments.

Resource Requirements, Additional Information & Design Rationale

For additional information on the design and implementation of this service, check out the accompanying paper.

Training Repository

Scripts used for training live in the training repository.

About

Service which processes input lecture material files (PDFs, PowerPoint presentations, lecture videos) to text which can then be ingested by the DocProcAI neural networks to provide semantic search, linking, summarization etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages