Skip to content

Complete pipeline for generating DBpedia text embeddings using OpenAI's embedding models and publishing them as Hugging Face datasets.

License

Notifications You must be signed in to change notification settings

redis-performance/vector-embeddings

Repository files navigation

dbpedia-openai-text-embeddings-to-huggingface

This repository provides a complete pipeline for generating DBpedia text embeddings using OpenAI's embedding models and publishing them as Hugging Face datasets. The pipeline supports generating embeddings with different dimensions from the same OpenAI model and source data, allowing you to create multiple dataset variants optimized for different use cases.

Features

  • Flexible Embedding Dimensions: Generate embeddings with different dimensions (e.g., 512, 1024, 1536, 3072) from the same OpenAI model
  • Scalable Processing: Multi-process embedding generation for large datasets
  • Hugging Face Integration: Direct upload to Hugging Face Hub
  • Resume Support: Skip already processed chunks to resume interrupted jobs

Generated Datasets

Dataset link Embedding model Embedding Dimensions N Vectors
dbpedia-openai-1M-text-embedding-3-large-512d text-embedding-3-large 512 1M
dbpedia-openai-1M-text-embedding-3-large-1024d text-embedding-3-large 1024 1M
dbpedia-openai-1M-text-embedding-3-large-1536d text-embedding-3-large 1536 1M
dbpedia-openai-1M-text-embedding-3-large-2048d text-embedding-3-large 2048 1M
dbpedia-openai-1M-text-embedding-3-large-3072d text-embedding-3-large 3072 1M

About

Complete pipeline for generating DBpedia text embeddings using OpenAI's embedding models and publishing them as Hugging Face datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages