Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Zulip connector schema + links and enable temporal metadata #4005

Merged
merged 6 commits into from
Feb 15, 2025

Conversation

ATSiem
Copy link
Contributor

@ATSiem ATSiem commented Feb 14, 2025

Description

This PR addresses critical issues with the Zulip connector's metadata handling and URL validation:

  1. Fixes type mismatches in Document model metadata that were causing indexing failures (enable indexing Zulip Server + Cloud)
  2. Improves URL handling for Zulip realm configurations (fix reference links)
  3. Enhances metadata fields to support temporal document ordering (enhance RAG)
  4. Adds proper timezone handling for message timestamps (enhance RAG)

Key improvements:

  • Type-safe metadata handling with proper string conversions
  • Robust URL validation with scheme detection and domain normalization
  • Enhanced error handling for message link generation
  • UTC-aware timestamp handling for temporal consistency

How Has This Been Tested?

  1. Connector Initialization:
  • Tested with various URL formats (http, https, domain-only)
  • Verified proper handling of malformed URLs
  • Confirmed domain normalization works correctly
  1. Document Creation:
  • Verified metadata type conversion and validation
  • Tested message link generation across different realm configurations
  • Confirmed UTC timestamp handling for messages
  • Validated temporal ordering of documents
  1. Type Safety:
  • Ran mypy type checking
  • Verified all metadata values are properly converted to strings
  • Confirmed error handling for None values
  1. Integration Testing:
  • Tested full indexing flow with Zulip Server 8.2+ and Zulip Cloud
  • Verified document creation through the ingestion API
  • Confirmed temporal RAG pipeline functionality (see example)
image

Backporting

  • This PR should be backported
  • Override Linear Check

Copy link

vercel bot commented Feb 14, 2025

@ATSiem is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

@ATSiem ATSiem closed this Feb 15, 2025
@ATSiem ATSiem deleted the feature/enhance-zulip-metadata branch February 15, 2025 02:25
@ATSiem ATSiem restored the feature/enhance-zulip-metadata branch February 15, 2025 02:45
@ATSiem ATSiem reopened this Feb 15, 2025
Copy link
Contributor

@yuhongsun96 yuhongsun96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@yuhongsun96 yuhongsun96 merged commit f371efc into onyx-dot-app:main Feb 15, 2025
1 check failed
@ATSiem ATSiem deleted the feature/enhance-zulip-metadata branch February 16, 2025 01:07
ATSiem added a commit to ATSiem/onyx that referenced this pull request Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants