restructure responsibilities around indexing and generative content #317

Southclaws · 2024-12-14T23:51:12Z

remove autotag/summarise from index consumer
fix some issues with auto tag
refactor node mutation logic

- remove autotag/summarise from index consumer - fix some issues with auto tag - refactor node mutation logic

vercel · 2024-12-14T23:51:17Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
storyden	⬜️ Ignored (Inspect)	Visit Preview		Dec 15, 2024 8:08pm
storyden-homepage	⬜️ Ignored (Inspect)	Visit Preview		Dec 15, 2024 8:08pm

coderabbitai · 2024-12-14T23:51:19Z

📝 Walkthrough

Walkthrough

This pull request introduces significant changes to the node management and processing system. The modifications span multiple files and services, focusing on enhancing node creation, updating, and indexing processes. Key changes include adding an AutoFillNode type, modifying the summarization and tagging mechanisms, and streamlining the node mutation workflow. The changes aim to provide more flexible content handling, with improved support for automatic summarization and tagging.

Changes

File	Change Summary
`app/resources/mq/message_types.go`	Added `AutoFillNode` type with `ID`, `SummariseContent`, and `AutoTag` fields; Updated `IndexNode` to include `ID`
`app/services/generative/*`	Modified `Summariser` interface and implementation to use `datagraph.Content` instead of `datagraph.Item`
`app/services/library/node_mutate/*`	Significant refactoring of node mutation logic, added `preMutation` method, introduced new options handling
`app/services/library/node_semdex/*`	Added `autofill` method, simplified indexing process, removed direct summarization and tagging methods
`app/transports/http/bindings/nodes.go`	Updated `NodeUpdate` method to use new mutation logic
`web/src/screens/library/LibraryPageScreen/useLibraryPageScreen.ts`	Modified link import and node update handling

Possibly related PRs

remove items entirely in preparation for a simpler one-type datagraph node tree #113: The changes in the main PR introduce a new type AutoFillNode and modify the IndexNode type, which relates to the overall restructuring of node types in the retrieved PR that removes items in preparation for a simpler node tree.
Content refactor for nodes and HTML content #139: The modifications in the main PR enhance the data structure definitions, which could be related to the content refactor for nodes and HTML content in the retrieved PR, particularly in how nodes are structured and managed.
implement marks for library nodes #193: The introduction of new fields and types in the main PR aligns with the implementation of marks for library nodes in the retrieved PR, as both involve enhancements to node structures.
rework node query to apply correct visibility rules as with tree traversal and listing #204: The changes in the main PR regarding node types may relate to the rework of node queries to apply correct visibility rules, as both involve modifications to how nodes are handled and queried.
Datagraph-search-filters #295: The main PR's focus on enhancing node structures could be connected to the introduction of search filters in the retrieved PR, which also deals with node management and querying.
Rework OAuth2 providers #304: The modifications in the main PR regarding node types and structures may relate to the rework of OAuth2 providers, as both involve changes to how entities are managed within the application.
add experimental fully local vector DB semdex impl #315: The experimental implementation of a fully local vector DB in the retrieved PR may connect to the changes in the main PR that enhance node structures, as both involve improvements to data handling and storage mechanisms.

Poem

🐰 Nodes dancing with new grace,
Summaries bloom in every space,
Tags springing forth with glee,
Automation sets content free!
A rabbit's code, both smart and light,
Transforms the digital delight! 🌟

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. This feature will be included in our Pro Plan when released.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)

app/services/library/node_mutate/update.go (1)

58-58: Logging failure to publish index post message

Logging the error when failing to publish the index post message is appropriate. However, consider whether additional error handling or retries are necessary to ensure indexing consistency.
app/services/library/node_semdex/autofill.go (1)
16-35: Add documentation for the autofill function

The function implements important business logic for node autofill but lacks documentation explaining its purpose and behavior.

Add a doc comment explaining the function's purpose:
+// autofill updates a node with optional summarization and auto-tagging.
+// It delegates the actual updates to the node updater while maintaining
+// separation of concerns between indexing and content generation.
 func (i *semdexer) autofill(ctx context.Context, id library.NodeID, summarise bool, autotag bool) error {
app/services/library/node_semdex/node_semdex.go (1)
149-169: Consider adding graceful shutdown handling for the subscription goroutine.

The new subscription goroutine for autoFillQueue should be properly cleaned up when the service shuts down. Consider using a context cancellation or a done channel.
 lc.Append(fx.StartHook(func(_ context.Context) error {
+    ctx, cancel := context.WithCancel(ctx)
     sub, err := autoFillQueue.Subscribe(ctx)
     if err != nil {
         return err
     }
 
     go func() {
+        defer cancel()
         for msg := range sub {
             ctx = session.GetSessionFromMessage(ctx, msg)
 
             if err := re.autofill(ctx, msg.Payload.ID, msg.Payload.SummariseContent, msg.Payload.AutoTag); err != nil {
                 l.Error("failed to autofill node", zap.Error(err))
             }
 
             msg.Ack()
         }
     }()
 
     return nil
 }))
app/transports/http/bindings/nodes.go (1)
246-248: Consider adding validation for tag fill rule.

The tag fill rule handling could benefit from additional validation before applying it to the partial struct.
 if tfr, ok := tagFillRuleParam.Get(); ok {
+    // Validate the fill rule before applying
+    if err := validateTagFillRule(tfr); err != nil {
+        return nil, fault.Wrap(err, fctx.With(ctx))
+    }
     partial.TagFill = opt.New(tag.TagFillCommand{FillRule: tfr})
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 130e1b7 and 5d8b329.

📒 Files selected for processing (13)

app/resources/mq/message_types.go (1 hunks)
app/services/generative/generative.go (1 hunks)
app/services/generative/summary.go (2 hunks)
app/services/library/node_fill/filler.go (3 hunks)
app/services/library/node_mutate/create.go (2 hunks)
app/services/library/node_mutate/mutate.go (5 hunks)
app/services/library/node_mutate/options.go (1 hunks)
app/services/library/node_mutate/update.go (3 hunks)
app/services/library/node_semdex/autofill.go (1 hunks)
app/services/library/node_semdex/indexer.go (2 hunks)
app/services/library/node_semdex/node_semdex.go (7 hunks)
app/transports/http/bindings/nodes.go (3 hunks)
web/src/screens/library/LibraryPageScreen/useLibraryPageScreen.ts (2 hunks)

🔇 Additional comments (23)

app/services/library/node_mutate/create.go (2)

9-9: Importing opt package is appropriate

The addition of github.com/Southclaws/opt is necessary for handling optional values in the code.

40-44: Verify that preMutation fully replaces applyOpts

Ensure that the new preMutation method encompasses all functionalities previously handled by applyOpts, to prevent any regression or loss of functionality.

Run the following script to check for any remaining references to applyOpts:

✅ Verification successful

preMutation fully replaces and enhances the functionality of applyOpts

Based on the comprehensive code review, the preMutation method not only replaces but significantly enhances the previous functionality:

It handles all basic field updates (name, slug, content, etc.)
Adds advanced features like:
- Asset source handling
- URL fetching and link creation
- Tag management (add/remove)
- Content summarization
- Parent node resolution
Properly structures the mutation options into a result type with additional metadata

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that `applyOpts` has been fully removed from the codebase.

# Test: Search for any occurrences of `applyOpts`.
rg 'applyOpts' -A 2 -B 2

Length of output: 24

Script:

#!/bin/bash
# Let's try a more comprehensive search strategy

# 1. Search for any historical references to applyOpts in git history
git log -p --all | rg "applyOpts"

# 2. Search for the current implementation of preMutation to verify its functionality
ast-grep --pattern 'func $_($_ *service) preMutation($$$) {
  $$$
}'

# 3. Search for any test files that might contain references
rg -t go "preMutation|applyOpts" --glob "*_test.go" -A 2 -B 2

Length of output: 134

Script:

#!/bin/bash
# Let's try a different approach focusing on the current file and its implementation

# 1. Look for the file content to understand the implementation
cat app/services/library/node_mutate/create.go

# 2. Search for any other files in the same directory that might contain related code
fd . app/services/library/node_mutate/ -t f -x cat {}

# 3. Search for preMutation usage across the codebase
rg "preMutation" -A 3 -B 3

Length of output: 20873

app/services/library/node_mutate/update.go (3)

44-49: Usage of preMutation in Update method

The Update method now utilizes preMutation to prepare options. This change improves consistency with the Create method. Ensure that preMutation handles all necessary option preparations.

73-73: Assigning TagSuggestions from pre.tags

Setting TagSuggestions from pre.tags ensures that any generated tag suggestions are included in the response. This is appropriate and maintains consistency with the updated tagging process.

24-24: Verify calls to Update method reflect the new signature

The Update method no longer accepts the options parameter. Ensure that all calls to Update have been updated to match the new signature.

Run the following script to find calls to Update that may still use the old signature:

✅ Verification successful

Let me search for direct calls to the Update method of the Manager type in the node_mutate package to verify if there are any callers still using the old signature.

Let me try a different search pattern to find calls to the Update method in the node_mutate package.

Based on the search results, I can now provide a final response:

All calls to the Update method have the correct signature

The search results show all calls to the Update method in the codebase, and they all use the new signature with three parameters (ctx, qk, partial). Here are the verified calls:

app/services/asset/analyse/pdf.go: Uses Update(ctx, library.QueryKey{}, node_mutate.Partial{})
app/services/library/node_semdex/autofill.go: Uses Update(ctx, qk, p)
app/services/library/node_mutate/update.go: Internal implementation and a call with Update(ctx, qk, pre.opts...)
app/transports/http/bindings/nodes.go: Multiple calls, all using Update(ctx, deserialiseNodeMark(), partial)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find calls to `Update` with the old method signature.

# Test: Search for `Update` calls with four or more arguments.
rg 'Update\([^,]+,[^,]+,[^,]+,[^,]+\)' -A 2 -B 2

Length of output: 26902

Script:

#!/bin/bash
# Search for direct calls to Manager.Update in node_mutate package
ast-grep --pattern 'Manager.Update($$$)'

# Also search using ripgrep to catch any potential calls
rg 'node_mutate.*Manager.*Update\(' -A 2 -B 2

Length of output: 90

Script:

#!/bin/bash
# Search for files that import node_mutate and look for Update calls
rg -l "node_mutate" | xargs rg "Update\(" -A 2 -B 2

# Also search specifically in the library package
rg -l "library" | xargs rg "node_mutate.*Manager.*Update" -A 2 -B 2

Length of output: 4338

app/services/library/node_mutate/mutate.go (3)

42-42: Addition of TagFill to Partial struct

Including TagFill in the Partial struct allows for more flexible tag management during node mutations. Ensure that all uses of Partial handle this new field correctly.

56-56: Introduction of summariser in Manager struct

Adding the summariser to Manager integrates summarization capabilities into node mutations. Confirm that summariser is properly initialized and utilized where necessary.

Line range hint 71-85: Verify constructors are updated with new summariser parameter

The New constructor now requires a summariser parameter. Ensure that all instantiations of Manager pass this new parameter to prevent runtime issues.

Run the following script to find any constructor calls missing the summariser argument:

✅ Verification successful

No issues with the summariser parameter in constructors

The verification shows that the dependency injection is properly set up:

The summariser is provided via the generative package's fx setup (app/services/generative/generative.go) where it's registered as fx.As(new(Summariser))
The node_mutate.New constructor is injected through fx.Provide in app/services/library/library.go
Since this is using the fx dependency injection framework, the framework ensures all required dependencies are provided at startup time, and will fail fast if the summariser is not properly configured

The dependency injection framework guarantees that the new summariser parameter will be properly passed to all Manager instantiations. No manual verification of constructor calls is needed as they are handled automatically by the fx framework.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find `node_mutate.New` calls lacking the `summariser` parameter.

# Test: Search for `node_mutate.New` calls with fewer than expected arguments.
rg 'node_mutate\.New\(' -A 5 | awk '/node_mutate\.New\(/ {c=5} c&&c--'

Length of output: 72

Script:

#!/bin/bash
# Let's try a different approach to find Manager instantiations
# First, find all files that might contain node_mutate.New or direct Manager struct initialization
rg -l "node_mutate|Manager{" 

# Then let's look for actual instantiations with more context
rg -B 2 -A 5 "(?:node_mutate\.New|Manager{)"

Length of output: 321

Script:

#!/bin/bash
# Let's try a simpler approach with proper escaping
# First find files containing node_mutate.New
rg -l "node_mutate\.New"

# Then look for the actual instantiations with context
rg -B 2 -A 5 "node_mutate\.New"

# Also search for direct Manager struct initialization
rg -B 2 -A 5 "Manager\{"

Length of output: 6465

Script:

#!/bin/bash
# Let's examine the dependency injection setup since we found node_mutate.New is provided via fx.Provide
# First, let's find where the summariser is provided
rg -l "Summariser|summariser"

# Then look for the actual provider setup
rg -B 2 -A 5 "(?i)summariser"

# Also check for any wire/fx setup files
fd -e go -x grep -l "fx\." {} \;

Length of output: 7848

app/services/library/node_mutate/options.go (5)

38-149: Review preMutation method for comprehensive option handling

The preMutation method centralizes option preparation for node mutations. Ensure that all fields in Partial are correctly mapped to their corresponding options and that error handling is thorough.

75-87: Queue assets for analysis when ContentFill is specified

The logic correctly queues assets for analysis when ContentFill is provided. Confirm that assets are processed as expected and that potential errors are properly managed.

97-102: Ensure correct content selection for pre-mutation tasks

The method selects content from the Partial or the current node for tasks like tag suggestion and summarization. Verify that this logic correctly prioritizes new content over existing when appropriate.

105-117: Validate tag filling logic

The handling of TagFill respects the specified fill rule. When Replace is used, tags are replaced with suggestions; otherwise, existing tags are preserved or augmented. Ensure this aligns with the desired tag management strategy.

207-219: Handle empty content in buildTagSuggestionOpts

The method gracefully returns when content is empty, avoiding unnecessary processing. Error handling for the tagging process is also appropriate. Confirm that this behavior meets expected requirements.

app/services/generative/generative.go (1)

18-18: Interface change requires verification of all implementations

The parameter type change from datagraph.Item to datagraph.Content is more specific and aligned with the content processing responsibility. However, this is a breaking change.

✅ Verification successful

Let me try a different approach to find implementations of the Summariser interface.

Let me check the implementation in the generator struct.

Let me check the previous implementation to confirm the change.

Based on the search results, I can now provide a final response:

Interface change appears to be properly implemented

The Summariser interface is implemented by the generator struct in the generative package, and its implementation in summary.go already uses the new datagraph.Content parameter type. The interface is only used internally within the same package and by node_mutate package as a dependency injection.

✓ Implementation in generator.Summarise matches the interface
✓ All usages of Summariser are properly wired through dependency injection
✓ No other implementations of the interface were found in the codebase

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all implementations of the Summariser interface to ensure they're updated
ast-grep --pattern 'type $_ struct {
  $$$
}

func ($_ $_) Summarise(ctx context.Context, $_) (string, error) {
  $$$
}'

Length of output: 122

Script:

#!/bin/bash
# Search for struct types that implement Summarise method
rg -U "type.*struct.*\n.*\n.*Summarise.*context\.Context.*\).*\(string,.*error\)" --type go

# Also search for the interface usage to find where it might be implemented
rg "Summariser" --type go

Length of output: 479

Script:

#!/bin/bash
# Check the generator struct implementation
rg -A 10 "type generator struct" --type go

# Check the actual Summarise method implementation
rg -B 2 -A 10 "func \(.*generator\) Summarise\(" --type go

Length of output: 1552

Script:

#!/bin/bash
# Check git history for the previous implementation of the Summarise method
git log -p -- app/services/generative/summary.go | rg -B 5 -A 5 "func.*Summarise"

Length of output: 0

app/services/library/node_semdex/indexer.go (1)

Line range hint 14-32: Clean separation of indexing from content generation

The simplified index method now properly focuses on its core responsibility of indexing, with content generation concerns moved elsewhere. This aligns well with the PR objectives.

✅ Verification successful

Let me verify the context of getSummary usage to ensure it's not related to the indexing functionality.

Verified: Indexing functionality successfully separated from content generation

The getSummary occurrences found are in app/resources/datagraph/content.go and are part of content processing logic, completely unrelated to the indexing functionality. The old summarization and tagging logic has been successfully removed from the indexer, making it more focused on its core responsibility.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that old summarization and tagging logic is completely removed
rg -g '!*.md' -i 'getSummary|generateTags|applyTags'

Length of output: 207

Script:

#!/bin/bash
# Check the context of getSummary usage in content.go
rg -A 5 -B 5 'getSummary' app/resources/datagraph/content.go

Length of output: 639

app/resources/mq/message_types.go (2)

20-21: LGTM: IndexNode ID field addition

The addition of the ID field to IndexNode struct provides clear identification of nodes for indexing operations.

Line range hint 23-27: LGTM: Well-structured AutoFillNode type

The new AutoFillNode type is well-designed with clear boolean flags for controlling summarization and auto-tagging features.

app/services/generative/summary.go (2)

Line range hint 15-30: LGTM: Improved summarization prompt

The updated prompt template provides:

Clear instructions for handling malformed content
Better guidance for maintaining neutral tone
Explicit output format requirements

Line range hint 32-46: LGTM: Improved type safety in Summarise method

The change from datagraph.Item to datagraph.Content provides better type safety and clearer intent.

app/services/library/node_fill/filler.go (3)

28-31: LGTM: Well-structured Filler type

The addition of autoFillQueue is well-integrated with existing queue fields.

Line range hint 77-88: LGTM: Proper error handling for asset queue

The error handling for asset queue publishing is properly implemented with fault wrapping.

90-97: LGTM: Well-structured auto-fill queue publishing

The auto-fill queue publishing is properly implemented with appropriate error handling.

app/transports/http/bindings/nodes.go (1)

233-244: LGTM! Well-structured partial object construction.

The construction of the node_mutate.Partial struct is clear and organized, with proper use of optional types.

coderabbitai · 2024-12-14T23:59:04Z

app/services/library/node_mutate/update.go

+			// failing to publish the deletion message is worthy of an error.
+			return nil, fault.Wrap(err, fctx.With(ctx))


⚠️ Potential issue

Return error on failure to publish deletion message

By returning an error when failing to publish the deletion message, you ensure that this critical failure is propagated. Confirm that this change aligns with the overall error handling strategy and that upstream functions handle this error appropriately.

coderabbitai · 2024-12-14T23:59:04Z

app/services/library/node_fill/filler.go

+			ID:               library.NodeID(library.NodeID(cfr.TargetNodeID)),
+			SummariseContent: true,


⚠️ Potential issue

Fix redundant type conversion

There's a redundant type conversion in the ID field assignment.

Apply this fix:

-ID: library.NodeID(library.NodeID(cfr.TargetNodeID)), +ID: library.NodeID(cfr.TargetNodeID),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ID: library.NodeID(library.NodeID(cfr.TargetNodeID)),

SummariseContent: true,

ID: library.NodeID(cfr.TargetNodeID),

SummariseContent: true,

coderabbitai · 2024-12-14T23:59:04Z

app/services/library/node_semdex/node_semdex.go

+				ctx = session.GetSessionFromMessage(ctx, msg)
+
+				if err := re.autofill(ctx, msg.Payload.ID, msg.Payload.SummariseContent, msg.Payload.AutoTag); err != nil {
+					l.Error("failed to autofill node", zap.Error(err))
+				}


🛠️ Refactor suggestion

Enhance error handling for session context retrieval.

The session context retrieval and autofill operations should handle potential errors more gracefully.

-ctx = session.GetSessionFromMessage(ctx, msg) +newCtx, err := session.GetSessionFromMessage(ctx, msg) +if err != nil { + l.Error("failed to get session from message", zap.Error(err)) + msg.Ack() + continue +} +ctx = newCtx if err := re.autofill(ctx, msg.Payload.ID, msg.Payload.SummariseContent, msg.Payload.AutoTag); err != nil { l.Error("failed to autofill node", zap.Error(err)) + // Consider implementing a retry mechanism or dead letter queue }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-12-14T23:59:05Z

web/src/screens/library/LibraryPageScreen/useLibraryPageScreen.ts

+        await linkCreate(
+          { url: link.url },
          {
-            content: link.description,
-            primary_image_asset_id: link.primary_image?.id,
+            content_fill_rule: "replace",
+            node_content_fill_target: node.id,


🛠️ Refactor suggestion

Improve error handling in linkCreate operation.

The linkCreate operation should handle potential errors more explicitly and provide user feedback.

await linkCreate( { url: link.url }, { content_fill_rule: "replace", node_content_fill_target: node.id, + }).catch((error) => { + console.error('Failed to create link:', error); + throw new Error('Failed to import link. Please try again.'); }, );

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

internal/infrastructure/ai/ai.go (2)
15-15: Consider adding documentation for the EmbeddingFunc method

The new interface method would benefit from documentation explaining its purpose, expected behavior, and return value characteristics.
+// EmbeddingFunc returns a function that generates vector embeddings for text input.
+// The returned function should be thread-safe and suitable for concurrent use.
 EmbeddingFunc() func(ctx context.Context, text string) ([]float32, error)
37-39: Consider adding error documentation for disabled implementation

The nil return might be unexpected. Consider documenting the behavior or returning a specific error.
 func (d *Disabled) EmbeddingFunc() func(ctx context.Context, text string) ([]float32, error) {
-    return nil
+    return func(ctx context.Context, text string) ([]float32, error) {
+        return nil, ErrDisabled
+    }
 }
internal/infrastructure/ai/mock.go (2)
34-35: Document the embedding generation formula

The magic numbers (256, 3071, 65535) in the embedding generation formula need documentation explaining their significance.
+// Generate a deterministic embedding value using:
+// - 256: byte range for character normalization
+// - 3071: prime factor for distribution
+// - 65535: maximum uint16 for range limitation
 c := ((float32(v % 256)) / 256) * float32(((i+1)*3071)%65535)
43-56: Consider optimizing vector normalization

The normalization function could be optimized by:

Pre-computing the squared values

Using a single allocation
 func normalizeVector(vec []float32) []float32 {
     var n float32
+    r := make([]float32, len(vec))
     for _, v := range vec {
         n += v * v
     }
     n = float32(math.Sqrt(float64(n)))
 
-    r := make([]float32, len(vec))
     for i, v := range vec {
         r[i] = v / n
     }
 
     return r
 }
internal/infrastructure/ai/openai.go (2)
16-17: Document thread safety requirements for ef field

The embedding function field should be documented regarding its thread safety guarantees.
 type OpenAI struct {
     client *openai.Client
+    // ef is a thread-safe function for generating text embeddings
     ef     func(ctx context.Context, text string) ([]float32, error)
 }
46-48: Consider adding context validation in EmbeddingFunc wrapper

The EmbeddingFunc method could benefit from additional context validation.
 func (o *OpenAI) EmbeddingFunc() func(ctx context.Context, text string) ([]float32, error) {
-    return o.ef
+    return func(ctx context.Context, text string) ([]float32, error) {
+        if ctx == nil {
+            ctx = context.Background()
+        }
+        return o.ef(ctx, text)
+    }
 }
app/services/semdex/semdexer/semdexer.go (1)

13-13: LGTM! Clean dependency injection implementation.

The changes effectively decouple the semantic indexer from specific AI implementations by introducing dependency injection through the ai.Prompter interface. This improves modularity and testability.

Consider documenting the expected behavior of the ai.Prompter interface in the package documentation to help implementers understand the requirements.

Also applies to: 23-23, 31-31
app/services/semdex/semdexer/chromem_semdexer/chromem.go (1)
35-36: Consider enhancing error message clarity.

The error message could be more specific about why the language model provider is required.

Consider this alternative error message:
-		return nil, fault.New("a language model provider must be enabled for the embedded semdexer to be enabled")
+		return nil, fault.New("embedded semdexer requires an enabled language model provider for generating embeddings")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8b329 and 9e38766.

📒 Files selected for processing (5)

app/services/semdex/semdexer/chromem_semdexer/chromem.go (2 hunks)
app/services/semdex/semdexer/semdexer.go (2 hunks)
internal/infrastructure/ai/ai.go (2 hunks)
internal/infrastructure/ai/mock.go (1 hunks)
internal/infrastructure/ai/openai.go (2 hunks)

🔇 Additional comments (3)

internal/infrastructure/ai/ai.go (1)

23-25: LGTM: Mock provider addition

The addition of the mock provider case is well-placed and follows the existing pattern.

app/services/semdex/semdexer/chromem_semdexer/chromem.go (2)

20-20: LGTM! Clean interface implementation.

The addition of the ai.Prompter parameter and its import aligns with the dependency injection pattern established in the parent package.

Also applies to: 29-29

39-39: Verify error handling for embedding function.

The embedding function is retrieved but there's no explicit error handling if the function returns nil or fails.

Let's check if the EmbeddingFunc method can return nil or error:

coderabbitai · 2024-12-15T20:15:22Z

internal/infrastructure/ai/mock.go

+		for _, v := range text {
+			for i := range mockEmbeddingSize {
+				c := ((float32(v % 256)) / 256) * float32(((i+1)*3071)%65535)
+				embedding[i] = c
+			}
+		}


⚠️ Potential issue

Fix the range loop in embedding generation

There's a bug in the embedding generation loop. The range should be over indices, not the size itself.

-for i := range mockEmbeddingSize { +for i := 0; i < mockEmbeddingSize; i++ {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for _, v := range text {

for i := range mockEmbeddingSize {

c := ((float32(v % 256)) / 256) * float32(((i+1)*3071)%65535)

embedding[i] = c

}

}

for _, v := range text {

for i := 0; i < mockEmbeddingSize; i++ {

c := ((float32(v % 256)) / 256) * float32(((i+1)*3071)%65535)

embedding[i] = c

}

}

coderabbitai · 2024-12-15T20:15:23Z

internal/infrastructure/ai/openai.go

 	client := openai.NewClient(option.WithAPIKey(cfg.OpenAIKey))
-	return &OpenAI{client: client}, nil
+	ef := chromem.NewEmbeddingFuncOpenAI(cfg.OpenAIKey, chromem.EmbeddingModelOpenAI3Large)
+	return &OpenAI{client: client, ef: ef}, nil


⚠️ Potential issue

Add error handling for embedding function initialization

The embedding function initialization should include error handling as it might fail.

func newOpenAI(cfg config.Config) (*OpenAI, error) { client := openai.NewClient(option.WithAPIKey(cfg.OpenAIKey)) - ef := chromem.NewEmbeddingFuncOpenAI(cfg.OpenAIKey, chromem.EmbeddingModelOpenAI3Large) - return &OpenAI{client: client, ef: ef}, nil + ef, err := chromem.NewEmbeddingFuncOpenAI(cfg.OpenAIKey, chromem.EmbeddingModelOpenAI3Large) + if err != nil { + return nil, fault.Wrap(err, fctx.With(ctx)) + } + return &OpenAI{client: client, ef: ef}, nil }

Committable suggestion skipped: line range outside the PR's diff.

restructure responsibilities around indexing and generative content

5d8b329

- remove autotag/summarise from index consumer - fix some issues with auto tag - refactor node mutation logic

coderabbitai bot reviewed Dec 14, 2024

View reviewed changes

barneyferry added 2 commits December 15, 2024 15:08

add a mock embedding provider

9854967

use custom embedding provider in chromem

9e38766

coderabbitai bot reviewed Dec 15, 2024

View reviewed changes

Southclaws merged commit 4613bb4 into main Dec 19, 2024
4 checks passed

Southclaws deleted the rework-node-mutation-fill-rules branch December 19, 2024 18:41

This was referenced Dec 20, 2024

add title fill rule #323

Merged

rework middleware and add frontend proxy #331

Merged

This was referenced Jan 1, 2025

add pinecone semdexer #340

Merged

rework reindexing and update pinecone dedupe #341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restructure responsibilities around indexing and generative content #317

restructure responsibilities around indexing and generative content #317

Southclaws commented Dec 14, 2024

vercel bot commented Dec 14, 2024 •

edited

Loading

coderabbitai bot commented Dec 14, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Dec 14, 2024

coderabbitai bot Dec 14, 2024

coderabbitai bot Dec 14, 2024

coderabbitai bot Dec 14, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 15, 2024

coderabbitai bot Dec 15, 2024

		// failing to publish the deletion message is worthy of an error.
		return nil, fault.Wrap(err, fctx.With(ctx))

		ID: library.NodeID(library.NodeID(cfr.TargetNodeID)),
		SummariseContent: true,

restructure responsibilities around indexing and generative content #317

restructure responsibilities around indexing and generative content #317

Conversation

Southclaws commented Dec 14, 2024

vercel bot commented Dec 14, 2024 • edited Loading

coderabbitai bot commented Dec 14, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 14, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 14, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 14, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 14, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 15, 2024

Choose a reason for hiding this comment

vercel bot commented Dec 14, 2024 •

edited

Loading

coderabbitai bot commented Dec 14, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)