-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5 custom model training #15
Conversation
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe pull request introduces a comprehensive redesign of the IntentGuard project's architecture, focusing on modularizing the code verification system. Key changes include creating new packages and classes for inference providers, prompt factories, judgement caches, and domain models. The implementation introduces a flexible framework for code assertion using language models, with support for different providers, caching mechanisms, and evaluation strategies. The changes include adding dataset generation tools, model training scripts, and infrastructure for local model inference using Llamafile. Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🧹 Nitpick comments (47)
ai_research/model_training/install.sh (3)
1-2
: Add execution permissions to the scriptEnsure the script is executable by running:
chmod +x install.sh
3-3
: Consider pinning PyTorch version for reproducibilityThe PyTorch installation lacks version pinning, which could lead to inconsistent environments across different setups.
-python -m pip install torch --index-url https://download.pytorch.org/whl/cu121 || exit 1 +python -m pip install torch==2.1.2 --index-url https://download.pytorch.org/whl/cu121 || exit 1
1-6
: Add error handling and user feedbackThe script would benefit from better error handling and user feedback.
Add these improvements:
- Set error handling flags
- Add progress messages
- Verify Python version
- Check CUDA availability
#!/bin/bash +set -euo pipefail + +echo "Checking Python environment..." +python --version || { echo "Python not found"; exit 1; } + +echo "Checking CUDA availability..." +if ! python -c "import torch; print(torch.cuda.is_available())" 2>/dev/null; then + echo "Warning: CUDA may not be available" +fi + +echo "Installing PyTorch..." python -m pip install torch --index-url https://download.pytorch.org/whl/cu121 || exit 1 + +echo "Installing requirements..." python -m pip install -r requirements.txt || exit 1ai_research/model_training/run.sh (2)
3-3
: Add error handling and logging for installation.While the script handles installation failure with
exit 1
, it would be beneficial to add logging for better debugging.-bash ./install.sh || exit 1 +echo "Starting installation..." +if ! bash ./install.sh; then + echo "Installation failed. Check install.sh logs for details." + exit 1 +fi +echo "Installation completed successfully."
1-7
: Consider adding script documentation and environment validation.The script would benefit from:
- A header comment explaining its purpose and requirements
- Environment validation (Python version, gcloud auth)
- Cleanup of temporary files if any are created during training
Add this at the beginning of the script:
#!/bin/bash + +# IntentGuard Model Training Runner +# This script automates the model training pipeline on Google Cloud: +# 1. Installs dependencies +# 2. Runs model training +# 3. Shuts down the compute instance on completion +# +# Requirements: +# - Python 3.8+ +# - Authenticated gcloud CLI +# - Google Cloud compute instance metadata access + +# Validate environment +command -v python >/dev/null 2>&1 || { echo "Python is required but not installed."; exit 1; } +command -v gcloud >/dev/null 2>&1 || { echo "gcloud CLI is required but not installed."; exit 1; } + +# Ensure we're running on a Google Cloud instance +if ! curl -sf -H "Metadata-Flavor: Google" http://metadata.google.internal > /dev/null; then + echo "This script must be run on a Google Cloud compute instance." + exit 1 +fiscripts/prepare.py (3)
26-29
: Consider handling I/O errors explicitly.
Theverify_checksum
function does not handle potential I/O or permission errors when reading the file. While it's not mandatory, wrapping these calls in a try-except block can help diagnose file access issues more gracefully.
31-47
: Optional: Provide a progress indicator or logging for large downloads.
Currently, file download progress isn't shown. If the file is large, consider implementing a more granular progress reporting mechanism or structured logging to keep track of download progress.
62-86
: Consider enhanced logging for easier debugging.
Usingintentguard/app/judgement_cache.py (1)
44-66
: Cache update policy is left to the concrete implementation.
This design choice is flexible, but consider documenting typical invalidation or update scenarios for new contributors.ai_research/dataset_generation/domain/example.go (3)
89-105
: Regex for code objects is quite specific.
The approach captures code segments using Python fences. Consider parameterizing or extending it if other languages should be supported.
117-125
: Code section extraction is strict.
The code section must appear before[Thinking]
. This constraint is fine if guaranteed by upstream data, but adding robust fallback logic could help if inputs vary.
127-135
: Assertion text parsing is strict.
Similar to code section parsing, the function expects[Assertion]
preceding[Code]
. If future data format variations arise, consider more flexible parsing or failover strategies.intentguard/infrastructure/llamafile.py (1)
18-22
: Consider making constants configurableThese constants are hardcoded but might need to be adjusted based on different deployment scenarios or model requirements.
Consider making these configurable through environment variables or a configuration file:
-STARTUP_TIMEOUT_SECONDS = 120 # 2 minutes -INFERENCE_TIMEOUT_SECONDS = 300 # 5 minutes -CONTEXT_SIZE = 8192 -MODEL_FILENAME = "IntentGuard-1.Q8_0.gguf" -MODEL_NAME = "IntentGuard-1" +STARTUP_TIMEOUT_SECONDS = int(os.getenv("LLAMAFILE_STARTUP_TIMEOUT", 120)) # 2 minutes +INFERENCE_TIMEOUT_SECONDS = int(os.getenv("LLAMAFILE_INFERENCE_TIMEOUT", 300)) # 5 minutes +CONTEXT_SIZE = int(os.getenv("LLAMAFILE_CONTEXT_SIZE", 8192)) +MODEL_FILENAME = os.getenv("LLAMAFILE_MODEL_FILENAME", "IntentGuard-1.Q8_0.gguf") +MODEL_NAME = os.getenv("LLAMAFILE_MODEL_NAME", "IntentGuard-1")intentguard/app/intentguard.py (1)
26-28
: Add type hints to class variablesThe class variables lack type hints which could help with static type checking.
Add type hints to class variables:
- _inference_provider: InferenceProvider - _prompt_factory: PromptFactory - _judgement_cache_provider: JudgementCache + _inference_provider: ClassVar[InferenceProvider] + _prompt_factory: ClassVar[PromptFactory] + _judgement_cache_provider: ClassVar[JudgementCache]ai_research/model_training/intentguard_finetuning.py (1)
11-17
: Make output directories configurableThe output directories are hardcoded but should be configurable.
Make the directories configurable through environment variables:
-MAX_SEQ_LENGTH = 128000 -DTYPE = None -LOAD_IN_4BIT = True -OUTPUT_DIR = "outputs" -LORA_OUTPUT_DIR = "lora_model" -GGUF_OUTPUT_DIR = "model" -GGUF_QUANTIZATION_METHOD = "q4_k_m" +MAX_SEQ_LENGTH = int(os.getenv("MAX_SEQ_LENGTH", 128000)) +DTYPE = os.getenv("DTYPE", None) +LOAD_IN_4BIT = os.getenv("LOAD_IN_4BIT", "true").lower() == "true" +OUTPUT_DIR = os.getenv("OUTPUT_DIR", "outputs") +LORA_OUTPUT_DIR = os.getenv("LORA_OUTPUT_DIR", "lora_model") +GGUF_OUTPUT_DIR = os.getenv("GGUF_OUTPUT_DIR", "model") +GGUF_QUANTIZATION_METHOD = os.getenv("GGUF_QUANTIZATION_METHOD", "q4_k_m")ai_research/dataset_generation/cli/root.go (1)
10-21
: Excellent Command SetupUsing
rootCmd
for the top-level command is a common pattern, and theExecute
function handles errors gracefully. Consider adding usage examples and/or help text for new subcommands.intentguard/app/inference_options.py (1)
5-18
: Consider providing a default value fortemperature
or documenting safe defaults.Currently, the
InferenceOptions
class requires the caller to explicitly provide atemperature
. Many frameworks set a default value (e.g.,0.7
) for convenience. Providing a default (or clarifying valid ranges) might improve usability, especially if developers are unsure about the appropriate level of randomness.intentguard/__init__.py (1)
7-8
: Validate or initialize FsJudgementCache with caution.Consider whether the
FsJudgementCache
requires initialization parameters (e.g., a cache directory) that might need to be configurable. Document or parameterize such options to avoid runtime issues if file permissions or directory structure are different from defaults.intentguard/app/message.py (1)
5-23
: Consider enforcing strict role constraints.Currently, the
role
is an unconstrained string. Using an enum (e.g.,'system'
,'user'
,'assistant'
) or a dedicated type can prevent accidental typos and make the code more self-documenting.ai_research/dataset_generation/infrastructure/storage.go (2)
15-34
: Evaluate the use oflog.Fatal
for error handling.While
log.Fatalf
is straightforward, it abruptly terminates the program. If partial failures are tolerable, consider returning errors up the call stack or using structured error handling. This approach improves testability and makes the system more resilient.
24-34
: Assess concurrency implications for file output.Appending lines to
output.jsonl
can cause contention in multi-threaded or multi-process environments. If concurrency is expected, you may want a locking mechanism or a more robust solution (e.g., a queue or a database) to avoid interleaved writes.ai_research/dataset_generation/domain/assertion.go (1)
14-35
: Improve edge case handling for parseAssertion.
The function handles missing matches gracefully by returning an empty slice. However, consider logging or returning an error if the assertion text is entirely malformed. That might help with debugging.func parseAssertion(assertionText string) Assertion { re := regexp.MustCompile(`\{(?P<object>[^}]*)}`) trimmedAssertionText := strings.Trim(assertionText, `"`) matches := re.FindAllStringSubmatch(assertionText, -1) if matches != nil { // existing code ... } - return Assertion{ + // Optionally, log or handle malformed input: + log.Printf("Warning: no code object matches found in assertion text: %s", assertionText) + return Assertion{ AssertionText: trimmedAssertionText, CodeObjectNames: []string{}, } }ai_research/dataset_generation/domain/llm_response.go (1)
9-21
: Error-handling strategy.
Skipping problematic examples instead of raising errors is appropriate if partial data is acceptable. Consider returning an aggregated error list to aid debugging.ai_research/dataset_generation/domain/assertion_test.go (1)
18-29
: Mismatch in expected count comment.
Line 21 states “Expected 2 code object names” but your logic expects 3. Update the comment to avoid confusion.- t.Errorf("Expected 2 code object names, got %d", len(assertion.CodeObjectNames)) + t.Errorf("Expected 3 code object names, got %d", len(assertion.CodeObjectNames))intentguard/app/inference_provider.py (2)
1-3
: Use type annotations consistently for clarity.While the
predict
method has type annotations, the imports fromtyping
(List, etc.) hint that you value explicit type usage. Consider also applying type hints to class variables or other function parameters you might add in the future.
9-17
: Document explicit usage scenarios.The class docstring is descriptive, but it can be improved by illustrating example usage or clarifying the expected structure of messages in the prompt. Future maintainers and integrators will benefit from concrete examples.
ai_research/dataset_generation/cli/transform.go (3)
14-18
: Consider stricter error handling for CLI users.When input/output files are absent or fail to open, the command logs a fatal error and exits. This is acceptable for small internal scripts but might not be optimal for broader CLI usage. You might consider returning a user-friendly message or fallback logic if you anticipate partial or typical user error.
31-49
: Validate JSON data before processing.The loop decodes each JSON line into an
OutputLine
. Currently, if the JSON structure is incorrect, you break out of the loop but do not warn or track the malformed line. Logging partial successes versus failures might improve traceability for large datasets.
53-57
: Make flag naming consistent across CLI commands.Using the same naming patterns (like
--input-file
/--output-file
) across all commands can improve discoverability. Consider aligning with other commands in this repository (e.g.,generate
command if it uses different naming).tests/test_intentguard.py (1)
43-46
: Evaluate edge cases for consensus logic.The test ensures consensus is determined for a given
num_evaluations
but doesn’t verify edge cases (e.g.,num_evaluations = 1
or extremely high values). Consider additional tests to confirm correct behavior.ai_research/dataset_generation/infrastructure/llm_provider.go (1)
30-53
: Allow customization of retry strategy.The provider sets a fixed maximum of 360 retries with a 10-second delay, which might be too high for some use cases. Add a parameter for customizing retry behavior or consider more dynamic backoff strategies to avoid excessive waiting.
ai_research/dataset_generation/cli/generate.go (2)
11-15
: Consider scoping these variables more narrowly.
Defining multiple global variables can complicate concurrency or future refactoring if the CLI grows in features. It may be cleaner and safer to define these variables within the command’sRun
function or a dedicated configuration struct if they are only needed there.
56-58
: Question the default provider choice.
Using"groq"
as the default provider may be fine if that is the most common provider in your environment. Otherwise, consider a more neutral default or require it to be specified.intentguard/domain/code_object.py (1)
27-67
: Robust exception handling in source extraction.
The method gracefully handlesTypeError
and other unexpected exceptions by logging and providing fallback source text. This enhances resilience against object types that lack inspectable source.ai_research/model_training/requirements.txt (1)
1-4
: Pin versions for reproducibility
Consider specifying versions for these dependencies (e.g.,jinja2==...
) to ensure consistent, reproducible environments for training.intentguard/domain/judgement_options.py (1)
4-18
: Consider deferring this abstraction until needed.While the documentation clearly outlines potential future features, introducing an empty dataclass might be premature. Consider either:
- Adding at least one configuration option now, or
- Deferring this class until concrete requirements emerge.
If you have immediate plans to implement any of the mentioned features (voting thresholds, weighting strategies, etc.), I can help design and implement them. Would you like to discuss the implementation details?
intentguard/app/intentguard_options.py (1)
12-12
: Consider documenting temperature's impact.While the temperature parameter is documented, it would be helpful to explain:
- How this value affects the model's behavior
- Why 0.4 was chosen as the default
- Recommended ranges for different use cases
Add more detailed documentation:
temperature (float, optional): The temperature parameter for the LLM. - Defaults to 0.4. + Controls randomness in model outputs. Lower values (0.1-0.4) + produce more focused and deterministic responses, while higher + values (0.5-1.0) increase creativity and diversity. Defaults + to 0.4 for balanced precision and creativity.Also applies to: 21-22
ai_research/dataset_generation/README.md (3)
12-23
: Consider adding example entries.While the field descriptions are clear, including sample JSON entries would make it easier for users to understand the exact format.
Add example entries like:
### Example Entry ```json { "assertion": { "assertionText": "The function always returns a positive number", "codeObjectNames": ["calculate_square"] }, "codeObjects": [{ "name": "calculate_square", "code": "def calculate_square(x): return x * x" }], "thoughts": "Since multiplication of any number by itself produces..." }<details> <summary>🧰 Tools</summary> <details> <summary>🪛 LanguageTool</summary> [uncategorized] ~14-~14: Loose punctuation mark. Context: ...e following fields: * **`assertion`**: * `assertionText`: A natural lang... (UNLIKELY_OPENING_PUNCTUATION) --- [uncategorized] ~16-~16: Loose punctuation mark. Context: ...y of the code. * `codeObjectNames`: A list of strings, each representing th... (UNLIKELY_OPENING_PUNCTUATION) --- [uncategorized] ~17-~17: Loose punctuation mark. Context: ... in the assertion. * **`codeObjects`**: A list of code objects, each with: ... (UNLIKELY_OPENING_PUNCTUATION) --- [uncategorized] ~19-~19: Loose punctuation mark. Context: ...ng a Python code snippet. * `name`: A string representing the name of the c... (UNLIKELY_OPENING_PUNCTUATION) --- [uncategorized] ~21-~21: Loose punctuation mark. Context: ...o meet the assertion. * **`thoughts`**: A string containing a chain-of-thought ... (UNLIKELY_OPENING_PUNCTUATION) </details> <details> <summary>🪛 Markdownlint (0.37.0)</summary> 15-15: Expected: 2; Actual: 4 Unordered list indentation (MD007, ul-indent) --- 16-16: Expected: 2; Actual: 4 Unordered list indentation (MD007, ul-indent) --- 18-18: Expected: 2; Actual: 4 Unordered list indentation (MD007, ul-indent) --- 19-19: Expected: 2; Actual: 4 Unordered list indentation (MD007, ul-indent) </details> </details> --- `15-19`: **Fix markdown list indentation.** The nested list indentation is inconsistent with markdown standards. ```diff - * `assertionText`: A natural language assertion describing a positive property of the code. - * `codeObjectNames`: A list of strings, each representing the name of a code component referenced in the assertion. -* **`codeObjects`**: A list of code objects, each with: - * `code`: A string containing a Python code snippet. - * `name`: A string representing the name of the code component. + * `assertionText`: A natural language assertion describing a positive property of the code. + * `codeObjectNames`: A list of strings, each representing the name of a code component referenced in the assertion. +* **`codeObjects`**: A list of code objects, each with: + * `code`: A string containing a Python code snippet. + * `name`: A string representing the name of the code component.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~16-~16: Loose punctuation mark.
Context: ...y of the code. *codeObjectNames
: A list of strings, each representing th...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~17-~17: Loose punctuation mark.
Context: ... in the assertion. *codeObjects
: A list of code objects, each with: ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...ng a Python code snippet. *name
: A string representing the name of the c...(UNLIKELY_OPENING_PUNCTUATION)
🪛 Markdownlint (0.37.0)
15-15: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
16-16: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
18-18: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
19-19: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
27-28
: Consider adding dataset validation tools.Since this dataset is crucial for model training, it would be helpful to provide tools for validating dataset entries.
Would you like me to help create a validation script that ensures:
- JSON schema compliance
- Code syntax validation
- Assertion-code relationship verification
ai_research/dataset_generation/data/prompt.txt (4)
1-9
: Enhance template clarity and markdown consistency.Consider these improvements for better clarity and maintainability:
- Add a description of valid categories where
{{.Category}}
is used- Use markdown-style numbered list for consistency
-Your task is to generate high-quality training examples for the **{{.Category}}** category. Each example includes: +Your task is to generate high-quality training examples for the **{{.Category}}** category (e.g., "Error Handling", "Design Patterns", etc.). Each example includes: -1. A natural language assertion describing positive properties of the code. -2. Python code snippets for each referenced component. -3. A [Thinking] section that provides a chain-of-thought analysis of the assertion and code, assuming no prior knowledge of whether this is a positive or negative example. -4. If it is a negative example, an explanation detailing why the code does not meet the assertion. +1. A natural language assertion describing positive properties of the code +2. Python code snippets for each referenced component +3. A [Thinking] section that provides a chain-of-thought analysis of the assertion and code, assuming no prior knowledge of whether this is a positive or negative example +4. If it is a negative example, an explanation detailing why the code does not meet the assertion🧰 Tools
🪛 LanguageTool
[style] ~8-~8: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...sertion. Follow the instructions below very carefully. --- ### FORMAT AND EXAMPLE STRUCTURE...(EN_WEAK_ADJECTIVE)
57-61
: Enhance code quality requirements.The code guidelines should include specific criteria for quality and security:
2. **Code Guidelines:** - For every `{name}` in the assertion, provide a corresponding Python code implementation block in the [Code] section. - The code should be realistic, follow Python best practices, and either demonstrate adherence to the assertion (positive example) or fail to do so (negative example). - Code should be coherent and self-contained. Even in negative examples, it should not be blatantly nonsensical. + - Code must follow security best practices (e.g., input validation, proper error handling). + - Each component should not exceed 50 lines of code for clarity. + - Complexity should be kept minimal (max cyclomatic complexity of 5). + - Include type hints and docstrings for better documentation.
94-125
: Enhance error handling example with more scenarios.The error handling example could be more comprehensive:
class DatabaseService: def __init__(self, error_handler): self.error_handler = error_handler + self.retries = 3 def execute_query(self, query): - with self.error_handler.handle_operation("database_query"): - result = self._run_query(query) - return result + for attempt in range(self.retries): + with self.error_handler.handle_operation(f"database_query_attempt_{attempt + 1}"): + try: + result = self._run_query(query) + return result + except TemporaryDatabaseError: + if attempt == self.retries - 1: + raise + continue def _run_query(self, query): - # Simulate the actual database operation - pass + # Example database operation with potential errors + if not query: + raise ValueError("Query cannot be empty") + # ... database operation logic ...
195-195
: Specify example count and quality criteria.The final instruction should be more specific about expectations:
-**Now produce new examples for the category ({{.Category}}) following these instructions.** +**Now produce 3-5 high-quality examples for the category ({{.Category}}) following these instructions. Ensure each example:** +1. Demonstrates a unique aspect of the category +2. Contains realistic, production-ready code +3. Maintains a balance between positive and negative cases +4. Includes comprehensive thinking analysisai_research/result/IntentGuard-1.Modelfile (2)
51-54
: Consider adding more stop sequences for better control.The current stop sequences are good, but consider adding more to prevent hallucination:
PARAMETER stop "<|start_header_id|>" PARAMETER stop "<|end_header_id|>" PARAMETER stop "<|eot_id|>" PARAMETER stop "<|eom_id|>" +PARAMETER stop "```json" +PARAMETER stop "```" +PARAMETER stop "[Code]" +PARAMETER stop "[Assertion]"
58-205
: Enhance system prompt with security considerations.The system prompt is well-structured but could benefit from explicit security considerations:
Your task is to: 1. **Parse and Interpret**: * Identify each named code object (e.g., `{user_service}`) and load its corresponding Python code. * Interpret the assertion, understanding the desired properties and how they relate to the code objects. + * Identify any security-sensitive components or operations in the code. 2. **Evaluate**: * Analyze the code, step-by-step, to determine if it fulfills the assertion. * Check if all referenced components exist and are implemented as expected. * Verify if the described properties hold true for the provided code snippets. + * Check for potential security vulnerabilities or unsafe practices.intentguard/infrastructure/llamafile_prompt_factory.py (1)
208-217
: Consider adding configuration options.The
LlamafilePromptFactory
class could benefit from configuration options:class LlamafilePromptFactory(PromptFactory): + def __init__( + self, + max_prompt_size: int = 8192, + sanitize_inputs: bool = True, + logging_level: int = logging.DEBUG + ): + """ + Initialize the prompt factory with configuration options. + + Args: + max_prompt_size: Maximum allowed prompt size in tokens + sanitize_inputs: Whether to sanitize input strings + logging_level: Logging level for debug messages + """ + self.max_prompt_size = max_prompt_size + self.sanitize_inputs = sanitize_inputs + logger.setLevel(logging_level)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
ai_research/dataset_generation/go.sum
is excluded by!**/*.sum
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (50)
.cdigestignore
(1 hunks).github/workflows/main.yml
(2 hunks).gitignore
(1 hunks)ai_research/.gitignore
(1 hunks)ai_research/dataset_generation/.gitignore
(1 hunks)ai_research/dataset_generation/README.md
(1 hunks)ai_research/dataset_generation/cli/generate.go
(1 hunks)ai_research/dataset_generation/cli/root.go
(1 hunks)ai_research/dataset_generation/cli/transform.go
(1 hunks)ai_research/dataset_generation/data/prompt.txt
(1 hunks)ai_research/dataset_generation/domain/assertion.go
(1 hunks)ai_research/dataset_generation/domain/assertion_test.go
(1 hunks)ai_research/dataset_generation/domain/category.go
(1 hunks)ai_research/dataset_generation/domain/code_object.go
(1 hunks)ai_research/dataset_generation/domain/example.go
(1 hunks)ai_research/dataset_generation/domain/example_test.go
(1 hunks)ai_research/dataset_generation/domain/llm_response.go
(1 hunks)ai_research/dataset_generation/go.mod
(1 hunks)ai_research/dataset_generation/infrastructure/llm_provider.go
(1 hunks)ai_research/dataset_generation/infrastructure/prompts.go
(1 hunks)ai_research/dataset_generation/infrastructure/storage.go
(1 hunks)ai_research/dataset_generation/main.go
(1 hunks)ai_research/model_training/install.sh
(1 hunks)ai_research/model_training/intentguard_finetuning.py
(1 hunks)ai_research/model_training/requirements.txt
(1 hunks)ai_research/model_training/run.sh
(1 hunks)ai_research/result/IntentGuard-1.Modelfile
(1 hunks)intentguard/__init__.py
(1 hunks)intentguard/app/inference_options.py
(1 hunks)intentguard/app/inference_provider.py
(1 hunks)intentguard/app/intentguard.py
(1 hunks)intentguard/app/intentguard_options.py
(1 hunks)intentguard/app/judgement_cache.py
(1 hunks)intentguard/app/message.py
(1 hunks)intentguard/app/prompt_factory.py
(1 hunks)intentguard/cache.py
(0 hunks)intentguard/domain/code_object.py
(1 hunks)intentguard/domain/evaluation.py
(1 hunks)intentguard/domain/judge.py
(1 hunks)intentguard/domain/judgement_options.py
(1 hunks)intentguard/infrastructure/.gitignore
(1 hunks)intentguard/infrastructure/fs_judgement_cache.py
(1 hunks)intentguard/infrastructure/llamafile.py
(1 hunks)intentguard/infrastructure/llamafile_prompt_factory.py
(1 hunks)intentguard/intentguard.py
(0 hunks)intentguard/prompts.py
(0 hunks)pyproject.toml
(1 hunks)scripts/prepare.py
(1 hunks)tests/test_cache.py
(0 hunks)tests/test_intentguard.py
(2 hunks)
💤 Files with no reviewable changes (4)
- intentguard/intentguard.py
- intentguard/prompts.py
- tests/test_cache.py
- intentguard/cache.py
✅ Files skipped from review due to trivial changes (7)
- .cdigestignore
- ai_research/.gitignore
- ai_research/dataset_generation/main.go
- ai_research/dataset_generation/.gitignore
- .gitignore
- intentguard/infrastructure/.gitignore
- ai_research/dataset_generation/go.mod
🧰 Additional context used
🪛 LanguageTool
ai_research/dataset_generation/data/prompt.txt
[style] ~8-~8: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...sertion. Follow the instructions below very carefully. --- ### FORMAT AND EXAMPLE STRUCTURE...
(EN_WEAK_ADJECTIVE)
[style] ~69-~69: The words ‘Explanation’ and ‘Explain’ are quite similar. Consider replacing ‘Explain’ with a different word.
Context: ...es (for Negative Examples Only):** - Explain why the code fails to meet the assertio...
(VERB_NOUN_SENT_LEVEL_REP)
ai_research/dataset_generation/README.md
[uncategorized] ~9-~9: Loose punctuation mark.
Context: ... and test.json
. * train.jsonl
: Contains training examples, one per lin...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~10-~10: Loose punctuation mark.
Context: ...n JSON Lines format. * test.json
: Contains a list of test examples in JSO...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~14-~14: Loose punctuation mark.
Context: ...e following fields: * assertion
: * assertionText
: A natural lang...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~16-~16: Loose punctuation mark.
Context: ...y of the code. * codeObjectNames
: A list of strings, each representing th...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~17-~17: Loose punctuation mark.
Context: ... in the assertion. * codeObjects
: A list of code objects, each with: ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...ng a Python code snippet. * name
: A string representing the name of the c...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~21-~21: Loose punctuation mark.
Context: ...o meet the assertion. * thoughts
: A string containing a chain-of-thought ...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 Markdownlint (0.37.0)
ai_research/dataset_generation/README.md
15-15: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
16-16: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
18-18: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
19-19: Expected: 2; Actual: 4
Unordered list indentation
(MD007, ul-indent)
🪛 Ruff (0.8.2)
intentguard/infrastructure/llamafile.py
170-170: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (62)
ai_research/model_training/install.sh (1)
4-4
: Verify requirements.txt existence and content
The script assumes requirements.txt exists but doesn't verify it.
✅ Verification successful
Requirements file exists and contains valid Python packages
The verification confirms that requirements.txt
exists in the expected location (ai_research/model_training/requirements.txt
) and contains a list of valid Python packages (jinja2, datasets, trl, transformers) needed for model training. The install script's assumption about the file's existence is correct.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if requirements.txt exists and show its content
if [ -f "ai_research/model_training/requirements.txt" ]; then
echo "requirements.txt found. Content:"
cat "ai_research/model_training/requirements.txt"
else
echo "requirements.txt not found in expected location"
fi
Length of output: 214
ai_research/model_training/run.sh (2)
1-2
: LGTM: Proper shebang and spacing.
The script starts with the correct shebang line and maintains good readability with proper spacing.
7-7
: 🛠️ Refactor suggestion
Enhance cloud instance shutdown with error handling and validation.
The current implementation of the cloud instance shutdown has several potential issues:
- No error handling for metadata server failures
- No validation of instance name/zone retrieval
- No confirmation of successful shutdown
-gcloud compute instances stop "$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name)" --zone="$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/zone | awk -F'/' '{print $NF}')"
+# Fetch instance details
+echo "Fetching instance metadata..."
+INSTANCE_NAME=$(curl -sf -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/name)
+INSTANCE_ZONE=$(curl -sf -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/zone | awk -F'/' '{print $NF}')
+
+# Validate metadata retrieval
+if [ -z "$INSTANCE_NAME" ] || [ -z "$INSTANCE_ZONE" ]; then
+ echo "Failed to retrieve instance metadata"
+ exit 1
+fi
+
+echo "Stopping instance $INSTANCE_NAME in zone $INSTANCE_ZONE..."
+if ! gcloud compute instances stop "$INSTANCE_NAME" --zone="$INSTANCE_ZONE" --quiet; then
+ echo "Failed to stop instance"
+ exit 1
+fi
+echo "Instance stopped successfully."
Let's verify the dependencies and related files:
✅ Verification successful
Enhance cloud instance shutdown with error handling and validation
The suggested refactoring is appropriate given the context:
- The
run.sh
script already has error handling for installation (|| exit 1
) but lacks it for the critical instance shutdown operation - The script is part of a model training workflow where reliable cleanup is essential
- No existing error handling or validation for the metadata retrieval and instance shutdown
- The suggested changes align with the error handling pattern used in
install.sh
Key improvements in the suggested changes:
- Separates metadata retrieval for better error handling
- Adds validation for instance details
- Provides feedback on the shutdown process
- Maintains consistent error handling with the rest of the script
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for the presence of required files and permissions
echo "Verifying script dependencies..."
# Check for install.sh
if [ ! -f "./install.sh" ]; then
echo "Missing install.sh"
fi
# Check for training script
if [ ! -f "./intentguard_finetuning.py" ]; then
echo "Missing intentguard_finetuning.py"
fi
# Check execute permissions
ls -l run.sh install.sh
Length of output: 411
Script:
#!/bin/bash
# Let's first locate the run.sh file and check its content
fd -t f "run.sh$"
# Check the directory structure around run.sh
fd -t f . "ai_research/model_training"
# Check if there are any other similar shutdown scripts
rg -l "gcloud compute instances stop"
Length of output: 335
Script:
#!/bin/bash
# Let's check the content of run.sh to understand the full context
cat ai_research/model_training/run.sh
# Check if there are any dependencies or error handling patterns in install.sh
cat ai_research/model_training/install.sh
# Check if there are any environment variables or configuration settings
cat ai_research/model_training/requirements.txt
Length of output: 849
scripts/prepare.py (2)
17-23
: Good use of chunk-based reading for large files.
Using a 4096
-byte block size in compute_checksum
helps avoid excessive memory usage with large files. This is a best practice.
49-60
: Nice handling of partial downloads and re-downloading.
This pattern of verifying the checksum and re-downloading if incorrect is robust.
pyproject.toml (2)
15-33
: Well-defined include/exclude sets.
Excluding transitory directories (e.g., .run
, dist
, tests
) prevents unnecessary files from being packaged. Ensure that excluding the scripts
folder is truly desired if scripts are intended to be distributed.
38-39
: Helpful addition of the prepare
script.
Defining prepare = "scripts.prepare:main"
provides a convenient entry point for environment setup. This aligns well with the new workflow steps.
.github/workflows/main.yml (2)
37-39
: Appropriate placement for the prepare step in the test job.
Adding poetry run prepare
ensures that all required files are downloaded and verified before tests run. This reduces failures caused by missing or corrupted dependencies.
73-75
: Consistent usage of the prepare step in the publish job.
Ensuring the same environment setup for publishing helps maintain parity with the test environment.
intentguard/app/prompt_factory.py (3)
1-3
: Imports are concise and appropriately typed.
No issues found here regarding correctness or security.
8-16
: Well-defined abstract base class.
The docstring clearly outlines the purpose and usage of PromptFactory
, making it easy for future implementers to understand how prompt generation should be structured.
18-42
: Method contract clarity.
The create_prompt
method provides a well-defined signature and thorough documentation. This ensures maintainers and implementers know exactly what to return (a list of Message
objects) and how to shape the prompt logic.
intentguard/app/judgement_cache.py (3)
1-8
: Imports & initial setup are straightforward.
These imports align well with the rest of the codebase. No immediate concerns.
10-17
: Good use of an abstract class for caching.
JudgementCache
sets clear expectations for cache implementations, reducing the likelihood of inconsistent caching logic.
19-43
: Effective documentation for the get
method.
It clearly states the requirement of using prompt, inference options, and judgement options for the cache key, preventing accidental collisions.
intentguard/domain/judge.py (4)
1-9
: Logging usage is beneficial for debugging.
The logger is appropriately set, offering granular debug information.
11-18
: Clear class docstring for Judge
.
The docstring provides straightforward insight into the voting mechanism, assisting future maintenance.
20-28
: Constructor straightforwardly stores configuration.
Storing judgement_options
at initialization is a clean approach, allowing easy customization of future expansions (like weighting or different thresholds).
29-64
: Majority vote logic is concise and robust.
- Using a
Counter
simplifies vote tallying. - Explanation retrieval for negative results is well-structured.
- Additional negative result details are obtained from the first failing evaluation, which is typically sufficient.
ai_research/dataset_generation/domain/example.go (6)
1-15
: Data model definition is clear.
The Example
struct fields—Assertion, CodeObjects, Thoughts, Explanation—are self-explanatory. This fosters maintainability.
17-50
: Example parsing pipeline is well-organized.
- The function calls for
parseAssertionText
,parseCodeSection
, etc., demonstrate a clean separation of concerns. - The final validation step (
validateExample
) helps keep consistent data.
52-75
: Validation ensures consistency between assertion and code objects.
- Error messages are descriptive, aiding debugging when code object counts do not match.
- Checking for forbidden words in
explanation
is an effective guard.
77-87
: Explanation parsing is optional and well-handled.
If the explanation is absent, the function returns nil
, which is cleanly integrated into the data model.
107-115
: Thoughts parsing includes necessary error handling.
Returning an error if none is found ensures the rest of the pipeline is halted appropriately.
137-144
: Custom JSON marshaling is straightforward.
This approach to shaping the JSON structure is well-structured for external usage.
intentguard/infrastructure/fs_judgement_cache.py (5)
1-6
: Imports set the stage for hashing and file-based I/O.
No issues. Package usage is appropriate.
16-24
: Class docstring thoroughly explains the file-based caching strategy.
Future contributors can easily understand the approach of hashing input parameters for file naming.
26-29
: Initialization ties into a default local cache directory.
Storing cached data in .intentguard
is explicit and easy to locate or ignore via .gitignore
.
30-54
: Hashing approach will keep collisions low, but consider size constraints.
While SHA-256 is robust, keep in mind potential OS filename length limits in unusual environments, though unlikely an immediate issue.
55-90
: get
method gracefully handles file read errors.
- Logging a cache miss as debug is helpful.
- Catching and warning about exceptions ensures graceful fallback.
ai_research/dataset_generation/domain/code_object.go (3)
1-4
: Package and Imports Look Good
The package name domain
is sensible, and using the built-in encoding/json
library is appropriate for JSON marshaling.
5-8
: Struct Fields Are Clear and Consistent
Defining Name
and Code
fields as strings is straightforward and expresses clear intent.
10-15
: Custom JSON Marshaling is Well-Structured
The MarshalJSON
implementation neatly wraps the Name
and Code
fields. Consider whether you might want to include additional fields or transformations in the future if the struct evolves.
ai_research/dataset_generation/infrastructure/prompts.go (2)
1-8
: Package and Constants Are Appropriate
The infrastructure
package is a logical choice for this functionality, and promptFilename
is a helpful constant.
10-18
: Robust Error Handling
The function reads the file reliably, and returning a wrapped error is good practice. The usage of fmt.Errorf
ensures better traceability of file handling failures.
ai_research/dataset_generation/cli/root.go (1)
1-9
: CLI Structure Is Well Organized
The package name cli
and usage of cobra
for CLI is a standard, maintainable choice.
ai_research/dataset_generation/domain/category.go (3)
1-3
: No Immediate Issues With Imports
Import statements are minimal and reflect usage of math/rand
.
5-19
: Meaningful Category List
The category entries are well-chosen; they cover a broad range of software quality considerations.
21-23
: Check for Potential Randomness Concerns
math/rand.Intn
relies on a global seed, which can lead to predictable outputs if the seed isn’t set. If this randomness must be reproducible or more secure, consider explicitly seeding or using crypto/rand
.
intentguard/__init__.py (2)
10-11
: Enhance error handling strategy for Llamafile usage.
When setting up the Llamafile
inference provider, consider anticipating service availability or permission issues. You could, for example, catch or verify successful initialization. This step will help surface potential misconfigurations more gracefully.
13-14
: Confirm the prompt factory configuration consistency.
Check that the LlamafilePromptFactory
interacts seamlessly with the Llamafile
provider’s input and output format. A mismatch can cause unexpected runtime failures if prompts are not properly formatted for the backend.
intentguard/domain/evaluation.py (3)
1-2
: Use of built-in dataclasses and Optional is appropriate.
The imports for dataclass
and Optional
are correct and align with Pythonic best practices for strongly typed data structures.
5-23
: Docstring clarity and completeness.
The docstring comprehensively explains the purpose of this class, including usage context. It’s well-structured and valuable for other team members or users who rely on your code.
25-26
: Data fields are well-defined.
Defining result
as a bool
and explanation
as an Optional[str]
is clear. This design supports flexible usage—some evaluations may not require an explanation.
ai_research/dataset_generation/domain/assertion.go (2)
9-12
: Struct naming clarity.
The Assertion
struct's naming and field definitions are intuitive, making it straightforward to understand how assertion data is stored.
37-42
: JSON marshaling.
Using an inline map structure for marshaling is straightforward and provides consistent JSON output. This design ensures minimal overhead and flexible fields.
ai_research/dataset_generation/domain/llm_response.go (1)
23-47
: Robust text extraction approach.
Using a regex approach to parse sections is elegant. However, confirm that new lines or additional whitespace won’t break the pattern. Consider more test coverage with varied formatting to ensure reliability.
Would you like a specialized test added to ensure that multiple consecutive blank lines or unusual spacing conditions are gracefully handled?
ai_research/dataset_generation/domain/assertion_test.go (2)
8-16
: Example test clarifies function usage.
Use of an example function with a comment about the expected console output is a great way to document usage.
31-36
: Empty string edge case tested.
Ensuring zero code object names when the string is empty verifies your function’s robustness for boundary conditions. Great coverage!
intentguard/app/inference_provider.py (1)
19-37
: Ensure child classes handle errors gracefully.
Since predict
is abstract, implementations must handle provider-specific error cases. For instance, network disruptions or malformed responses from external LLMs should be caught and surfaced. Consider instructing child classes to raise meaningful, domain-specific exceptions.
tests/test_intentguard.py (1)
1-6
: Clarify why logging is set to INFO.
For test environments, it might be more typical to set logging to WARNING or ERROR. Using INFO may produce extra output that can clutter test results. Consider revamping logs if they’re strictly for debugging.
ai_research/dataset_generation/cli/generate.go (1)
21-53
: Be cautious with the infinite loop.
The for { ... }
loop continuously generates examples without a break condition. If this is intentional, consider documenting or logging the reason for the endless generation. Otherwise, introduce a mechanism to exit or limit the number of iterations for safer operation.
intentguard/domain/code_object.py (1)
9-26
: Class and docstring are well-structured.
The CodeObject
class and its docstring clearly describe the purpose and usage, making it straightforward for other developers to understand and integrate.
ai_research/dataset_generation/domain/example_test.go (4)
1-2
: Test file structure is consistent.
The package layout, imports, and basic arrangement align with Go best practices for test files.
8-60
: Positive example coverage is comprehensive.
The snippet thoroughly showcases business logic examples and how they’re tested or demonstrated, aiding clarity for new contributors and maintainers.
62-102
: Contrast between assertions and actual implementations is clear.
Highlighting the negative example fosters a strong understanding of how real implementations can diverge from defined business rules. This is helpful for teaching or demonstrating code smells.
179-184
: Excellent negative test coverage.
The Test_parseExplanation
function ensures correct handling of missing explanations by verifying that an explanation is nil in a scenario where it’s not expected to exist.
intentguard/app/intentguard_options.py (1)
24-25
: LGTM! Clean implementation of the temperature parameter.
The field initialization is straightforward and type-annotated correctly.
ai_research/dataset_generation/README.md (1)
1-11
: LGTM! Clear and concise dataset overview.
The introduction and file structure are well-documented, providing a good high-level understanding of the dataset's purpose and organization.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~9-~9: Loose punctuation mark.
Context: ... and test.json
. * train.jsonl
: Contains training examples, one per lin...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~10-~10: Loose punctuation mark.
Context: ...n JSON Lines format. * test.json
: Contains a list of test examples in JSO...
(UNLIKELY_OPENING_PUNCTUATION)
ai_research/result/IntentGuard-1.Modelfile (2)
55-56
: Verify temperature and context window settings.
The temperature (0.4) is reasonable for deterministic responses. The context window (8192) should be verified against the base model's capabilities.
1-1
: Verify the base model version and compatibility.
The model file is based on IntentGuard-1.Q8_0.gguf
. Let's verify its existence and compatibility.
✅ Verification successful
The base model configuration is properly set up and verified
The model configuration is correct and well-implemented:
- The base model
IntentGuard-1.Q8_0.gguf
is properly referenced in the Modelfile - The model is downloaded from a verified source (HuggingFace) with SHA-256 checksum verification
- The infrastructure is set up to handle the model with proper error handling and verification
- The model is used with appropriate context size (8192) and server configuration
Key implementation details:
- Model URL:
https://huggingface.co/kdunee/IntentGuard-1/resolve/main/IntentGuard-1.Q8_0.gguf
- SHA-256:
0cb9476a129e7fc68b419ab86397b9ce4309b0d5faf6ba5d18629e796ca01383
- The
prepare.py
script ensures the model is downloaded and verified before use - The
llamafile.py
properly configures and uses the model for inference
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the base model exists and verify its format
# Expected: The base model should exist and be in GGUF format
# Search for the base model file
fd -e gguf 'IntentGuard-1.Q8_0'
# Verify the file header to confirm it's a GGUF model
if [ -f "IntentGuard-1.Q8_0.gguf" ]; then
# GGUF files start with 0x67676d6c
hexdump -n 4 -e '1/4 "%08x"' "IntentGuard-1.Q8_0.gguf"
fi
Length of output: 68
Script:
#!/bin/bash
# Let's check the Modelfile and related files to understand the model setup
# First, let's find all Modelfile(s) in the repository
fd Modelfile
# Then, let's examine their content to understand the model configuration
fd Modelfile -x cat {}
# Let's also check for any documentation or configuration files that might reference the model
rg -i "intentguard.*gguf" --type-add 'config:*.{json,yaml,yml,conf,config,toml}'
Length of output: 487
Script:
#!/bin/bash
# Let's examine the prepare.py script to understand the model setup
cat scripts/prepare.py
# Also check the llamafile.py to see how the model is used
cat intentguard/infrastructure/llamafile.py
# Let's verify if the infrastructure directory exists and check its contents
fd . intentguard/infrastructure/
Length of output: 9423
intentguard/infrastructure/llamafile_prompt_factory.py (1)
1-8
: LGTM: Clean imports and logger setup.
The imports are well-organized and the logger setup follows best practices.
@coderabbitai pause |
✅ Actions performedReviews paused. |
07b76b6
to
b425fe4
Compare
96833fb
to
229c837
Compare
229c837
to
b2d6519
Compare
b2d6519
to
cd67244
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
closes #5
Summary by CodeRabbit
New Features
Infrastructure
Documentation
Chores