Skip to content

Commit de98478

Browse files
committed
Update README with evaluate command usage
1 parent 5cd740e commit de98478

File tree

1 file changed

+60
-2
lines changed

1 file changed

+60
-2
lines changed

README.md

+60-2
Original file line numberDiff line numberDiff line change
@@ -84,12 +84,70 @@ file.
8484

8585
## Evaluating an Implementation
8686

87-
Comming soon...
87+
RubyTuner provides an `evaluate` command to assess various evaluation criteria
88+
for generated content and the original implementation of a feature. This command is
89+
useful for testing the output of fine-tuned models or for comparing different
90+
implementations.
91+
92+
### Usage
93+
94+
```bash
95+
ruby_tuner evaluate FEATURE_ID [IMPLEMENTATION]
96+
```
97+
98+
**Parameters:**
99+
100+
- `FEATURE_ID`: The ID of the feature to evaluate (required).
101+
- `IMPLEMENTATION`: The implementation to evaluate (optional).
102+
103+
**Options:**
104+
105+
* `--similarity-method METHOD`: Specify the similarity method to use (`tf_idf` or `exact`; default: `tf_idf`).
106+
* `--acceptance-score SCORE`: Set the similarity score that passes evaluation (default: `0.8`).
107+
* `--file PATH`: Specify a file containing the implementation to evaluate.
108+
109+
### Examples
110+
111+
Evaluate an inline implementation:
112+
113+
```bash
114+
ruby_tuner evaluate sort-array "def sort_array(arr); arr.sort; end"
115+
```
116+
117+
Evaluate an implementation from a file:
88118

89119
```bash
90-
ruby_tuner evaluate your-feature-description
120+
ruby_tuner evaluate sort-array --file ./implementations/sort_array.rb
121+
```
122+
123+
Evaluate an implementation from standard input:
124+
125+
```bash
126+
echo "def sort_array(arr); arr.sort; end" | ruby_tuner evaluate sort-array
127+
```
128+
129+
Use a different similarity method and threshold:
130+
131+
```
132+
ruby_tuner evaluate sort-array --similarity-method exact --similarity-threshold 0.9 "def sort_array(arr); arr.sort; end"
91133
```
92134

135+
### How it works
136+
137+
The evaluate command compares the provided implementation with the original
138+
implementation stored in the feature's directory. It uses the specified
139+
similarity method to calculate a similarity score and determines if the
140+
implementation passes based on the similarity threshold.
141+
142+
This command is particularly useful for:
143+
144+
* Assessing the quality of generated code from fine-tuned models
145+
* Comparing different implementations of the same feature
146+
* Validating machine-generated code against human-written implementations
147+
148+
The evaluation results, including similarity scores and pass/fail status, will
149+
be displayed in the console output.
150+
93151
## Generating Training Data
94152

95153
Comming soon...

0 commit comments

Comments
 (0)