Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Oct 3, 2024
1 parent 517ed33 commit ad40f7b
Show file tree
Hide file tree
Showing 6 changed files with 944 additions and 167 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
4eda9161
c81fd013
10 changes: 5 additions & 5 deletions docs/dataset-formats/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -363,39 +363,39 @@ <h1 class="title">Dataset Formats</h1>
</tr>
</thead>
<tbody class="list">
<tr data-index="0" data-listing-file-modified-sort="1727718988927" data-listing-reading-time-sort="1" data-listing-word-count-sort="47" data-listing-title-sort="Pre-training" data-listing-filename-sort="pretraining.qmd">
<tr data-index="0" data-listing-file-modified-sort="1727917381030" data-listing-reading-time-sort="1" data-listing-word-count-sort="47" data-listing-title-sort="Pre-training" data-listing-filename-sort="pretraining.qmd">
<td>
<a href="../../docs/dataset-formats/pretraining.html" class="title listing-title">Pre-training</a>
</td>
<td>
<span class="listing-description">Data format for a pre-training completion task.</span>
</td>
</tr>
<tr data-index="1" data-listing-file-modified-sort="1727718988927" data-listing-reading-time-sort="2" data-listing-word-count-sort="308" data-listing-title-sort="Instruction Tuning" data-listing-filename-sort="inst_tune.qmd">
<tr data-index="1" data-listing-file-modified-sort="1727917381030" data-listing-reading-time-sort="2" data-listing-word-count-sort="308" data-listing-title-sort="Instruction Tuning" data-listing-filename-sort="inst_tune.qmd">
<td>
<a href="../../docs/dataset-formats/inst_tune.html" class="title listing-title">Instruction Tuning</a>
</td>
<td>
<span class="listing-description">Instruction tuning formats for supervised fine-tuning.</span>
</td>
</tr>
<tr data-index="2" data-listing-file-modified-sort="1727718988926" data-listing-reading-time-sort="2" data-listing-word-count-sort="254" data-listing-title-sort="Conversation" data-listing-filename-sort="conversation.qmd">
<tr data-index="2" data-listing-file-modified-sort="1727917381030" data-listing-reading-time-sort="2" data-listing-word-count-sort="254" data-listing-title-sort="Conversation" data-listing-filename-sort="conversation.qmd">
<td>
<a href="../../docs/dataset-formats/conversation.html" class="title listing-title">Conversation</a>
</td>
<td>
<span class="listing-description">Conversation format for supervised fine-tuning.</span>
</td>
</tr>
<tr data-index="3" data-listing-file-modified-sort="1727718988927" data-listing-reading-time-sort="1" data-listing-word-count-sort="3" data-listing-title-sort="Template-Free" data-listing-filename-sort="template_free.qmd">
<tr data-index="3" data-listing-file-modified-sort="1727917381030" data-listing-reading-time-sort="1" data-listing-word-count-sort="3" data-listing-title-sort="Template-Free" data-listing-filename-sort="template_free.qmd">
<td>
<a href="../../docs/dataset-formats/template_free.html" class="title listing-title">Template-Free</a>
</td>
<td>
<span class="listing-description">Construct prompts without a template.</span>
</td>
</tr>
<tr data-index="4" data-listing-file-modified-sort="1727718988927" data-listing-reading-time-sort="1" data-listing-word-count-sort="92" data-listing-title-sort="Custom Pre-Tokenized Dataset" data-listing-filename-sort="tokenized.qmd">
<tr data-index="4" data-listing-file-modified-sort="1727917381030" data-listing-reading-time-sort="1" data-listing-word-count-sort="92" data-listing-title-sort="Custom Pre-Tokenized Dataset" data-listing-filename-sort="tokenized.qmd">
<td>
<a href="../../docs/dataset-formats/tokenized.html" class="title listing-title">Custom Pre-Tokenized Dataset</a>
</td>
Expand Down
2 changes: 1 addition & 1 deletion docs/input_output.html
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ <h3 class="anchored" data-anchor-id="check-the-prompts">3. Check the prompts</h3
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="op">&gt;&gt;&gt;</span> <span class="bu">print</span>(tok.decode(row[<span class="st">'input_ids'</span>]))</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="op">&lt;</span>s<span class="op">&gt;</span> Hello</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> hi there<span class="op">!</span>. goodbye farewell<span class="op">&lt;/</span>s<span class="op">&gt;</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>We can check that the right tokens are ingored by comparing the labels to each token:</p>
<p>We can check that the right tokens are ignored by comparing the labels to each token:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>pd.DataFrame([{<span class="st">'token'</span>: tok.decode(i), <span class="st">'label'</span>: l, <span class="st">'id'</span>:i} <span class="cf">for</span> i,l <span class="kw">in</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">zip</span>(row[<span class="st">'input_ids'</span>], row[<span class="st">'labels'</span>])])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
Loading

0 comments on commit ad40f7b

Please sign in to comment.