Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix typos #3543

Merged
merged 71 commits into from
Jul 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
80b3f90
fix typos
RainRat May 4, 2023
e979f11
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 4, 2023
6fbea2e
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 5, 2023
f93b9b8
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 6, 2023
29bc2cf
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 7, 2023
8130017
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 7, 2023
80da028
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat May 7, 2023
8f876ba
Merge branch 'LAION-AI:main' into main
RainRat May 8, 2023
82d330d
Merge branch 'LAION-AI:main' into main
RainRat May 8, 2023
79aaef6
Merge branch 'LAION-AI:main' into main
RainRat May 10, 2023
cb3f63f
Merge branch 'LAION-AI:main' into main
RainRat May 10, 2023
148208d
Merge branch 'LAION-AI:main' into main
RainRat May 11, 2023
abb7e0e
Merge branch 'LAION-AI:main' into main
RainRat May 11, 2023
0231c38
Merge branch 'LAION-AI:main' into main
RainRat May 12, 2023
02999e5
Merge branch 'LAION-AI:main' into main
RainRat May 12, 2023
3cf47d4
Merge branch 'LAION-AI:main' into main
RainRat May 13, 2023
8519b9f
Merge branch 'LAION-AI:main' into main
RainRat May 13, 2023
e65b985
Merge branch 'LAION-AI:main' into main
RainRat May 14, 2023
0126e99
Merge branch 'LAION-AI:main' into main
RainRat May 14, 2023
c1b9ade
Merge branch 'LAION-AI:main' into main
RainRat May 15, 2023
bd98526
Merge branch 'LAION-AI:main' into main
RainRat May 15, 2023
9080ee1
Merge branch 'LAION-AI:main' into main
RainRat May 16, 2023
3baf679
Merge branch 'LAION-AI:main' into main
RainRat May 17, 2023
6f960dc
Merge branch 'LAION-AI:main' into main
RainRat May 19, 2023
63a11db
Merge branch 'LAION-AI:main' into main
RainRat May 21, 2023
95f33ec
Merge branch 'LAION-AI:main' into main
RainRat May 22, 2023
6dfba94
fix typos
RainRat May 23, 2023
223ac1a
try to fix minor formatting issues caused by typo fixes
RainRat May 23, 2023
4aef508
try fix formatting another way
RainRat May 23, 2023
25c4338
one more try at fix formatting
RainRat May 23, 2023
b809ff0
Merge branch 'LAION-AI:main' into main
RainRat May 23, 2023
879f3de
Merge branch 'LAION-AI:main' into main
RainRat May 24, 2023
32d6f80
Merge branch 'LAION-AI:main' into main
RainRat May 25, 2023
91c1d86
Merge branch 'LAION-AI:main' into main
RainRat May 27, 2023
ab7af81
Merge branch 'LAION-AI:main' into main
RainRat May 27, 2023
69d6d13
Merge branch 'LAION-AI:main' into main
RainRat May 29, 2023
818b91d
Merge branch 'LAION-AI:main' into main
RainRat May 29, 2023
e745730
Merge branch 'LAION-AI:main' into main
RainRat May 30, 2023
767c951
Merge branch 'LAION-AI:main' into main
RainRat May 31, 2023
4a71ac2
Merge branch 'LAION-AI:main' into main
RainRat Jun 2, 2023
007b402
Merge branch 'LAION-AI:main' into main
RainRat Jun 2, 2023
6d9bb3a
Merge branch 'LAION-AI:main' into main
RainRat Jun 3, 2023
c141778
Merge branch 'LAION-AI:main' into main
RainRat Jun 5, 2023
aa445df
Merge branch 'LAION-AI:main' into main
RainRat Jun 6, 2023
62baf9b
Merge branch 'LAION-AI:main' into main
RainRat Jun 7, 2023
3cec7c5
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat Jun 7, 2023
51a7173
Merge branch 'main' of https://github.com/RainRat/Open-Assistant
RainRat Jun 8, 2023
9495904
Merge branch 'LAION-AI:main' into main
RainRat Jun 9, 2023
021e58e
Merge branch 'LAION-AI:main' into main
RainRat Jun 10, 2023
19af304
Merge branch 'LAION-AI:main' into main
RainRat Jun 11, 2023
e2c622f
Merge branch 'LAION-AI:main' into main
RainRat Jun 12, 2023
b27c80c
Merge branch 'LAION-AI:main' into main
RainRat Jun 12, 2023
72225a5
Merge branch 'LAION-AI:main' into main
RainRat Jun 12, 2023
504812d
Merge branch 'LAION-AI:main' into main
RainRat Jun 13, 2023
e2eb672
Merge branch 'LAION-AI:main' into main
RainRat Jun 14, 2023
11e65f6
Merge branch 'LAION-AI:main' into main
RainRat Jun 14, 2023
037e6b8
Merge branch 'LAION-AI:main' into main
RainRat Jun 16, 2023
bdce3bf
Merge branch 'LAION-AI:main' into main
RainRat Jun 17, 2023
fd45186
Merge branch 'LAION-AI:main' into main
RainRat Jun 18, 2023
5cb66e8
Merge branch 'LAION-AI:main' into main
RainRat Jun 19, 2023
44d83d4
Merge branch 'LAION-AI:main' into main
RainRat Jun 20, 2023
ff060cb
Merge branch 'LAION-AI:main' into main
RainRat Jun 20, 2023
ddf63a6
Merge branch 'LAION-AI:main' into main
RainRat Jun 23, 2023
ac71c2b
Merge branch 'LAION-AI:main' into main
RainRat Jun 26, 2023
779974e
Merge branch 'LAION-AI:main' into main
RainRat Jun 30, 2023
39876a5
Merge branch 'LAION-AI:main' into main
RainRat Jul 4, 2023
73ce63f
fix typos
RainRat Jul 4, 2023
2a953c6
change whitespace to follow formatting
RainRat Jul 4, 2023
e15bedf
Merge branch 'LAION-AI:main' into main
RainRat Jul 12, 2023
e94feec
Merge branch 'LAION-AI:main' into main
RainRat Jul 16, 2023
2c0a502
Merge branch 'LAION-AI:main' into main
RainRat Jul 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"\n",
"this list was build from https://anvaka.github.io/redsim. Can be used to expand the list of favourable subreddits.\n",
"\n",
"takeing these for now"
"taking these for now"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion data/datasets/oa_leet10k/oa_leet10k.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"Takes this Kaggle dataset 'leetcode-solutions'\n",
"https://www.kaggle.com/datasets/erichartford/leetcode-solutions, and turns them into basic\n",
"dialogue using a preset list of user prompt tempaltes."
"dialogue using a preset list of user prompt templates."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion data/datasets/poetry_instruction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Dataset Structure This dataset follows the OA format, which is:

INSTRUCTION (string): The user asks for a poem (from a variety of premade
prompts) with topics (tags). If the given poem has no tags, the user asks for a
poem on it's own.
poem on its own.

RESPONSE (string): The assistant replies with the poem and title (from a variety
of premade prompts).
Expand Down
2 changes: 1 addition & 1 deletion data/datasets/recipes/tasty_recipes.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Takes this Kaggle dataset 'Recipes from Tasty' https://www.kaggle.com/datasets/zeeenb/recipes-from-tasty?select=ingredient_and_instructions.json, and turns them into basic dialogue using a preset list of user prompt tempaltes."
"Takes this Kaggle dataset 'Recipes from Tasty' https://www.kaggle.com/datasets/zeeenb/recipes-from-tasty?select=ingredient_and_instructions.json, and turns them into basic dialogue using a preset list of user prompt templates."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion data/datasets/safety_directory/child_help/child_help.py
Original file line number Diff line number Diff line change
Expand Up @@ -951,7 +951,7 @@
"Parent-Child Support Line": {
"region": "Hong Kong (China)",
"page": "https://childhelplineinternational.org/hong-kong-china-parent-child-support-line/",
"description": "Operated by Action Againt Abuse (ACA), the Parent-Child Support Line provides service where parents, children, professionals and the public can call the hotline 2755 1122, or go to the ACA centre to report suspected child abuse cases or ask questions about any issues they are facing. It is also a support and hotline for children to express their voices and opinions. The personal data and case content of the data provider/reporter are kept strictly confidential.",
"description": "Operated by Action Against Abuse (ACA), the Parent-Child Support Line provides service where parents, children, professionals and the public can call the hotline 2755 1122, or go to the ACA centre to report suspected child abuse cases or ask questions about any issues they are facing. It is also a support and hotline for children to express their voices and opinions. The personal data and case content of the data provider/reporter are kept strictly confidential.",
"contacts": {
"Website": {"type": "website", "link": "https://www.aca.org.hk/index.php#.YmRbANNBw-Q"},
"116 111": {"type": "phone", "link": "tel:"},
Expand Down
6 changes: 3 additions & 3 deletions data/datasets/tv_dialogue/imsdb.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -231,9 +231,9 @@
" text += f\"{speaker}\\r\\n\"\n",
" if not re.findall(r\"\\[.+?\\] .+?\\r\\n\\r\\n\\[.+?\\] .+?\\r\\n\\r\\n\", text):\n",
" return \"\"\n",
" first_occurance = re.findall(r\"\\[.+?\\] \", text)[0]\n",
" if len(re.findall(re.escape(first_occurance), text)) == 1:\n",
" text = re.sub(re.escape(first_occurance), f\"{first_occurance[1:-2]}\\r\\n\", text)\n",
" first_occurrence = re.findall(r\"\\[.+?\\] \", text)[0]\n",
" if len(re.findall(re.escape(first_occurrence), text)) == 1:\n",
" text = re.sub(re.escape(first_occurrence), f\"{first_occurrence[1:-2]}\\r\\n\", text)\n",
"\n",
" text = text.replace(\"&\", \"&\")\n",
" text = \"\\r\\n\".join(text.splitlines())\n",
Expand Down
4 changes: 2 additions & 2 deletions data/datasets/zhihu-kol/convert_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pandas as pd


def reformat_csv_to_openassitant(df: pd.DataFrame) -> pd.DataFrame:
def reformat_csv_to_openassistant(df: pd.DataFrame) -> pd.DataFrame:
"""
Reformat the downloaded CSV into either Instruction or Text format
so that it could be directly ingested into the training pipeline.
Expand Down Expand Up @@ -44,6 +44,6 @@ def reformat_csv_to_openassitant(df: pd.DataFrame) -> pd.DataFrame:
input_csv = "zhihu.csv"
# Create a pandas dataframe from your dataset file(s)
df = pd.read_csv(input_csv) # or any other way
df = reformat_csv_to_openassitant(df)
df = reformat_csv_to_openassistant(df)
# Save the file in the Parquet format
df.to_parquet("dataset.parquet", row_group_size=100, engine="pyarrow", index=False)
4 changes: 2 additions & 2 deletions data/datasets/zhihu-kol/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def get_answer_content(qid: str, aid) -> str:
return content


def reformat_csv_to_openassitant(df: pd.DataFrame) -> pd.DataFrame:
def reformat_csv_to_openassistant(df: pd.DataFrame) -> pd.DataFrame:
"""
Reformat the downloaded CSV into either Instruction or Text format
so that it could be directly ingested into the training pipeline.
Expand Down Expand Up @@ -226,7 +226,7 @@ def start(qid: str, aid: str):
start(qid, aid)
multitasking.wait_for_tasks()
df["回答内容"] = df["问题ID"].apply(lambda x: content_list[x])
updated_df = reformat_csv_to_openassitant(df)
updated_df = reformat_csv_to_openassistant(df)
updated_df.to_csv(csv_path, encoding="utf-8-sig", index=None)
bar.close()
print(f"url_token 为 {url_token} 的用户回答数据已存储到文件:{csv_path}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ image: https://img.youtube.com/vi/5IymlBZDw-0/0.jpg

import ReactPlayer from "react-player";

Livestream playing around with Open Assistant and AI allignement :)
Livestream playing around with Open Assistant and AI alignment :)

https://open-assistant.io/chat

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/research/retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ i.e. the 7B can utilize 40 nearest neighbor chunks, a 172M model only 10 NNs.
### Bertsch et al. 2023: Unlimiformer: Long-Range Transformers with Unlimited Length Input

Idea: Use retrieval to actually maximize overlap of "query embeddings" with
embeddings from an encoder (in a encoder-decoder architecture). Essentially it
embeddings from an encoder (in an encoder-decoder architecture). Essentially it
is an ideal approximation of the softmax in the Cross-Attention over all
previous tokens (in the encoder inputs).

Expand Down
2 changes: 1 addition & 1 deletion inference/worker/chat_chain.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ def handle_conversation(
id="1",
chat_id="1",
parent_id=None,
content="Hello, my name is Open Assisstant, how i can help you today?",
content="Hello, my name is Open Assistant, how i can help you today?",
created_at=datetime.datetime.now(),
role="assistant",
state=inference.MessageState.complete,
Expand Down
2 changes: 1 addition & 1 deletion model/model_training/configs/ppo_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ model:
tokenizer:
tokenizer_path:
truncation_side: "left" # AKo: changed this to "left", otherwise it sees only the beginning of conversations
padding_side: "left" # use with rotary positional embedidngs
padding_side: "left" # use with rotary positional embeddings

optimizer:
name: "adamw"
Expand Down
4 changes: 2 additions & 2 deletions model/model_training/tests/test_dialogue_data_collator.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def test_dataset_entry_no_context(pythia_tokenizer):
# check if targets are as expected
assert pythia_tokenizer.decode(batch.targets[0]) == expected_decoded_targets[0]

# check if masking is correct. Note that we mask the system aswell
# check if masking is correct. Note that we mask the system as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a swollen rump, to "too" 😂

assert pythia_tokenizer.decode(batch.input_ids[0][batch.label_masks[0]]) == expected_masked[0]


Expand Down Expand Up @@ -114,7 +114,7 @@ def test_dataset_entry(pythia_tokenizer):
assert pythia_tokenizer.decode(batch.targets[0]) == expected_decoded_targets[0]
assert pythia_tokenizer.decode(batch.targets[1]) == expected_decoded_targets[1]

# check if masking is correct. Note that we mask the system aswell
# check if masking is correct. Note that we mask the system as well
assert pythia_tokenizer.decode(batch.input_ids[0][batch.label_masks[0]]) == expected_masked[0]
assert pythia_tokenizer.decode(batch.input_ids[1][batch.label_masks[1]]) == expected_masked[1]

Expand Down
2 changes: 1 addition & 1 deletion model/model_training/tools/sample_rm_data.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Recursive method to travese down the the conversation tree
Recursive method to traverse down the conversation tree

Use fastlangid for language identification :
>> pip install fastlangid
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@
"metadata": {},
"source": [
"#### Convert red-teaming data to prosocial format\n",
"- Red-teaming data is laballed for the whole conversation\n",
"- Red-teaming data is labelled for the whole conversation\n",
"- Pro-social data is labelled per dialog \n",
"- The `episode_done` flag indicates the end of a conversation"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,7 @@
}
],
"source": [
"# Test to see if it was sucessful\n",
"# Test to see if it was successful\n",
"table = pq.read_table(\"output.parquet\")\n",
"table.to_pandas()"
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"id": "o0lAqmWhsiUe"
},
"source": [
"The goal of this notebook is to use data argumentation to have data on improving essays. The way this is done is by taking a template \"good\" essay and making step by step changes that make it worse and add intructions on how to fix it."
"The goal of this notebook is to use data argumentation to have data on improving essays. The way this is done is by taking a template \"good\" essay and making step by step changes that make it worse and add instructions on how to fix it."
]
},
{
Expand Down Expand Up @@ -165,7 +165,7 @@
},
"outputs": [],
"source": [
"# Make stucture error (shuffle one paragraph with another)\n",
"# Make structure error (shuffle one paragraph with another)\n",
"essay_paragraphs = essay.split(\"\\n\\n\") # Splitting a String by newline character (\\n)\n",
"\n",
"rand1 = random.randint(0, len(essay_paragraphs) - 1)\n",
Expand Down Expand Up @@ -286,7 +286,7 @@
}
],
"source": [
"# Prints intrcutions (final step)\n",
"# Prints instructions (final step)\n",
"for i in instructions:\n",
" print(i)\n",
"instructions.clear()"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@
" ## make sure that characters are the same\n",
" if char_id_new == char_ids:\n",
" lineids_new = [int(lineid[1:]) for lineid in data[\"utterance\"][\"LineID\"]]\n",
" if lineids_new[0] == (lineids[-1] + 1): ##ensure continuety\n",
" if lineids_new[0] == (lineids[-1] + 1): ##ensure continuity\n",
" lineids.extend(lineids_new)\n",
" else:\n",
" break\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/data-augmentation/unified-qa/unified-qa.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -737,7 +737,7 @@
" conv.append(\"I have some common sense questions for you to answer.\")\n",
" conv.append(\"Okay, I can try to answer your questions while using common sense. Please provide the question.\")\n",
" conv.append(q)\n",
" conv.append(\"The commmon sense answer would be: \" + a)\n",
" conv.append(\"The common sense answer would be: \" + a)\n",
" return conv\n",
"\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/data-augmentation/wikidata-qa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ print(wg.generate("Q115564437"))

>>> Questions and Answers on ChatGPT (also known as GPT-3.5, Generative Pre-trained Transformer):

Q: What is the offical website for Generative Pre-trained Transformer?
Q: What is the official website for Generative Pre-trained Transformer?
A: Its web address is: https://chat.openai.com/chat

Q: What's ChatGPT's license?
Expand Down
12 changes: 6 additions & 6 deletions notebooks/data-augmentation/wikidata-qa/wikidata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@
" \"What's {name}'s capital city? Thank you in advance!\",\n",
" ],\n",
" \"P37\": [\n",
" \"Tell me what the offical language of {name} is?\",\n",
" \"Tell me what the official language of {name} is?\",\n",
" \"What language do they speak in {name}?\",\n",
" \"How do they speak in {name}?\",\n",
" \"What languages they understand in {name}?\",\n",
Expand Down Expand Up @@ -368,7 +368,7 @@
" \"Who crated {name}?\",\n",
" \"Who is {name}'s creator?\",\n",
" \"Who made {name}?\",\n",
" \"Who is reponsible for {name}?\",\n",
" \"Who is responsible for {name}?\",\n",
" ],\n",
" \"P225\": [\n",
" \"Describe {name} to me in latin.\",\n",
Expand Down Expand Up @@ -477,7 +477,7 @@
" \"Give me the URL for {name}.\",\n",
" \"What's the URL for {name}? Thanks!\",\n",
" \"What's {name}'s website?\",\n",
" \"What is the offical website for {name}?\",\n",
" \"What is the official website for {name}?\",\n",
" \"Can you tell me the link to {name}?\",\n",
" ],\n",
" \"P973\": [\n",
Expand Down Expand Up @@ -587,7 +587,7 @@
" \"P36\": [\"Do you know {pos} capital?\", \"What is {pos} capital called?\", \"What's the name of {pos} capital?\"],\n",
" \"P37\": [\n",
" \"Describe {pos} official language.\",\n",
" \"What is {pos} offical language?\",\n",
" \"What is {pos} official language?\",\n",
" \"What language do they speak there?\",\n",
" ],\n",
" \"P38\": [\n",
Expand Down Expand Up @@ -713,7 +713,7 @@
" \"P856\": [\n",
" \"Send me {pos} web address.\",\n",
" \"What's the address of {pos} website?\",\n",
" \"What is {pos} offical website?\",\n",
" \"What is {pos} official website?\",\n",
" \"Can you tell me the link to {obj}?\",\n",
" ],\n",
" \"P973\": [\n",
Expand Down Expand Up @@ -1180,7 +1180,7 @@
"Q: How big is Budim?\r\n",
"A: Budim's area is 52514 hectare\n",
"\n",
"Q: What is its offical website?\r\n",
"Q: What is its official website?\r\n",
"A: The URLs for Budapest are: https://budapest.hu and https://budapest.hu/sites/english/\n",
"\n",
"Q: Tell me who its governor is!\r\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"where `stripped_constraint` is the constraint found.\n",
"\n",
"* Answer beginning constraints: this constraint was imposed by the way the answer should start. \n",
"> Base template, starting with: {beggining} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beggining}:\\n{story}\n",
"> Base template, starting with: {beginning} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beginning}:\\n{story}\n",
"\n",
"where `beginning` is the first sentence of a story.\n",
"\n",
Expand Down Expand Up @@ -1009,7 +1009,7 @@
" dialogBase = \"\"\"User: write me a story about: {stripped_prompt}\"\"\"\n",
" dialog1 = \"\"\" -> Rosey: Sure, here's a story about: {stripped_prompt}:\\n{story}\"\"\"\n",
" dialog2 = \"\"\", {stripped_constraint} -> Rosey: Sure, here's a story about: {stripped_prompt}, {stripped_constraint}:\\n{story}\"\"\"\n",
" dialog3 = \"\"\", starting with: {beggining} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beggining}:\\n{story}\"\"\"\n",
" dialog3 = \"\"\", starting with: {beginning} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beginning}:\\n{story}\"\"\"\n",
" dialog4 = \"\"\", ending with: {ending} -> Rosey: Sure, here's a story about {stripped_prompt}: ending with: {ending}\\n{story}\"\"\"\n",
" dialog5 = \"\"\", where the middle of the story is about: {middle} -> Rosey: Sure, here's a story about: {stripped_prompt}, where the middle of the story is about: {middle}:\\n{story}\"\"\"\n",
"\n",
Expand Down Expand Up @@ -1041,7 +1041,7 @@
" stripped_prompt=strippedPrompt, stripped_constraint=strippedConstraint, story=story\n",
" )\n",
" dialogs.append(get_sample_dict(row.split, row.splitIndex, dialog))\n",
" dialog = dialogBeg + dialog3.format(stripped_prompt=strippedPrompt, story=story, beggining=beginning)\n",
" dialog = dialogBeg + dialog3.format(stripped_prompt=strippedPrompt, story=story, beginning=beginning)\n",
" dialogs.append(get_sample_dict(row.split, row.splitIndex, dialog))\n",
" dialog = dialogBeg + dialog4.format(stripped_prompt=strippedPrompt, story=story, ending=ending)\n",
" dialogs.append(get_sample_dict(row.split, row.splitIndex, dialog))\n",
Expand Down Expand Up @@ -1087,7 +1087,7 @@
" dialogBase = \"\"\"User: write me a story about: {stripped_prompt}\"\"\"\n",
" dialog1 = \"\"\" -> Rosey: Sure, here's a story about: {stripped_prompt}:\\n{story}\"\"\"\n",
" dialog2 = \"\"\", {stripped_constraint} -> Rosey: Sure, here's a story about: {stripped_prompt}, {stripped_constraint}:\\n{story}\"\"\"\n",
" dialog3 = \"\"\", starting with: {beggining} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beggining}:\\n{story}\"\"\"\n",
" dialog3 = \"\"\", starting with: {beginning} -> Rosey: Sure, here's a story about: {stripped_prompt}, starting with: {beginning}:\\n{story}\"\"\"\n",
" dialog4 = \"\"\", ending with: {ending} -> Rosey: Sure, here's a story about {stripped_prompt}: ending with: {ending}\\n{story}\"\"\"\n",
" dialog5 = \"\"\", where the middle of the story is about: {middle} -> Rosey: Sure, here's a story about: {stripped_prompt}, where the middle of the story is about: {middle}:\\n{story}\"\"\"\n",
"\n",
Expand Down Expand Up @@ -1117,7 +1117,7 @@
" if beginning is not None:\n",
" beginning, middles, ending = extract_story_parts(story)\n",
" dialog = dialogBeg + dialog3.format(\n",
" stripped_prompt=strippedPrompt, story=story, beggining=beginning\n",
" stripped_prompt=strippedPrompt, story=story, beginning=beginning\n",
" )\n",
" dialogs.append(get_sample_dict(row.split, row.splitLineIndex, dialog))\n",
" dialog = dialogBeg + dialog4.format(stripped_prompt=strippedPrompt, story=story, ending=ending)\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/detoxify-evaluation/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Detoxify evaluation

[Detoxify](https://github.com/unitaryai/detoxify) is a open source model used to
identify prompts as toxic
[Detoxify](https://github.com/unitaryai/detoxify) is an open source model used
to identify prompts as toxic

<img src="https://raw.githubusercontent.com/unitaryai/detoxify/master/examples.png" alt="Image from detoxify github that shows the example input/output of their model" />

Expand Down
2 changes: 1 addition & 1 deletion notebooks/detoxify-evaluation/detoxify-evaluation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"[Detoxify](https://github.com/unitaryai/detoxify) is a open source model used to identify prompts as toxic\n",
"[Detoxify](https://github.com/unitaryai/detoxify) is an open source model used to identify prompts as toxic\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/unitaryai/detoxify/master/examples.png\" alt=\"Image from detoxify github that shows the example input/output of their model\" />\n",
"\n",
Expand Down
8 changes: 4 additions & 4 deletions oasst-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,10 @@ messages are those which have a `review_result` that is `false`.

Conversation threads are a linear lists of messages. THese objects can be
identified by the presence of the `"thread_id"` property which contains the UUID
of the last message of the the thread (which can be used to reconstruct the
thread by returning the list of ancestor messages up to the prompt root
message). The message_id of the first message is normally also the id of the
message-tree that contains the thread.
of the last message of the thread (which can be used to reconstruct the thread
by returning the list of ancestor messages up to the prompt root message). The
message_id of the first message is normally also the id of the message-tree that
contains the thread.

```json
{
Expand Down
Loading