Skip to content

docs: add glossary #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/instructions/genaiscript.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
applyTo: "**/*.genai.*"
---

## GenAIScript Code Generation Instructions

GenAIScript is a custom runtime for node.js. It provides a set of unique APIs and support the TypeScript syntax, ESM, await/async.

- GenAIScript documentation: https://microsoft.github.io/genaiscript/llms-full.txt
- GenAIScript ambient type definitions: https://microsoft.github.io/genaiscript/genaiscript.d.ts

## Guidance for Code Generation

- you always generate JavaScript code using ESM modules for Node.JS.
- you prefer using APIs from GenAIScript `genaiscript.d.ts` rather than node.js. Do NOT use node.js imports.
- you keep the code simple, avoid exception handlers or error checking.
- you add `TODOs` where you are unsure so that the user can review them
- you use the global types in genaiscript.d.ts are already loaded in the global context, no need to import them.
- save generated code with `.genai.js` extension
3 changes: 3 additions & 0 deletions docs/scripts/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
genaiscript.d.ts
tsconfig.json
jsconfig.json
175 changes: 175 additions & 0 deletions docs/scripts/glossary.genai.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
// GenAIScript to create a shared glossary from markdown files in lessons
script({
title: "Generate glossary from lessons",
description: "Process all markdown files under /lessons/** to create a shared glossary.md file",
parameters: {
force: {
type: "boolean",
description: "Force regeneration of the entire glossary",
default: false,
},
},
temperature: 0.1,
});

// Find all markdown files under lessons, excluding translations
const files = await workspace.findFiles("lessons/**/*.md", {
ignore: "**/translations/**",
});

console.log(`Found ${files.length} markdown files to process`);

// Check if glossary.md already exists
const glossaryPath = "glossary.md";
let existingGlossary = "";
try {
const glossaryFile = await workspace.readText(glossaryPath);
existingGlossary = glossaryFile?.content || "";
console.log("Found existing glossary.md, will extend it");
} catch (error) {
console.log("No existing glossary.md found, will create new one");
}

// Extract existing terms from glossary if it exists
const existingTerms = new Set();
if (existingGlossary) {
const termMatches = existingGlossary.matchAll(/^- \*\*([^*]+)\*\*/gm);
for (const match of termMatches) {
existingTerms.add(match[1].toLowerCase());
}
console.log(`Found ${existingTerms.size} existing terms in glossary`);
}

// Process each markdown file
let allContent = "";
for (const file of files) {
console.log(`Processing: ${file.filename}`);
const fileContent = await workspace.readText(file.filename);
const content = fileContent?.content || "";
allContent += `\n\n--- ${file.filename} ---\n\n${content}`;
}

// Create the prompt for extracting technical terms
const { text: newTermsResponse } = await prompt`
You are tasked with creating a comprehensive glossary of technical terms from the provided content.

## Content to analyze:
${allContent}

## Instructions:
1. Extract technical terms from the content to analyze related to:
- Generative AI and Machine Learning concepts
- Programming and development terms
- Web development technologies
- APIs and software development concepts
- AI/ML frameworks and tools
- Data science and computational terms

2. For each term, provide a concise one-line definition (maximum 20 words)

3. Focus on terms that would be valuable for developers learning about AI and JavaScript. Avoid terms that are too basic or not relevant to the context of AI and JavaScript development

4. Exclude thise terms and concepts:
- Common programming terms that most developers would know (like "function", "variable", "array")
- Historical terms or concepts that are only there for the storytelling aspect of the lessons
- Terms that are too similar to existing terms. For example, "Chain of Thought" and "Chain of Thought Prompting" are too similar and should not both be included.

5. Format each entry as: **Term**: Definition

6. It's OK to not output anything if no new terms are found. In that case, just return an empty string.

${
existingTerms.size > 0
? `## Existing terms to avoid duplicating:
${Array.from(existingTerms).join(", ")}`
: ""
}

## Output format:
Provide only the glossary entries, one per line, sorted alphabetically. Do not include any headers, explanations, or other text.
`;

// Combine existing and new terms
let finalGlossary = "";
let glossarySize, previousSize = 0;

if (existingGlossary && !env.vars.force) {
// Parse existing glossary and add new terms
const lines = existingGlossary.split("\n");
const headerEndIndex = lines.findIndex(
(line) => line.trim() === "" && lines[lines.indexOf(line) - 1]?.includes("technical terms")
);

if (headerEndIndex > 0) {
// Keep existing header
finalGlossary = lines.slice(0, headerEndIndex + 1).join("\n") + "\n";
} else {
// Create new header
finalGlossary = `# Glossary\n\nA comprehensive list of technical terms used throughout the lessons.\n\n`;
}

// Get existing entries
const existingEntries = [];
const termPattern = /^- \*\*([^*]+)\*\*: (.+)$/gm;
let match;
while ((match = termPattern.exec(existingGlossary)) !== null) {
existingEntries.push({ term: match[1], definition: match[2] });
}

// Parse new entries
const newEntries = [];
const newTermLines = newTermsResponse.split("\n").filter((line) => line.trim());
for (const line of newTermLines) {
const termMatch = line.match(/\*\*([^*]+)\*\*:\s*(.+)/);
if (termMatch) {
const term = termMatch[1].trim();
const definition = termMatch[2].trim();
if (!existingTerms.has(term.toLowerCase())) {
newEntries.push({ term, definition });
}
}
}

// Combine and sort all entries
const allEntries = [...existingEntries, ...newEntries];
allEntries.sort((a, b) => a.term.toLowerCase().localeCompare(b.term.toLowerCase()));

// Add all entries to glossary
for (const entry of allEntries) {
finalGlossary += `- **${entry.term}**: ${entry.definition}\n`;
}

previousSize = existingEntries.length;
glossarySize = allEntries.length;
console.log(`Added ${newEntries.length} new terms to existing glossary`);
} else {
// Create completely new glossary
finalGlossary = `# Glossary\n\nA comprehensive list of technical terms used throughout the lessons.\n\n`;

const newTermLines = newTermsResponse.split("\n").filter((line) => line.trim());
const entries = [];

for (const line of newTermLines) {
const termMatch = line.match(/\*\*([^*]+)\*\*:\s*(.+)/);
if (termMatch) {
entries.push({ term: termMatch[1].trim(), definition: termMatch[2].trim() });
}
}

entries.sort((a, b) => a.term.toLowerCase().localeCompare(b.term.toLowerCase()));

for (const entry of entries) {
finalGlossary += `- **${entry.term}**: ${entry.definition}\n`;
}

glossarySize = entries.length;
console.log(`Created new glossary with ${entries.length} terms`);
}

// Write the glossary file
await workspace.writeText(glossaryPath, finalGlossary);
console.log(`Glossary saved to ${glossaryPath}`);

env.output.appendContent(`Glossary generated with ${glossarySize} terms (previously ${previousSize} terms).\n\n`);
env.output.appendContent(`Glossary saved to \`${glossaryPath}\`.\n`);
env.output.appendContent(`Make sure to perform a manual review before committing the changes to ensure accuracy and relevance of the terms.\n\n`);
44 changes: 44 additions & 0 deletions glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Glossary

A comprehensive list of technical terms used throughout the lessons.

- **API**: A set of rules enabling software applications to communicate with each other, commonly used in generative AI integration.
- **API key**: A private key used to authenticate requests to an application programming interface.
- **Augmented Prompt**: A prompt enhanced with additional context or information to improve the relevance of AI-generated responses.
- **Azure AI Studio**: A platform to build, evaluate, and deploy AI models using Microsoft Azure.
- **Azure OpenAI**: A cloud service for deploying and scaling OpenAI models like GPT for applications.
- **Caesar cipher**: A substitution cipher shifting characters by a fixed number of places in the alphabet.
- **Chain-of-Thought Prompting**: A technique guiding models to break down complex tasks into sequential reasoning steps for better accuracies in outputs.
- **Chatbot**: An application designed to simulate conversation with human users, often using natural language processing.
- **Completions API**: API to generate text or code based on inputs, used for predictive or generative tasks in AI models.
- **Context Window**: The amount of past input that a language model can consider when generating responses, measured in tokens.
- **CSV**: A data format consisting of values separated by commas, often used for structured data retrieval and modification.
- **Embedding**: Numeric vector representation of data, often used for semantic search or clustering in machine learning.
- **Escape Hatch**: A technique instructing AI to admit lack of knowledge when data is insufficient to ensure accurate responses.
- **Few-Shot Prompting**: A method of providing minimal examples to the model to influence its output with specific context or format.
- **Full-Stack Development**: Development of both the client (frontend) and server (backend) in software applications.
- **Function Calling**: A method for passing structured prompt data into specific functions within an application programmatically.
- **GitHub Codespaces**: A cloud-based environment for coding, testing, and running applications directly from GitHub repositories.
- **GitHub Models**: A platform hosting pre-trained AI models for use and integration with GitHub development workflows.
- **GitHub Token**: An authentication method to access GitHub-hosted APIs or services securely.
- **Interactive Development Environments (IDEs)**: Software providing coding, debugging, and testing tools for developers.
- **JSON**: A lightweight data-interchange format used for structured information exchange between systems, including generative AI responses.
- **Knowledge Bases**: Data repositories used to enhance AI applications by providing reliable, domain-specific information.
- **LangChain**: A framework for building AI applications that focus on chaining multiple models and functionalities together.
- **Large Language Model (LLM)**: AI models trained on large text datasets to generate human-like responses for diverse applications.
- **Maieutic Prompting**: A technique involving follow-up queries to challenge or validate AI-generated responses for accuracy and reasoning.
- **Managed Identity**: A secure cloud mechanism that provides applications with automatic authentication to access resources without managing passwords.
- **Markdown**: A lightweight markup language for formatting plain text into structured layouts, like tables or lists.
- **Meta Prompts**: Instructions added before a user's prompt to refine or restrict the AI's behavior and output format.
- **Multimodal Capabilities**: AI functionality to process various formats like text, image, or video input and deliver diverse outputs.
- **Node.js**: A runtime environment allowing developers to execute JavaScript code server-side for building scalable applications.
- **OpenAI**: A pioneering organization in AI research and APIs for language models integrated into applications for generative tasks.
- **Prompt Engineering**: The process of crafting effective prompts to guide AI models toward desired responses and behaviors.
- **RAG (Retrieval-Augmented Generation)**: A technique combining retrieval-based methods with generative models for more accurate, data-grounded outputs.
- **Semantic Search**: Search method leveraging the meaning of terms for more contextually accurate and nuanced results.
- **Structured Output**: Data output organized in predefined formats like tables or JSON, enabling easier integration with systems.
- **System Message**: A prompt in conversational AI that specifies contextual boundaries or personality for the assistant.
- **TensorFlow.js**: A JavaScript-based machine learning library enabling browser and Node.js-based AI/ML applications and training.
- **Tokenizer**: A tool used to convert text into tokens, providing structure for how data is inputted or analyzed by models.
- **Vector Search**: Retrieval technique comparing encoded vectors to find semantically similar information in AI applications.
- **XML**: A markup language formatting structured data for information storage, exchange, or generative model input/output.