Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codelion/hash-hop Long Context Evaluation #920

Open
1 task
ShellLM opened this issue Aug 31, 2024 · 1 comment
Open
1 task

codelion/hash-hop Long Context Evaluation #920

ShellLM opened this issue Aug 31, 2024 · 1 comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models Git-Repo Source code repository like gitlab or gh human-verified <INST>NEVER PICK THIS LABEL</INST> in-context-learning Examples of few-shot prompts for in-context learning. llm Large Language Models llm-benchmarks testing and benchmarking large language models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models MachineLearning ML Models, Training and Inference python Python code, tools, info source-code Code snippets

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Aug 31, 2024

HashHop Long Context Evaluation

This repository contains the code for HashHop, our long context architecture benchmark.

Installation Guide

Prerequisites

Steps

  1. Clone the repository:

    git clone git@github.com:magicproduct/hash-hop.git
    cd hash-hop
    
  2. Install dependencies:

    poetry install
    

Generating Evaluation Data

The MultiHopEval.make_one function generates a MultiHopSample object which can be used for either evaluation (via the targets field) or for training models on the multihop task (via the completion field).

Usage Example

from hashhop import MultiHopEval

CHARS_PER_TOKEN = 3
datapoint = MultiHopEval.make_one(
    n_chars_problem=int(1_000_000 * CHARS_PER_TOKEN),
    num_queries=5,
    hops=2,
    hash_pair_str_length=16,
    chain_of_thought=False,
)
print(datapoint.prompt)
print(datapoint.completion)
print(datapoint.targets)

Parameters

  • n_chars_problem: int
    • The size of the problem in characters.
  • num_queries: int
    • The number of queries in the completion.
  • hops: int
    • The number of hops in the reasoning chain.
  • hash_pair_str_length: int
    • The number of characters per hash.
  • chain_of_thought: bool
    • If True, the model is asked to produce H1 -> H2 -> H3.
    • If False, the model is asked to produce H1 -> H3.

Output

  • prompt: str
    • Contains the shuffled hash pairs.
  • (Used for training) completion: str
    • The queries and targets in string format
  • (Used for evaluation) targets: Dict[str, str]
    • Contains query-ground truth pairs in structured format
    • If chain of thought is false, will contain {H1: H3} (e.g. 'HETyxiWTFSVUYega': 'pChfybAJRUBmdAGC')
    • If chain of thought is true, will contain full chain {H1: H2 = H3} (e.g. 'KeiVcwXpnYIWLPmk': 'GmmNmICdvEErHgei = JhgvBFdYCnLVZBoy')

Citation

@misc{magic2024hashhop,
  author = {Magic},
  title = {HashHop: Long Context Evaluation},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\\url{https://github.com/magicproduct/hash-hop}},
}

License

MIT

Suggested labels

None

@ShellLM ShellLM added AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models Git-Repo Source code repository like gitlab or gh llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets MachineLearning ML Models, Training and Inference python Python code, tools, info source-code Code snippets labels Aug 31, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Aug 31, 2024

Related content

#309 similarity score: 0.89
#919 similarity score: 0.89
#848 similarity score: 0.89
#456 similarity score: 0.88
#811 similarity score: 0.88
#324 similarity score: 0.88

@irthomasthomas irthomasthomas changed the title hash-hop/README.md at main · codelion/hash-hop codelion/hash-hop Long Context Evaluation Aug 31, 2024
@irthomasthomas irthomasthomas added llm Large Language Models llm-experiments experiments with large language models AI-Agents Autonomous AI agents using LLMs in-context-learning Examples of few-shot prompts for in-context learning. llm-benchmarks testing and benchmarking large language models ai-leaderboards leaderdoards for llm's and other ml models human-verified <INST>NEVER PICK THIS LABEL</INST> labels Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models Git-Repo Source code repository like gitlab or gh human-verified <INST>NEVER PICK THIS LABEL</INST> in-context-learning Examples of few-shot prompts for in-context learning. llm Large Language Models llm-benchmarks testing and benchmarking large language models llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models MachineLearning ML Models, Training and Inference python Python code, tools, info source-code Code snippets
Projects
None yet
Development

No branches or pull requests

2 participants