thesis

Reading and Understanding Annual Reports Using Large Language Models LLM and Retrieval-Augmented Generation.

The primary research questions:

How can open-source utilities (libraries and models) be effectively combined with information retrieval to enhance the question-answering and understanding of financial reports?
What are the results of the RAG pipeline with using open-source utilities on real-life finance domain specific questions?

Methodology:

A comprehensive review of existing research and theoretical foundation on embeddings, LLMs, vectore storage, RAG systems, and their applications.
Using open-source LLMs and frameworks to develop a RAG system that can process and analyze financial reports. The system will be designed to run locally (without using any API calls), ensuring cost-effectiveness.
Conducting experiments to assess the quality of the system's responses to real financial queries. This includes testing the system on questions of varying complexity and specificity, as well as manual evaluation of its accuracy.

experiments board

Experiments Board LangChain Smith projects

datasets

modified edgar-crawler code to download and extract reports from the FINANCEBENCH 150 questions dataset.

questions

prepared list of 35 questions that are most commonly used in analyzing corporate reports. These questions will be used to test the system
150 questions from FINANCEBENCH

models / frameworks / databases

evaluation

Manual evaluation

additional literature

Kay x Cybersyn x LangChain. API retriver of SEC filings (can be used for evaluation)
Anyscale. Numbers every LLM Developer should know
Anyscale. Building RAG-based LLM Applications for Production (Part 1)
Anyscale. Using LoRa for fine-tuning Llama 2
Why You (Probably) Don’t Need to Fine-tune an LLM

remarks

Models like The MosaicML MPT-7B model or Llama2 7B are too large to run on Google Colab free account even though they are the smallest in their respective families. With a free colab account, 12Gb RAM memory is available. Estimation of the RAM required for 7B parameters model: model parameters 28 Gb, model overhead 3-6 Gb, I/O buffers 1-2 Gb, OS/Framework overhead 2-4 Gb, In total - 34-40 Gb.
llama.cpp is a plain C/C++ implementation of LLaMA model using 4-bit integer quantization. TheBloke/Llama-2-13B-chat-GGUF model with 13B parameters requires 11.73 Gb maximum
Quantized Llama 2 model. The recommendation is to use a 4-bit quantized model, on the largest parameter size you can run on your gpu (a rough estimate would be 1b parameters = 1gb vram).
The model size. The industry standard is 32bit per parameter, since we are talking about 13B let me give an example using that. 13B x 32bit = 13,000,000,000 parameter × 32 bit ÷ 8 bits per byte ÷1,024 bytes per kilobyte ÷ 1,024 kilobytes per megabyte ÷1,024 megabytes per gigabyte ≈ 48.43 gigabytes. Meaning in 16bit format the size is half that = 48.42... / 2 (half the bits) ≈ 24.21 gigabytes, meaning in 8bit it becomes half ≈ 12.1 gigabytes and finally in 4bit format ≈ 6 gigabytes.
The second piece of information you need to know is how you get a model loaded, the most effective and efficient way is to use VRAM since GPUs are very fast with floating point operations and matrix mathematics which means to get best results (fastest inference) you have to load it all to VRAM, now you can see why we need the quantized (smaller versions). For a 13B parameter model in 32bit format you need a little bit more than 2 x 4090 which is super expensive to get as a home setup because it will still need power and so on, while the quality of output at 4bit quantization is nearly 95% of original model, which if compared to the saving you get in model size saving is a very profitable deal.
https://oobabooga.github.io/blog/posts/perplexities/ - a link comparing and explaining the effect of quantization on model perplexity and comparison between each level
The comprehensive study of RAG systems
FAISS supports searching only from RAM, as disk databases are orders of magnitude slower, even with SSDs.
The RAG-Token Model of the the paper is a uncased model, which means that capital letters are simply converted to lower-case letters. The complete lecagy index requires over 75 GB of RAM.
There are apps LangChain and Chunkviz where you can test various chunk sizes to gain some intuition; in particular, it's worth examining where the document is split using various split sizes or strategies and whether semantically related content is unnaturally split.

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
LaTeX		LaTeX
archive		archive
datasets		datasets
edgar-crawler-modified		edgar-crawler-modified
evaluation		evaluation
notebooks		notebooks
questions		questions
retrieval		retrieval
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

thesis

experiments board

datasets

questions

models / frameworks / databases

evaluation

additional literature

remarks

About

Releases

Packages

Languages

winterForestStump/thesis

Folders and files

Latest commit

History

Repository files navigation

thesis

experiments board

datasets

questions

models / frameworks / databases

evaluation

additional literature

remarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages