Skip to content

Commit

Permalink
Creating 10.21105.joss.07211.jats
Browse files Browse the repository at this point in the history
  • Loading branch information
editorialbot committed Jan 24, 2025
1 parent 423de6e commit 0ae4755
Showing 1 changed file with 373 additions and 0 deletions.
373 changes: 373 additions & 0 deletions joss.07211/paper.jats/10.21105.joss.07211.jats
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN"
"JATS-publishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
<front>
<journal-meta>
<journal-id></journal-id>
<journal-title-group>
<journal-title>Journal of Open Source Software</journal-title>
<abbrev-journal-title>JOSS</abbrev-journal-title>
</journal-title-group>
<issn publication-format="electronic">2475-9066</issn>
<publisher>
<publisher-name>Open Journals</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">7211</article-id>
<article-id pub-id-type="doi">10.21105/joss.07211</article-id>
<title-group>
<article-title>ollamar: An R package for running large language
models</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4590-7039</contrib-id>
<name>
<surname>Lin</surname>
<given-names>Hause</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0000-5659-9890</contrib-id>
<name>
<surname>Safi</surname>
<given-names>Tawab</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<aff id="aff-1">
<institution-wrap>
<institution>Massachusetts Institute of Technology, USA</institution>
</institution-wrap>
</aff>
</contrib-group>
<pub-date date-type="pub" publication-format="electronic" iso-8601-date="2024-11-21">
<day>21</day>
<month>11</month>
<year>2024</year>
</pub-date>
<volume>10</volume>
<issue>105</issue>
<fpage>7211</fpage>
<permissions>
<copyright-statement>Authors of papers retain copyright and release the
work under a Creative Commons Attribution 4.0 International License (CC
BY 4.0)</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>The article authors</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Authors of papers retain copyright and release the work under
a Creative Commons Attribution 4.0 International License (CC BY
4.0)</license-p>
</license>
</permissions>
<kwd-group kwd-group-type="author">
<kwd>R</kwd>
<kwd>large language models</kwd>
<kwd>Ollama</kwd>
<kwd>natural language processing</kwd>
<kwd>artificial intelligence</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="summary">
<title>Summary</title>
<p>Large language models (LLMs) have transformed natural language
processing and AI applications across numerous domains. While
cloud-based LLMs are common, locally deployed models offer distinct
advantages in reproducibility, data privacy, security, and
customization. <monospace>ollamar</monospace> is an R package that
provides an interface to Ollama, enabling researchers and data
scientists to integrate locally-hosted LLMs into their R workflows
seamlessly. It implements a consistent API design that aligns with
other programming languages and follows established LLM usage
conventions. It further distinguishes itself by offering flexible
output formats and easy management of conversation history.
<monospace>ollamar</monospace> is maintained on GitHub and available
through the Comprehensive R Archive Network (CRAN), where it regularly
undergoes comprehensive continuous integration testing across multiple
platforms.</p>
</sec>
<sec id="state-of-the-field">
<title>State of the Field</title>
<p>The increasing importance of LLMs in various fields has created a
demand for accessible tools that allow researchers and practitioners
to leverage LLMs within their preferred programming environments.
Locally deployed LLMs offer advantages in terms of data privacy,
security, reproducibility, and customization, making them an
attractive option for many users
(<xref alt="Chan et al., 2024" rid="ref-Chan2024Aug" ref-type="bibr">Chan
et al., 2024</xref>;
<xref alt="Liu et al., 2024" rid="ref-Liu2024Aug" ref-type="bibr">Liu
et al., 2024</xref>;
<xref alt="Lytvyn, 2024" rid="ref-Lytvyn2024Jun" ref-type="bibr">Lytvyn,
2024</xref>;
<xref alt="Shostack, 2024" rid="ref-Shostack2024Mar" ref-type="bibr">Shostack,
2024</xref>). Currently, Ollama (https://ollama.com/) is one of the
most popular tools for running locally hosted LLMs, offering access to
a range of models with different sizes and capabilities. Several R
packages currently facilitate interaction with locally deployed LLMs
through Ollama, each with distinct approaches, capabilities, and
limitations.</p>
<p>The <monospace>rollama</monospace>
(<xref alt="Gruber &amp; Weber, 2024" rid="ref-Gruber2024Apr" ref-type="bibr">Gruber
&amp; Weber, 2024</xref>) and <monospace>tidyllm</monospace>
(<xref alt="Brüll, 2024" rid="ref-Brull2024" ref-type="bibr">Brüll,
2024</xref>) libraries focus on text generation, conversations, and
text embedding, but their core functions do not always or necessarily
mirror the official Ollama API endpoints, which lead to
inconsistencies and confusion for users familiar with the official
API. Additionally, these libraries may not support all Ollama
endpoints and features. Another popular R library is
<monospace>tidychatmodels</monospace>
(<xref alt="Albert, 2024" rid="ref-Rapp2024" ref-type="bibr">Albert,
2024</xref>), which allows users to chat with different LLMs, but it
is not available on CRAN, and therefore is not subject to the same
level of testing and quality assurance as CRAN packages.</p>
<p>All these libraries also adopt the tidyverse workflow, which some
may find restrictive, opinionated, or unfamiliar. While the tidyverse
is popular in the R community, it may not align with the programming
style or workflow of all users, especially those coming from other
programming languages or domains. This limitation can hinder the
accessibility and usability of these libraries for a broader audience
of R users. Thus, the R ecosystem lacks a simple and reliable library
to interface with Ollama, even though R is a popular and crucial tool
in statistics, data science, and various research domains
(<xref alt="Hill et al., 2024" rid="ref-Hill2024May" ref-type="bibr">Hill
et al., 2024</xref>).</p>
</sec>
<sec id="statement-of-need">
<title>Statement of Need</title>
<p><monospace>ollamar</monospace> addresses the limitations of
existing R libraries by providing a consistent API design that mirrors
official Ollama endpoints, enabling seamless integration across
programming environments. It also offers flexible output formats
supporting dataframes, JSON lists, raw strings, and text vectors,
allowing users to choose the format that best suits their needs. The
package also includes independent conversation management tools that
align with industry-standard chat formats, streamlining the
integration of locally deployed LLMs into R workflows. These features
make <monospace>ollamar</monospace> a versatile and user-friendly tool
for researchers and data scientists working with LLMs in R. It fills a
critical gap in the R ecosystem by providing a native interface to run
locally deployed LLMs, and is already being used by researchers and
practitioners
(<xref alt="Turner, 2024" rid="ref-Turner2024Aug" ref-type="bibr">Turner,
2024</xref>).</p>
</sec>
<sec id="design">
<title>Design</title>
<p><monospace>ollamar</monospace> implements a modular,
non-opinionated, and consistent approach that aligns with established
software engineering principles, where breaking down complex systems
into manageable components enhances maintainability, reusability, and
overall performance. It also avoids feature bloat, ensuring the
library remains focused on its core functionality. The key design
philosophy of <monospace>ollamar</monospace> are described below.</p>
<p><bold>Consistency and maintainability</bold>: It provides an
interface to the Ollama server and all API endpoints, closely
following the official API design, where each function corresponds to
a specific endpoint. This implementation ensures the library is
consistent with the official Ollama API and easy to maintain and
extend as new features are added to Ollama. It also makes it easy for
R users to understand how similar libraries (such as in Python and
JavaScript) work while allowing users familiar with other programming
languages to adapt to and use this library quickly. The consistent API
structure across languages facilitates seamless transitions and
knowledge transfer for developers working in multi-language
environments.</p>
<p><bold>Consistent and flexible output formats</bold>: All functions
that call API endpoints return
<monospace>httr2::httr2_response</monospace> objects by default, which
provides a consistent interface for flexible downstream processing. If
preferred, users have the option to specify different output formats
when calling the endpoints, such as dataframes
(<monospace>&quot;df&quot;</monospace>), lists (of JSON objects)
(<monospace>&quot;jsonlist&quot;</monospace>), raw strings
(<monospace>&quot;raw&quot;</monospace>), text vectors
(<monospace>&quot;text&quot;</monospace>), or request objects
(<monospace>&quot;req&quot;</monospace>). Alternatively, use the
<monospace>resp_process()</monospace> function to convert and process
the response object as needed. This flexibility allows users to choose
the format that best suits their needs, such as when working with
different data structures, integrating the output with other R
packages, or allowing parallelization via the popular
<monospace>httr2</monospace> library. It also allows users to easily
build applications or pipelines on top of the library, without being
constrained by a specific output format.</p>
<p><bold>Easy management of LLM conversation history</bold>: LLM APIs
often expect conversation/chat history data as input, often nested
lists or JSON objects. Note that this data format is standard for
chat-based applications and APIs (not limited to Ollama), such as
those provided by OpenAI and Anthropic. <monospace>ollamar</monospace>
simplifies preparing and processing conversational data for input to
different LLMs, focusing on streamlining the workflow for the most
popular chat-based applications.</p>
<p><bold>Automated regular testing</bold>:
<monospace>ollamar</monospace> is hosted on GitHub and available
through CRAN, where it undergoes comprehensive continuous integration
testing across multiple platforms to ensure reliability. Daily
automated quality checks maintain long-term stability, and scheduled
tests verify version compatibility.</p>
</sec>
<sec id="conclusion">
<title>Conclusion</title>
<p><monospace>ollamar</monospace> bridges a crucial gap in the R
ecosystem by providing seamless access to large language models
through Ollama. Its focus on consistency, flexibility, and
maintainability makes it a versatile and user-friendly tool for
researchers and data scientists working with LLMs in R. These design
choices ensure <monospace>ollamar</monospace> is a user-friendly and
reliable tool for integrating locally deployed LLMs into R workflows,
accelerating research and development in fields relying on R for data
analysis and machine learning.</p>
</sec>
<sec id="acknowledgements">
<title>Acknowledgements</title>
<p>This project was partially supported by the Canadian Social
Sciences &amp; Humanities Research Council Tri-Agency Funding (funding
reference: 192324).</p>
</sec>
</body>
<back>
<ref-list>
<title></title>
<ref id="ref-Turner2024Aug">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Turner</surname><given-names>Stephen D.</given-names></name>
</person-group>
<article-title>biorecap: an R package for summarizing bioRxiv preprints with a local LLM</article-title>
<source>arXiv</source>
<year iso-8601-date="2024-08">2024</year><month>08</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<pub-id pub-id-type="doi">10.48550/arXiv.2408.11707</pub-id>
</element-citation>
</ref>
<ref id="ref-Hill2024May">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Hill</surname><given-names>Chelsey</given-names></name>
<name><surname>Du</surname><given-names>Lanqing</given-names></name>
<name><surname>Johnson</surname><given-names>Marina</given-names></name>
<name><surname>McCullough</surname><given-names>B. D.</given-names></name>
</person-group>
<article-title>Comparing programming languages for data analytics: Accuracy of estimation in Python and R</article-title>
<source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
<publisher-name>John Wiley &amp; Sons, Ltd</publisher-name>
<year iso-8601-date="2024-05">2024</year><month>05</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<volume>14</volume>
<issue>3</issue>
<issn>1942-4787</issn>
<pub-id pub-id-type="doi">10.1002/widm.1531</pub-id>
<fpage>e1531</fpage>
<lpage></lpage>
</element-citation>
</ref>
<ref id="ref-Gruber2024Apr">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Gruber</surname><given-names>Johannes B.</given-names></name>
<name><surname>Weber</surname><given-names>Maximilian</given-names></name>
</person-group>
<article-title>rollama: An R package for using generative large language models through Ollama</article-title>
<source>arXiv</source>
<year iso-8601-date="2024-04">2024</year><month>04</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<pub-id pub-id-type="doi">10.48550/arXiv.2404.07654</pub-id>
</element-citation>
</ref>
<ref id="ref-Liu2024Aug">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Liu</surname><given-names>Fei</given-names></name>
<name><surname>Kang</surname><given-names>Zejun</given-names></name>
<name><surname>Han</surname><given-names>Xing</given-names></name>
</person-group>
<article-title>Optimizing RAG techniques for automotive industry PDF chatbots: A case study with locally deployed Ollama models</article-title>
<source>arXiv</source>
<year iso-8601-date="2024-08">2024</year><month>08</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<pub-id pub-id-type="doi">10.48550/arXiv.2408.05933</pub-id>
</element-citation>
</ref>
<ref id="ref-Shostack2024Mar">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Shostack</surname><given-names>Adam</given-names></name>
</person-group>
<article-title>The boy who survived: Removing Harry Potter from an LLM is harder than reported</article-title>
<source>arXiv</source>
<year iso-8601-date="2024-03">2024</year><month>03</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<pub-id pub-id-type="doi">10.48550/arXiv.2403.12082</pub-id>
</element-citation>
</ref>
<ref id="ref-Lytvyn2024Jun">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Lytvyn</surname><given-names>Oleksandr</given-names></name>
</person-group>
<article-title>Enhancing propaganda detection with open source language models: A comparative study</article-title>
<source>Proceedings of the MEi:CogSci Conference</source>
<year iso-8601-date="2024-06">2024</year><month>06</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<volume>18</volume>
<issue>1</issue>
<issn>2960-5911</issn>
<uri>https://journals.phl.univie.ac.at/meicogsci/article/view/822</uri>
</element-citation>
</ref>
<ref id="ref-Chan2024Aug">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Chan</surname><given-names>Ryan Sze-Yin</given-names></name>
<name><surname>Nanni</surname><given-names>Federico</given-names></name>
<name><surname>Brown</surname><given-names>Edwin</given-names></name>
<name><surname>Chapman</surname><given-names>Ed</given-names></name>
<name><surname>Williams</surname><given-names>Angus R.</given-names></name>
<name><surname>Bright</surname><given-names>Jonathan</given-names></name>
<name><surname>Gabasova</surname><given-names>Evelina</given-names></name>
</person-group>
<article-title>Prompto: An open source library for asynchronous querying of LLM endpoints</article-title>
<source>arXiv</source>
<year iso-8601-date="2024-08">2024</year><month>08</month>
<date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
<pub-id pub-id-type="doi">10.48550/arXiv.2408.11847</pub-id>
</element-citation>
</ref>
<ref id="ref-Rapp2024">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Albert</surname><given-names>Rapp</given-names></name>
</person-group>
<article-title>Tidychatmodels: Chat with all kinds of AI models through a common interface</article-title>
<source></source>
<year iso-8601-date="2024">2024</year>
<volume></volume>
<issue></issue>
<uri>https://github.com/AlbertRapp/tidychatmodels</uri>
<fpage></fpage>
<lpage></lpage>
</element-citation>
</ref>
<ref id="ref-Brull2024">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Brüll</surname><given-names>Eduard</given-names></name>
</person-group>
<article-title>Tidyllm: Tidy integration of large language models</article-title>
<source>CRAN: Contributed Packages</source>
<year iso-8601-date="2024">2024</year>
<uri>https://edubruell.github.io/tidyllm/</uri>
<pub-id pub-id-type="doi">10.32614/cran.package.tidyllm</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</article>

0 comments on commit 0ae4755

Please sign in to comment.