Creating 10.21105.joss.07211.jats

openjournals · Jan 24, 2025 · 0ae4755 · 0ae4755
1 parent 423de6e
commit 0ae4755
Showing 1 changed file with 373 additions and 0 deletions.
diff --git a/joss.07211/paper.jats/10.21105.joss.07211.jats b/joss.07211/paper.jats/10.21105.joss.07211.jats
@@ -0,0 +1,373 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN"
+                  "JATS-publishing1.dtd">
+<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
+<front>
+<journal-meta>
+<journal-id></journal-id>
+<journal-title-group>
+<journal-title>Journal of Open Source Software</journal-title>
+<abbrev-journal-title>JOSS</abbrev-journal-title>
+</journal-title-group>
+<issn publication-format="electronic">2475-9066</issn>
+<publisher>
+<publisher-name>Open Journals</publisher-name>
+</publisher>
+</journal-meta>
+<article-meta>
+<article-id pub-id-type="publisher-id">7211</article-id>
+<article-id pub-id-type="doi">10.21105/joss.07211</article-id>
+<title-group>
+<article-title>ollamar: An R package for running large language
+models</article-title>
+</title-group>
+<contrib-group>
+<contrib contrib-type="author">
+<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4590-7039</contrib-id>
+<name>
+<surname>Lin</surname>
+<given-names>Hause</given-names>
+</name>
+<xref ref-type="aff" rid="aff-1"/>
+</contrib>
+<contrib contrib-type="author">
+<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0000-5659-9890</contrib-id>
+<name>
+<surname>Safi</surname>
+<given-names>Tawab</given-names>
+</name>
+<xref ref-type="aff" rid="aff-1"/>
+</contrib>
+<aff id="aff-1">
+<institution-wrap>
+<institution>Massachusetts Institute of Technology, USA</institution>
+</institution-wrap>
+</aff>
+</contrib-group>
+<pub-date date-type="pub" publication-format="electronic" iso-8601-date="2024-11-21">
+<day>21</day>
+<month>11</month>
+<year>2024</year>
+</pub-date>
+<volume>10</volume>
+<issue>105</issue>
+<fpage>7211</fpage>
+<permissions>
+<copyright-statement>Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)</copyright-statement>
+<copyright-year>2025</copyright-year>
+<copyright-holder>The article authors</copyright-holder>
+<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
+<license-p>Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)</license-p>
+</license>
+</permissions>
+<kwd-group kwd-group-type="author">
+<kwd>R</kwd>
+<kwd>large language models</kwd>
+<kwd>Ollama</kwd>
+<kwd>natural language processing</kwd>
+<kwd>artificial intelligence</kwd>
+</kwd-group>
+</article-meta>
+</front>
+<body>
+<sec id="summary">
+  <title>Summary</title>
+  <p>Large language models (LLMs) have transformed natural language
+  processing and AI applications across numerous domains. While
+  cloud-based LLMs are common, locally deployed models offer distinct
+  advantages in reproducibility, data privacy, security, and
+  customization. <monospace>ollamar</monospace> is an R package that
+  provides an interface to Ollama, enabling researchers and data
+  scientists to integrate locally-hosted LLMs into their R workflows
+  seamlessly. It implements a consistent API design that aligns with
+  other programming languages and follows established LLM usage
+  conventions. It further distinguishes itself by offering flexible
+  output formats and easy management of conversation history.
+  <monospace>ollamar</monospace> is maintained on GitHub and available
+  through the Comprehensive R Archive Network (CRAN), where it regularly
+  undergoes comprehensive continuous integration testing across multiple
+  platforms.</p>
+</sec>
+<sec id="state-of-the-field">
+  <title>State of the Field</title>
+  <p>The increasing importance of LLMs in various fields has created a
+  demand for accessible tools that allow researchers and practitioners
+  to leverage LLMs within their preferred programming environments.
+  Locally deployed LLMs offer advantages in terms of data privacy,
+  security, reproducibility, and customization, making them an
+  attractive option for many users
+  (<xref alt="Chan et al., 2024" rid="ref-Chan2024Aug" ref-type="bibr">Chan
+  et al., 2024</xref>;
+  <xref alt="Liu et al., 2024" rid="ref-Liu2024Aug" ref-type="bibr">Liu
+  et al., 2024</xref>;
+  <xref alt="Lytvyn, 2024" rid="ref-Lytvyn2024Jun" ref-type="bibr">Lytvyn,
+  2024</xref>;
+  <xref alt="Shostack, 2024" rid="ref-Shostack2024Mar" ref-type="bibr">Shostack,
+  2024</xref>). Currently, Ollama (https://ollama.com/) is one of the
+  most popular tools for running locally hosted LLMs, offering access to
+  a range of models with different sizes and capabilities. Several R
+  packages currently facilitate interaction with locally deployed LLMs
+  through Ollama, each with distinct approaches, capabilities, and
+  limitations.</p>
+  <p>The <monospace>rollama</monospace>
+  (<xref alt="Gruber &amp; Weber, 2024" rid="ref-Gruber2024Apr" ref-type="bibr">Gruber
+  &amp; Weber, 2024</xref>) and <monospace>tidyllm</monospace>
+  (<xref alt="Brüll, 2024" rid="ref-Brull2024" ref-type="bibr">Brüll,
+  2024</xref>) libraries focus on text generation, conversations, and
+  text embedding, but their core functions do not always or necessarily
+  mirror the official Ollama API endpoints, which lead to
+  inconsistencies and confusion for users familiar with the official
+  API. Additionally, these libraries may not support all Ollama
+  endpoints and features. Another popular R library is
+  <monospace>tidychatmodels</monospace>
+  (<xref alt="Albert, 2024" rid="ref-Rapp2024" ref-type="bibr">Albert,
+  2024</xref>), which allows users to chat with different LLMs, but it
+  is not available on CRAN, and therefore is not subject to the same
+  level of testing and quality assurance as CRAN packages.</p>
+  <p>All these libraries also adopt the tidyverse workflow, which some
+  may find restrictive, opinionated, or unfamiliar. While the tidyverse
+  is popular in the R community, it may not align with the programming
+  style or workflow of all users, especially those coming from other
+  programming languages or domains. This limitation can hinder the
+  accessibility and usability of these libraries for a broader audience
+  of R users. Thus, the R ecosystem lacks a simple and reliable library
+  to interface with Ollama, even though R is a popular and crucial tool
+  in statistics, data science, and various research domains
+  (<xref alt="Hill et al., 2024" rid="ref-Hill2024May" ref-type="bibr">Hill
+  et al., 2024</xref>).</p>
+</sec>
+<sec id="statement-of-need">
+  <title>Statement of Need</title>
+  <p><monospace>ollamar</monospace> addresses the limitations of
+  existing R libraries by providing a consistent API design that mirrors
+  official Ollama endpoints, enabling seamless integration across
+  programming environments. It also offers flexible output formats
+  supporting dataframes, JSON lists, raw strings, and text vectors,
+  allowing users to choose the format that best suits their needs. The
+  package also includes independent conversation management tools that
+  align with industry-standard chat formats, streamlining the
+  integration of locally deployed LLMs into R workflows. These features
+  make <monospace>ollamar</monospace> a versatile and user-friendly tool
+  for researchers and data scientists working with LLMs in R. It fills a
+  critical gap in the R ecosystem by providing a native interface to run
+  locally deployed LLMs, and is already being used by researchers and
+  practitioners
+  (<xref alt="Turner, 2024" rid="ref-Turner2024Aug" ref-type="bibr">Turner,
+  2024</xref>).</p>
+</sec>
+<sec id="design">
+  <title>Design</title>
+  <p><monospace>ollamar</monospace> implements a modular,
+  non-opinionated, and consistent approach that aligns with established
+  software engineering principles, where breaking down complex systems
+  into manageable components enhances maintainability, reusability, and
+  overall performance. It also avoids feature bloat, ensuring the
+  library remains focused on its core functionality. The key design
+  philosophy of <monospace>ollamar</monospace> are described below.</p>
+  <p><bold>Consistency and maintainability</bold>: It provides an
+  interface to the Ollama server and all API endpoints, closely
+  following the official API design, where each function corresponds to
+  a specific endpoint. This implementation ensures the library is
+  consistent with the official Ollama API and easy to maintain and
+  extend as new features are added to Ollama. It also makes it easy for
+  R users to understand how similar libraries (such as in Python and
+  JavaScript) work while allowing users familiar with other programming
+  languages to adapt to and use this library quickly. The consistent API
+  structure across languages facilitates seamless transitions and
+  knowledge transfer for developers working in multi-language
+  environments.</p>
+  <p><bold>Consistent and flexible output formats</bold>: All functions
+  that call API endpoints return
+  <monospace>httr2::httr2_response</monospace> objects by default, which
+  provides a consistent interface for flexible downstream processing. If
+  preferred, users have the option to specify different output formats
+  when calling the endpoints, such as dataframes
+  (<monospace>&quot;df&quot;</monospace>), lists (of JSON objects)
+  (<monospace>&quot;jsonlist&quot;</monospace>), raw strings
+  (<monospace>&quot;raw&quot;</monospace>), text vectors
+  (<monospace>&quot;text&quot;</monospace>), or request objects
+  (<monospace>&quot;req&quot;</monospace>). Alternatively, use the
+  <monospace>resp_process()</monospace> function to convert and process
+  the response object as needed. This flexibility allows users to choose
+  the format that best suits their needs, such as when working with
+  different data structures, integrating the output with other R
+  packages, or allowing parallelization via the popular
+  <monospace>httr2</monospace> library. It also allows users to easily
+  build applications or pipelines on top of the library, without being
+  constrained by a specific output format.</p>
+  <p><bold>Easy management of LLM conversation history</bold>: LLM APIs
+  often expect conversation/chat history data as input, often nested
+  lists or JSON objects. Note that this data format is standard for
+  chat-based applications and APIs (not limited to Ollama), such as
+  those provided by OpenAI and Anthropic. <monospace>ollamar</monospace>
+  simplifies preparing and processing conversational data for input to
+  different LLMs, focusing on streamlining the workflow for the most
+  popular chat-based applications.</p>
+  <p><bold>Automated regular testing</bold>:
+  <monospace>ollamar</monospace> is hosted on GitHub and available
+  through CRAN, where it undergoes comprehensive continuous integration
+  testing across multiple platforms to ensure reliability. Daily
+  automated quality checks maintain long-term stability, and scheduled
+  tests verify version compatibility.</p>
+</sec>
+<sec id="conclusion">
+  <title>Conclusion</title>
+  <p><monospace>ollamar</monospace> bridges a crucial gap in the R
+  ecosystem by providing seamless access to large language models
+  through Ollama. Its focus on consistency, flexibility, and
+  maintainability makes it a versatile and user-friendly tool for
+  researchers and data scientists working with LLMs in R. These design
+  choices ensure <monospace>ollamar</monospace> is a user-friendly and
+  reliable tool for integrating locally deployed LLMs into R workflows,
+  accelerating research and development in fields relying on R for data
+  analysis and machine learning.</p>
+</sec>
+<sec id="acknowledgements">
+  <title>Acknowledgements</title>
+  <p>This project was partially supported by the Canadian Social
+  Sciences &amp; Humanities Research Council Tri-Agency Funding (funding
+  reference: 192324).</p>
+</sec>
+</body>
+<back>
+<ref-list>
+  <title></title>
+  <ref id="ref-Turner2024Aug">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Turner</surname><given-names>Stephen D.</given-names></name>
+      </person-group>
+      <article-title>biorecap: an R package for summarizing bioRxiv preprints with a local LLM</article-title>
+      <source>arXiv</source>
+      <year iso-8601-date="2024-08">2024</year><month>08</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <pub-id pub-id-type="doi">10.48550/arXiv.2408.11707</pub-id>
+    </element-citation>
+  </ref>
+  <ref id="ref-Hill2024May">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Hill</surname><given-names>Chelsey</given-names></name>
+        <name><surname>Du</surname><given-names>Lanqing</given-names></name>
+        <name><surname>Johnson</surname><given-names>Marina</given-names></name>
+        <name><surname>McCullough</surname><given-names>B. D.</given-names></name>
+      </person-group>
+      <article-title>Comparing programming languages for data analytics: Accuracy of estimation in Python and R</article-title>
+      <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
+      <publisher-name>John Wiley &amp; Sons, Ltd</publisher-name>
+      <year iso-8601-date="2024-05">2024</year><month>05</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <volume>14</volume>
+      <issue>3</issue>
+      <issn>1942-4787</issn>
+      <pub-id pub-id-type="doi">10.1002/widm.1531</pub-id>
+      <fpage>e1531</fpage>
+      <lpage></lpage>
+    </element-citation>
+  </ref>
+  <ref id="ref-Gruber2024Apr">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Gruber</surname><given-names>Johannes B.</given-names></name>
+        <name><surname>Weber</surname><given-names>Maximilian</given-names></name>
+      </person-group>
+      <article-title>rollama: An R package for using generative large language models through Ollama</article-title>
+      <source>arXiv</source>
+      <year iso-8601-date="2024-04">2024</year><month>04</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <pub-id pub-id-type="doi">10.48550/arXiv.2404.07654</pub-id>
+    </element-citation>
+  </ref>
+  <ref id="ref-Liu2024Aug">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Liu</surname><given-names>Fei</given-names></name>
+        <name><surname>Kang</surname><given-names>Zejun</given-names></name>
+        <name><surname>Han</surname><given-names>Xing</given-names></name>
+      </person-group>
+      <article-title>Optimizing RAG techniques for automotive industry PDF chatbots: A case study with locally deployed Ollama models</article-title>
+      <source>arXiv</source>
+      <year iso-8601-date="2024-08">2024</year><month>08</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <pub-id pub-id-type="doi">10.48550/arXiv.2408.05933</pub-id>
+    </element-citation>
+  </ref>
+  <ref id="ref-Shostack2024Mar">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Shostack</surname><given-names>Adam</given-names></name>
+      </person-group>
+      <article-title>The boy who survived: Removing Harry Potter from an LLM is harder than reported</article-title>
+      <source>arXiv</source>
+      <year iso-8601-date="2024-03">2024</year><month>03</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <pub-id pub-id-type="doi">10.48550/arXiv.2403.12082</pub-id>
+    </element-citation>
+  </ref>
+  <ref id="ref-Lytvyn2024Jun">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Lytvyn</surname><given-names>Oleksandr</given-names></name>
+      </person-group>
+      <article-title>Enhancing propaganda detection with open source language models: A comparative study</article-title>
+      <source>Proceedings of the MEi:CogSci Conference</source>
+      <year iso-8601-date="2024-06">2024</year><month>06</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <volume>18</volume>
+      <issue>1</issue>
+      <issn>2960-5911</issn>
+      <uri>https://journals.phl.univie.ac.at/meicogsci/article/view/822</uri>
+    </element-citation>
+  </ref>
+  <ref id="ref-Chan2024Aug">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Chan</surname><given-names>Ryan Sze-Yin</given-names></name>
+        <name><surname>Nanni</surname><given-names>Federico</given-names></name>
+        <name><surname>Brown</surname><given-names>Edwin</given-names></name>
+        <name><surname>Chapman</surname><given-names>Ed</given-names></name>
+        <name><surname>Williams</surname><given-names>Angus R.</given-names></name>
+        <name><surname>Bright</surname><given-names>Jonathan</given-names></name>
+        <name><surname>Gabasova</surname><given-names>Evelina</given-names></name>
+      </person-group>
+      <article-title>Prompto: An open source library for asynchronous querying of LLM endpoints</article-title>
+      <source>arXiv</source>
+      <year iso-8601-date="2024-08">2024</year><month>08</month>
+      <date-in-citation content-type="access-date"><year iso-8601-date="2024-08-24">2024</year><month>08</month><day>24</day></date-in-citation>
+      <pub-id pub-id-type="doi">10.48550/arXiv.2408.11847</pub-id>
+    </element-citation>
+  </ref>
+  <ref id="ref-Rapp2024">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Albert</surname><given-names>Rapp</given-names></name>
+      </person-group>
+      <article-title>Tidychatmodels: Chat with all kinds of AI models through a common interface</article-title>
+      <source></source>
+      <year iso-8601-date="2024">2024</year>
+      <volume></volume>
+      <issue></issue>
+      <uri>https://github.com/AlbertRapp/tidychatmodels</uri>
+      <fpage></fpage>
+      <lpage></lpage>
+    </element-citation>
+  </ref>
+  <ref id="ref-Brull2024">
+    <element-citation publication-type="article-journal">
+      <person-group person-group-type="author">
+        <name><surname>Brüll</surname><given-names>Eduard</given-names></name>
+      </person-group>
+      <article-title>Tidyllm: Tidy integration of large language models</article-title>
+      <source>CRAN: Contributed Packages</source>
+      <year iso-8601-date="2024">2024</year>
+      <uri>https://edubruell.github.io/tidyllm/</uri>
+      <pub-id pub-id-type="doi">10.32614/cran.package.tidyllm</pub-id>
+    </element-citation>
+  </ref>
+</ref-list>
+</back>
+</article>