Skip to content

Latest commit

 

History

History
126 lines (85 loc) · 7.59 KB

data-science-overview.md

File metadata and controls

126 lines (85 loc) · 7.59 KB

From building data pipelines to productionizing machine learning models, Kotlin can be a great choice for working with data:

  • Kotlin is concise, readable, and easy to learn.
  • Static typing and null safety help create reliable, maintainable code that is easy to troubleshoot.
  • Being a JVM language, Kotlin gives you great performance and an ability to leverage an entire ecosystem of tried and true Java libraries.

Interactive editors

Notebooks such as Kotlin Notebook, Jupyter Notebook, and Datalore provide convenient tools for data visualization and exploratory research. Kotlin integrates with these tools to help you explore data, share your findings with colleagues, or build up your data science and machine learning skills.

Kotlin Notebook

The Kotlin Notebook is a plugin for IntelliJ IDEA that allows you to create notebooks in Kotlin. It leverages the Kotlin kernel for executing the cells and harnesses the powerful Kotlin IDE support to offer real-time code insights. It is now the preferred method for working with Kotlin notebooks. Be sure to check out our blog post about it.

Kotlin Notebook{width=800}

Kotlin Notebooks in Datalore

With Datalore, you can use Kotlin in the browser straight out of the box, no installation required. You can also collaborate on Kotlin notebooks in real time, get smart coding assistance when writing code, and share results as interactive or static reports. Check out a sample report.

Kotlin in Datalore{width=800}

Sign up and use Kotlin with a free Datalore Community account.

Jupyter Kotlin kernel

The Jupyter Notebook is an open-source web application that allows you to create and share documents (aka "notebooks") that can contain code, visualizations, and Markdown text. Kotlin-jupyter is an open source project that brings Kotlin support to Jupyter Notebook.

Kotlin in Jupyter notebook{width=800}

Check out Kotlin kernel's GitHub repo for installation instructions, documentation, and examples.

Libraries

The ecosystem of libraries for data-related tasks created by the Kotlin community is rapidly expanding. Here are some libraries that you may find useful:

Kotlin libraries

  • Kotlin DataFrame is a library for structured data processing. It aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPLs.

  • Kandy is an open-source plotting library for the JVM written in Kotlin. It provides a powerful and flexible DSL for chart creation, along with seamless integration with Kotlin Notebook and Kotlin DataFrame.

  • Multik: multidimensional arrays in Kotlin. The library provides Kotlin-idiomatic, type- and dimension-safe API for mathematical operations over multidimensional arrays. Multik offers swappable JVM and native computational engines, and a combination of the two for optimal performance.

  • KotlinDL is a high-level Deep Learning API written in Kotlin and inspired by Keras. It offers simple APIs for training deep learning models from scratch, importing existing Keras models for inference, and leveraging transfer learning for tweaking existing pre-trained models to your tasks.

  • Kotlin for Apache Spark adds a missing layer of compatibility between Kotlin and Apache Spark. It allows Kotlin developers to use familiar language features such as data classes, and lambda expressions as simple expressions in curly braces or method references.

  • kmath is an experimental library that was initially inspired by NumPy but evolved to more flexible abstractions. It implements mathematical operations combined in algebraic structures over Kotlin types, defines APIs for linear structures, expressions, histograms, streaming operations, provides interchangeable wrappers over existing Java and Kotlin libraries including ND4J, Commons Math, Multik, and others.

  • lets-plot is a plotting library for statistical data written in Kotlin. Lets-Plot is multiplatform and can be used not only with JVM, but also with JS and Python.

  • kravis is another library for the visualization of tabular data inspired by R's ggplot.

  • londogard-nlp-toolkit is a library that provides utilities when working with natural language processing such as word/subword/sentence embeddings, word-frequencies, stopwords, stemming, and much more.

Java libraries

Since Kotlin provides first-class interop with Java, you can also use Java libraries for data science in your Kotlin code. Here are some examples of such libraries:

  • DeepLearning4J - a deep learning library for Java

  • ND4J - an efficient matrix math library for JVM

  • Dex - a Java-based data visualization tool

  • Smile - a comprehensive machine learning, natural language processing, linear algebra, graph, interpolation, and visualization system. Besides Java API, Smile also provides a functional Kotlin API along with Scala and Clojure API.

    • Smile-NLP-kt - a Kotlin rewrite of the Scala implicits for the natural language processing part of Smile in the format of extension functions and interfaces.
  • Apache Commons Math - a general math, statistics, and machine learning library for Java

  • NM Dev - a Java mathematical library that covers all of classical mathematics.

  • OptaPlanner - a solver utility for optimization planning problems

  • Charts - a scientific JavaFX charting library in development

  • Apache OpenNLP - a machine learning based toolkit for the processing of natural language text

  • CoreNLP - a natural language processing toolkit

  • Apache Mahout - a distributed framework for regression, clustering, and recommendation

  • Weka - a collection of machine learning algorithms for data mining tasks

  • Tablesaw - a Java dataframe. It includes a visualization library based on Plot.ly