Skip to content

Latest commit

 

History

History
523 lines (417 loc) · 40 KB

README.md

File metadata and controls

523 lines (417 loc) · 40 KB

understanding-ai

Collection of AI relevant stuff

Reinforcement Learning

Transformers

Tooling

AI Coder

Languages

Open Letters

Guides and Principles

Notebooks

  • nbdev
  • Apache Zeppelin: Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.
  • Jupyter: JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
  • Google Colab: Colab, or "Colaboratory", allows you to write and execute Python in your browser, with
    • Zero configuration required
    • Access to GPUs free of charge
    • Easy sharing

Plugins

  • Rise: With RISE, a Jupyter notebook extension, you can instantly turn your jupyter notebook into a live reveal.js-based presentation.

IDE

  • Spyder: Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts.

Courses

Collections

Current

Tutorials

Next

Interesting

Summer School

Python

SQL

Tutorials

Books

Videos

AI Trainingsdata

  • Kaggle: code & data you need to do your data science work.
  • Tencent ML-Images: the largest open-source multi-label image database
  • ImageNet: an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.
  • MNIST: a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
  • Cifar-10 / Cifar-100: labeled subsets of the 80 million tiny images dataset
  • Bloom: BLOOM (BigScience Language Open-science Open-access Multilingual) has 176 billion parameters and has been trained on 1.5 terabytes of text.
  • Google Open Dataset Explorer
  • Open Datasets UCI Machine Learning Repo
  • huggingface: Build, train and deploy state of the art models powered by the reference open source in machine learning.
  • COYO-700M: Large-scale Image-Text Pair Dataset
  • Stable Diffusion
  • Alpa: a system for training and serving gigantic machine learning models. Alpa makes training and serving large models like GPT-3 simple, affordable, accessible to everyone.
  • BERT: TensorFlow code and pre-trained models for BERT

AI news

Blogs

Online Training

Data Mining

  • RapidMiner: provides a data science platform to help you drive real business impact.
  • Orange: Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox.
  • Knime: Allowing anyone to build and upskill on data science
  • SAS: SAS is the leading data mining tool for business analysis

Archives

  • Papers with Code: The latest in machine learning
  • arXiv: a free distribution service and an open-access archive

Important Papers

Articles

Frameworks

GUIs

  • Gradio:an open-source Python library that is used to build machine learning and data science demos and web applications.
  • Streamlit: turn data scripts into shareable web apps

Grafik

  • Bokeh: a Python library for creating interactive visualizations for modern web browsers.
  • Vega-Altair: a declarative statistical visualization library for Python.
  • Plotly: Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
  • pydeck: set of Python bindings for making spatial visualizations with deck.gl, optimized for a Jupyter environment.
  • Seaborn: a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
  • Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations.

Python (and other)

  • NVIDIA NeMo: a toolkit for creating Conversational AI applications.
  • scikit-learn: simple and efficient tools for predictive data analysis
  • PyCaret: an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment
  • PyTorch: an open source machine learning framework that accelerates the path from research prototyping to production deployment
  • Imaginaire: a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA.
  • Pandas: a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language
  • NumPy: fundamental package for scientific computing with Python
  • TensorFlow: an end-to-end open source machine learning platform
  • Keras: an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load
  • MXNet: a open source deep learning framework suited for flexible research prototyping and production.
  • Singa: focusing on distributed training of deep learning and machine learning models
  • Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • VISSL: A library for state-of-the-art self-supervised learning
  • Theano: a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.
  • skorch: a scikit-learn compatible neural network library that wraps PyTorch
  • Chainer: a powerful, flexible, and intuitive Framework for Neural Networks
  • EvoTorch: designed to accelerate research and applications of Evolutionary Algorithms, with dedicated support for NeuroEvolution.
  • TensorStore: Library for reading and writing large multi-dimensional arrays.
  • LangChain: Building applications with LLMs through composability
  • Modin: Scale your Pandas workflows by changing a single line of code
  • Gensim: Topic Modelling for Humans
  • SentenceTransformers: a Python framework for state-of-the-art sentence, text and image embeddings.
  • fastText: a library for text classification and representation. It transforms text into continuous vectors that can later be used on any language related task.

Performance

  • fastprogress: a simple and flexible progress bar for Jupyter Notebook and console
  • tqdm: a fast, Extensible Progress Bar for Python and CLI

Helper

  • bottleneck: Fast NumPy array functions written in C
  • PyYAML: PyYAML is a YAML parser and emitter for Python.
  • Jinja: Jinja is a fast, expressive, extensible templating engine. Special placeholders in the template allow writing code similar to Python syntax. Then the template is passed data to render the final document.
  • Jedi: Jedi is a static analysis tool for Python that is typically used in IDEs/editors plugins. Jedi has a focus on autocompletion and goto functionality. Other features include refactoring, code search and finding references.
  • Pillow: Python Imaging Library
  • Wordcloud: a little word cloud generator in Python
  • seaborn: a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
  • nltk: a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
  • spaCy: a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
  • Beautiful Soup: a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
  • Selenium: Selenium Python bindings provides a simple API to write functional/acceptance tests using Selenium WebDriver. Through Selenium Python API you can access all functionalities of Selenium WebDriver in an intuitive way.
  • Parsel: extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with regular expressions.
  • Scrapy: an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
  • Confusion Matrix in Python

JavaScript

  • Danfo: an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
  • Tensorflow: a library for machine learning in JavaScript

Java

  • Apache Flink ML: Machine learning library of Apache Flink
  • DL4J: the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala
  • Weka: open source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or a Java API
  • MOA: the most popular open source framework for data stream mining, with a very active growing community
  • DeepNetts: Java Deep Learning Library and Development Tool.
  • Neuroph: a Java framework that can be used for creating neural networks.
  • ND4J: scientific computing library for the JVM.
  • OpenNLP: a machine learning based toolkit for the processing of natural language text.
  • Standford CoreNLP: natural language processing in Java
  • Smile: Statistical Machine Intelligence and Learning Engine
  • Tweety: a comprehensive collection of Java libraries for logical aspects of artificial intelligence and knowledge representation
  • EJJ: a Java-based Evolutionary Computation Research System
  • JGAP: a Genetic Algorithms and Genetic Programming package written in Java.
  • Arbiter: a tool dedicated to tuning (hyperparameter optimization) of machine learning models. Part of the DL4J Suite of Machine Learning / Deep Learning tools for the enterprise.
  • Jenetics: a Genetic Algorithm, Evolutionary Algorithm, Genetic Programming, and Multi-objective Optimization library, written in modern-day Java.
  • Dagli: Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).
  • Tribuo: a machine learning library in Java that provides multi-class classification, regression, clustering, anomaly detection and multi-label classification. Tribuo provides implementations of popular ML algorithms and also wraps other libraries to provide a unified interface.
  • Spark MlLib: Apache Spark's scalable machine learning library.
  • Open NLP: a machine learning based toolkit for the processing of natural language text.
  • Apache Mahout: a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends.
  • Smile: a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala.
  • Kotlin∇: a type-safe automatic differentiation framework in Kotlin. It allows users to express differentiable programs with higher-dimensional data structures and operators.
  • Facebook AI Tools: Cutting edge open source frameworks, tools, libraries, and models for research exploration to large-scale production deployment.
  • AWS DJL: an open-source, high-level, engine-agnostic Java framework for deep learning. DJL is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and functions like any other regular Java library.
  • Apache Ignite: an in-memory database that includes a machine learning framework.

Cheat Sheets

Misc

  • Weka: workbench for machine learning
  • OpenAI Gym: a toolkit for developing and comparing reinforcement learning algorithms.

Interesting

Practice

Github Projects

  • GPT-4 & LangChain:GPT4 & LangChain Chatbot for large PDF docs
  • Lit-LLaMA ️: Independent implementation of LLaMA that is fully open source under the Apache 2.0 license.

Podcasts

Movies

German

Newspaper

Studies

Blogs

Further

Hollywood

  • Eagle Eyes
  • iRobot
  • Her
  • Blade Runner
  • Blade Runner 2049
  • A.I.
  • Ex Machina
  • 2001: a Space Odyssey
  • WarGames
  • 23
  • Moneyball