Skip to content

Getting started workflow toolings research

Antoni Ivanov edited this page Aug 15, 2023 · 19 revisions

Apache Spring

Apache Spring is one of the most popular frameworks for building enterprise Java applications. The Spring team has invested considerable effort into making the developer experience smooth and efficient. This document will provide a brief walkthrough of the getting started workflow with Spring and highlight lessons that can be adopted for VDK.

Getting Started Workflow

Spring Initializr (choose your modules)

The starting point for most Spring projects, Spring Initializr - https://start.spring.io/ - provides a web-based interface for bootstrapping a new Spring application. Users select the desired build tool (Maven/Gradle), language (Java/Kotlin/Groovy), and dependencies/modules (Spring MVC for web apps, Spring Data for data access, Spring Security for authentication, etc.), and Initializr generates a project skeleton for them.

Modularity:

Spring is designed in a modular fashion. This allows developers to pick and choose only what they need. Spring doesn't lock users into a particular way of doing things. It offers flexibility in choosing tools, databases, etc

In VDK plugins offer similar modularity in theory.

Spring Boot

https://spring.io/guides/gs/spring-boot/

Simplifies the bootstrapping process. It provides conventions for application setup and configurations, reducing the need for boilerplate code. Spring Boot also offers an embedded server so that developers can run the application immediately without external server setup.

In VDK, quickstart-vdk serves the same way in theory.

Spring Actuator

Spring Actuator is a sub-project of Spring Boot that provides production-ready features to help monitor and manage application health, metrics, info, and more. One of the more notable features is its set of endpoints to retrieve application operational information - e.g see how the project is configured, what beans are active, metrics about the project, health info.

Convention Over Configuration

Spring prioritizes conventions. This reduces the amount of boilerplate and configuration code, streamlining the development process. Adopting a similar strategy can make our tool more user-friendly.

Validation and Autocompletion

Integrated development environments (IDEs) like IntelliJ IDEA and Eclipse support validation and autocompletion for Spring configurations.

Environment Profile Management

Spring allows multiple environment configurations (like 'dev', 'prod'). This enables the same codebase to behave differently based on the environment it's running in

Simple start in IDE without extra plugins

Entry point is

	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}

not requiring specific command

Ideas based on Spring

Introduce a VDK Initializer

Like Spring Initializer, provide a GUI or CLI tool that sets up the basic structure for VDK, pre-populated with sensible defaults.

Guided Workflow

A step-by-step process, with explanations, can guide new users through the setup and deployment processes

Config File Alternatives

Allow users to specify configuration in formats more widely adopted, like YAML or TOML. These formats are less error-prone due to their structured nature.

Config Profiles

Like Spring's environment profiles, allow users to define configuration profiles. This way, configurations can be set once and reused across different environments

Runtime Validation

Implement a runtime check for configurations, similar to Spring. Alert the user if there are unknown or deprecated properties

Config/Metrics UI like Spring actuator

Conventions idea

Directory Structure

By defining a default directory structure for VDK projects, users can quickly set up and understand projects. For example:

/configs: For configuration files
/csv: for csv files 
/data: For input/output data
/steps/python: For defining individual python tasks
/steps/sql: for defining individual SQL tasks 

Maybe something else. This is just illustrative.

Naming conventions

If a job is named "data_cleaning", VDK can automatically look for configurations named "data_cleaning.config"

Or look at the section "data_cleaning" in HOME/.vdk/config

or other similar conventions. Assume well-known industry conventions and avoid making your own

Sensible defaults

If a user doesn’t specify certain configuration parameters, VDK should have sensible defaults that it falls back on

Non CLI entry point option

Already exists in Notebooks but doesn't provide pure python user experience.

For example that would be better:

if __name__ == '__main__':
   StandaloneDataJobFactory.run()
Clone this wiki locally