-
Notifications
You must be signed in to change notification settings - Fork 59
Getting started workflow toolings research
Apache Spring is one of the most popular frameworks for building enterprise Java applications. The Spring team has invested considerable effort into making the developer experience smooth and efficient. This document will provide a brief walkthrough of the getting started workflow with Spring and highlight lessons that can be adopted for VDK.
The starting point for most Spring projects, Spring Initializr - https://start.spring.io/ - provides a web-based interface for bootstrapping a new Spring application. Users select the desired build tool (Maven/Gradle), language (Java/Kotlin/Groovy), and dependencies/modules (Spring MVC for web apps, Spring Data for data access, Spring Security for authentication, etc.), and Initializr generates a project skeleton for them.
![](https://private-user-images.githubusercontent.com/2536458/260686075-ed63bc05-0507-4ed5-9a07-f1f46fed8c9c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNjY0MjIsIm5iZiI6MTczOTM2NjEyMiwicGF0aCI6Ii8yNTM2NDU4LzI2MDY4NjA3NS1lZDYzYmMwNS0wNTA3LTRlZDUtOWEwNy1mMWY0NmZlZDhjOWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTJUMTMxNTIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OGJiYTk4ZjEwOTBmZTMzNWM4MmM3ZmUwNzkyYWI5Y2NjODgwMzljODg5YTUxMGM5NjczZTRiNzkzOTBhZGMwNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.sDlQ7jsg1K-TZg83EMWgiFQDeS_SNPK97_E_jEEkk6o)
Spring is designed in a modular fashion. This allows developers to pick and choose only what they need. Spring doesn't lock users into a particular way of doing things. It offers flexibility in choosing tools, databases, etc
In VDK plugins offer similar modularity in theory.
https://spring.io/guides/gs/spring-boot/
Simplifies the bootstrapping process. It provides conventions for application setup and configurations, reducing the need for boilerplate code. Spring Boot also offers an embedded server so that developers can run the application immediately without external server setup.
In VDK, quickstart-vdk serves the same way in theory.
Spring Actuator is a sub-project of Spring Boot that provides production-ready features to help monitor and manage application health, metrics, info, and more. One of the more notable features is its set of endpoints to retrieve application operational information - e.g see how the project is configured, what beans are active, metrics about the project, health info.
Spring prioritizes conventions. This reduces the amount of boilerplate and configuration code, streamlining the development process. Adopting a similar strategy can make our tool more user-friendly.
Integrated development environments (IDEs) like IntelliJ IDEA and Eclipse support validation and autocompletion for Spring configurations.
Spring allows multiple environment configurations (like 'dev', 'prod'). This enables the same codebase to behave differently based on the environment it's running in
Entry point is
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
not requiring specific command
Like Spring Initializer, provide a GUI or CLI tool that sets up the basic structure for VDK, pre-populated with sensible defaults.
A step-by-step process, with explanations, can guide new users through the setup and deployment processes
Allow users to specify configuration in formats more widely adopted, like YAML or TOML. These formats are less error-prone due to their structured nature.
Like Spring's environment profiles, allow users to define configuration profiles. This way, configurations can be set once and reused across different environments
Implement a runtime check for configurations, similar to Spring. Alert the user if there are unknown or deprecated properties
By defining a default directory structure for VDK projects, users can quickly set up and understand projects. For example:
/configs: For configuration files
/csv: for csv files
/data: For input/output data
/steps/python: For defining individual python tasks
/steps/sql: for defining individual SQL tasks
Maybe something else. This is just illustrative.
If a job is named "data_cleaning", VDK can automatically look for configurations named "data_cleaning.config"
Or look at the section "data_cleaning" in HOME/.vdk/config
or other similar conventions. Assume well-known industry conventions and avoid making your own
If a user doesn’t specify certain configuration parameters, VDK should have sensible defaults that it falls back on
Already exists in Notebooks but doesn't provide pure python user experience.
For example that would be better:
if __name__ == '__main__':
StandaloneDataJobFactory.run()
SDK - Develop Data Jobs
SDK Key Concepts
Control Service - Deploy Data Jobs
Control Service Key Concepts
- Scheduling a Data Job for automatic execution
- Deployment
- Execution
- Production
- Properties and Secrets
Operations UI
Community
Contacts