-
Notifications
You must be signed in to change notification settings - Fork 59
Getting started workflow toolings research
Table of contents generated with markdown-toc
Apache Spring is one of the most popular frameworks for building enterprise Java applications. The Spring team has invested considerable effort into making the developer experience smooth and efficient. This document will provide a brief walkthrough of the getting started workflow with Spring and highlight lessons that can be adopted for VDK.
The starting point for most Spring projects, Spring Initializr - https://start.spring.io/ - provides a web-based interface for bootstrapping a new Spring application. Users select the desired build tool (Maven/Gradle), language (Java/Kotlin/Groovy), and dependencies/modules (Spring MVC for web apps, Spring Data for data access, Spring Security for authentication, etc.), and Initializr generates a project skeleton for them.
![](https://private-user-images.githubusercontent.com/2536458/260686075-ed63bc05-0507-4ed5-9a07-f1f46fed8c9c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNDY4NDEsIm5iZiI6MTczOTM0NjU0MSwicGF0aCI6Ii8yNTM2NDU4LzI2MDY4NjA3NS1lZDYzYmMwNS0wNTA3LTRlZDUtOWEwNy1mMWY0NmZlZDhjOWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTJUMDc0OTAxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZGY5MmY1NWI3ODZiMzk2ZGVmNGQ2NjQzZjc5YmZiZDVkYjQ0MTYxZWU1ZWYwMTQ0YmY5OWUxYzljNTJjYjk4MiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.A2OHkkQvaE5Sa7C4Ok2S9vCCzZxDOAJWFRWCFIm-pAQ)
Spring is designed in a modular fashion. This allows developers to pick and choose only what they need. Spring doesn't lock users into a particular way of doing things. It offers flexibility in choosing tools, databases, etc
In VDK plugins offer similar modularity in theory.
https://spring.io/guides/gs/spring-boot/
Simplifies the bootstrapping process. It provides conventions for application setup and configurations, reducing the need for boilerplate code. Spring Boot also offers an embedded server so that developers can run the application immediately without external server setup.
In VDK, quickstart-vdk serves the same way in theory.
Spring Actuator is a sub-project of Spring Boot that provides production-ready features to help monitor and manage application health, metrics, info, and more. One of the more notable features is its set of endpoints to retrieve application operational information - e.g see how the project is configured, what beans are active, metrics about the project, health info.
Spring prioritizes conventions. This reduces the amount of boilerplate and configuration code, streamlining the development process. Adopting a similar strategy can make our tool more user-friendly.
Examples
-
Configuration conventions:
- By default, Spring looks for properties in files named application.properties or application.yml.
- Profile follows the convention of application-{profile}.properties.
- Spring follows convention of accepting properties from various sources like system properties, environment variables, command-line arguments, and property files. It resolves them in a specific order, allowing overrides
- Spring Boot, specifically, provides conventions for configuring a data source by simply defining properties like spring.datasource.url, spring.datasource.username, etc., without needing any additional configuration class . As long library is in the classpath, things just work.
-
By following naming conventions and annotations, Spring automatically detects and registers beans, and automatically inject them reducing manual wiring.
-
Without configuration, Spring Boot can automatically connect to a database if the right dependencies are in the classpath only.
-
Spring Boot can automatically start an embedded server with some sensible default settings.
-
JPA is another good example
- By default, each entity class in JPA corresponds to a table. The table's name is derived from the class name.
- Each non-static, non-transient field in an entity class is mapped to a column
- By naming methods according to specific patterns, Spring Data JPA can infer the DB query (
findById(String id)
)
Integrated development environments (IDEs) like IntelliJ IDEA and Eclipse support validation and autocompletion for Spring configurations.
Spring allows multiple environment configurations (like 'dev', 'prod'). This enables the same codebase to behave differently based on the environment it's running in
Entry point is
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
not requiring specific command
https://docs.spring.io/spring-framework/docs/3.0.0.M4/reference/html/ch11s02.html
Spring provides a consistent exception hierarchy across its modules. For instance, the DataAccessException is used across JDBC, JPA, Hibernate, and other data access technologies. This allows developers to handle exceptions in a consistent manner, regardless of the underlying data access technology.
Spring translates exceptions from various data access technologies (e.g., JDBC, Hibernate, JPA) into a consistent set of exceptions, such as DataAccessException. This allows developers to handle data access exceptions in a consistent manner, regardless of the underlying technology.
https://spring.io/blog/2013/11/01/exception-handling-in-spring-mvc
Spring MVC has @ControllerAdvice and @ExceptionHandler annotations. With these, developers can handle exceptions globally across controllers or locally within a specific controller, respectively. This centralized approach helps in returning standardized error responses and reduces duplicate error-handling code.
While Spring provides its own set of exceptions and mechanisms, it also allows developers to define custom exceptions and handlers. This ensures that the framework can be tailored to specific needs.
Like Spring Initializer, provide a GUI or CLI tool that sets up the basic structure for VDK, pre-populated with sensible defaults.
A step-by-step process, with explanations, can guide new users through the setup and deployment processes
Allow users to specify configuration in formats more widely adopted, like YAML or TOML. These formats are less error-prone due to their structured nature.
Like Spring's environment profiles, allow users to define configuration profiles. This way, configurations can be set once and reused across different environments
Implement a runtime check for configurations, similar to Spring. Alert the user if there are unknown or deprecated properties
By defining a default directory structure for VDK projects, users can quickly set up and understand projects. For example:
/configs: For configuration files
/csv: for csv files
/data: For input/output data
/steps/python: For defining individual python tasks
/steps/sql: for defining individual SQL tasks
Maybe something else. This is just illustrative.
If a job is named "data_cleaning", VDK can automatically look for configurations named "data_cleaning.config"
Or look at the section "data_cleaning" in HOME/.vdk/config
or other similar conventions. Assume well-known industry conventions and avoid making your own
If a user doesn’t specify certain configuration parameters, VDK should have sensible defaults that it falls back on
Already exists in Notebooks but doesn't provide pure python user experience.
For example that would be better:
if __name__ == '__main__':
StandaloneDataJobFactory.run()
Django's settings.py system makes it easy to understand and configure an application. It provides a very structured way of setting up database connections, middleware, installed apps, and many other settings
Flask uses an object-based configuration, which means that configuration is loaded via regular Python files
This library allows for separation of the configuration parameters from the code, making it easier to manage. It can pull from environment variables or .ini files, providing type casting and defaults.
A configuration management tool for Python applications, supporting formats like TOML, YAML, JSON, and others. It allows for environment-specific settings and layered configurations.
Traitlets is a configuration system for Python applications used in the Jupyter ecosystem.
One of the primary features of Traitlets is that it provides dynamic type checking and offers a mechanism to observe and respond to changes in configuration values. Traitlets-based applications often allow configurations to be defined both via configuration files (typically in Python or JSON format) and via command-line arguments
OmegaConf is a configuration management library for Python that supports structured and hierarchical configurations. It offers features like variable interpolation, merging of multiple configuration sources, and integration with typed data classes
SDK - Develop Data Jobs
SDK Key Concepts
Control Service - Deploy Data Jobs
Control Service Key Concepts
- Scheduling a Data Job for automatic execution
- Deployment
- Execution
- Production
- Properties and Secrets
Operations UI
Community
Contacts