-
Notifications
You must be signed in to change notification settings - Fork 26
Vision Statement
Enterprise organizations spend enormous efforts to manage their data. One of the largest costs of managing data is the development and maintenance of batch processing programs. Batch processing is the execution of one or more programs that takes a set of data files as input, processes the data, and produces output data files with minimized human intervention. Batch processing examples include extract, transform, load (ETL) processes, processing monthly statements at banks, enriching existing data with a third party tool, or exporting data into a format needed for another customer.
Many developers write these batch processing programs as standalone programs with the sole intent of satisfying business requirements and less thought towards reuse and extensibility. When a new batch processing program needs to be written, another stovepiped program typically is written and maintained. Consider the following common, non-functional requirements when judging how a batch process should work.
- What happens when you have 8 hours to execute a batch process that takes 16 hours?
- Do you have the capability to restart a batch process from a certain failure point?
- How quickly can the batch processing program adapt to change?
- What is the level of difficulty for someone new to maintain a batch processing program?
- How much time does it take your team to create a new batch process?
- Did the batch processing function finish or fail?
To address this common plumbing needed for every batch processing job, the Spring Batch project was started. The goals of Spring Batch include concurrent batch processing, the ability to restart after failure, partial processing (skipping data), and reuse of common batch processing components. These issues need to be considered when running a batch processing job using data stored in MarkLogic.
The MarkLogic Spring Batch project is intended to extend the Spring Batch to make it easier to write batch processing applications using Spring Batch.
The vision of the MarkLogic Spring Batch (MSB) project is to reduce the time to develop and maintain batch processing jobs for data stored in MarkLogic.
- Develop batch processing solutions that are reliable, robust, and high performing
- Reduce the amount of time it takes to build batch processing jobs with MarkLogic
- Simplify batch processing of data into, within, and out of MarkLogic
- Provide supplemental Java classes to the Spring Batch code base that are common across batch processing jobs written for MarkLogic
- Provide the ability to persist metadata of jobs in MarkLogic removing the need for a relational database
- Educate developers on the best practices to execute batch processing jobs with MarkLogic
- Provide a library of out of the box batch processing jobs that can be used, extended, or as a baseline for a larger batch processing job
- Show how batch processing jobs can easily scale out to achieve maximum performance
- To educate developers using MarkLogic Spring Batch through the use of integration and unit tests
- To promote the Java surface area of MarkLogic to help encourage development on the MarkLogic platform