Skip to content
This repository has been archived by the owner on Jul 21, 2020. It is now read-only.

Batch Processing

Axel Faust edited this page Oct 13, 2013 · 3 revisions

The Alfresco Enhanced Script Environment currently supports batch processing in server-side JavaScript based on Rhino in the Repository tier only. A Share- / Surf-specific batch processing capability may be added at a later time.

API

Batch processing is available through the root-level function executeBatch which can be used in the following variants:

 executeBatch(items, processItemCallback, threadCount, batchSize)
 executeBatch(workProviderCallback, processItemCallback, threadCount, batchSize)
 executeBatch(items, processItemCallback, threadCount, batchSize, beforeBatchCallback)
 executeBatch(workProviderCallback, processItemCallback, threadCount, batchSize, beforeBatchCallback)
 executeBatch(items, processItemCallback, threadCount, batchSize, beforeBatchCallback, afterBatchCallback)
 executeBatch(workProviderCallback, processItemCallback, threadCount, batchSize, beforeBatchCallback, áfterBatchCallback)

A callback parameter can be provided as either a function (JavaScript or native Java method) or an object in the form of

 {
     fn: function,
     scope: object
 }

where scope can be used to define the execution scope (this) for the callback. The processItemCallback is the only callback that will be called with an argument: the single work item to process.

Clients can provide the items to process either as a direct parameter or via a custom callback that provides the items in one or more chunks. The type and structure of the objects that can be provided as items either by parameter or the return value of the callback depends on the available "JavaScript object to work item" converters. Out of the box, the following is supported:

  • native JavaScript arrays
  • wrapped Java collections
Clients can specify any amount of threads to execute the batch operation but should be aware that the maximum allowed amount can be limited by global configuration of the _script.batch.maxThreads_ property in alfresco-global.properties, which is set by default to 2. This global thread-limit configuration has been put in place to limit the chance of run-away scripts taking up significant CPU time to a certain degree.

Each thread in a batch process has access to a thread-local scope for storing variables. Variables in this scope are declared implicitly whenever an assignment is made to a variable that has neither been defined using the var keyword nor in any of the accessible scopes, i.e. closure / global scope. They are accessed / read like any other variable. This thread-local scope makes it possible for the beforeBatch callback to initialize / retrieve some data that will be needed in the actual processItem callback, and for the afterBatch callback to perform some cleanup / logging functionality.

Data / State Synchronization

The Rhino JavaScript engine does not provide any kind of data or state synchronization mechanisms as script execution is (usually) always performed in a single thread context. When parallel batches are executed within a script the callback functions all have access to the global / shared scope and may modify the same data structures concurrently or execute the same operations. The lack of synchronization can cause race conditions and non-atomic variable updates. To avoid synchronization issues the batch processing function automatically wraps access global / shared scope data and functions transparently and manages (shallow) read-write locks during variable access and modification as well as function calls. This does not guarantee correct and complete data / state synchronization and is only a best-effort mechanism. The assumptions made about function calls may cause locks to be obtained too aggressively, e.g. obtaining a full write lock if a read lock may have sufficed. This is an area of ongoing optimizations and feedback is appreciated.

The data / state synchronization currently adheres to the following principles:

  • read access to a member of a script object is performed with a read lock on the object preventing concurrent modification but allowing concurrent access to its members
  • write access to a member of a script object is performed with an exclusive write lock on the object preventing any concurrent access to its members
  • an invocation of a function / operation member of a script object is performed with an exclusive write lock on the object and the this object preventing any concurrent access to its members
The following exceptions are made to the basic principles:
  • Alfresco and Spring Web Script processor extensions are assumed to be thread-safe and will never have any locking / synchronization applied to them. They are already shared among scripts executing in parallel and thus should have internal safeguards or not even any state at all. This basically applies to all root scope service objects of the Alfresco JavaScript API.
  • an invocation of a function / operation member with a trivial name implying read-only access is performed with a read lock on the object and the this object preventing concurrent modificatin but allowing concurretn access to its members / invocation. "Trivial" names are: get, is, has, find, toString, search, query, equals, hashCode, compareTo (this list may be further expanded).
  • an invocation of a function / operation member which matches a standard pattern of read-only access is performed with a read lock on the object and the "this" object preventing concurrent modificatin but allowing concurrent access to its members / invocation. "Standard" patterns are: get*, is*, has*, find*, search*, query* (this list may be further expanded). The character following the prefix must be upper-cased.

Examples

Clone this wiki locally