The Data Processor is a library designed to process large CSV files by splitting the records into batches and sending them to an API endpoint with concurrency control and retry logic. It is useful for handling bulk user data processing tasks with efficient error handling and logging.
- Processes CSV files in batches.
- Supports concurrent API requests with a configurable limit.
- Includes retry logic with exponential backoff for failed requests.
- Handles API rate limits with automatic retry after a delay.
- Outputs the results to JSON files (successful users, failed users, wallets, point allocations, transaction audits).
Ensure you have Node.js and yarn installed on your system.
yarn install
- Calculate the MD5 hash of the password
- Concatenate the username and the MD5 hashed password with a colon in between.
- Encode the concatenated string in Base64.
- MD5 hash of the password
Pass123 - bdc87b9c894da5168059e00ebffb9077
- Concatenate the username and the MD5 hashed password with a colon in between -
store123:bdc87b9c894da5168059e00ebffb9077
- Base64 value of the concatenated string -
c3RvcmUxMjM6YmRjODdiOWM4OTRkYTUxNjgwNTllMDBlYmZmYjkwNzc=
- Final value:
Basic c3RvcmUxMjM6YmRjODdiOWM4OTRkYTUxNjgwNTllMDBlYmZmYjkwNzc=
Run the processor from the command line using the following command:
yarn start --input <path_to_csv_file> --authorization <api_token> --capillaryHost <api_host>
Option | Description | Type | Default | Required |
---|---|---|---|---|
--input |
Input CSV file path | String | - | Yes |
--authorization |
Authorization header | String | - | Yes |
--capillaryHost |
Capillary host URL | String | - | Yes |
--concurrency |
Number of concurrent requests | Number | 5 | No |
--batchSize |
Number of items per batch | Number | 500 | No |
--batchConcurrentLimit |
Number of items to process in parallel | Number | 100 | No |
--retryAttempts |
Number of retry attempts | Number | 3 | No |
--retryDelay |
Delay between retry attempts (ms) | Number | 1000 | No |
yarn start --input ./data/users.csv --authorization "Basic your_api_token" --capillaryHost "https://api.yourdomain.com"
Upon successful processing, the following output files are generated in the output
folder:
failed_users_logs.json
: Logs of users who failed processing.successful_users_logs.json
: Logs of successfully processed users.wallets.json
: Details of user wallets.point_allocations.json
: Details of point allocations for users.transaction_audits.json
: Transaction audit logs.
The library includes built-in error handling and logs errors to a designated folder in case of request failures. The following output files are generated in the error_logs
folder
{
"error": "Request failed after all retry attempts",
"details": "Error details..."
}
- Ensure your CSV file is properly formatted.
- Verify your API credentials before running the processor.
- Adjust concurrency settings based on your API limits.