Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Modular backup implementation #37715

Merged
merged 8 commits into from
Nov 28, 2024
Merged

Conversation

sharat87
Copy link
Member

@sharat87 sharat87 commented Nov 26, 2024

Backup implementation changed to be modular, and made up of separate pieces. Each piece (link in a chain) is responsible for one component/functionality of all the data that's being backed up. This PR introduces the framework for this modularization. The next PR will finish migration to that architecture.

Automation

/test sanity

🔍 Cypress test results

Tip

🟢 🟢 🟢 All cypress tests have passed! 🎉 🎉 🎉
Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/12063841586
Commit: 75c1d78
Cypress dashboard.
Tags: @tag.Sanity
Spec:


Thu, 28 Nov 2024 07:23:30 UTC

Communication

Should the DevRel and Marketing teams inform users about this change?

  • Yes
  • No

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a modular backup system with BackupState, DiskSpaceLink, EncryptionLink, and ManifestLink classes to manage backup operations more efficiently.
    • Added command-line argument support for the backup command, enhancing flexibility.
    • Improved user interaction during backup restoration with prompts for encryption passwords.
  • Bug Fixes

    • Enhanced error handling and clarity in the restoration process, particularly for manifest file reading.
  • Documentation

    • Updated test structure to reflect new directory organization and improved focus on backup cleanup logic.

Copy link
Contributor

coderabbitai bot commented Nov 26, 2024

Walkthrough

This pull request introduces several key changes related to backup operations within the application. A new BackupState class is created to manage backup state, alongside several Link classes for modular handling of pre-backup, backup, and post-backup processes. The command-line interface has been updated to accept arguments directly. Additionally, modifications to test files reflect new directory structures and focus on verifying backup cleanup logic. The restoration process has also been enhanced for better user interaction and file management.

Changes

File Path Change Summary
app/client/packages/rts/src/ctl/backup/BackupState.ts Added BackupState class with properties for managing backup state, including args, initAt, and methods like isEncryptionEnabled().
app/client/packages/rts/src/ctl/backup/backup.test.ts Updated import paths and adjusted tests for backup cleanup logic, simplifying mocks and removing redundant tests.
app/client/packages/rts/src/ctl/backup/index.ts Modified BackupState constructor and run function to accept command-line arguments directly.
app/client/packages/rts/src/ctl/backup/links/DiskSpaceLink.ts Introduced DiskSpaceLink class with preBackup method for checking available disk space.
app/client/packages/rts/src/ctl/backup/links/EncryptionLink.ts Introduced EncryptionLink class for managing backup encryption, including user prompts for passwords.
app/client/packages/rts/src/ctl/backup/links/ManifestLink.ts Introduced ManifestLink class for handling backup metadata and writing to manifest.json.
app/client/packages/rts/src/ctl/backup/links/index.ts Added Link interface defining methods for backup operations and exported EncryptionLink and ManifestLink.
app/client/packages/rts/src/ctl/index.ts Updated backup.run() method to accept command-line arguments for enhanced flexibility.
app/client/packages/rts/src/ctl/restore.ts Enhanced user prompts and refined file handling during the backup restoration process.
deploy/docker/fs/opt/bin/ctl Modified execution command to include --enable-source-maps flag for improved readability.

Possibly related PRs

Suggested labels

Task

Suggested reviewers

  • pratapaprasanna
  • nidhi-nair
  • mohanarpit

💾 In the realm of backups, a state we now keep,
With BackupState ready, our data's in deep.
Links for the process, each step well-defined,
In this code, we trust, our backups aligned.
So raise up your flags, let the backups commence,
For with these changes, we build with confidence! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sharat87 sharat87 changed the title chore/backup-state-chain chore: Modular backup implementation Nov 26, 2024
@github-actions github-actions bot added the skip-changelog Adding this label to a PR prevents it from being listed in the changelog label Nov 26, 2024
@sharat87
Copy link
Member Author

/build-deploy-preview skip-tests=true

Copy link

Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/12034930067.
Workflow: On demand build Docker image and deploy preview.
skip-tests: true.
env: ``.
PR: 37715.
recreate: .

Copy link

Deploy-Preview-URL: https://ce-37715.dp.appsmith.com

@sharat87
Copy link
Member Author

/build-deploy-preview skip-tests=true

Copy link

Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/12048633207.
Workflow: On demand build Docker image and deploy preview.
skip-tests: true.
env: ``.
PR: 37715.
recreate: .

Copy link

Deploy-Preview-URL: https://ce-37715.dp.appsmith.com

@sharat87 sharat87 marked this pull request as ready for review November 28, 2024 06:43
@sharat87 sharat87 added the ok-to-test Required label for CI label Nov 28, 2024
@sharat87 sharat87 requested a review from mohanarpit November 28, 2024 06:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (10)
app/client/packages/rts/src/ctl/backup/index.ts (1)

312-319: Simplify 'removeOldBackups' method for clarity

Consider combining the .map functions to improve readability:

return Promise.all(
  backupFiles
    .sort()
    .reverse()
    .slice(backupArchivesLimit)
-   .map((file) => Constants.BACKUP_PATH + "/" + file)
-   .map(async (file) => fsPromises.rm(file)),
+   .map(async (file) => 
+     fsPromises.rm(Constants.BACKUP_PATH + "/" + file)
+   ),
);
app/client/packages/rts/src/ctl/backup/links/DiskSpaceLink.ts (2)

1-2: Consider using absolute imports for better maintainability

The relative imports could become problematic if the file structure changes.

-import { checkAvailableBackupSpace, getAvailableBackupSpaceInBytes } from "..";
-import type { Link } from ".";
+import { checkAvailableBackupSpace, getAvailableBackupSpaceInBytes } from "@rts/ctl/backup";
+import type { Link } from "@rts/ctl/backup/links";

4-11: Add class documentation and error handling

The class lacks documentation explaining its role in the backup chain.

+/**
+ * DiskSpaceLink validates available disk space before backup operations.
+ * Implements the Link interface as part of the backup chain.
+ */
 export class DiskSpaceLink implements Link {
app/client/packages/rts/src/ctl/backup/links/index.ts (1)

1-10: Consider enhancing interface documentation with JSDoc.

The interface design is clean and follows the Chain of Responsibility pattern well. Consider enhancing the documentation with JSDoc for better IDE integration and documentation generation.

+/**
+ * Represents a link in the backup chain, handling specific aspects of the backup process.
+ * Each link can participate in pre-backup, backup, and post-backup phases.
+ */
 export interface Link {
+  /**
+   * Called before the backup folder is created.
+   * Use this for any initialization or validation needed before backup starts.
+   */
   preBackup?(): Promise<void>;
 
+  /**
+   * Called after backup folder is created.
+   * Implement this to copy or create backup files in the backup folder.
+   * @throws {Error} If backup operation fails
+   */
   doBackup?(): Promise<void>;
 
+  /**
+   * Called after backup archive is created.
+   * Use this for cleanup or additional processing of the archive.
+   */
   postBackup?(): Promise<void>;
 }
app/client/packages/rts/src/ctl/backup/links/ManifestLink.ts (1)

1-22: Consider implementing a manifest schema and versioning

For a modular backup system, consider:

  1. Define a TypeScript interface for the manifest structure
  2. Add manifest schema version for future compatibility
  3. Consider adding backup timestamp and other metadata

Example interface suggestion:

interface ManifestData {
  schemaVersion: string;
  timestamp: string;
  appsmithVersion: string;
  dbName: string;
  // Add other relevant metadata
}
app/client/packages/rts/src/ctl/backup/BackupState.ts (3)

1-3: Consider adding an interface for better type safety

Adding an interface would make the contract more explicit and help with future extensions of the backup state.

+interface IBackupState {
+  args: string[];
+  initAt: string;
+  errors: string[];
+  backupRootPath: string;
+  archivePath: string;
+  encryptionPassword: string;
+  isEncryptionEnabled(): boolean;
+}
+
-export class BackupState {
+export class BackupState implements IBackupState {

13-20: Consider typing the constructor args parameter

The args parameter could benefit from a more specific type than string[].

+type BackupArgs = {
+  paths: string[];
+  encryption?: boolean;
+  compression?: boolean;
+};
+
-constructor(args: string[]) {
+constructor(args: BackupArgs) {

22-24: Make encryption check more explicit

The double negation pattern could be replaced with a more explicit check.

-isEncryptionEnabled() {
-  return !!this.encryptionPassword;
+isEncryptionEnabled(): boolean {
+  return this.encryptionPassword.length > 0;
app/client/packages/rts/src/ctl/index.ts (1)

Line range hint 1-70: Architecture feedback on modularization approach.

The command routing logic in this file remains clean and focused. The change to pass arguments to backup.run() aligns well with the modular architecture goals mentioned in the PR objectives. This separation of concerns will make it easier to extend backup functionality in future PRs.

app/client/packages/rts/src/ctl/backup/backup.test.ts (1)

130-174: Consider improving test descriptions for better clarity

The test descriptions could be more specific about the expected behavior. For example:

  • "Should remove oldest file when exceeding limit of 4"
  • "Should remove 3 oldest files when exceeding limit of 2"
  • "Should not remove files when at or below limit"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4727b4d and 75c1d78.

📒 Files selected for processing (10)
  • app/client/packages/rts/src/ctl/backup/BackupState.ts (1 hunks)
  • app/client/packages/rts/src/ctl/backup/backup.test.ts (2 hunks)
  • app/client/packages/rts/src/ctl/backup/index.ts (5 hunks)
  • app/client/packages/rts/src/ctl/backup/links/DiskSpaceLink.ts (1 hunks)
  • app/client/packages/rts/src/ctl/backup/links/EncryptionLink.ts (1 hunks)
  • app/client/packages/rts/src/ctl/backup/links/ManifestLink.ts (1 hunks)
  • app/client/packages/rts/src/ctl/backup/links/index.ts (1 hunks)
  • app/client/packages/rts/src/ctl/index.ts (1 hunks)
  • app/client/packages/rts/src/ctl/restore.ts (3 hunks)
  • deploy/docker/fs/opt/bin/ctl (1 hunks)
🔇 Additional comments (14)
app/client/packages/rts/src/ctl/backup/index.ts (2)

14-23: Modular backup implementation looks good

The introduction of the BackupState and Link classes effectively modularizes the backup process.


Line range hint 85-98: Error handling update is appropriate

Using state.args for the --error-mail flag check aligns with the new argument handling approach.

deploy/docker/fs/opt/bin/ctl (2)

6-9: LGTM! Clean formatting improvement.

The multi-line format enhances readability while maintaining the same functionality.


7-7: Verify source maps configuration in production.

Enabling source maps can impact performance. Ensure this is intentional for the production environment.

✅ Verification successful

Based on the search results, I can see that source maps are intentionally enabled in the production environment:

  1. The TypeScript configuration in app/client/tsconfig.json has "sourceMap": true which is standard for TypeScript projects
  2. The Node.js server startup script app/client/packages/rts/start-server.sh explicitly requires source map support
  3. This is consistent across other TypeScript configurations in the monorepo (RTS package, AST package, Cypress)

Source maps are correctly configured for production debugging.

The source map configuration is intentional and follows best practices for enterprise applications where production debugging capabilities are required. While there might be a minor performance impact, the benefits of being able to debug production issues outweigh the costs.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for source map related configurations and documentation

# Check if source maps are mentioned in documentation
rg -i "source.?map" -g "!*.{map,js,ts}"

# Check for any source map related configurations
rg -i "source.?map.*=(true|false)" -g "*.{json,js,ts,env}"

Length of output: 27532

app/client/packages/rts/src/ctl/backup/links/DiskSpaceLink.ts (1)

4-11: Verify Link interface implementation completeness

Let's ensure all required methods from the Link interface are implemented.

✅ Verification successful

Implementation is complete - all methods are optional

The Link interface defines three optional methods: preBackup?(), doBackup?(), and postBackup?(). The question mark in the method signatures indicates they are optional. DiskSpaceLink implements preBackup(), which is sufficient.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Link interface definition to verify required methods
ast-grep --pattern 'interface Link {
  $$$
}'

Length of output: 969

app/client/packages/rts/src/ctl/backup/links/index.ts (1)

12-13: LGTM!

Clean exports following TypeScript best practices.

app/client/packages/rts/src/ctl/backup/links/ManifestLink.ts (3)

1-6: LGTM! Imports are well-organized and use modern async APIs.


7-8: LGTM! Clean class implementation with proper encapsulation.


10-21: ⚠️ Potential issue

Add error handling and input validation

The backup operation needs more robust error handling and validation:

Consider applying these improvements:

 async doBackup() {
+  try {
+    const dbUrl = utils.getDburl();
+    if (!dbUrl) {
+      throw new Error("Database URL is not configured");
+    }
+
     const version = await utils.getCurrentAppsmithVersion();
     const manifestData = {
       appsmithVersion: version,
-      dbName: utils.getDatabaseNameFromMongoURI(utils.getDburl()),
+      dbName: utils.getDatabaseNameFromMongoURI(dbUrl),
     };

     await fsPromises.writeFile(
-      path.join(this.state.backupRootPath, "/manifest.json"),
+      path.join(this.state.backupRootPath, "manifest.json"),
       JSON.stringify(manifestData, null, 2),
     );
+  } catch (error) {
+    throw new Error(`Failed to create manifest: ${error.message}`);
+  }
 }

Let's verify the database URL handling in other files:

app/client/packages/rts/src/ctl/backup/links/EncryptionLink.ts (1)

1-8: LGTM! Well-structured class declaration with proper TypeScript patterns

Good use of TypeScript features with proper encapsulation using private readonly state.

app/client/packages/rts/src/ctl/index.ts (1)

54-54: Verify argument sanitization in the backup module.

While passing command-line arguments directly to backup.run() provides flexibility, ensure proper argument validation is implemented in the backup module to prevent security issues.

✅ Verification successful

The previous attempt didn't yield results, so I'm trying to:

  1. Locate and examine the backup module implementation
  2. Find how the run function handles its arguments

This will help us verify if there's proper argument validation in place.


I see the backup module uses a chain pattern with a BackupState class, but need to verify how the arguments are validated through the chain of responsibility.


I can see the chain of responsibility pattern is used with three links (DiskSpace, Manifest, and Encryption). Let's examine their implementations to verify how they validate and process the arguments.


Command-line arguments are properly handled and validated

The implementation shows a secure chain of responsibility pattern where:

  • Arguments are encapsulated in a sealed BackupState class
  • Only --non-interactive flag is processed, which is safely checked
  • Disk space validation is performed before backup
  • Encryption is handled securely with password validation
  • No raw command-line arguments are directly executed
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the backup module implements argument validation
# Expected: Find argument validation logic in the backup module

ast-grep --pattern 'function run(args: string[]) {
  $$$
  if ($$$) {
    $$$
  }
  $$$
}'

# Also check for any input validation utilities
rg -l 'validateArgs|sanitize|validate' app/client/packages/rts/src/ctl/backup/

Length of output: 173


Script:

#!/bin/bash
# Let's first find the backup module implementation
fd "backup" app/client/packages/rts/src/ctl/ --type f --exec cat {}

# Also check for any argument handling in the backup module
rg -A 5 "run\(.*\)" app/client/packages/rts/src/ctl/backup/

Length of output: 11188


Script:

#!/bin/bash
# Let's check the Link chain implementation and BackupState usage
rg -A 10 "const chain: Link\[\] = \[" app/client/packages/rts/src/ctl/backup/

# Also check for any Link interface/type definition
ast-grep --pattern 'interface Link {
  $$$
}'

Length of output: 1790


Script:

#!/bin/bash
# Let's check the implementation of the links that handle the backup process
cat app/client/packages/rts/src/ctl/backup/links/DiskSpaceLink.ts
cat app/client/packages/rts/src/ctl/backup/links/ManifestLink.ts
cat app/client/packages/rts/src/ctl/backup/links/EncryptionLink.ts

Length of output: 2138

app/client/packages/rts/src/ctl/backup/backup.test.ts (1)

2-10: LGTM: Clean import restructuring and mock implementation

The import paths have been properly updated to reflect the new directory structure, and the mock implementation is well-focused.

app/client/packages/rts/src/ctl/restore.ts (2)

69-72: LGTM: Improved password prompt implementation

The password prompt is well-implemented with proper security considerations using hideEchoBack.


154-158: Consider cleanup of backup env file

While using cp instead of mv is safer, ensure the backup file doesn't expose sensitive data indefinitely.

✅ Verification successful

Backup file cleanup is handled appropriately

The backup file is created as part of the restore process for safety, but it's not a security concern since:

  • The backup is created in the same protected /appsmith-stacks/configuration/ directory as the original
  • The file is only accessible within the container
  • The original docker.env already contains sensitive data and has the same access restrictions
  • The backup serves as a rollback mechanism in case the restore process fails
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any leftover backup env files
find /appsmith-stacks/configuration -name "docker.env.*" -type f

Length of output: 132


Script:

#!/bin/bash
# Search for any cleanup or removal of backup files in the codebase
rg -A 5 "docker.env.*" 

# Search for any file cleanup patterns in restore functionality
ast-grep --pattern 'function restore($$$) { $$$ }'

Length of output: 12219

tty.isatty((process.stdout as any).fd)
) {
state.encryptionPassword = getEncryptionPasswordFromUser();
for (const link of chain) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Do we expect to run all the links in the chain sequentially always? Or can they be run parallely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's indeed some that "theoretically" can be run in parallel, but I'm not doing that now, and not expecting to solve it. Primarily because running these things in parallel means that the command output gets garbled and will become hard to troubleshoot, and the speed up benefits... I don't think it'll be much. The only place I see can benefit from parallel is running mongodump and pg_dump in parallel, but... that both doesn't seem worth it, and can only be temporary.

@sharat87 sharat87 merged commit 6a31cac into release Nov 28, 2024
45 checks passed
@sharat87 sharat87 deleted the chore/backup-state-chain branch November 28, 2024 07:35
github-actions bot pushed a commit to Zeral-Zhang/appsmith that referenced this pull request Dec 2, 2024
Backup implementation changed to be modular, and made up of separate
pieces. Each piece (link in a chain) is responsible for one
component/functionality of all the data that's being backed up. This PR
introduces the framework for this modularization. The next PR will
finish migration to that architecture.


## Automation

/test sanity

### 🔍 Cypress test results
<!-- This is an auto-generated comment: Cypress test results  -->
> [!TIP]
> 🟢 🟢 🟢 All cypress tests have passed! 🎉 🎉 🎉
> Workflow run:
<https://github.com/appsmithorg/appsmith/actions/runs/12063841586>
> Commit: 75c1d78
> <a
href="https://internal.appsmith.com/app/cypress-dashboard/rundetails-65890b3c81d7400d08fa9ee5?branch=master&workflowId=12063841586&attempt=1"
target="_blank">Cypress dashboard</a>.
> Tags: `@tag.Sanity`
> Spec:
> <hr>Thu, 28 Nov 2024 07:23:30 UTC
<!-- end of auto-generated comment: Cypress test results  -->


## Communication
Should the DevRel and Marketing teams inform users about this change?
- [ ] Yes
- [x] No


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a modular backup system with `BackupState`,
`DiskSpaceLink`, `EncryptionLink`, and `ManifestLink` classes to manage
backup operations more efficiently.
- Added command-line argument support for the backup command, enhancing
flexibility.
- Improved user interaction during backup restoration with prompts for
encryption passwords.

- **Bug Fixes**
- Enhanced error handling and clarity in the restoration process,
particularly for manifest file reading.

- **Documentation**
- Updated test structure to reflect new directory organization and
improved focus on backup cleanup logic.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Required label for CI skip-changelog Adding this label to a PR prevents it from being listed in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants