TDR Redacted Files

This lambda is passed an object with an S3 key and bucket. It gets that object from S3 which returns a json object. The results key in that json is a list of file paths and if there is a file matching the redacted file pattern, either finds the original file or returns an error. If the file does not match the redacted file pattern, it returns nothing. The example input here is only part of the full json object but these are the only fields checked.

Given the following input:

{
  "results": [
    {
      "fileId": "079bc416-180c-45cc-a943-7c6d63c21d57",
      "originalPath": "/a/path/file.txt"
    },
    {
      "fileId": "9e31f5f3-7240-4802-9442-766307fc9501",
      "originalPath": "/a/path/file_R1.txt"
    },
    {
      "fileId": "2cfc0597-53d4-4a10-aa5a-e49be49aaa9b",
      "originalPath": "/a/path/file2_R.txt"
    },
    {
      "fileId": "4ad0037f-45ae-410a-9e0f-31bece7cef85",
      "originalPath": "/another/path/file3_R.txt"
    },
    {
      "fileId": "6de7cc09-0bf2-4216-ae78-29b8f9ef6220",
      "originalPath": "/another/path/file3.txt"
    },
    {
      "fileId": "13671f42-5b15-4e55-95e9-607185b84bbd",
      "originalPath": "/another/path/file3.doc"
    },
    {
      "fileId": "f3a6f37e-c0fb-4fdd-b5a0-fe6dd31e57cb",
      "originalPath": "/a/path/file4_R.doc"
    },
    {
      "fileId": "4509ffee-69d2-48da-b771-070a4d3a376d",
      "originalPath": "/a/path/file4_R.pdf"
    },
    {
      "fileId": "8cb1078a-f990-4875-81e1-c4120fdd01f2",
      "originalPath": "/a/path/file5.pdf"
    }
  ]
}

It will group the files by directory

Map(
  "/a/path" -> List("/a/path/file.txt", "/a/path/file_R1.txt", "/a/path/file2_R.txt", "/a/path/file4_R.doc", "a/path/file4_R.pdf", "/a/path/file5.pdf"), 
  "/another/path" -> List("/another/path/file3_R.txt", "/another/path/file3.txt", "/another/path/file3.doc")  
)

For the /a/path directory, it will filter out any file which matches the pattern _R\d*?$ without its file extension. This returns:

"/a/path/file4_R.pdf"
"/a/path/file4_R.doc"
"/a/path/file2_R.txt"
"/a/path/file_R1.txt"

It will then filter any redacted file names with the same name ignoring the file extension. This gives:

"/a/path/file4_R.pdf"
"/a/path/file4_R.doc"

These are returned with the error DuplicateFileName

The remaining redacted files are checked against the non redacted files for original file matches.

file2_R.txt needs to have a matching file called file2.xxx but this isn't in the original array so this returns an error of NoOriginalFile

file_R1.txt needs to have a matching file called file.xxx This is in the original array so this is returned as a matched pair.

For the /another/path folder, this redacted file is found:

"/another/path/file3_R.txt"

There is only one so there is no duplicate, so it then checks the original file list for a match. We are looking for a file called file3.xxx There are two files which match this, file3.txt and file3.doc We can't tell which of these was the original file, so we return an AmbiguousOriginalFile error.

The lambda then returns this json:

{
  "redactedFiles": [
    {
      "originalFileId": "079bc416-180c-45cc-a943-7c6d63c21d57",
      "originalFilePath": "/a/path/file.txt",
      "redactedFileId": "9e31f5f3-7240-4802-9442-766307fc9501",
      "redactedFilePath": "/a/path/file_R1.txt"
    }
  ],
  "errors": [
    {
      "fileId": "2cfc0597-53d4-4a10-aa5a-e49be49aaa9b",
      "cause": "NoOriginalFile"
    },
    {
      "fileId": "f3a6f37e-c0fb-4fdd-b5a0-fe6dd31e57cb",
      "cause": "DuplicateFileName"
    },
    {
      "fileId": "4509ffee-69d2-48da-b771-070a4d3a376d",
      "cause": "DuplicateFileName"
    },
    {
      "fileId": "4ad0037f-45ae-410a-9e0f-31bece7cef85",
      "cause": "AmbiguousOriginalFile"
    }
  ]
}

There is a LambdaRunner class which will take a json string and run the Lambda. This can be used to test various inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
.github		.github
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TDR Redacted Files

About

Releases 89

Packages

Contributors 10

Languages

License

nationalarchives/tdr-redacted-files

Folders and files

Latest commit

History

Repository files navigation

TDR Redacted Files

About

Resources

License

Stars

Watchers

Forks

Releases 89

Packages 0

Contributors 10

Languages

Packages