🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309

bagg3rs · 2023-08-29T15:26:16Z

User Story

As a Analytical Platform user
I want to sync unstructured data from a network share
So that that we can perform NLP (Natural Language Processing) on that data to gain insight.

Slack thread

Value

We have had a few requests to have data sync'd to S3 data warehouse for processing.
This is not a current capability of the Analytical Platform. Since this is connected to legacy smb file server (managed by a third party which cannot be changed) we would need a swing location for this service to be created to overcome network routing issues.

If we can provide a feature of our platform to enable other teams to setup and maintain their own Sync Transfer Tasks Teams can stop using sub optimal methods e.g. Remote Desktop or laptops which have issues around sleeping/terminating.

Questions / Assumptions / Hypothesis

Hypothesis

If we add the AWS Sync service
Then teams can use this to easily import data, rather than using laptops and virtual desktop sessions.

Proposal

Deploy AWS DataSync in Modernisation Platform
Allow teams to manage their sync requirements

Definition of done

AWS DataSync deployed and tested
Metadata applied to transferred data
Findings documented

Reference

How to write good user stories

The text was updated successfully, but these errors were encountered:

bagg3rs · 2023-08-30T12:11:13Z

Slack discussion

jhpyke · 2023-10-24T10:32:00Z

Refinement (24/10/23) Look for steer from Project Management on Priority/Delivery

YvanMOJdigital · 2023-10-24T10:46:39Z

Iteration 4 objective: For this sprint we would like to know an estimated effort/time cost to deliver this functionality as well as the potential compute costs the users would incur to meet their needs. We want the us/users to be able to understand the cost/benefit of implementing the solution and doing the processing.

jhpyke · 2023-10-24T10:51:31Z

Refinment (24/10/23): Spike to be made on Implementation (/ what we'd need from ATOS to implement)

jhpyke · 2023-11-02T14:32:25Z

Looked at as of 02/11/23:

Identified home of target SMB server, a Non resolvable intranet address (http://dom1.infra.int/data/HQ/PGO/Shared/Group/Investigations/). Identified intended PoC as per following diagram:

Final product may wish to target sync to bucket in Data Account directly, but getting the SMB connection/networking seems to be the primary challenge with this ticket.

julialawrence · 2023-11-21T12:09:36Z

Have heard from Atos that creation of the service account is chargeable, so now it's back with the requestor to reassess.

bagg3rs · 2023-12-07T10:09:49Z

Funding has been granted!

github-actions · 2024-02-06T01:47:27Z

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

github-actions · 2024-02-13T01:47:40Z

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!

bagg3rs · 2024-08-29T10:02:02Z

relates to #5175

bagg3rs added the Data Platform Core Infrastructure label Aug 29, 2023

moj-data-platform-robot added this to Analytical Platform Aug 29, 2023

jacobwoffenden added data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools and removed Data Platform Core Infrastructure labels Sep 22, 2023

julialawrence changed the title ~~🚚 Investigate adding AWS DataSync capability to Platform~~ 🚚 Spike: Investigate adding AWS DataSync capability to Platform Oct 26, 2023

julialawrence added the 🧑‍💻 Apps & Tools BAU (Epic #1827) label Oct 26, 2023

github-actions bot mentioned this issue Oct 26, 2023

🧑‍💻 Apps & Tools BAU #1827

Open

77 tasks

julialawrence moved this to 🧐 To Do in Analytical Platform Oct 31, 2023

jhpyke moved this from 🧐 To Do to 💨 In Progress in Analytical Platform Nov 2, 2023

murad-ali-MoJ assigned jhpyke, julialawrence and murad-ali-MoJ Nov 6, 2023

julialawrence moved this from 💨 In Progress to ✋ Blocked in Analytical Platform Nov 9, 2023

jhpyke moved this from ✋ Blocked to 💨 In Progress in Analytical Platform Nov 13, 2023

julialawrence moved this from 💨 In Progress to ✋ Blocked in Analytical Platform Nov 21, 2023

github-actions bot added the stale label Feb 6, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 13, 2024

jacobwoffenden moved this from 🚫 Blocked to 🎉 Done in Analytical Platform Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309

🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309

bagg3rs commented Aug 29, 2023 •

edited

Loading

bagg3rs commented Aug 30, 2023

jhpyke commented Oct 24, 2023 •

edited

Loading

YvanMOJdigital commented Oct 24, 2023

jhpyke commented Oct 24, 2023

jhpyke commented Nov 2, 2023

julialawrence commented Nov 21, 2023

bagg3rs commented Dec 7, 2023

github-actions bot commented Feb 6, 2024

github-actions bot commented Feb 13, 2024

bagg3rs commented Aug 29, 2024

🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309

🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309

Comments

bagg3rs commented Aug 29, 2023 • edited Loading

User Story

Value

Questions / Assumptions / Hypothesis

Hypothesis

Proposal

Definition of done

Reference

bagg3rs commented Aug 30, 2023

jhpyke commented Oct 24, 2023 • edited Loading

YvanMOJdigital commented Oct 24, 2023

jhpyke commented Oct 24, 2023

jhpyke commented Nov 2, 2023

julialawrence commented Nov 21, 2023

bagg3rs commented Dec 7, 2023

github-actions bot commented Feb 6, 2024

github-actions bot commented Feb 13, 2024

bagg3rs commented Aug 29, 2024

bagg3rs commented Aug 29, 2023 •

edited

Loading

jhpyke commented Oct 24, 2023 •

edited

Loading