-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚚 Spike: Investigate adding AWS DataSync capability to Platform #1309
Comments
Slack discussion |
Refinement (24/10/23) Look for steer from Project Management on Priority/Delivery |
Iteration 4 objective: For this sprint we would like to know an estimated effort/time cost to deliver this functionality as well as the potential compute costs the users would incur to meet their needs. We want the us/users to be able to understand the cost/benefit of implementing the solution and doing the processing. |
Refinment (24/10/23): Spike to be made on Implementation (/ what we'd need from ATOS to implement) |
Looked at as of 02/11/23: Identified home of target SMB server, a Non resolvable intranet address (http://dom1.infra.int/data/HQ/PGO/Shared/Group/Investigations/). Identified intended PoC as per following diagram: Final product may wish to target sync to bucket in Data Account directly, but getting the SMB connection/networking seems to be the primary challenge with this ticket. |
Have heard from Atos that creation of the service account is chargeable, so now it's back with the requestor to reassess. |
Funding has been granted! |
This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open. |
This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you! |
relates to #5175 |
User Story
As a Analytical Platform user
I want to sync unstructured data from a network share
So that that we can perform NLP (Natural Language Processing) on that data to gain insight.
Slack thread
Value
We have had a few requests to have data sync'd to S3 data warehouse for processing.
This is not a current capability of the Analytical Platform. Since this is connected to legacy smb file server (managed by a third party which cannot be changed) we would need a
swing
location for this service to be created to overcome network routing issues.If we can provide a feature of our platform to enable other teams to setup and maintain their own Sync Transfer Tasks Teams can stop using sub optimal methods e.g. Remote Desktop or laptops which have issues around sleeping/terminating.
Questions / Assumptions / Hypothesis
Hypothesis
If we add the AWS Sync service
Then teams can use this to easily import data, rather than using laptops and virtual desktop sessions.
Proposal
Deploy AWS DataSync in Modernisation Platform
Allow teams to manage their sync requirements
Definition of done
Reference
How to write good user stories
The text was updated successfully, but these errors were encountered: