-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ees 4945 create data set version mappings 3 #5000
Ees 4945 create data set version mappings 3 #5000
Conversation
…d DataSetVersionMappingService implementation and tests.
bbe8b5a
to
fe8ed0a
Compare
…oNone if the service detects no likely candidates, and switched logic of completed mappings to indicate that AutoNone mappings are incomplete until the user confirms them.
… bring PublicDataDbContextModelSnapshot up-to-date
…keys rather than Lists that also contained elements with unique keys, to more easily work with JSON paths and JSON partial updates
…le elements, as currently these mostly only contain a single Label field. Updated PublicDataDbContextModelSnapshot to reflect JSON field mapping simplifications.
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Model/DataSetVersionMapping.cs
Outdated
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Model/DataSetVersionMapping.cs
Outdated
Show resolved
Hide resolved
src/GovUk.Education.ExploreEducationStatistics.Public.Data.Model/DataSetVersionMapping.cs
Outdated
Show resolved
Hide resolved
...on.ExploreEducationStatistics.Public.Data.Processor/Services/DataSetVersionMappingService.cs
Outdated
Show resolved
Hide resolved
...on.ExploreEducationStatistics.Public.Data.Processor/Services/DataSetVersionMappingService.cs
Outdated
Show resolved
Hide resolved
...on.ExploreEducationStatistics.Public.Data.Processor/Services/DataSetVersionMappingService.cs
Outdated
Show resolved
Hide resolved
.ForEach(filterMapping => AutoMapParentAndOptions( | ||
parentMapping: filterMapping, | ||
parentCandidates: filtersPlan.Candidates, | ||
candidateOptionsSupplier: autoMappedCandidate => autoMappedCandidate.Options)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say maybe we could do away with the candidateOptionsSupplier
argument, as you will always be getting the options on the candidate. But are we doing this to keep the flexibility for when we start mapping other facets, which may or may not have the same JSON structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's useful for FilterOptions because in order to work out which candidates are valid, you firstly have to see which Filter candidate is mapped to the FilterOption's owning Filter, and then draw the available FilterOption candidates from that. It's simpler with LocationOptions, because both Mappings and Candidates are grouped under their GeoLevels, so we already know which candidates should be available immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
|
||
if (matchingCandidate is not null) | ||
{ | ||
mapping.CandidateKey = matchingCandidate.Key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line got me thinking...
When we do the automapping, and we find a potential match, this CandidateKey
is always just going to be identical to the source key. So it almost seems pointless setting it. You have the same amount of information by not having a CandidateKey
at all and just having Type
set to AutoMapped
.
However, I'm guessing you want this property CandidateKey
because the user might change it manually - in which case it would no longer be identical?
But if that's the case, then I'm wondering if it should then be called TargetKey
rather than CandidateKey
. Because 'Candidate' implies 'this could be an option'. Whereas, 'Target' implies 'this is the current option I have set it to, or it was automatically set to'. Just makes more sense to me I think?
At the same time, I get why you'd want to name it CandidateKey
to keep it consistent with the naming of the Candidates
array... And Candidates
does feel like the right naming for that bit of the JSON
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it could go either way tbh. You're right that we always set it to the same key as the source during automapping, but we allow the user to update it as well. We might have brainier strategies that look at other metadata in the future and don't rely solely on identical keys to perform an automapping.
I could go either way namewise, but probably err on CandidateKey just so it's clearer what we're mapping to if that's OK with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, all good with me! :) I could also go either way
|
||
public abstract class CreateMappingMiscTests( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove the abstract
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, wonder if we should stick the three test classes that extend CreateMappingsTests
inside of CreateMappingsTests
? Just so that the nesting makes it a bit easier to digest?
Same with the three test classes that extend ApplyAutoMappingsTests
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this one thanks. The remaining abstract ones in this class are all base classes for the test suites rather than test suites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for the second bit I was suggesting nesting the test classes inside of their abstract parent base class, just to make the discovery a little better? It will indent everything by 1 tab of course. Don't feel super strongly about it though
...onStatistics.Public.Data.Processor.Tests/Functions/ProcessNextDataSetVersionFunctionTests.cs
Outdated
Show resolved
Hide resolved
…tial mapping type for LocationOptions. Various simplifications and tidy-ups.
…-mappings-3' into EES-5113-investigate-partial-json-updates
…ay purposes in the front end during mapping process. Swapping Filter Key from being its Label to its column name (PublicId), again for the purposes of the displaying in the front end
…igate-partial-json-updates Ees 5113 investigate partial json updates
Overview
This PR handles the creation of the mapping "diffs" that let us map specific metadata elements of one data set version to the next. Currently these are Locations and Filters / Filter Options.
The original design of this diff structure was proposed in this document. It has changed slightly during the implementation here but not greatly.
In order to do this, this PR:
JSON example of generated mappings
Locations
Filters
Auto-mapping implementation
The auto-mapping process in this PR is very simple. When the empty mappings are first created in the
CreateMappingsFunction
, each mappable element from the source and the target data set versions is allocated a "Key", which is unique enough to identify it within its context in the JSON structure. For example, the Filter Option of "Free school meals" would be enough to uniquely identify that Filter Option within the context of its owning Filter, so it can use this as a Key, whereas Location Options require the use of any applicable Codes as well as a label to guarantee being able to uniquely identify it, and so we use the LocationMeta's RowKey for a Location Option'sKey
value.The auto-mapping process then visits each source element that needs to be mapped, and looks for a candidate element with a matching
Key
as itself. This again is done within the correct context of the source element. For example, when looking to map the Local Authority of "Sheffield", we would only be seeking a candidate Location Option with Key "Sheffield" from within the Local Authorities of the target data set version. Similarly, when seeking a matching key for a Filter Option, we would only be looking at the Filter Options that belong to a Filter that has been mapped already to the source Filter Option's owning Filter.MappingType.None vs AutoNone vs ManualNone?
When the mapping structure is first created, prior to running automappings, all of the potential mappings begin with a value of "None". This indicates that nothings been attempted with them yet.
When automappings have been run, the service will map all mappings to either AutoMapped where it can find a likely candidate, or AutoNone where it's found no likely candidate. We don't treat the AutoNone mappings as being "completed" mappings, but rather wait until the user confirms them at which point they become "ManualNone" and we consider this to be a "completed" mapping. The action in the UI is the "No mapping" link in the prototypes here:
Naming
Source and target data set versions
I refer in the code to "source" data set versions and "target" data set versions when inside the mapping code. Elsewhere these are generally referred to as "initial" and "next" versions, but source and target felt more natural in terms of the function of this code to me.
Source element
Source elements are mappable entities from the original data set version to the next. A source element could be a particular Location from the original (source) data set version for example, e.g. the Local Authority called "Sheffield".
Mappings
A mapping is a "Source" element e.g. "Sheffield", a "CandidateKey" which tells us the ID of another Location that we've found as an appropriate candidate to map to, and a "Type" which tells us if this was an automatic or a manual mapping.
Candidates
A candidate is an element from the target data set version that could be the target for a mapping e.g. the Local Authority "Sheffield" in the target version could be a candidate for a mapping of any Local Authority Locations in the source data set version.
Diffs to plans naming convention
I've moved away from "diffs" in favour of "plans". The backend and the user work to build a mapping plan for each facet of the data set versions that can be mapped (Locations / Filters / Filter Options) and at the end of the process, the plan will be carried out!
JSON structure differences from original proposal
Location mappings and candidates now grouped under their respective geographic levels
In the original proposal, we had the top-level location diff structure modelled as below, where mappings and targets (now candidates) were on the same level and each one had many elements under them grouped by geographic level:
I've now flipped this so that "mappings" and "candidates" (formerly "targets") are grouped under their respective geographic levels thusly:
Whilst working in amongst it, it just felt like it made more sense to have everything encapsulated in a level-by-level basis like this.