Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esites: Standardise and clean up New Zealand site records. #571

Open
rondlg opened this issue May 12, 2022 · 20 comments
Open

esites: Standardise and clean up New Zealand site records. #571

rondlg opened this issue May 12, 2022 · 20 comments
Labels
data cleanup esites This issue relates or refers to the sites module

Comments

@rondlg
Copy link
Member

rondlg commented May 12, 2022

Generated from esites Working Group Meeting: #210

@rondlg
Copy link
Member Author

rondlg commented May 12, 2022

From [CP] 13 April 2022

"Hi Wyatt,

Was hoping to give you an update sooner but the right “button was pressed” at the database users meeting and we have the beginnings of a reply!

So, nearly 2 weeks ago I took the plan and the questions regarding islands, use of Maori names, and PD2s/3s to the Regenstein curatorial group. While there was no strong objection to going forward there was also no commitment to approve that plan until we received input from New Zealand in regards to what they are doing and whether they thought this all made sense to them.

That same day I reached out to Te Papa and hadn’t heard a response. So I was wondering if you were going to poke me first or if I would get a response first and it was a little bit of both :)

So, Sharon, you bringing it up at the meeting yesterday activated the process!

So, this morning an e-mail did come in but only an acknowledgement that we would get a reply soon. So I guess that is positive in getting an idea of what NZ is doing and buy in from Pacific team.

One question we can resolve a bit here is a question we had on the diacritics. Wyatt had proposed to insert the diacritic a on the Maori name for the North Island. I had already previously entered it in the sites for those Anthro records from there - and that is reflected in the audit trail. But when North Island of NZ was being worked on in sites that diacritic was removed. So the question is whether there was a reason that diacritic came out - did it pose a problem for EMu etc. or was it removed for some other reason?

@sharon I wasn’t able to find time on your calendar yet for the non-NZ related / backend type questions I previously mentioned. Things pretty much exploded during SAA week and residual activity after.

I also think it makes sense for us - you Wyatt, and I to quickly touch base once we get word from Te Papa. Now that the e-mail I sent off nearly 2 weeks ago has been acknowledged I’ hoping a response we can take action on is imminent!

Chris"

@rondlg
Copy link
Member Author

rondlg commented May 12, 2022

13 April 2022

Diacrits were replaced.

@rondlg
Copy link
Member Author

rondlg commented May 12, 2022

FROM [WG] 13 April 2022

For a refresher from my end: for the first step I am looking to merge all North Island site records into one, and all South Island records into one. The list of sites affected is below; anything with one of these in the parent hierarchy would also be affected. The exact summary string might depend on any restructuring of the top level country/island group records (separate conversation i think). I made this list a couple of weeks ago, so any very recent changes might not be reflected.

For North Island, I want to merge all of the following site records into one record with an Island name of "North Island (Te Ika-a-Māui)". All the other records would get deleted after the merge since they are duplicates.

18712 Oceania, New Zealand, North Island. [LL] {linked to 359 catalogue, 134 CE, 1022+93* sites}
629987 Aotearoa (New Zealand Archipelago), North Island (Te Ika-a-Maui) {linked to 26 sites}
449559 Oceania, New Zealand, North Island. [LL] {linked to 2 CE}
363038 Oceania, New Zealand, North I. [LL] {linked to 1 catalogue, 2 CE, 4 sites}
287443 Oceania, New Zealand, North Id. [LL] {linked to 6 catalogue, 6 CE, 19 sites}
449559 Oceania, New Zealand, North Island. [LL] {linked to 2 CE}

For South Island, a similar approach to merge into one record with and Island name of "South Island (Te Waipounamu)". Sharon fixed the one Malaysian record that was linked here.

18715 Oceania, New Zealand, South Island. [LL] { linked to 375 catalogue, 124 CE, 3 sites}
630400 South Island {linked to 1810 sites}
630402 South Island [Te Waipounamu] {linked to 17 sites}
631144 South Island {linked to 7 sites only}
630401 Aotearoa (New Zealand Archipelago), South Island (Te Waipounamu) {linked to 10 sites}
630521 Aotearoa (New Zealand Archipelago), Te Waipounamu (South Island) {linked to 1 site}
633824 Oceania, New Zealand, South Island. [LL] {linked to 126 sites}
287483 Oceania, New Zealand, South Is. [LL] {linked to 1 catalogue, 1 CE, 2 sites}
633823 Oceania, New Zealand, South I. [LL] {linked to 4 sites}
287466 Oceania, New Zealand, South Id. [LL] {linked to 7 catalogue, 7 CE, 7 sites}

-Wyatt

@rondlg rondlg added esites This issue relates or refers to the sites module data cleanup labels May 12, 2022
@rondlg rondlg assigned rondlg and unassigned rondlg May 12, 2022
@CPhilippField
Copy link

Sent inquiry to Arapata Hakiwai of Te Papa in regards to how Te Papa deals with data covered in this issue.

Arapata replied and forwarded to Dougal Austin who in turn shared with Kristy Cox: reply was this:

Tena koe Christopher,

Due to the technical nature of your inquiry about our database I checked with our Manager, Collections Information System, Kirsty Cox. And she responded with the following including advising on what our system does here at Te Papa and touching also on some of the difficulties involved:

Nga mihi

Dougal Austin

Kia ora Dougal

Regarding place names in Humanities records (including Mātauranga Māori) we primarily utilise the Getty's Thesaurus for Geographic Names (TGN). This of course doesn't always have a 'full' list of all Aotearoa/New Zealand place names and certainly doesn't always have the correct macronisation. However within our version of EMu we have corrected existing place names pulled through from TGN and added the correct macronisation. Note this is rather ad hoc and only when noticed by staff.

Also if you have a quick chat to Safua and Sean in the Pacific Cultures team we have done the same to Sāmoan place names and even manually added new place names that were not previously in our EMu Thesaurus module and added more specific latitude and longitude details to allow for more precise/accurate mapping of the collections.

Ideally we would be feeding these updates to Getty to revise TGN but we haven't had the resources to do so. I plan to get to this within the next financial year (hopefully!).

I should also add that apart from the manual updates to existing TGN terms in our Thesaurus module we haven't done a bulk update of the entire TGN dataset since we first acquired EMu in 2005 and is now very out of date. This is of importance as this thesaurus is regularly updated every month. However to do this on a regular basis in EMu is very expensive - hence instead as a workaround we manually update and insert new place names. For example we requested this for the Ngā Upoko Tukutuku thesaurus to be imported and was quoted NZD$20k.

Regarding ISO 3166 country codes (which is what I believe Christopher is referring to) I don't believe the TGN complies with this as there is no mention of ISO compliance in their info page about TGN (https://www.getty.edu/research/tools/vocabularies/tgn/about.html). The TGN is however complaint with Linked Open Data (LOD) which basically allows for more seamless searching of the 'same' things online - Wikidata is an example of this.

The New Zealand Geographic Board (NZGB) has been officially updating names throughout New Zealand and the Chathams with dual/multiple names, which means that the Māori, English and Moriori names for a place are officially recognised. At Alexander Turnbull Library I was updating all our place names when they were officially updated by NZGB with the dual/multiple official names.

I hope this helps. If Christopher, Arapata or yourself have any further questions please don't hesitate to contact me.

Ngā mihi, nā
Kirsty Cox
Manager, Collections Information System

Category : EMu
Description :

Kia ora Kirsty,

This database inquiry from Chicago has come in via Arapata. It's quite detailed and I'm hoping you can help! But I do like where their thinking is at and maybe we should be doing some of these things if we are not already.

Ngā mihi

Dougal

Get Outlook for Android

From: Arapata Hakiwai <ArapataH@tepapa.govt.nz>
Sent: Thursday, May 19, 2022 8:47:46 PM
To: Christopher Philipp <cphilipp@fieldmuseum.org>; Dougal Austin <Dougala@tepapa.govt.nz>
Cc: Asha Nath <Asha.Nath@tepapa.govt.nz>
Subject: RE: Greetings and Database questions

Kia ora ra Christopher and sincere apologies for not getting back to you earlier. Your suggestions regarding the locations and their names look great and they certainly align with many of ours. Ive included our senior curator Mātauranga Māori Dougal Austin into this email as he can advise you what classifications we use regarding the various names. We also adopt macrons on our Māori words as these are correct in how to write the names as well as their value in how they should be pronounced and accentuated.

Ngā mihi,

Arapata

Dr Arapata Hakiwai
Kaihautū
Museum of New Zealand Te Papa Tongarewa

Executive Assistant : Asha Nath

DDI: +64 4 381 7171 | Email: asha.nath@tepapa.govt.nz

@rondlg
Copy link
Member Author

rondlg commented Jun 2, 2022

@CPhilippField we don't use the thesaurus for localities BUT if Te Papa are willing to share their data with us, between us we could work on getting it uploaded to sites. Just a thought.

@rondlg
Copy link
Member Author

rondlg commented Jun 29, 2022

Hey @CPhilippField

While we wait to get more information from Te Papa on the correct forms for the name, can we go ahead as Wyatt suggests for merging the records?

@rondlg
Copy link
Member Author

rondlg commented Jun 29, 2022

@CPhilippField
Copy link

Hi Sharon, I think it is fine to merge the North Island records and the South Island records and to have them listed as:
North Island (Te Ika-a-Māui)
South Island (Te Waipounamu)
From their initial responses they were supportive of us using the Māori names and diacritics.

On that note could we also add the ā to all the Samoa and American Samoa records. I've already added it to the cultural attributions for Sāmoan and Māori to start.

@wyattgaswick
Copy link

Thank you Sharon for prodding me on this! Sorry for the delay.

North Island is restructured and partly merged - I don't have permissions to merge and delete some of these records, so I will have to leave that step in Sharon's hands. If there aren't any complaints in the next couple of days, I'll handle South Island in the same way. Details:

  • 18712 is the new restructured North Island master record. Summary string is a little awkward to my eyes but am not sure if there is a better way to structure the record:
    Oceania, New Zealand, Aotearoa (New Zealand Archipelago), North Island (Te Ika-a-Māui). [LL]

  • 629987 is merged & deleted

  • 363038, + 287443, + 449559 : I do not have permissions to merge & delete those because they are linked to CEs. Sharon, maybe you can merge these into 18712 and then delete?

Thanks!

@rondlg
Copy link
Member Author

rondlg commented Aug 5, 2022

363038, + 287443, + 449559 are new merged into 18712

@wyattgaswick
Copy link

18715 is the master site for South Island.
Oceania, New Zealand, Aotearoa (New Zealand Archipelago), South Island (Te Waipounamu). [LL]

Sharon : I couldn't merge&delete 287483 or 287466. Can you do those two for me? Thanks!

I suppose that next on the list is standardizing the regions! I'll compile a list of my suggested merge/deletes for that.

@rondlg
Copy link
Member Author

rondlg commented Aug 16, 2022

@wyattgaswick 287483 and 287466 are merged and deleted.

@wyattgaswick
Copy link

Next step : structuring the primary divisions within New Zealand. I'll list duplicates to combine after we're happy with how we want these...it's too much to do all in one chunk.

Main sources: [https://en.wikipedia.org/wiki/Regions_of_New_Zealand], [https://en.wikipedia.org/wiki/List_of_administrative_divisions_by_country]

Regions: pretty clean already, but need Māori names if we want to be consistent. There are 16 of these, each with the New Zealand country record (irn 4442) as political parent already. Each has either North Island (18712) or South Island (18715) as island parent. All have rank = "Region" already.

IRN ---- Current Name ----- Proposed Name
North Island:
336502 Wellington Wellington (Te Whanga-nui-a-Tara)
336501 Manawatu-Wanganui Manawatū-Whanganui
18927 Hawke's Bay Hawke's Bay (Te Matau-a-Māui)
336498 Gisborne Gisborne (Te Tai Rāwhiti)
336497 Bay of Plenty Bay of Plenty (Te Moana-a-Toi)
336499 Waikato [no change]
336496 Northland Northland (Te Tai Tokerau)
4447 Auckland Auckland (Tāmaki-makau-rau)
336500 Taranaki [no change]
South Island:
633825 Southland Southland (Murihiku)
593360 West Coast West Coast (Te Tai Poutini)
19943 Otago Otago (Ōtākou)
19507 Canterbury Canterbury (Waitaha)
336508 Marlborough Marlborough (Te Tauihu-o-te-waka)
336509 Tasman Tasman (Te Tai-o-Aorere)
19700 Nelson Nelson (Whakatū)

New Zealand Island groups outside of these primary regions: All will get New Zealand country record as parent. These would not be included in the term "Aotearoa" as I understand it, so they would have no island group parent (but please correct me if I am wrong). The IRNs listed are for the best existing match I can find to the end goal (please update me if I've missed a good master record candidate).

IRN ---- Current Rank ------ Current Name ----- Proposed Name
630983 Island Group Kermadec Islands Kermadec Islands (Rangitāhua)
631168 Island Group Three Kings Islands Three Kings Islands (Manawatāwhi)
630848 Island Group Chatham Islands Chatham Islands (Rēkohu) [this is the Moriori name; Maori is Wharekauri but it seems like Moriori would be the best language to use)
Wikipedia lists the following 5 as "New Zealand Subantarctic Islands", but I would avoid using it this since it looks unnatural and we aren't using it so far.
630772 Island Group Auckland Islands Auckland Islands (Motu Maha)
628973 Island Group Campbell Islands [currently wrong parent] [no change]
386422 Precise Locality Snares Islands Snares Islands (Tini Heke)
[to add] Antipodes Islands (Moutere Mahue)
[to add] Bounty Islands (Moutere Hauriri)

Island not included here:
Stewart Island (Rakiura). It's part of the Southland region, so I would make it a child of that Region record. This would technically make it underneath South Island in the hierarchy...but otherwise I would have to make a second separate Southland record for on/off of South Island, which I would really prefer to avoid because it'll be confusing. If there is strong disagreement, I can implement a two-record Southland system.

Other things not included here that are part of the "Realm of New Zealand":
Ross Dependency (part of Antarctica)
Tokelau (country)
Niue (country)
Cook Islands (country)

Duplicates:
This is the next step if everyone is happy with the structure and name changes outlined above. Please suggest/comment otherwise! Changes can always be made later but it'd be nice to get it right from the start.

@rondlg
Copy link
Member Author

rondlg commented Aug 26, 2022

Regions: Definitely include the Māori names
New Zealand Island groups outside of these primary regions: As these are island groups in and of themselves they can stay without parents, though they can themselves be parents and had sub-groups.
New Zealand Subantarctic Islands: We can do these as a separate Antarctica task.
Southland system: One record gets my vote.

@wyattgaswick
Copy link

The name changes listed above have been made.

The new Stewart Island (Rakiura) record is 630451. There are two other Stewart Island records that I would merge & delete except one has a site identifier [KC98] so I am going to ask fishes/zoology permission first. That record is 593395 (Stewart Island, Precise Locality) which has the other duplicate as an island parent (630452, Island).

@rondlg
Copy link
Member Author

rondlg commented Oct 11, 2022

630452 is merged

@wyattgaswick
Copy link

Merging a few duplicates region/pd2 duplicates that I think are simple enough to be non-controversial. I don't have permissions and need IT help to merge the following duplicate records into the proper regions :

Region: 19507 Oceania, New Zealand, Aotearoa (New Zealand Archipelago), South Island (Te Waipounamu), Canterbury (Waitaha). [LL]

Duplicate to merge into 19507:
711854 [pd2] Oceania, New Zealand, Canterbury Region

Region: 633825 Oceania, New Zealand, Aotearoa (New Zealand Archipelago), South Island (Te Waipounamu), Southland (Murihiku). [LL]

Duplicate to merge into 633825:
428057 [Region] Oceania, New Zealand, Aotearoa (New Zealand Archipelago), South Island (Te Waipounamu), Southland Region (Murihiku). [LL]

Ty!

@rondlg
Copy link
Member Author

rondlg commented Aug 15, 2023

@wyattgaswick You should have edit permissions to:

  • 19507
  • 711854
  • 633825
  • 428057

@wyattgaswick
Copy link

Ah the thing is - I have permissions for the sites, but I don't have the permissions to edit all of the attached CEs etc that are affected during the merge.

@rondlg
Copy link
Member Author

rondlg commented Aug 15, 2023

I get it. Those records are now merged and the duplicates deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data cleanup esites This issue relates or refers to the sites module
Projects
None yet
Development

No branches or pull requests

3 participants