Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Table Request - Code Table Cleanup: taxonomic names in parts #7752

Open
1 of 8 tasks
Jegelewicz opened this issue May 3, 2024 · 14 comments
Open
1 of 8 tasks

Code Table Request - Code Table Cleanup: taxonomic names in parts #7752

Jegelewicz opened this issue May 3, 2024 · 14 comments
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Denormalizer Issue is making data less-accessible Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...

Comments

@Jegelewicz
Copy link
Member

Jegelewicz commented May 3, 2024

Goal

Describe what you're trying to accomplish. This is the only necessary step to start this process. The Committee is available to assist with all other steps. Please clearly indicate any uncertainty or desired guidance if you proceed beyond this step.

Currently we have part names that are also taxa. It would be preferable to include these in identifications in order to increase discoverability.

Context

Describe why this new value is necessary and existing values are not.

Now that we have the ability to record more than one accepted identification on a catalog record, I propose that we add the identifications to the affected records and replace the part names with whole organism. The identification order should probably be determined by the collections using the part names.

Table

Code Tables are http://arctos.database.museum/info/ctDocumentation.cfm. Link to the specific table or value. This may involve multiple tables and will control datatype for Attributes. OtherID requests require BaseURL (and example) or explanation. Please ask for assistance if unsure.

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecimen_part_name

Proposed Value

Proposed new value. This should be clear and compatible with similar values in the relevant table and across Arctos.

current part name new part name new part remark add record identification add identification remark
acanthocephala whole organism acanthocephala Acanthocephala from original part name acanthocephala
cestode whole organism cestode Cestoda from original part name cestode
ectoparasite whole organism ectoparasite requires research to select a taxon from original part name ectoparasite
endoparasite whole organism endoparasite requires research to select a taxon from original part name endoparasite
nematode whole organism nematode Nematoda from original part name nematode
trematode whole organism trematode Trematoda from original part name trematode

Alternatively, we could add associated species attributes.

For an example of how this would look (both as an identification and as associated species) see https://arctos.database.museum/guid/UTEP:Herp:11727

The first option is more powerful for taxonomy searching. Nobody is going to find the tick as long as the taxonomy is buried in a part name or an attribute.

Proposed Definition

Clear, complete, non-collection-type-specific functional definition of the value. Avoid discipline-specific terminology if possible, include parenthetically if unavoidable.

N/A

Collection type

Some code tables contain collection-type-specific values. collection_cde may be found from https://arctos.database.museum/home.cfm

N/A

Attribute Extras

Attribute data type

If the request is for an attribute, what values will be allowed?
free-text, categorical, or number+units depending upon the attribute (TBA)

N/A

Attribute controlled values

If the values are categorical (to be controlled by a code table), add a link to the appropriate code table. If a new table or set of values is needed, please elaborate.

N/A

Attribute units

if numerical values should be accompanied by units, provide a link to the appropriate units table.

N/A

Part preservation attribute affect on "tissueness"

if a new part preservation is requested, please add the affect it would have on "tissueness": No Influence, Allows, or Denies

N/A

Priority

Please describe the urgency and/or choose a priority-label to the right. You should expect a response within two working days, and may utilize Arctos Contacts if you feel response is lacking.

Example Data

Requests with clarifying sample data are generally much easier to understand and prioritize. Please attach or link to any representative data, in any form or format, which might help clarify the request.

@dustymc please provide a list of use

Available for Public View

Most data are by default publicly available. Describe any necessary access restrictions.

N/A

Helpful Actions

  • Add the issue to the Code Table Management Project.

  • Please reach out to anyone who might be affected by this change. Leave a comment or add this to the Committee agenda if you believe more focused conversation is necessary.

@ArctosDB/arctos-code-table-administrators

Approval

All of the following must be checked before this may proceed.

The How-To Document should be followed. Pay particular attention to terminology (with emphasis on consistency) and documentation (with emphasis on functionality). No person should act in multiple roles; the submitter cannot also serve as a Code Table Administrator, for example.

  • Code Table Administrator[1] - check and initial, comment, or thumbs-up to indicate that the request complies with the how-to documentation and has your approval
  • Code Table Administrator[2] - check and initial, comment, or thumbs-up to indicate that the request complies with the how-to documentation and has your approval
  • DBA - The request is functionally acceptable. The term is not a functional duplicate, and is compatible with existing data and code.
  • DBA - Appropriate code or handlers are in place as necessary. (ID_References, Media Relationships, Encumbrances, etc. require particular attention)

Rejection

If you believe this request should not proceed, explain why here. Suggest any changes that would make the change acceptable, alternate (usually existing) paths to the same goals, etc.

  1. Can a suitable solution be found here? If not, proceed to (2)
  2. Can a suitable solution be found by Code Table Committee discussion? If not, proceed to (3)
  3. Take the discussion to a monthly Arctos Working Group meeting for final resolution.

Implementation

Once all of the Approval Checklist is appropriately checked and there are no Rejection comments, or in special circumstances by decree of the Arctos Working Group, the change may be made.

  • Review everything one last time. Ensure the How-To has been followed. Ensure all checks have been made by appropriate personnel.

  • Add or revise the code table term/definition as described above. Ensure the URL of this Issue is included in the definition. URLs should be included as text, separated by spaced pipes. Do not include HTML in definitions.

Close this Issue.

DO NOT modify Arctos Authorities in any way before all points in this Issue have been fully addressed; data loss may result.

Special Exemptions

In very specific cases and by prior approval of The Committee, the approval process may be skipped, and implementation requirements may be slightly altered. Please note here if you are proceeding under one of these use cases.

  1. Adding an existing term to additional collection types may proceed immediately and without discussion, but doing so may also subject users to future cleanup efforts. If time allows, please review the term and definition as part of this step.
  2. The Committee may grant special access on particular tables to particular users. This should be exercised with great caution only after several smooth test cases, and generally limited to "taxonomy-like" data such as International Commission on Stratigraphy terminology.
@Jegelewicz Jegelewicz added Function-CodeTables CodeTableCleanup Our bad data leads to more bad data. Fix it! Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ... labels May 3, 2024
@Jegelewicz Jegelewicz added this to the Need More Information milestone May 3, 2024
@Jegelewicz Jegelewicz changed the title Code Table Request - Code Table Cleanup: taxonomic names Code Table Request - Code Table Cleanup: taxonomic names in parts May 3, 2024
@dustymc
Copy link
Contributor

dustymc commented May 3, 2024

list of use

Data:

temp_tax.csv.zip

Summary:

 guid_prefix | count 
-------------+-------
 MVZ:Mamm    |   779
 UCM:Mamm    |    11
 NMMNH:Mamm  |   867
 ASNHC:Mamm  |   127
 ASNHC:Bird  |     7
 DMNS:Bird   |  1122
 UWYMV:Mamm  |     2
 NMU:Mamm    |  1911
 DGR:Mamm    |     5
 DMNS:Herp   |     1
 DMNS:Mamm   |  2322
 MLZ:Mamm    |    19
 CRCM:Mamm   |     1
 UAM:Mamm    | 11518
 MSB:Mamm    | 38311
 UWBM:Mamm   |    17
 MSB:Fish    |     2
 MVZ:Herp    |    82
 CHAS:Mamm   |     1
 MSB:Herp    |    33
 CHAS:Teach  |     1
 UMZM:Mamm   |     7
 UCM:Bird    |    17
 UMNH:Mamm   |   671
 MSB:Bird    |   715
 UTEP:Herp   |    12
 CHAS:Bird   |     2
 DGR:Bird    |     1
 UWYMV:Bird  |    21
 UMNH:Herp   |     4
 UCM:Herp    |     2
 KSB:Mamm    |   238
 UMZM:Bird   |    16
 ASNHC:Herp  |     7
 MVZ:Bird    |   280

Ping:

@jrdemboski
@keg34
@amgunderson
@jldunnum
@catherpes
@AdrienneRaniszewski
@campmlc
@jtgiermakowski
@ccwlobo
@adhornsby
@ccicero
@mkoo
@cjconroy
@atrox10
@jebrad
@ewommack
@mvzhuang
@acdoll
@lin-fred
@ebraker
@droberts49
@kderieg322079
@wellerjes
@rwilhoyt
@jessicatir

@cjconroy
Copy link

cjconroy commented May 3, 2024

There seems to be a second change here. Getting rid of the part ectoparasite and the part endoparasite. This list for MVZ mammals is not a list of taxa that are actually parasties. They are lists of all our endo and ectoparasites. We've been using those parts for many years. I think relegating that to a remark is going to be missed by a lot of people.

@campmlc
Copy link

campmlc commented May 3, 2024

Just curious why this is hurting anything? Why not have the mores specific info when we have it? This affects our MSB collections in a major way. Not a good time to emphasize this.

@campmlc
Copy link

campmlc commented May 3, 2024

Besides, these are just barely "taxonomic names". It's like having a part name "Chordata" or "Aves". This primarily affects MSB, and we have this level of specificity for a reason. It is not a good time for us to be tackling this particular issue. Surely there are other, easier projects to focus on.

@campmlc
Copy link

campmlc commented May 3, 2024

I strongly oppose this because of "resources". We do not want to loose more specific info and mush it into uninformative blah without a good reason.

@jldunnum
Copy link

jldunnum commented May 3, 2024 via email

@catherpes
Copy link

catherpes commented May 3, 2024 via email

@Jegelewicz
Copy link
Member Author

Just curious why this is hurting anything?

Arctos proposes to offer "research grade data". The list that Dusty provided represents at least 59,132 organisms that cannot be found in a taxonomic search and at least 59,132 whole organism parts that will not be found in a search for whole organisms.

Arctos also proposes to be based on the idea of data normalization. For me, recording a taxonomic classification in a part name is a clear violation of the effort to eliminate redundancy and inconsistent dependencies.

Anyone looking for ticks to study will never find any of the ticks included in this list unless they happen to know that there are two ways to find ticks in Arctos. In addition, they would need to sort through comments on thousands of "ectoparasite" parts to see if any of them are actually ticks. I think that is hurting everyone.

@campmlc
Copy link

campmlc commented May 4, 2024

That isn't correct. The parasites are being cataloged as separate parasite records with embedded higher classification as we speak. But this happens after they are collected and cataloged as parts of the host. We need the specificity of these part names to facilitate the transfer and cataloging of the related whole organism records. We do not want to mush these all into some generic part name in the host record as that would impede this process. And it is actually possible to find cestodes and ticks from the host records now.

@campmlc
Copy link

campmlc commented May 4, 2024

MSB has developed an entire protocol from field collection to host cataloging to parasite part identification to parasite cataloging that specifically results in independent catalog records linked by searchable relations. This process depends on the specificity of these host part names. What is proposed here would make this already challenging workflow even more challenging, for no purpose. I do not support this request.

@acdoll
Copy link

acdoll commented May 6, 2024

Anyone looking for ticks to study will never find any of the ticks included in this list unless they happen to know that there are two ways to find ticks in Arctos.

Researchers have been finding and requesting these parts for the last 20yrs. Is it easy and obvious? No, but they are discoverable. Can we make it easier for them? Yes, but there is a cost.

We only have a few thousand parasite parts and we don't have the resources available for making this change in a reasonable amount of time. UAM and MSB have tens of thousands of records with these parts, we can't expect them to make such a major change if they're not on board. It is a long-term goal to get all of our (DMNS) parasites cataloged as separate records, but that is not our priority right now. Unless there is some structural/stability issue for Arctos here, I don't see why this as a priority (wildfire?).

'Ectoparasite' and 'endoparasite' are such broad (para/polyphyletic) descriptions that, IMHO, they don't really qualify as taxonomic classifications. For many of our older records, that is often all we have recorded for the part, so adding a new identification for these is not helpful. Once we are able to examine and identify these, we will then catalog them and relate those records to the host.

As for the better defined parasite parts, e.g., nematode, moving the names into a remark on 'whole organism' seems to further bury these data, making them less discoverable. Also, for me personally, I don't like having them added to the identifications of the record - it seems to add unnecessary clutter and confusion to the host's record.

I'm open to other suggestions, but, as it stands, I would vote against getting rid of these parts for the time being.

@amgunderson
Copy link
Contributor

amgunderson commented May 6, 2024

No, immediately no. Many of our parasites are cataloged in our entomology collection as unique records, readily discoverable and identified by experts to appropriate taxa. To make identifications of those parts would essentially duplicate those specimen records. The part in the mammal record indicates that parasites exist for that mammal specimen making specimens with parasites easily discoverable. There is no problem here.

@campmlc
Copy link

campmlc commented May 6, 2024

I just did a search as a public user, not logged in, for ticks and mites (Acari) in MSB:Para collections with mammal host records across Arctos. I used the public preset advanced bio geo (note we've requested an even more explicity preset for biotic relationships here: #7725 ).
I used the following terms: collection = MSB:Para, Any Taxon ID = Acari (subclass for ticks and mites), and Related Item Taxonomy = Mammalia, and I get 4,919 results, all of which are ID'd as are some level of classification within Acari and have relationships of "parasite of" to a cataloged mammal record. See below.
If I log in and customize the selected profile by adding Identifiers to that selection, I can add identifier type = AF and get 133 ticks identified to species at MSB:Para that are related to UAM:Mamm hosts. This number will only increase as we continue to build these cross-institutional linkages. This is not a difficult or onerous process to explain to collection staff, researchers or the public.
Creating these relationships is ongoing - we are gradually working through our legacy "ectoparasite" and "endoparasite" host parts (note, we didn't even have these part names when I started in 2011 - parasite info was added to host remarks in various fields if it was added at all.) We now have various grant funded projects to catalog these related parasites in separate parasite collections (which also didn't exist prior to 2011), and we are slowly working through these, including linking to other collections such as UAM, DMNS, UNCG, and NMU. All our new incoming mammal parasites are getting cataloged in real time from the MSB Mamm records - but this depends on the presence of these specific parts and part attributes and container labels which allow us to find and distinguish the different parasite part types associated with each barcode. This is continually improving as we demonstrate the ability of Arctos to handle these types of data, resulting in new funding. This is an active and collaborative process, supported by multiple NSF initiatives, including funding provided explicitly for this purpose to from MSB to Arctos.
Any changes at this point to part names etc will only serve to obfuscate and impede this process and cause confusion and inconsistency. The existing legacy and current part names are a bridge to the gold standard of creating related catalog items. But they can also currrently still be used to find parasites in the meantime through multiple pathways, and that is needed given the decades of legacy data and limited availability of resources to make these conversions.
This is also why we have been repeatedly and urgently requesting ways of making this process easier and more transparent: see #6249 #7675 #7725 #7726 as well as related issues all designed to improve these types of queries for both hosts and parasites: e.g. #6507 and urgent handwaving and calls to prevent loss of ability to create these relationship in the first place e.g. #6738 and #5707 and all associated discussions. These are not being requested in isolation. And these issues heavily impact MSB and related collections trying to innovate and push the boundaries of Arctos as a leader in biodiversity collections management.

Here is the search I just did to find ticks in MSB collections that are parasites of Mammalia as a public user:
image

And the results:
Screenshot 2024-05-06 15 20 48

And results logged in for search on MSB AF records with related UAM mammal hosts of Acari (ticks and mites):

image

And results of above with the related records as urls:

image

@jldunnum
Copy link

jldunnum commented May 6, 2024

The collections primarily affected by this issue have all weighed in now. They have determined that at this moment making this change would cause far more problems than solutions in terms of current data management and facilitating cataloging of parasite data in the not too distant future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Denormalizer Issue is making data less-accessible Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...
Projects
None yet
Development

No branches or pull requests