[Feature Request]: Support converting Nominal-Text -> Nominal/Ordinal #1633

JorisGoosen · 2022-02-23T12:40:59Z

This has come up often and we added some feedback for users on why a column can't be converted to another type at jasp-stats/jasp-desktop@769383d for our internal issue https://github.com/jasp-stats/INTERNAL-jasp/issues/977

#1258 is perhaps related as well as #1581

But now also @EJWagenmakers was asking me about it and I think it shouldn't be so very hard to just do the following:

Support Nominal-Text -> Nominal/Ordinal
Where we drop the original strings as in, for instance, a csv file and assign an integral value based on the order at the moment of conversion. This will "lose" some information but this is not so bad.

Converting to scalar would then still fail, because then even the labels would be lost and I suppose that is not what one wants?
On the other hand, a messagebox asking the user whether they are ok with losing the data could also be done I suppose.

And also this: https://github.com/jasp-stats/INTERNAL-jasp/issues/1397

The text was updated successfully, but these errors were encountered:

JoKeyser · 2022-02-23T15:43:12Z

Sorry if I add just noise, as I'm out of my comfort zone: But maybe there is a better way than a conversion "based on the order at the moment of conversion"? It seems that this may lead to confusing effects if a user changes/adds/removes strings?
Maybe there is some "canonical" conversion based on the strings actual binary representation? If so, would that not be more stable?

vandenman · 2022-02-23T16:05:19Z

I'd do whatever R does when it converts character to factor:

set.seed(123)
c <- sample(letters, 5)
f <- factor(c)
# alphabetical
print(data.frame(
  character = c,
  factor    = f,
  integer   = as.integer(f)
), row.names = FALSE)
#>  character factor integer
#>          o      o       4
#>          s      s       5
#>          n      n       3
#>          c      c       1
#>          j      j       2

c <- c("汉", "字", letters[seq(3, 1, -1)])
f <- factor(c)
# no idea what determines the order for the chinese characters
print(data.frame(
  character = c,
  factor    = f,
  integer   = as.integer(f)
), row.names = FALSE)
#>  character factor integer
#>         汉     汉       5
#>         字     字       4
#>          c      c       3
#>          b      b       2
#>          a      a       1

Maybe there is some "canonical" conversion based on the strings actual binary representation

Perhaps that's what R does? I think it just sorts the unique values and uses that to assign integer values.

JoKeyser · 2022-02-23T18:30:24Z

@vandenman well I meant a conversion based on the actual string "value", not its ordered position.
However, I think I realize that this idea is limited by the fact that strings could be arbitrarily long, so there is no feasible conversion.
And my issue with using factor() is purely theoretical. Probably it's best to use something simple like that, and see if any real-world problems arise.

JorisGoosen · 2022-02-23T21:34:04Z

Sorry if I add just noise, as I'm out of my comfort zone: But maybe there is a better way than a conversion "based on the order at the moment of conversion"? It seems that this may lead to confusing effects if a user changes/adds/removes strings? Maybe there is some "canonical" conversion based on the strings actual binary representation? If so, would that not be more stable?

Well, actually this is how we use them inside analyses already, if you change the order of the labels in the variableswindow then that will change the order in the resulting factor that is fed to the analysis, and thus anything in R that depends on that.

To make that a bit more clear, when we feed the nominal-text column to R now it is in fact converted into a factor. Running from 1 to columnLength in the exact order of the labels as seen in the variableswindow.

So just using that when converting to nominal and ordinal should be alright. And it allows for users deciding the order of their scales and things like that which I assume they want. (And we wouldn't get if we just order it based on the strings)

shun2wang · 2022-02-23T23:50:02Z

no idea what determines the order for the chinese characters

that's ok,The order of Chinese characters is usually not considered because in quantitative data analysis practice, Chinese characters are generally used as labels but not treated as values. If ordering is to be considered, I would suggest ordering by value.

vandenman · 2022-02-24T06:50:12Z

To make that a bit more clear, when we feed the nominal-text column to R now it is in fact converted into a factor. Running from 1 to columnLength in the exact order of the labels as seen in the variableswindow.

I think this makes sense.

So just using that when converting to nominal and ordinal should be alright. And it allows for users deciding the order of their scales and things like that which I assume they want. (And we wouldn't get if we just order it based on the strings)

Also makes sense. There is one edge case though that I would check for. In R, this situation can occur:

f <- factor(as.character(1:11))
f # order from sorting 1:11 as strings
#>  [1] 1  2  3  4  5  6  7  8  9  10 11
#> Levels: 1 10 11 2 3 4 5 6 7 8 9
fSorted <- factor(f, levels = sort(as.numeric(levels(f))))
fSorted # order from sorting 1:11 as numbers
#>  [1] 1  2  3  4  5  6  7  8  9  10 11
#> Levels: 1 2 3 4 5 6 7 8 9 10 11

where the default levels (first print) have order 1 10 11 2 ... because of string sorting. The second ordering 1 2 3 4 ... is probably closer to what people expect.

Also, I'd imagine this is just the default conversion from nominal text to Nominal/ ordinal. Afterward, people should be able to change the order and labels in any way they want.

that's ok,The order of Chinese characters is usually not considered because in quantitative data analysis practice, Chinese characters are generally used as labels but not treated as values. If ordering is to be considered, I would suggest ordering by value.

Sure, but the issue is that we need a consistent way to assign values to text. That text may consist of Chinese characters, Hebrew symbols, or who knows what kind of characters. However, initially, there is no value we can use to order by.

tomtomme · 2024-05-24T17:09:47Z

@JorisGoosen
Nominal Text was axed, correct?
So this issue is now solved?

JorisGoosen · 2024-05-24T18:54:21Z

Indeed!

JorisGoosen added the Feature Request label Feb 23, 2022

JorisGoosen self-assigned this Feb 23, 2022

JorisGoosen mentioned this issue Feb 23, 2022

[Feature-Request]: export dataset from REDCap to JASP with labels #1581

Open

3 tasks

tomtomme added the Component: Data Editing label Jan 23, 2024

tomtomme mentioned this issue Jan 23, 2024

SPSS scalar column with labels become nominal-text, which is not practical #1323

Closed

tomtomme assigned vandenman Feb 12, 2024

tomtomme unassigned vandenman May 24, 2024

JorisGoosen closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Support converting Nominal-Text -> Nominal/Ordinal #1633

[Feature Request]: Support converting Nominal-Text -> Nominal/Ordinal #1633

JorisGoosen commented Feb 23, 2022 •

edited

Loading

JoKeyser commented Feb 23, 2022

vandenman commented Feb 23, 2022

JoKeyser commented Feb 23, 2022 •

edited

Loading

JorisGoosen commented Feb 23, 2022

shun2wang commented Feb 23, 2022

vandenman commented Feb 24, 2022

tomtomme commented May 24, 2024

JorisGoosen commented May 24, 2024

[Feature Request]: Support converting Nominal-Text -> Nominal/Ordinal #1633

[Feature Request]: Support converting Nominal-Text -> Nominal/Ordinal #1633

Comments

JorisGoosen commented Feb 23, 2022 • edited Loading

JoKeyser commented Feb 23, 2022

vandenman commented Feb 23, 2022

JoKeyser commented Feb 23, 2022 • edited Loading

JorisGoosen commented Feb 23, 2022

shun2wang commented Feb 23, 2022

vandenman commented Feb 24, 2022

tomtomme commented May 24, 2024

JorisGoosen commented May 24, 2024

JorisGoosen commented Feb 23, 2022 •

edited

Loading

JoKeyser commented Feb 23, 2022 •

edited

Loading