-
-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cultures #5223
Cultures #5223
Conversation
First of all, looking good impaktor! Glad to see your efforts over the last few years have been so fruitful, I think this easily quintuples the number of names we can generate in Pioneer! (You're upstaging the renderer rewrite PR in terms of line count, but I've only been working on that for six months so I don't really care :D)
I've certainly seen plenty of mismatched names in the 'real world', usually in US immigrants who keep the family surname but give their children "western" / "Christian" first names, so it's not as jarring as you think. I'd think maybe 5-10% of names should follow this pattern of using the location's primary culture for first name and a random minority culture for the last name, but that's not exactly low-hanging fruit.
I've been thinking about the idea of a generalized "early lua init" stage which loads a small subset of our lua scripts (pigui themes, custom system defs, etc.) for use in the game's startup procedures; I think providing a list of culture definitions to the c++ system generation stage would be a good candidate for this stage as well, and would allow C++ system and station generation to create the per-system weights as you described. I'm not sure it's something that will be realized during this release cycle, but it's room for future improvement. Just a random pre-review comment: instead of Also: 'Finish' for |
Ha! I hadn't realized until now, that it's 8k lines, in spite of (most languages) only including 100 names per variable. I see Greek is the biggest, thanks to @jimishol. (I assume @nozmajner has us all floored in line counts when pushing models)
Fixed.
Fixed.
Yeah, combining different last and first names would of course be trivial. (I think in the second or third "Ender's Game" book, they live on a Portuguese/Norwegian colony, so you could even specify to mix in a specific way like that, according to some coupling matrix).
Interesting. As long as no other person tries to merge a save bump, there's no rush on this PR. Thanks for the comments! |
Seconding the comment about "mismatched" names - with more and more people having a parent each of very different cultures, or moving elsewhere as a child, it's becoming more and more common to see people with "mismatched" names. |
I shall miss "Vladimir Pidgeon" 😁 |
4b4d981
to
7bcbf3b
Compare
5afc0c1
to
401aa29
Compare
@Web-eWorks I think this is done, (except for squash merge the commits, which I can do). I haven't changed any code since you looked at it, except homogenize format, and remove debug statements, and some documentation. I only have one concern, and that's the |
I did not previously know that some languages and cultures had gendered surnames 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I'm not great at Lua but it all seems logical, well documented and full of good comments and function explanations so easy to follow along.
data/culture/culture.lua
Outdated
Culture.lookup = {} | ||
print("Random generated names from:") | ||
for k, v in pairs(Culture.weights) do | ||
Culture.lookup[v.lang.name] = v.lang | ||
print("* ", k, v.lang.code, v.lang.name, v.lang, v.weight) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend removing this debug print for merge.
Upped the weight of "misc" (our current names, i.e. contributors / AUTHOR.txt + mixed bag of stuff), and lowered Gaelic. These are the weights (in percent) for each language, descending order:
|
This moves name generation to data/cultures/, to make first and last name match the language, as well as sex of first and last name (e.g. for Russian and Greek). Old name gen is now a subset, as a "misc" language. This also brings contributor names up to date in to misc.
(Needed for BBS refactor pioneerspacesim#5312, and Cultures based name gen pioneerspacesim#5223)
Introduction
I thought it time to move on this code, that's been festering in my git repo since 2016, and open up for comments and feedback. The idea behind this PR is to improve the name generator, so that:
First and last names match. No more "Nakamura Smith", or "Vladimir Pidgeon"
Don't spam with unique names. For many languages I could probably find +1000 different male / female names, however, I want the player to learn which first/last names are typical for each language (and all languages +95% of people usually find their name in the top 100 most common), thus I've (in most places) limited myself to the top 100 most common last and first names, thus 300 names per culture (male, female, surname).
The third, and main idea was initially to have each system or station dominated by some language/culture, to give stations some more character (e.g. "Now I'm on a station where NPCs have Russian names"). I got stuck on this for many years as I wanted to expose this to the custom system scripts, however, I've now decided to leave that to some future contributor, and just have random - but weighted - generated names on each station. Thus only point 1. & 2. are implemented here, and then each individual is assigned a culture by some weights I pulled out from my hat, such that English, or Russian, is more likely than Gaelic.
And before someone objects: It's true that airports are "mixed", and our space stations are like airports, but, when I'm at Oslo airport is probably 80%-90% Norwegian, (plus Danes and Swedes). I think one could always have 10% of NPCs sampled from "random" culture, on top of the dominant one, and Earth stations could of course be completely random, but that's the stuff I gave up on in point 3 above, so it's moot anyway.
Example
Here I've pre-fixed each name with language code, just to more clearly see what's going on:
data:image/s3,"s3://crabby-images/f9a05/f9a05da4d1225056b477f5989e4796b4fe44a51e" alt="many_languages"
Implementation
I've added a
data/culture
folder where each culture adds its own rules and names, as well as a super class indata/cultures/common.lua
that is inherited by each language file, and adata/cultures/cultures.lua
file to expose them all to the outside world.All language files are more or less identical, except
ru.lua
,el.lua
, due to overloading the inheritedlastname()
function. Also, I've opted for lua rather than json, as different languages might want to do different funky stuff. Perhaps these can be expanded to also include geographical locations, and such.For now, I've just made the original name generator in
libs/NameGen
call the new methods:where
rand
is optional in the former but not the latter, so it's just a wrapper, and, at the moment,culture
argument defaults according to probabilities specified inCulture.weight
(but one can pass in a culture, e.g."Spanish"
). I think I might be almost trivial to have each space station define it's ownweight
matrix, and override the default inCulture.weigts
, some how, so you could have a 90% Russian, 10% Dutch station, or similarly.To do
This will be Save bump, I assume, so no point in merging this until we get some other save bumping PRs.
Some languages (nb, dk, se, de, is, gd) have non-ascii characters. Although having a "Gödel station" or "Schrödingers Village" would be cool, it would be hard to search for on a US keyboard. I though I could replace those at run-time (at least if it's to be used in a station name, as opposed to NPC name), but lua encodes those characters as two chars? Either way, I didn't get anywhere, and I'd rather not just "remove" them from the name list.
NameGen:Surname()
Should this be expanded to take anisfemale
variable (matters for Slavic and Greek)? It would require some changes on C++ side (I think), as it's used by system generation code.Should we keep the old names? I have made a misc.lua "culture" that's the old stuff, but I've excluded it from this PR for now.
Remove debug printout (like pre-pending name with language code)
Squash commits
Closes #3601