-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Bulgarian transcriptions #9
Comments
Could you move this over the WikiPron issue tracker as well (cf.
https://github.com/kylebgorman/wikipron/issues/138) and follow the
procedure I suggested there?
1. We need to pick one as the UR. Does your informant have any opinions on
the subject?
2. Affricates should be done with the tie, I agree.
3. Let's assume light /l/ is the UR.
If these consistencies can't be resolved in time, either they're truly
random noise that will affect all participants equally or there is some
consistencies that systems will glom onto.
In the worst case we'll put out a post-competition revision of the data set
after the fact.
It is fascinating to me how Phoible continues to be unhelpful in resolving
these issues: https://phoible.org/languages/bulg1262.
…On Tue, Apr 14, 2020 at 1:27 PM Ben Peters ***@***.***> wrote:
Similarly to the recent discussion of Georgian, I have noticed various
inconsistencies in the Bulgarian data.
1. Alveolar vs. dental t/d: [d], [d̪], [t], and [t̪] all occur in the
data. There are some near-minimal pairs like туй -> [t̪ u j] and тук -> [t
u k]. My Bulgarian-speaking consultant does not think the alveolar-dental
pairs are allophones, and at any rate their usage seems to be random.
2. Affricates: both joined forms (like t͡s) and separated ones (like t
s) occur in the data.
3. Light vs. dark L: according to this reference
<http://www.personal.rdg.ac.uk/~llsroach/phon2/b_phon/b_phon.htm>, [l]
and [ɫ] are allophones, with [l] before front vowels and [ɫ] elsewhere.
However, their occurrence in the data appears to be basically random.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OPMFZ4N7DTQWPXIV7DRMSMHRANCNFSM4MH5KNHA>
.
|
Thanks, I'll open an issue on WikiPron. My informant says it's alveodental. The reference I linked above labels them as dental, as does (Klagstad, 1958). |
Be careful with Please also note that, interestingly, the letter ч is never represented as
Regarding /l/, the situation is also not so easy. /l/ has to allophones [ɫ] and [l]. |
Affricates: both joined forms (like t͡s) and separated ones (like t s)
occur in the data.
Be careful with t + s: in some cases, it can also appear at morpheme
boundaries where it represents two phonemes (written as тс in Bulgarian
orthography). Examples include хърватски h ə r v ɑ t s k i and отсъстваха
o t̪ s ɤ s t̪ v ə x ə. When t + s corresponds to the letter ц, it is
monophonemic and should probably better be transcribed as t͡s. It is true
that this is not applied consistently in the data. Examples for the
monophonemic ц from the training data include абстракция a p s t r a k
t͡s i j ə, болница b ɔ l n i t͡s ə, администрация ə d m i n i s t r a t s
i j ə and акцент ə k t s ɛ n t.
Please also note that, interestingly, the letter ч is never represented as
t͡ʃ in the training data, but always as t ʃ, although it is clearly a
single phoneme. Examples include вечер v ɛ t ʃ ɛ r and чужденец t ʃ u ʒ d
ɛ n ɛ t s. I suppose this might be because, unlike with t + s, it does
not contrast with a biphonemic sequence t + ʃ (i.e. тш in Bulgarian
orthography), because that is not a typical phoneme sequence in Bulgarian
and does not appear even once in the training data.
Sounds like manual human intervention will be required here. This may be
out of the scope for the task.
Light vs. dark L: according to this reference, [l] and [ɫ] are allophones,
with [l] before front vowels and [ɫ] elsewhere. However, their occurrence
in the data appears to be basically random.
Regarding /l/, the situation is also not so easy. /l/ has to allophones
[ɫ] and [l].
In the syllable onset, [ɫ] is used before /u/, /ɔ/, /a/, and /ɤ/, and [l]
is used before /i/ and /ɛ/. In addition, there is a palatal /lʲ/ which is a
phoneme on its own, and which can appear in the syllable onset before /u/,
/ɔ/, and /a/.
In the syllable coda, one would typically expect to find [ɫ] according to
the rule mentioned in the initial post. However, in some cases, one finds
[l] consistently in the coda, and these are words which have a palatal /lʲ/
in other Slavic languages. Examples include болница b ɔ l n i t͡s ə
(compare to Russian больница
<https://en.wiktionary.org/wiki/%D0%B1%D0%BE%D0%BB%D1%8C%D0%BD%D0%B8%D1%86%D0%B0>)
and актуалност ə k t u a l n o s t (compare to Russian актуальность
<https://en.wiktionary.org/wiki/%D0%B0%D0%BA%D1%82%D1%83%D0%B0%D0%BB%D1%8C%D0%BD%D0%BE%D1%81%D1%82%D1%8C>)
and писател p i s a t ɛ l (compare to Russian писатель
<https://en.wiktionary.org/wiki/%D0%BF%D0%B8%D1%81%D0%B0%D1%82%D0%B5%D0%BB%D1%8C>).
To conclude, it seems that while the phoneme /l/ has the two allophones [ɫ]
and [l], the phoneme /lʲ/ has also two allophones, namely [lʲ] and [l]. The
phone [l] is thus an allophone of two different phonemes: it is an
allophone of /l/ in syllable onsets, but an allophone of /lʲ/ in syllable
codas. Since the assumed phonemic distinction between /l/ ([ɫ]) and /lʲ/
([l]) in syllable codas is not reflected in writing, it cannot be easily
checked whether the data is consistent here.
FWIW we have no desire to transcribe allophony for this task, but I do see
the issue with the conditional merger into [l] in syllable codas.
Thank you for the reports on both issues.
… |
Similarly to the recent discussion of Georgian, I have noticed various inconsistencies in the Bulgarian data.
The text was updated successfully, but these errors were encountered: