Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Bulgarian transcriptions #9

Open
bpopeters opened this issue Apr 14, 2020 · 4 comments
Open

Inconsistent Bulgarian transcriptions #9

bpopeters opened this issue Apr 14, 2020 · 4 comments

Comments

@bpopeters
Copy link

Similarly to the recent discussion of Georgian, I have noticed various inconsistencies in the Bulgarian data.

  1. Alveolar vs. dental t/d: [d], [d̪], [t], and [t̪] all occur in the data. There are some near-minimal pairs like туй -> [t̪ u j] and тук -> [t u k]. My Bulgarian-speaking consultant does not think the alveolar-dental pairs are allophones, and at any rate their usage seems to be random.
  2. Affricates: both joined forms (like t͡s) and separated ones (like t s) occur in the data.
  3. Light vs. dark L: according to this reference, [l] and [ɫ] are allophones, with [l] before front vowels and [ɫ] elsewhere. However, their occurrence in the data appears to be basically random.
@bpopeters bpopeters changed the title Inconsist Bulgarian transcriptions Inconsistent Bulgarian transcriptions Apr 14, 2020
@kylebgorman
Copy link
Collaborator

kylebgorman commented Apr 15, 2020 via email

@bpopeters
Copy link
Author

Thanks, I'll open an issue on WikiPron.

My informant says it's alveodental. The reference I linked above labels them as dental, as does (Klagstad, 1958).

@besou
Copy link

besou commented Apr 16, 2020

Affricates: both joined forms (like t͡s) and separated ones (like t s) occur in the data.

Be careful with t + s: in some cases, it can also appear at morpheme boundaries where it represents two phonemes (written as тс in Bulgarian orthography). Examples include хърватски h ə r v ɑ t s k i and отсъстваха o t̪ s ɤ s t̪ v ə x ə. When t + s corresponds to the letter ц, it is monophonemic and should probably better be transcribed as t͡s. It is true that this is not applied consistently in the data. Examples for the monophonemic ц from the training data include абстракция a p s t r a k t͡s i j ə, болница b ɔ l n i t͡s ə, администрация ə d m i n i s t r a t s i j ə and акцент ə k t s ɛ n t.

Please also note that, interestingly, the letter ч is never represented as t͡ʃ in the training data, but always as t ʃ, although it is clearly a single phoneme. Examples include вечер v ɛ t ʃ ɛ r and чужденец t ʃ u ʒ d ɛ n ɛ t s. I suppose this might be because, unlike with t + s, it does not contrast with a biphonemic sequence t + ʃ (i.e. тш in Bulgarian orthography), because that is not a typical phoneme sequence in Bulgarian and does not appear even once in the training data.

Light vs. dark L: according to this reference, [l] and [ɫ] are allophones, with [l] before front vowels and [ɫ] elsewhere. However, their occurrence in the data appears to be basically random.

Regarding /l/, the situation is also not so easy. /l/ has to allophones [ɫ] and [l].
In the syllable onset, [ɫ] is used before /u/, /ɔ/, /a/, and /ɤ/, and [l] is used before /i/ and /ɛ/. In addition, there is a palatal /lʲ/ which is a phoneme on its own, and which can appear in the syllable onset before /u/, /ɔ/, and /a/.
In the syllable coda, one would typically expect to find [ɫ] according to the rule mentioned in the initial post. However, in some cases, one finds [l] consistently in the coda, and these are words which have a palatal /lʲ/ in other Slavic languages. Examples include болница b ɔ l n i t͡s ə (compare to Russian больница) and актуалност ə k t u a l n o s t (compare to Russian актуальность) and писател p i s a t ɛ l (compare to Russian писатель). To conclude, it seems that while the phoneme /l/ has the two allophones [ɫ] and [l], the phoneme /lʲ/ has also two allophones, namely [lʲ] and [l]. The phone [l] is thus an allophone of two different phonemes: it is an allophone of /l/ in syllable onsets, but an allophone of /lʲ/ in syllable codas. Since the assumed phonemic distinction between /l/ ([ɫ]) and /lʲ/ ([l]) in syllable codas is not reflected in writing, it cannot be easily checked whether the data is consistent here.

@kylebgorman
Copy link
Collaborator

kylebgorman commented Apr 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants