You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes documents contain non-linguistic items that are not representative of the language that we're sampling. For example, some annotations, technical tags (like html, markdown), chunks of text with mixed languages, etc. The idea is to be able to filter this noise as much as possible (it is not always possible to get fully rid of it).
The files in the talkbanks tend to have specific annotations that we might want to filter out and just extract the text:
Example 1 Source file:
por fim (.) o gato é forçado a fugir a [=? em] frente ao cão (.) ao mesmo tempo que a ave (.) pode alimentar as suas crias (.) .
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Sometimes documents contain non-linguistic items that are not representative of the language that we're sampling. For example, some annotations, technical tags (like html, markdown), chunks of text with mixed languages, etc. The idea is to be able to filter this noise as much as possible (it is not always possible to get fully rid of it).
The files in the talkbanks tend to have specific annotations that we might want to filter out and just extract the text:
Example 1 Source file:
por fim (.) o gato é forçado a fugir a [=? em] frente ao cão (.) ao mesmo tempo que a ave (.) pode alimentar as suas crias (.) .
Example 2 Source file:
唔 識 .
唔 識 玩 ?
@Situation: the child's mother asks the child to thank the investigator
Example 3 Source file
< Acmo nimolinia> [% whisper] .
@tsamardzic @christianbentz
Beta Was this translation helpful? Give feedback.
All reactions