Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deleted] Discuss language exercises based on Scribe usage #85

Closed
andrewtavis opened this issue Dec 30, 2021 · 13 comments
Closed

[Deleted] Discuss language exercises based on Scribe usage #85

andrewtavis opened this issue Dec 30, 2021 · 13 comments
Labels
blocked Another issue is blocking feature New feature or request help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Dec 30, 2021

A long term goal for Scribe is the inclusion of language exercises to practice words that a user has used Scribe for help with. This would need to be done in a way that assures that the user’s typing data is not collected, but could be done via the commandBar. Information that is displayed to a user in the command bar could be saved in the app and then used in various memory style games or spaced repetition flashcards. This feature should be set up in a way that the user can set a time limit so that it doesn't become a distraction. The user should further be given the option to have words added automatically, with the default being that they are not.

Examples:

  • Words that have their gender shown could be saved and then shown to the user again (would need a feature to say that they’ve already been mastered)
  • Words that Conjugate has been used on could have their conjugations tested (testing the one selected would require collecting usage data and should be avoided)
  • Words that Translate is used on could have their translations tested
  • Words that Plural is used on could have their plurals tested

Related words could also be tested, with the potential for natural language processing over the collected words to check for topics that a user is interested in.

This issue is to discuss the possibility of this feature being added to Scribe, which will in turn lead to issues being created for the Language Practice project.

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed question Further information is requested labels Dec 30, 2021
@andrewtavis andrewtavis changed the title Language exercises based on Scribe usage Discuss language exercises based on Scribe usage Dec 30, 2021
@andrewtavis
Copy link
Member Author

andrewtavis commented Mar 19, 2022

Something to consider is that these exercises could be used for potentially beneficial purposes in a similar way that Duolingo translations are used to translate actual documents. If Scribe adopted similar translation game structures, then these could be used to update Wikipedia or Wikidata's translations. Other methods could be devised where a user would check other features of open data. Users that are proven to be at a certain level would have their responses aggregated, with this then being used as a means of quality assurance where editors would be able to see where a potentially problematic data point is given user responses.

@andrewtavis
Copy link
Member Author

andrewtavis commented Mar 19, 2022

Spoken exercises are another thing that could be leveraged in open-source to improve open spoken language processing. Datasets for this are currently not very robust, so Scribe could dramatically broaden these by allowing people to read things that are interesting to them out loud (based on user feedback and NLP over their writing), have their pronunciations graded using automatic speech recognition, and then saving these in an open format. Anonymity would be key, with the shear size of the data also being something to consider. At scale this could be very very beneficial and allow diversification in speech recognition via such a large open corpus.

Specifically the user could also be prompted to read things in their native language from time to time, with this then serving as a basis for the corrections of second language speakers. With this in mind for speaking, a user could similarly be prompted with properties for their native language as a way to add or verify data (being shown a word and asked what its type/gender/conjugation is).

@thadguidry
Copy link

Common Voice integration? https://commonvoice.mozilla.org/en/datasets

@andrewtavis
Copy link
Member Author

andrewtavis commented Mar 19, 2022

I was actually just about to write you about the above via email :)

Thanks for the link. Do you think there's any value to the above ideas as far as integrating some of this into a pipeline where properties would be able to be made and/or validated based on advanced learners as well as native speakers being asked to do some checks as a "cost"? It'd obviously be a ton of work to get this set up, and I'm still very new to some of the Wikidata processes so maybe this isn't a good direction, but this seems like this could help.

@thadguidry
Copy link

thadguidry commented Mar 19, 2022

To me, some of that functionality might be better suited outside of Scribe as a keyboard enhancement? I could envision some human language curation working as a dedicated app, either mobile or desktop or both. Many in Wikidata already have plans for some of those kinds of apps down the road after Abstract Wikipedia and Wikifunctions come more into focus later this year and next. Your above comments are not completely setting in my brain, sorry! , and so a meetup might be better to help explain your thoughts to me or others.

@andrewtavis
Copy link
Member Author

Was kind of stream of consciousnessing for a moment, so sorry from me too as some of it maybe isn't saying what I want it to. Would be happy to discuss it in more detail in a different forum. Definitely seems to be a bit out of scope at this point especially 😄

Generally the gist of what I'm getting at is: say that we know a user's native language as well as what they're learning. In that case they'd be shown exercises for what they're learning, and as they do those they'd also at times be presented with things for their native language where their responses would be actually checking or adding Wikidata information (verb added -> no conjugations yet -> check with someone to add them). Responses from strong learners could also function in a similar way, but they'd not be as certain.

Will read about Abstract Wikipedia and Wikifunctions now :)

@thadguidry
Copy link

OK, now I understand more of what you mean. Basically using some human curation from a language learner to add back knowledge into Wikidata. Yeah, that's the sort of thing that Wikifunctions could help with a bit later on...with some apps that can use the Lexeme knowledge in Wikidata that is ever expanding from native knowledge folks.

@andrewtavis
Copy link
Member Author

Exactly :)

I'm not sure about exact planned structures of Abstract Wikipedia as I've just been looking for the last bit, but it could potentially also integrate with that as a health check of some of the functions. So an edit is made on Wikipedia, and then that edit could then serve as prompt where strong learners/native speakers in each language it's going to would be asked to translate it to check what's pushed to other Wikipedia versions.

@thadguidry
Copy link

thadguidry commented Mar 20, 2022

Abstract Wikipedia is the idea of "generating" wiki pages for languages that do not have one yet. For instance, generating a Wikipedia page in Igbo language for Mariah Carey using the English Wikipedia page for the artist. The generated pages of Abstract Wikipedia will be generated by Wikifunctions that are implemented, curated, and improved by human developers.

Where human curation comes into play will mainly happen with improving the Wikifunctions. Input, let's say in English is fed through a Wikifunction to derive a close approximation or exactly the same meaning in Igbo. How close we get to the desired Igbo will matter on the quality of the Wikifunctions, Renderers, and Constructors...all of which could have a small human curation aspect, indeed. The idea of Abstract Wikipedia is machine-generated pages. Generated through Wikifunctions, and Constructors mainly, and Rendered. For instance, one plan we have is to have a Simple English Renderer...but we could have a Klingon one just as easily from the same Constructors. It's early days, but various devs are making some prototypes and the Wikimedia Foundation with Denny Vrandečić leading the effort. You can keep up with development by reading the updates from this year 2022 by scrolling down this: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates

@andrewtavis
Copy link
Member Author

andrewtavis commented Mar 20, 2022

Ya I guess given the above and what I've further read the ideas of it working as a feedback to Wikidata don't make sense. Specifically if you assume the amount of data that'd be present by the time this feature is released there would be sufficient lexeme forms such that new ones would likely be difficult for even native speakers to understand (or just kind of random and a person wouldn't see value in it). Also the moderation that is currently present makes the health checks redundant in that with more contributors there will be more eyes to make sure everything's alright.

With Wikipedia I guess I didn't realize when reading about Abstract that it was fully function based such that there would be no modeling taking place. The above would make more sense as a validation step of a machine learning based translations. Makes sense, and thanks for the further explanation :)

@andrewtavis
Copy link
Member Author

andrewtavis commented Mar 20, 2022

So staying at the original idea where the long term goal of the language exercises would be:

  • Analyzing what a user types with a Scribe keyboard along with potentially some user input of what interests them
  • Determine the appropriate topics that the user would want to do exercises on
    • This then solves a common criticism of language apps where they're said to not help with what a learner specifically will need
  • Leveraging the Wikidata links within the data to then find sentences on interesting topics
  • Include also some refresher exercises for what the user recently needed help with

Will mark this as blocked just so we know that this is as of now not a priority, but again feedback is more than welcome 😊

@andrewtavis andrewtavis added the blocked Another issue is blocking label Mar 20, 2022
@andrewtavis
Copy link
Member Author

@thadguidry, would it make sense in your opinion to put the above in the discussion and close this, and then from there make new issues when work on this is going to be done?

@thadguidry
Copy link

Can you convert this whole issue into a discussion? That might be better. And it just stays around that way. Category of Idea might be fitting.

@scribe-org scribe-org locked and limited conversation to collaborators Mar 20, 2022
@andrewtavis andrewtavis converted this issue into a discussion Mar 20, 2022
@andrewtavis andrewtavis changed the title Discuss language exercises based on Scribe usage [deleted] Discuss language exercises based on Scribe usage Apr 8, 2023
@andrewtavis andrewtavis added the wontfix This will not be worked on label Apr 8, 2023
@andrewtavis andrewtavis changed the title [deleted] Discuss language exercises based on Scribe usage [Deleted] Discuss language exercises based on Scribe usage Feb 27, 2024
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked Another issue is blocking feature New feature or request help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants