Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 3.33 KB

nlptasks.md

File metadata and controls

45 lines (36 loc) · 3.33 KB

NLP@20 class projects task list:

Most important: Work hard and have fun!

Task1(arXivAPP):

  • About:arXiv dataset and metadata of 1.7M+ scholarly papers across STEM
  • Dataset URL:https://www.kaggle.com/Cornell-University/arxiv
  • Task: Build NLP applications upon arXiv dataset, including but not limited to Question/Answering,Knowledge Graph, Visualizaion, Recommendation, Survey systems.

Task2(SWI):

Task3(NLPBooks):

Task4(miniWatson):

  • About:Watson is a famous IBM NLP project of Question/Answering for open domains.
  • Reference URL:http://brenocon.com/watson_special_issue/
  • Task: Build a miniWatson for specific domains: Computer Science, Mathmatics, ..., or your perfered domain.

Task5(AIWriter):

  • Task: Build an AI Writer that can pass turing-test.
  • Task tips: Collect as many as possible computer science research papers, extract different paper sections (abstract, introduction, model, experiment, discussion, conclusion etc.) from papers. Construct a huge sentence bank from the extracted paper sections contents. Build n-gram models for sentences (instead of words). Build sentence similarity measures to help find simlar sentences. Given an input sentence and section tag (for example: 'abstract' or 'introduction'), generate the whole section with simlar following sentences. The generated sections (for example: 'abstract' or 'introduction') should pass turing-test.

Task6(ResearchGraph):

  • Task: Build knowledge graph from given research papers.
  • Task tips: Collect as many as possible computer science research papers, extract different paper sections (abstract, introduction, model, experiment, discussion, conclusion etc.) from papers. Identify important concepts entities and relations from the extracted paper sections contents. Build knowledge graphs with the entities and relations. Build a visualization system for the KG to give interactive demonstrations.

Task7(HumanEye):

  • Task: Generate text knowledge from visual contents (images or video).
  • Task tips: Collect as much as possible visual contents (images or videos). Build a generative model (human eye) from visual contents (images or videos). The gererator should pass turing-test.

Task8(Brainstorm):

  • Task: Generate thought paths from any concept pairs.
  • Task tips: Simulate human brain's functionalities. Given any concept pairs, generate a sequence of thoughts to connect the concept pairs. The thought path should be reasonable and should form a story that can pass turing-test.

Any suggestions are welcome, current tasks may be updated and new tasks may be added in the future.