Course Objective
The objective of the Knowledge and Data course is to make students acquainted with methods and technologies used for expressing knowledge and data, in particular on the Web. By the end of this course, students will have built an intelligent web application that queries and reasons over integrated knowledge from various sources obtained from the Web. All of this will be based on formal logic theory. Knowledge and understanding: at the end of the course, students will be familiar with: Theory of Data, Information and Knowledge Predictable inferencing and formal systemsLinked Data and Knowledge GraphsSemantic Web technology stack (RDF, RDFS, OWL)Ontology EngineeringKnowledge-driven Data Science Application of Knowledge and Insights: students will be able to:Represent knowledge and data in various formalisms (RDF, RDFS, OWL)Implement basic (RDFS) reasoning, Develop advanced knowledge models in RDFS and OWLWork with SPARQL for querying (distributed) knowledge graphsIntegrate acquired knowledge in an intelligent semantic data driven application. Judgement: Students will be able to assess the value of available datasets and ontologies for web applications, and to choose the appropriate technology for a specific application. Communication: Students are able to write a report about a developed application. Learning skills: The skill to acquire and apply knowledge and skills about fundamental knowledge representation concepts as well as state-of-the art technology, both individually as in a group context.
Course Content
In this course, we study formalisms that are useful and necessary to represent knowledge and data, in particular when these knowledge and data are to be reused, e.g. published and consumed on the Web. We introduce the concept of Knowledge Graphs, the technologies and representation formats (RDF, RDFS, OWL) for expressing semantics and linked data in a web-accessible format, use the SPARQL query language to query over this data. We finally build a data science application that uses integrated data for some intelligent task. Even though content on the web is generally produced from structured data sources (databases), its representation is in a form that is meant for human consumption. Linked Data allows to scale the walls of this siloed information space, by reusing identifiers and vocabularies across these datasets, and presenting that information in a way that is appropriate for machine consumption. Google, Bing and Yahoo already use this type of linked, structured information to improve web search and information retrieval. But it also helps content providers, such as the BBC, to better augment their content with content from other sources (e.g. from Musicbrainz).