frbr: Issues in Cognitive Science talk 2011

csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

This is the landing page for Tim Lebo's 2011 talk at RPI's Issues in Cognitive Science series.

Title: A Methodological Approach to Incorporating Data-On-The-Web: Sociological and Psychological Considerations
Date: 23 March, 2011
Slides: 7 MB pdf
Department's Talk Series page (video archive finally up!)

Tim Lebo giving talk at Issues in Cognitive Science at RPI in 2011

Photo credit: Linyun Fu

Abstract

For those wishing to construct an effective visualization, an understanding of human cognition is essential. This understanding must be even more comprehensive -- and detailed -- when striving to develop an automated system to produce visualizations of a disparate body of data, for a diverse set of audiences viewing the resulting visual messages with a variety of goals within a variety of contexts. After a brief review of the ongoing objectives for such a system and why it should be based on the Resource Description Framework and Linked Data principles, I will present the data aggregation and incorporation methodology developed over the past year to address practical concerns for a wealth of real-life data. The remainder of the talk will reflect upon the sociological and psychological cognitive factors of the challenges we faced, the solutions we established, and the remaining concerns we have yet to address. http://bitly.com/lebo-cogsci-issues-2011 will be used to share additional resources for this talk.

Questions at the talk

All questions came from students in Nick Cassimatis' Human-Level Intelligence Laboratory.

Richard Caneba - Q: Do the subsequent teams have the original data available to them? A: Yes; the original data is actually part of the context (the variable *), because the implementor of F tends to use it to guide the encoding of K(ey).
Richard Caneba - Q: What happens if there become inconsistencies from multiple interpretations? Would humans or systems have to identify the inconsistencies? A: Multiple interpretations are acceptable and distinguishable, so a consumer is free to use (or not use) one or more of the elements that contribute to the inconsistency. Currently, there is no automated way to detect inconsistencies (i.e., it is up to the humans working with the results), but a system operating over the results of the current work would be reasonable.
J.R. Scally - Q: It appears that the value of f and G comes from repeated use of G; is there benefit if it is only used once? A: Yes. Because the first team needs to establish K(ey) to do their job. From our experiences with using f and G, we have found that it leads us to good interpretations (i.e., getting through the "archeological stage") more quickly because it forces us to ask more insightful questions of the data.

Misc.

This presentation continues last year's talk (streaming video @ 36'30"; 12 MB pdf slides)
The homepage for the tool implementing the methodology discussed in this talk (function G and knowledge structure key).
The US/UK foreign aid/IT R&D example discussed in the talk is also described in this use case.
A somewhat complete list of publications relating to this work is at Publications.
The Mayan 2012 Calendar cartoon is copyright 2007 by Leigh Rubin (a blog post).

Talk Narrative

Whether observing "low-level" data or one of its "higher-level" visual portrayals, the cognitive tasks an human must perform to observe, decipher, and understand any sort of message are profoundly similar. We argue that context plays the single most important role in our unceasing endeavor to understand the barrage of information artifacts that enter our environment. Using a hypothetical "relay race" of visualization development teams, we highlight the importance of context and the essential need for data -- and visuals -- to explicitly embody the appropriate associations to maximize understanding while minimizing the time and resources used to achieve that understanding. To address the ubiquitous need for context, we propose a three part contextualization paradigm, the notion of contextual depth, and the use of the web infrastructure to establish globally accessible identifiers for every element encoded as datum or portrayed as visual.

Further, we propose and describe a function G that can provide the appropriate context for any data element, given that it has the output key from the function f. While G is and should be implemented mechanically, f, on the other hand, is not and will not be in the foreseeable future. f is an intensely complex cognitive process that spans both agent (psychological) and inter-agent (sociological) factors that need to be understood at the intra-agent (componential) and substrate (physiological) levels. While G and f are currently defined and broadly understood as a data-processing phenomena, we draw analogy to the functions h and i to create and interpret visual artifacts, respectively, for human consumption. It is our hope that a detailed understanding of the functions G and f will lead naturally to an understanding of their h and i analogues.

Perhaps the most concise illustration of the import of context is shown in the following cartoon. When faced with the new visual, one must begin and continue to answer two types of questions. First, what are the things being discussed? Second, how do those things relate? Calvin, on the right, does not have access to the wealth of context available to the observer on the left and is less able to establish answers to a variety of essential questions. Unfortunately, in today's hyper-connected world created by the web, we no longer need to wait millennia to lose the context required, resulting in many effortful archeological endeavors that require significant resources with little guarantee for conclusive and accurate results.

Running out of rock -- and context.

In short, the objective of the visualization process is to change the behavior of some protagonist by using the visual medium to inform the observer about the world. Because an abundance of world-describing data exist, data themselves cannot be directly observed, and the visual modality is a predominant method of human consumption, the need for efficient and effective visualization is paramount. Phrasing visualization as a behavior controlling process where animated light is an independent variable and human behavior is a dependent variable, every frame of a visualization can -- and should -- be considered an experiment. As an experiment, a visualization embodies a design selected to test a hypothesis that can -- and should -- be evaluated for correctness.

Before exploring the consequences of visualization-as-experiment, one must deal with the practical issues of how a visual artifact is constructed in the first place. Here, we offer the design of a hypothetical race that highlights the shortcoming of many data in the world, so that we can motivate the importance of their contextualized alternatives. Team A (comprising sub-teams 1, 2, and 3) and Team B (comprising sub-teams 4, 5, and 6) will each relay race to create three identical visual artifacts for Calvin's inspection. All teams have access to three datasets provided by three distinct organizations, but cannot begin working until the previous sub-team has completed their visual. All teams also have access to the same set of APIs and tools, but do not have access to application code created by previous sub-teams. So, in a sense, each sub-team is "starting fresh". The first sub-teams (1 and 4) both begin by ingesting the three datasets, which involves some archeological endeavor as described earlier. Only after a sufficient level of understanding can the teams construct their visuals.

A race to visuals: with and without context.

Up until this point, both teams have performed identically. The one difference between Teams A and B is their treatment of data during and after the first sub-team performs their archeological endeavor. Although both teams accumulated and applied context to gain an understanding of the original data artifacts, only Team A codified these results and provided augmented alternatives (data') to the originals (data). The cascading effects of this small act can be seen when sub-teams 2 and 5 begin their leg of the relay, when sub-team 2 moves directly to creating a visual with data' while sub-team 5 falls behind while trying to figure out data. The same delay can be seen when Team 6 finally starts and would continue to accumulate with the initiation of every new visualization effort not taking advantage of data augmented with explicit context.

QR Code for this web page

Subsequent questions

Paul H:

1. How/what is f and G coded in?
1. Really like the contextual depth so my question regarding who chooses the context was the reason for my interest. Especially if it can be driven by user/task/question/etc. [varying contextual depths]
- [shallowest depth suggests] Country is of greatest import and therefore is driving the querying of the system (i.e., country questions are the task - How much does each country spend on X?) Which makes the query: Country, Quantity and Agency
- But if one asked the question in regard to Purpose "How much does each country spend on doing X" then it is Country, Quantity, and Purpose so I figure Purpose moves up to the -1 level.
1. So......how do you thing h comes out of f and G?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly