-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmics2017.tex
586 lines (530 loc) · 34.4 KB
/
mics2017.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
% This is sigproc-sp.tex -FILE FOR V2.6SP OF ACM_PROC_ARTICLE-SP.CLS
% OCTOBER 2002
%
% It is an example file showing how to use the 'acm_proc_article-sp.cls' V2.6SP
% LaTeX2e document class file for Conference Proceedings submissions.
%
%----------------------------------------------------------------------------------------------------------------
% This .tex file (and associated .cls V2.6SP) *DOES NOT* produce:
% 1) The Permission Statement
% 2) The Conference (location) Info information
% 3) The Copyright Line with ACM data
% 4) Page numbering
%
% However, both the CopyrightYear (default to 2002) and the ACM Copyright Data
% (default to X-XXXXX-XX-X/XX/XX) can still be over-ridden by whatever the author
% inserts into the source .tex file.
% e.g.
% \CopyrightYear{2003} will cause 2003 to appear in the copyright line.
% \crdata{0-12345-67-8/90/12} will cause 0-12345-67-8/90/12 to appear in the copyright line.
%
%
%---------------------------------------------------------------------------------------------------------------
% It is an example which *does* use the .bib file (from which the .bbl file
% is produced).
% REMEMBER HOWEVER: After having produced the .bbl file,
% and prior to final submission,
% you need to 'insert' your .bbl file into your source .tex file so as to provide
% ONE 'self-contained' source file.
%
% Questions regarding SIGS should be sent to
% Adrienne Griscti ---> griscti@acm.org
%
% Questions/suggestions regarding the guidelines, .tex and .cls files, etc. to
% Gerald Murray ---> murray@acm.org
%
% For tracking purposes - this is V2.6SP - OCTOBER 2002
\documentclass[12pt]{article}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textwidth}{6in}
\setlength{\textheight}{9in}
\setlength{\parindent}{0in}
\usepackage{graphicx} %For jpg figure inclusion
\usepackage{times} %For typeface
\usepackage{epsfig}
\usepackage{color} %For Comments
%\usepackage[all]{xy}
\usepackage{float}
%\usepackage{subfigure}
\usepackage{hyperref}
\usepackage{url}
\usepackage{parskip}
%% Elena's favorite green (thanks, Fernando!)
\definecolor{ForestGreen}{RGB}{34,139,34}
\definecolor{BlueViolet}{RGB}{138,43,226}
\definecolor{Coquelicot}{RGB}{255, 56, 0}
\definecolor{Teal}{RGB}{2,132,130}
%Uncomment this if you want to show work-in-progress comments
\newcommand{\comment}[1]{{\bf \tt {#1}}}
% Uncomment this if you don't want to show comments
%\newcommand{\comment}[1]{}
\newcommand{\emcomment}[1]{\textcolor{ForestGreen}{\comment{Elena: {#1}}}}
\newcommand{\todo}[1]{\textcolor{blue}{\comment{To Do: {#1}}}}
\newcommand{\tscomment}[1]{\textcolor{Teal}{\comment{Tony: {#1}}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\pagestyle{plain}
%
% --- Author Metadata here ---
%\conferenceinfo{WOODSTOCK}{'97 El Paso, Texas USA}
%\setpagenumber{50}
%\CopyrightYear{2002} % Allows default copyright year (2002) to be
%over-ridden - IF NEED BE.
%\crdata{0-12345-67-8/90/01} % Allows default copyright data
%(X-XXXXX-XX-X/XX/XX) to be over-ridden.
% --- End of Author Metadata ---
\title{Improving Clojure Error Messages for Programming Novices with clojure.spec}
%\subtitle{[Extended Abstract \comment{DO WE NEED THIS?}]
%\titlenote{}}
%
% You need the command \numberofauthors to handle the "boxing"
% and alignment of the authors under the title, and to add
% a section for authors number 4 through n.
%
% Up to the first three authors are aligned under the title;
% use the \alignauthor commands below to handle those names
% and affiliations. Add names, affiliations, addresses for
% additional authors as the argument to \additionalauthors;
% these will be set for you without further effort on your
% part as the last section in the body of your article BEFORE
% References or any Appendices.
\author{
Myeongjae Song and Elena Machkasova \\
Computer Science Discipline \\
University of Minnesota Morris\\
Morris, MN 56267\\
songx823@morris.umn.edu, elenam@morris.umn.edu
}
\date{}
\maketitle
\thispagestyle{empty}
%\alcomment{Should these say @morris.umn.edu?}
\section*{\centering Abstract}
Functional programming paradigms have been getting mainstream attention recently because of their elegant concurrency
handling and conciseness of source code.
University of Minnesota, Morris has a strong tradition of using functional programming languages, in particular those
in the Lisp family, in the introductory CS curriculum.
A Lisp language is often the very first language that our majors encounter.
Clojure is a relatively new language in the Lisp family that is becoming increasingly popular nationally
and worldwide.
While it would be desirable to use Clojure as a beginner language in CS curriculum, there
are several challenges in adopting it for novice programmers.
One of the most significant challenges is its error message system which uses terminology and concepts
that are unfamiliar to beginners.
A current project at UMM aims at developing a system of beginner-friendly error messages for Clojure.
A recently added system of contracts in Clojure, known as clojure.spec, is the evidence that the Clojure community
at large is also interested in improving error reporting.
This system came out in summer 2016, and we have been exploring its ability to facilitate more robust and
flexible error messages, particularly ones aimed at helping beginners.
This paper describes our current system of providing beginner-friendly information in error messages
and presents our recent work
on switching to clojure.spec for this purpose.
\newpage
\setcounter{page}{1}
\section{Introduction}
Clojure is a relatively new programming language in the Lisp family, introduced by a developer Rich Hickey in 2007~\cite{Hickey:2008}.
It is a Lisp language with a dynamic type system.
It compiles to Java bytecodes and runs on the JVM. This means that a Clojure program can access any Java libraries.
It quickly gained a worldwide popularity in industry and has a large and very active community of developers.
As a Lisp language, Clojure follows functional methodology with an emphasis on immutability, functional abstraction, and recursion --
concepts that are essential in CS education, as discussed in~\cite{Felleisen:2004}.
As such, Clojure could be suitable as a first language taught to computer science students, continuing a rich tradition of MIT Scheme~\cite{Abelson},
on par with the Racket programming language, also in the Lisp category~\cite{htdp}.
However, there are several issues that make introducing Clojure to beginner programmers quite challenging.
The most notorious of these issues is the lack of beginner-friendly error messages.
Our research group at University of Minnesota, Morris (UMM) has been working on developing a system of error messages
that are helpful to beginners.
As our work on this system was progressing, Clojure has introduced a new feature that would allow us to interface
with Clojure core much easier.
This feature is a system of contracts for data validation, known as clojure.spec.
Thus our current goal is to transition our older ad-hoc system of collecting information about an error
to clojure.spec.
This paper presents a comparison between our two approaches and demonstrates our current progress
in this transition, including some challenges that we have overcome.
The paper proceeds as following: Section~\ref{sec:background} introduces Clojure and clojure.spec,
Section~\ref{sec:current} presents our system of error processing before clojure.spec,
Section~\ref{sec:spec-errors} discusses the new approach, and Section~\ref{sec:conclusion} summarizes our
current progress and discusses future work.
\section{Background}\label{sec:background}
This section is an introduction to Clojure and the clojure.spec library.
\subsection{Clojure}\label{sec:clojure}
Clojure is a Lisp-dialect with some distinct features in comparison to other popular languages, such as Java or Python.
One such feature is the use of prefix notation in Clojure syntax. In Clojure, every instruction or expression is enclosed in parentheses.
To invoke a function, you need to specify a function name right after the opening parenthesis and then list all arguments before the
closing parenthesis. Parentheses are even required for functions that do not take any arguments.
A simple addition of 3 and 4 in Clojure looks like:
\begin{verbatim}
(+ 3 4)
\end{verbatim}
In Clojure, \texttt{+} is a just function, not a special operator, and the same is true of what would be other common operators
in other languages. This syntax might look odd, but it is very efficient and easy to parse for machines
since the system does not need to
calculate operator precedence. Moreover, it easily accommodates a variable number of arguments, which means
you can provide any number of arguments to some functions, such as the {\tt +} function. For instance, both {\tt (+ 3 4 5 6)}
or {\tt (+ 3 4 5 6 10 15)} work.
%\emcomment{Give an example of a variable number of args}\tscomment{Resolved?}
%\emcomment{Need to explain in words what that means}
%\tscomment{Resolved I think}
Defining a function is very intuitive in Clojure. Inside a pair of parentheses, you list \texttt{defn}, the function name,
the argument list, and the expression.
An example of a function that multiplies a given number by two looks like:
\begin{verbatim}
(defn multiply-by-two [n] (* n 2))
\end{verbatim}
In this example:
\begin{itemize}
\item {\tt defn} indicates the declaration of a function.
\item {\tt multiply-by-two} is the function name.
\item \texttt{[n]} is the argument list: there is only one argument named {\tt n}.
\item {\tt (* n 2)} is the expression that multiplies n by 2. The result of this expression will be returned from the function.
\end{itemize}
Calling this function with an argument 3 would look like this: {\tt (multiply-by-two 3)}, and returns 6.
% The following are the function call and its result. {\tt =>} is just a notation that we are using to show
% what the function returns.
% \begin{verbatim}
% (multiply-by-two 3)
% => 6
% \end{verbatim}
% \emcomment{show the function call and its result. }
% \tscomment{Resolved?}
Another functional programming feature that Clojure adopted is immutable variables. In Clojure, once you bind a
name to a value, you cannot change what the name is bound to.
% \emcomment{It's not just that you can't change the value, you cannot change what the variable is bound to. }
% \tscomment{Resolved?}
This is the main source of confusion for developers from imperative programming backgrounds since they are used to
changing variables to store different values.
%\emcomment{immutable variables and data structures are actually slightly different, need to elaborate on this}
%\tscomment{Changed wording}
The main benefit of using immutable variables is minimizing side effects, which means
developers do not need to worry about variables being modified from unanticipated operations.
There are mutable variables in Clojure, but they need to be explicitly declared and modified with special syntax.
Thus, side effects are not as serious concern as in imperative programming languages.
% \emcomment{variables are immutable by default; there are mutable variables, and they need to be explicitly declared as such - need to clarify this,
% otherwise our discussion of an atom doesn't make sense}
% \tscomment{resolved}
Among various data structures that Clojure supports, sequences are unique in
the way they behave. In particular, a sequence can be lazy. In a lazy sequence, element(s) do not exist when the sequence is first created, but they
can potentially be computed later when they are needed.
% \emcomment{will be computed later as needed? }
% \tscomment{Resolved?}
In other words, lazy sequences are like a set of rules capable of generating as many elements as are needed.
Such sequences just exist as a lazy sequence type until some functions force them to be evaluated. Because of its laziness, it is possible
for a lazy sequence to represent an infinite sequence. The following is a function that returns a lazy sequence representing the
Fibonacci sequence. In this case, the starting values are {\tt n1} and {\tt n2}. Theoretically, it could generate elements of the Fibonacci
sequence indefinitely.
\begin{verbatim}
(defn fibonacci [n1 n2]
(lazy-seq (cons n1 (fibonacci n2 (+ n1 n2))))
\end{verbatim}
Another commonly used Clojure data structure is hash maps. It is essentially a collection of key - value pairs.
It is very useful when representing data with relationships. Following is an example of a hash map to
represent a college course: course number is paired with a course name.
The key and value are separated by a space, and each key - value pair is
separated by a comma.
\begin{verbatim}
{:csci3501 "Algorithms", :csci2101 "Data Structures"}
\end{verbatim}
In this example:
\begin{itemize}
\item The curly braces represent that this is a hash map.
\item {\tt :csci3501} and {\tt :csci2101} are the keys.
\item {\tt "Algorithms"} and {\tt "Data Structures"} are the values.
\end{itemize}
\subsection{Introduction to clojure.spec}\label{sec:intro-spec}
The cloure.spec library introduced in Clojure 1.9 in May 2016~\cite{spec}, and technically is still in development. It is a contract system that allows
Clojure developers to specify and validate expected data
%\emcomment{I'd say, expected data, not (just) data types}\tscomment{Resolved?}
at runtime. In addition to data validation, clojure.spec provides easy-to-parse error messages when the data is invalid.
To start using specs, we need to define a spec first. Each spec is just a specification for data.
You can use \texttt{s/def} to define specs.
The following are some simple spec definitions with Clojure predicates:
\begin{verbatim}
(s/def ::check-int integer?)
(s/def ::check-function ifn?)
\end{verbatim}
You can use the above spec definitions to check data types with {\tt s/valid?}, which returns a boolean.
If the data satisfies the spec, {\tt s/valid?} returns true. Otherwise, it returns false. For example,
{\tt (s/valid? ::check-int 3.5)} will return {\tt false} because {\tt 3.5} is not an integer number.
It is possible to assemble simple specs to create a more complicated spec. For instance, we can write a spec that verifies
the argument types of a function by combining \texttt{::check-int} and \texttt{::check-function} with \texttt{s/cat}
which stands for concatenation.
\begin{verbatim}
(s/def ::check-int-function
(s/cat :first-arg ::check-int
:second-arg ::check-function))
\end{verbatim}
If the above spec is attached to a function {\tt my-function} that takes two arguments,
{\tt (my-function 4 +)} will be valid
because the first argument {\tt 4} is an integer and the second argument {\tt +} is a function.
One really powerful feature of spec is the use of regular operators such as \texttt{s/cat}, \texttt{s/*}, and \texttt{s/?}.
As we have mentioned above, {\tt s/cat} stands for concatenation. \texttt{s/*} represents zero or more repetitions of a data pattern, and \texttt{s/?} represents zero or one of a data pattern.
Using these regular operators, we can describe a variety of data structures, including lazy sequences and nested data.
To define input and output specifications for functions, clojure.spec uses \texttt{s/fdef}. Even though it is possible to
define the output spec for a function, it is only usable in spec testing. For our purposes, we will be focused on the input
verification. To start spec input validation of a function, it should be turned on by \texttt{(stest/instrument `function-name)}.
Otherwise, the spec is completely ignored at runtime. Let's take a look at an example of \texttt{s/fdef}. The following
is a simplified definition of \texttt{even?} function and a possible function spec for it. Whenever \texttt{even?} function
is called, the spec is invoked and checks if the argument is an integer.
\begin{verbatim}
(defn even? [n] (= (mod n 2) 0))
(s/fdef even?
:args (s/cat ::check-int integer?))
;; turn on the function spec
(stest/instrument 'even?)
\end{verbatim}
The spec will allow the function to be called only when its parameter is an integer. Any other type of the parameter would
produce a run-time error.
\section{Clojure error messages}\label{sec:current}
\subsection{Overview of Clojure error messages and our approaches}
Since Clojure compiles to Java bytecodes and runs on the JVM, Clojure error messages are just Java exceptions
and uses Java datatypes and terminology.
For instance, {\tt (+ 2 true)} is an attempt to add a number and a boolean.
Since Java does not automatically convert booleans to numbers, this code results in an error:
\begin{verbatim}
ClassCastException java.lang.Boolean cannot be cast
to java.lang.Number
clojure.lang.Numbers.add (Numbers.java:128)
\end{verbatim}
This is quite cryptic for beginners since they are not familiar with a term ``class'' for a datatype,
``casting'' for type conversion, and ``exception'' for an error. Moreover, even if a term ``boolean''
has been introduced to them for true/false values, the prefix {\tt java.lang} does not
correspond to their experience. Also, note that the class {\tt Number} has a prefix {\tt clojure.lang}
which is different from {\tt java.lang} . To worsen the confusion, instead of referring
to the function {\tt +}, the error message refers to the method {\tt add}, and the class
in which it is defined is {\tt clojure.lang.Numbers}. Those familiar with Java may identify this
method as a static method (mostly based on the naming convention: {\tt Numbers} is pluralized).
For new Clojure programmers not familiar with Java, however, the message carries very little, if any, information.
Even if these new programmers memorize common error messages terminology after a while,
it would still not have any grounding in their experience.
Clearly, this situation is not acceptable in an introductory computer science class. Thus, one of the directions of our project is to
provide a set of error messages that would be more consistent and meaningful in the context of
novice programmers experience.
Our error messages translate standard Clojure error messages into terms that are less confusing for beginners.
For example, the above-mentioned expression {\tt (+ 2 true)} would result in the error message
\begin{verbatim}
In function "+" the second argument "true" must be a number,
but is a boolean.
\end{verbatim}
In this error message it is clear what function is being called, which argument is causing a problem,
what is expected, and what is given.
Our approach to replacing standard error messages has two main cases and a default case:
\begin{enumerate}
\item For commonly used functions we provide a direct assertion to check if we are passing the right number and types
of parameters, and report an error if we do not. These errors are more informative than default errors.
\item For other cases we perform a lookup of an error message in our ``dictionary'' using pattern-matching,
and replace the wording so that it is more understandable to beginners.
\item For rare cases that are not listed in the dictionary we have no choice but to report an error as it is.
\end{enumerate}
For this paper we are focusing on the first case: providing an assertion to check the number and type of the
arguments. We discuss how we have been handling this situation in the past and how using clojure.spec
allows us to collect the same information as before (and sometimes more) in a way that is less error-prone.
\subsection{Assertions for common Clojure functions}\label{sec:assert}
The approach to providing assertions to check function arguments that we used prior to
incorporating clojure.spec was to write our own function with the same name as a standard one,
provide a precondition for it to check if the arguments are valid, and if they are, call
the function provided by core Clojure. Here is an example of this approach used for
function {\tt +} that specifies that it can take any number of arguments, but all
of these arguments must be numbers:
\begin{verbatim}
(defn + [& args]
{:pre [(check-if-numbers? "+" args 1)]}
(apply clojure.core/+ args))
\end{verbatim}
In this example:
\begin{itemize}
\item the name of the function is {\tt +},
\item the {\tt [\& args]} is the arguments list. The {\tt \&} sign indicates that what follows is
a list of arguments of arbitrary length.
\item The next line (with {\tt :pre} in curly braces) is the precondition, followed by a list
(in square brackets) of conditions to check.
\item In this case there is only one condition to check. It is given by our own
function {\tt check-if-numbers?}
and passing to it {\tt args} (the arguments passed to {\tt +}), the name of the function we
are checking, which is ``+'', and the starting number of the arguments being checked
(in this case we are checking all of the arguments, so the starting number is {\tt 1}).
\item If {\tt check-if-numbers?} returns true, the body of the overwritten function {\tt +}
gets executed. If it return false, an assertion error is thrown by the precondition.
\item The last line {\tt (apply clojure.core/+ args)} applies the {\tt +} function in
\newline
{\tt clojure.core} to the list of arguments. One can think of {\tt clojure.core/+} as
a path to the function {\tt +} in the core package of Clojure.
\end{itemize}
Clojure preconditions don't record enough information to generate a helpful error message.
In particular, they don't record the name of the function they are attached to or the argument
number. Thus our custom-made predicates, such as {\tt check-if-numbers?}, record all
necessary information about failing arguments to augment the preconditions.
Before spec there was no good way of packaging necessary information into a failed assertion
exception itself, so we used a global variable (known as an {\it atom} in Clojure) to store this
information. The stored information included:
\begin{itemize}
\item The type that the predicate is checking for, such as a number in the case shown above,
\item The actual type of the argument,
\item The value of the argument,
\item The name of the function for which the assertion fails,
\item The number of the argument, such as first, second, etc.
\end{itemize}
%\emcomment{fix formatting!}
Revisiting the example above, the call {\tt (+ 2 true)} will make the predicate
\newline
{\tt check-if-numbers?} be called with the function name {\tt "+"}, the arguments list
{\tt (2, true)}, and the starting number of an argument: 1.
The predicate then checks the first argument in the list, {\tt 2}, and it is indeed a number.
Then a recursive call is made with the second argument, {\tt true}. The argument number is
incremented by 1 to indicate that it is a second argument. Since {\tt true} is not a number,
failure information is recorded in the global variable: the function name is {\tt +}, the ``offending
value'' is {\tt true} and its type is {\tt boolean}, the expected type is {\tt number}, and the argument
number is {\tt 2}.
After the information is recorded, the predicate returns {\tt false} to indicate that the precondition
has failed. Because the precondition failed, an exception {\tt AssertionError} is thrown to the
place of the program that has made the call to {\tt +} with a wrong argument.
The exception is then caught. Based on its type {\tt AssertionError}, it is passed to a handling function
that retrieves the information from the global variable and forms the error message:
\begin{verbatim}
In function "+" the second argument "true" must be a number,
but is a boolean.
\end{verbatim}
The global variable is then ``cleared'', i.e. the information is deleted from it. This is done so that this
information is not accidentally retrieved later when it would become irrelevant and confusing.
In addition to {\tt check-if-numbers?} function, we have similar predicates to check if an argument is
a function or a sequences or any other type that may be required. Note, however, that there are
not as many type requirements in Clojure as in a statically typed language, such as Java, so less than
a dozen such predicates are needed.
While this approach produces a correct error message, it has drawbacks: using a global variable to
pass this information from one part of the program to another may be problematic in a multithreaded
program. This process would also be fragile if multiple errors are taking place in a cascading fashion,
i.e. handling one error triggers another. The switch to clojure.spec eliminates potential inconsistencies
between the global variable and the error being handled.
\subsection{Lazy sequences and challenges in argument printing}
As mentioned in Section~\ref{sec:clojure}, Clojure has lazy sequences. Lazy sequences are values in
Clojure, so they can be passed to other functions unevaluated until their elements are needed.
This means that if we pass a lazy sequence (for instance, one generated by a function {\tt (constantly 5)} that
produces a lazy infinite sequence of 5s) instead of a number, the value will not be evaluated when an
error is detected, it will just be kept as a lazy sequence.
However,
one of the functions that forces an evaluation of a lazy sequence, is {\tt print}. Thus we have to be careful to not print
an infinite (or just very long) sequence accidentally when printing the error message.
We handle this issue with a preview function that evaluates only the first up to 10 elements of the sequence.
Switching from this approach to clojue.spec has created some issues that we will discuss in Section~\ref{sec:approach}
%\tscomment{Spec-specific issues (inline-functions) will be explained in 'Approaches to improve error messages'. I think
%just mentioning why checking lazy sequences is hard (we need to evaluate some for checking) is sufficient?}
\section{Improving error messages with clojure.spec}\label{sec:spec-errors}
\subsection{Benefits of using clojure.spec}
If there is no spec attached to a function, and the function fails,
it throws an error which has the information about the type of error, error message, and stack
trace. The value that caused the error is rarely given in the error message.
Let's say a user tried to add {\tt 2} to {\tt true}. Because addition only works for numbers,
{\tt true} of type boolean is the value that caused the error or the 'failing value' in this case.
Even when the failing value is given, we still need to parse the specific value out of
the error message string. The same problem occurs when it comes to the line number indicating where it failed.
Because an accurate line number is not given in the error message, users need to go through the stack trace to find
where the error actually happened. We used to handle these issues by providing assertions as we explained in Section~\ref{sec:assert}.
Unlike default Clojure errors, if a spec-attached function fails, an exception object
\newline
{\tt clojure.lang.ExceptionInfo} is thrown.
This exception object has a hash map of failure information, which can easily be
parsed with Clojure {\tt ex-data} function. A simplified hash map of a spec error
for the {\tt (even? false)} function call might look like:
\begin{verbatim}
{:clojure.spec/problems [:pred integer?, :val false, :in [0]}],
:clojure.spec/args (false),
:clojure.spec.test/caller {:file "spec_error.clj", :line 87}}
\end{verbatim}
The stored information included:
\begin{itemize}
\item {\tt :clojure.spec/problems} containing the predicate, failing value, and location of the value
(the argument number, in this case 0, which indicates that it is the first argument).
\item {\tt :clojure.spec/args} containing all the arguments of the function.
\item {\tt :clojure.spec.test/callers} containing the file name and line number.
\end{itemize}
One prerequisite to provide user-friendly error messages is having enough information about the function failure. Since
default Clojure error does not provide sufficient data such as the name of failing function or the failing value, our
research group had to use a global variable to store this information, as we explained in Section~\ref{sec:assert}.
With clojure.spec, we no longer need a global variable because the hash map of {\tt clojure.lang.ExeptoinInfo} object
has much more information about the
failure compared to the default Clojure error. The information that the embedded hash map provides, contains which predicate
failed, what the failing value was, and what the number of the argument was (if the function takes multiple arguments).
In addition, they can be easily parsed, as we mentioned earlier.
\subsection{Approaches to improve error messages}\label{sec:approach}
With clojure.spec, we have used two different approaches to improve error messages. One way is using a function spec.
If function specs are attached to the Clojure core functions, the spec error is thrown when some core function
call fails. Then, we can simply parse necessary information from the hash map embedded in the spec error, and
process it before we show them to the user. For instance, {\tt min} function
takes one or more numbers, and returns the smallest number. We can verify the input of the {\tt min} function by providing a
function spec like the following:
\begin{verbatim}
(s/fdef clojure.core/min
:args (s/cat :check-numbers (s/+ number?)))
\end{verbatim}
This function spec definition checks if {\tt clojure.core/min} function takes at least one argument
and if all arguments are of type {\tt number}.
The main advantage of using a function spec is that we do not need to overwrite Clojure core functions. We can simply attach our
function specs to the Clojure core functions. It is also very flexible in that it can be turned off if necessary.
We eventually decided to use another approach using spec assert, after we
discovered some issues with the simpler approach of just using a function spec.
One serious issue is that function specs do not work for inline functions, which are a type of function
that is inserted by the complier when they are used.
Many of Clojure core arithmetic functions are inline functions for better performance.
Beginners often use arithmetic functions, including {\tt +}, {\tt -}, and {\tt *},
so not being able to use specs for them is a huge downside of the function spec.
Moreover, function specs have problems dealing with lazy sequences. When a lazy sequence,
that has an error inside, is passed to the function as an argument, the Clojure default error is thrown.
For example, let's revisit the {\tt even?} function that we explained in Section 2.2.
The {\tt even?} function has an attached spec.
\begin{verbatim}
(even? (map even? [3 4 true]))
\end{verbatim}
The function {\tt map} normally takes two arguments, a function and a collection of elements.
What it does is simply apply the given function to each of the collection's elements.
If there is an expression {\tt (map even? [1 2 3])}, the {\tt even?} function is applied to 1, 2, and 3.
As a result of its computation, {\tt map} returns a lazy sequence of false, true, and false
because only 2 is an even number.
Let's go back to the above example using two {\tt even?} functions.
First, the argument of outer {\tt even?} is checked. Then, the lazy sequence {\tt (map even? [3 4 true])} is
evaluated in some internal Clojure spec checking space. The problem is the function spec for {\tt even?} does not exist
when the inner {\tt (even? true)} is evaluated. Therefore, the default {\tt java.lang.IllegalArgumentException} is thrown.
Because of the issues mentioned above, our research team is currently using {\tt s/assert} in lieu of the function spec.
This approach is very similar to our pre-spec approach that uses {\tt :pre} and requires overwriting
Clojure core functions. However, we can solve the issues concerning the inline functions in that
the {\tt s/assert} checks the overwritten function's input before the call to the core function is made.
Following is the overwritten {\tt min} with spec assert. The function {\tt do} just evaluates
the expressions in order, which means {\tt s/assert} is checked prior to the function call to {\tt clojure.core/min}.
\begin{verbatim}
(defn min [arg1 & args]
(do
(s/assert (s/cat :check-numbers (s/+ number?)) [arg1 args])
(apply clojure.core/min arg1 args)))
\end{verbatim}
Just like the function spec, {\tt s/assert} starts validating data once the switch is turned on by {\tt (s/check-asserts true)}
Because our {\tt s/assert} is embedded inside the overwritten function, lazy sequence checking is not
an issue anymore as in function spec.
\section{Conclusions and future work}\label{sec:conclusion}
\subsection{Conclusions}
Clojure.spec provides a more appropriate approach for generating meaningful error messages than our
previous ad-hoc approach. It automatically collects information necessary to provide useful information
about causes of errors. It is also significantly more flexible: we can easily control what kind of information
we present to programmers. This opens a possibility of creating multiple levels of error messages,
depending on programmers' experience.
In the process of working with clojure.spec we have resolved several technical challenges, such as
dealing with inlined functions and lazy sequences.
There are still a few unresolved issues, in particular related to multiple errors (errors that happen
when evaluating a failing value for another error).
However, with clojure.spec being still in development
we may just need to wait until they are resolved by the Clojure community.
Overall, using clojure.spec has a promise of creating a robust and customizable system of error
handling that relies on features embedded in the language itself, rather than in a separate system.
\subsection{Future work}
There are still many challenges to overcome before Clojure can be easily adopted
in an introductory CS classroom. One of the future directions of our work is to continue evaluating what
error messages are helpful to beginners. Another significant roadblock is a lack of a beginner-friendly IDE
for Clojure, despite the effort of the Clojure community to develop one. Another challenge is to figure out which
features of Clojure should be available to beginners: another relatively new Clojure feature, known as transducers,
makes certain expressions that beginners can write by mistake be syntactically valid, but almost certainly unintended
and confusing. Whether it is possible to ``hide'' these features from beginners, ideally by using clojure.spec,
is a subject of future work.
\bibliographystyle{acm}
\bibliography{mics2017}
\end{document}