Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The clause ... cannot be reached" when matching on list patterns #405

Open
erszcz opened this issue Apr 2, 2022 · 10 comments
Open

"The clause ... cannot be reached" when matching on list patterns #405

erszcz opened this issue Apr 2, 2022 · 10 comments

Comments

@erszcz
Copy link
Collaborator

erszcz commented Apr 2, 2022

Even with #404 when running make gradualize we get this report:

ebin/constraints.beam: The clause on line 43 at column 1 cannot be reached

The problematic code from constraints.erl when minimised gives the following example:

-spec i([atom()]) -> ok.
i([]) -> ok;
i([Cs]) -> ok;
i([C1, C2 | Cs]) -> ok.

The thrown error is:

check_clause(_Env, [?type(none)|_], _ResTy, {clause, P, _Args, _Guards, _Block}, _) ->
    throw({type_error, unreachable_clause, P});

Which means that if the argument type is none when checking a clause, it's considered as unreachable. Now, let's see how the argument type is refined as we check subsequent clauses:

  1. We start with type(list, [atom()]) from the function spec.
  2. After the first clause matching on an empty list the type gets refined to type(nonempty_list, [atom()]). The pattern of the second clause is inferred as type(nonempty_list, [atom()]). This leads to the following refinement after the second clause (1st param is the type being refined, 2nd param is the second clause pattern type):
{trace,<0.86.0>,call,
    {typechecker,refine_ty,
        [{type,0,tuple,[{type,0,nonempty_list,[{type,0,atom,[]}]}]},
         {type,0,tuple,[{type,0,nonempty_list,[{type,0,atom,[]}]}]},
         #{{type,0,tuple,[{type,0,nonempty_list,[{type,0,atom,[]}]}]} => {}},
         {env,
             #{{i,1} =>
                   [{type,
                        {7,8},
                        bounded_fun,
                        [{type,
                             {7,8},
                             'fun',
                             [{type,
                                  {7,8},
                                  product,
                                  [{type,{7,9},list,[{type,{7,10},atom,[]}]}]},
                              {atom,{7,22},ok}]},
                         []]}]},
             #{},#{},
             #{module => list_exhaustiveness_checking_regressions,
               records => #{},types => #{}},
             false,false,true,30,
             [{type,
                  {7,8},
                  bounded_fun,
                  [{type,
                       {7,8},
                       'fun',
                       [{type,
                            {7,8},
                            product,
                            [{type,{7,9},list,[{type,{7,10},atom,[]}]}]},
                        {atom,{7,22},ok}]},
                   []]}]}]}}
{trace,<0.86.0>,return_from,{typechecker,refine_ty,4},{type,0,none,[]}}

This means that the input type to the third clause is type(none). Without using extra information about the list pattern length we won't be able to disambiguate between types of the second and third clauses of the function.

I'm not sure what the right thing to do is:

  1. Withhold from reporting a warning if we know the function has more than two clauses? We would report less errors, but at least we would not report false negatives.
  2. Do not report unreachable_clause at all? That would be a significant loss of useful functionality (this warning is one of the core benefits shown in Gradient's TypedServer demo) :(
@erszcz
Copy link
Collaborator Author

erszcz commented Apr 2, 2022

Apparently, this false negative is quite common in real world code. It pops up in Gradualizer results on two out of three OTP stdlib files which https://github.com/vrnithinkumar/ETC was benchmarked on: orddict and ordsets.

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 3, 2022

This is quite a common pattern in general. Examples in Gradualizer itself:

$ make gradualize | grep "cannot be reached"
ebin/constraints.beam: The clause on line 43 at column 1 cannot be reached
ebin/gradualizer_db.beam: The clause on line 356 at column 1 cannot be reached
ebin/gradualizer_int.beam: The clause on line 175 at column 1 cannot be reached
ebin/typechecker.beam: The clause on line 767 at column 17 cannot be reached
make: *** [gradualize] Error 1

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 3, 2022

It's possible to tell from the pattern if it matches on a fixed-size list or on a list of any length - this could be used to make the refinement more precise:

3> merl:quote("fun ([a]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{nil,1}}],
                         [],
                         [{atom,1,ok}]}]}}
4> merl:quote("fun ([a | L]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{var,1,'L'}}],
                         [],
                         [{atom,1,ok}]}]}}
5> merl:quote("fun ([a | R]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{var,1,'R'}}],
                         [],
                         [{atom,1,ok}]}]}}
6> merl:quote("fun ([a | _]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{var,1,'_'}}],
                         [],
                         [{atom,1,ok}]}]}}
7> merl:quote("fun ([a, b | _]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{cons,1,{atom,1,b},{var,1,'_'}}}],
                         [],
                         [{atom,1,ok}]}]}}
8> merl:quote("fun ([a, b]) -> ok end").
{'fun',1,
       {clauses,[{clause,1,
                         [{cons,1,{atom,1,a},{cons,1,{atom,1,b},{nil,1}}}],
                         [],
                         [{atom,1,ok}]}]}}

Let's write list(T, 4) for a 4 element long list of elements of type T and list(T) for an arbitrary length list of elements of type T. Then:

  • refine_ty(list(T), list(T, N)) -> list(T)
  • refine_ty(list(T, N), list(T, N)) -> none()
  • refine_ty(list(T, N), list(T)) -> none() -- this clause is not likely to be needed, as it's not possible to express the length constraint using Erlang's type syntax; see below for the rest of the rationale.

We'd still have a problem with refine_ty(nonempty_list(T), list(T, N)) as we don't know the length of the non-empty list :| Defining it to refine_ty(nonempty_list(T), list(T, N)) -> list(T) seems to be the best bet. Then, the original example:

-spec i([atom()]) -> ok.
i([]) -> ok;
i([Cs]) -> ok;
i([C1, C2 | Cs]) -> ok.

Would be refined as follows:

list(atom) -- we start with the spec
list(atom()) \ nil() -> nonempty_list(atom()) -- we refine with the first clause pattern []
nonempty_list(atom()) \ list(atom(), 1) -> list(atom()) -- we refine with the fixed-length pattern [Cs] from the second clause
list(atom()) \ list(atom()) -> none() -- we refine with the arbitrary (but at least 2 element long) length pattern [C1, C2 | Cs] from the third clause
none() -- all clauses are reachable and exhaust the input type list(atom)

Now back to why refine_ty(list(T, N), list(T)) -> none() is not likely to be needed -- for this clause to be useful, our list(T, N) type synthesised from a pattern would have to appear as a type refined with another type (i.e. 1st param of refine_ty). However, the synthesised types are usually used as the types we refine with (i.e. 2nd params of refine_ty), not as the types being refined. Of course, I don't have all the code paths in my head to guarantee this being the case - just thinking out loud.

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 3, 2022

@zuiderkwast @berbiche WDYT?

@zuiderkwast
Copy link
Collaborator

zuiderkwast commented Apr 3, 2022

I think we have discussed this long ago: extending the type language with fixed-length lists.

I think we should avoid extending the type language seen by the user. We do need to distinguish between list patterns that exhausts a list type and one that doesn't though, internally.

A pattern like [A] is never exhaustive for any type, so it cannot be used for refinement nor for exhaustiveness, but [A | B] exhausts a non-empty list type if the variables are free. Can't we implement this without extending the type language with fixed-lengh lists?

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 3, 2022

I think we should avoid extending the type language seen by the user. We do need to distinguish between list patterns that exhausts a list type and one that doesn't though, internally.

👍

Can't we implement this without extending the type language with fixed-lengh lists?

I'm not sure yet, but I think we can. We'll see if that pans out, but I hope so!

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 5, 2022

Let's try another refinement example:

-spec j([atom()]) -> ok.
j([]) -> ok;
j([C1, C2 | _]) -> ok;
j([Cs]) -> ok.

The refinement would go as follows:

list(atom) -- we start with the spec
list(atom()) \ nil() -> nonempty_list(atom()) -- 1st clause
nonempty_list(atom()) \ nonempty_list(atom()) -> none() -- 2nd clause
none() \ list(atom(), 1) -> ?

This manifests in practice with (using erszcz@0b3bec6):

17:39:49 erszcz @ x6 : ~/work/erszcz/gradualizer (fix-list-exhaustiveness-checking-regressions-5 %)
$ nl -b a test/should_pass/list_exhaustiveness_checking_regressions.erl
     1	-module(list_exhaustiveness_checking_regressions).
     2
     3	-export([
     4	         %f/2, g/1, h/1,
     5	         i/1,
     6	         j/1
     7	        ]).
     8
     9	-spec i([atom()]) -> ok.
    10	i([]) -> ok;
    11	i([Cs]) -> ok;
    12	i([C1, C2 | Cs]) -> ok.
    13
    14	-spec j([atom()]) -> ok.
    15	j([]) -> ok;
    16	j([C1, C2 | _]) -> ok;
    17	j([Cs]) -> ok.
17:39:56 erszcz @ x6 : ~/work/erszcz/gradualizer (fix-list-exhaustiveness-checking-regressions-5 %)
$ ./bin/gradualizer test/should_pass/list_exhaustiveness_checking_regressions.erl
test/should_pass/list_exhaustiveness_checking_regressions.erl: The clause on line 17 at column 1 cannot be reached

What if we could store the length of the list type inferred from a list pattern?
Here, instead of using nonempty_list(A) we use list(A, 1+) for a list of length 1 or more and list(A, 1) for a list of length 1.

list(atom(), 0+) -- we start with the spec
list(atom(), 0+) \ list(top(), 0) -> list(atom(), 1+) -- 1st clause
list(atom(), 1+) \ list(atom(), 2+) -> list(atom(), 1) -- 2nd clause
list(atom(), 1) \ list(atom(), 1) -> none() -- 3rd clause
none() -- hooray \o/

@zuiderkwast
Copy link
Collaborator

zuiderkwast commented Apr 5, 2022

Yes, if we encode length in the list type, we could do exhaustiveness checking in this case.

If we don't do it, we can still create a sound type system and just not do exhaustiveness-checking in this case, if we consider exhaustiveness-checking a bonus, i.e. not a guarantee for all lists of clauses.

The error is that we perform the 2nd refinement. We shouldn't, because [C1, C2 | _] does not exhaust nonempty_list(atom()).

The example again:

-spec j([atom()]) -> ok.
j([]) -> ok;
j([C1, C2 | _]) -> ok;
j([Cs]) -> ok.

This is what we should do (if we don't introduce list(T, N+), etc.):

  1. list(atom) -- we start with the spec
  2. list(atom()) \ nil() -> nonempty_list(atom()) -- 1st clause
  3. nonempty_list(atom()) \ none() -> nonempty_list(atom()) -- 2nd clause: add_type_pat returns a tuple with none() as the 1st element (see Fix some self-gradualization errors #408 (comment)) and disables exhaustiveness checking for the list of clauses, since if we don't keep track of list lengths we can't know if the clauses are exhaustive.
  4. nonempty_list() \ none() -> nonempty_list() -- 3rd clause (same reasoning as 3)
  5. No exhaustiveness error, since we flagged the list of clauses as exhaustiveness unknown in step 3.

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 5, 2022

Hmm, indeed, it might make sense to disable it for lists 🤔 Exhaustiveness checking makes the most sense for variant types anyway.

@zuiderkwast
Copy link
Collaborator

I think we can do exhaustiveness-checking for lists in some cases, but not always. For example, we can do it for [] and [_|_].

In general, I think we can disable exhaustiveness-checking whenever add_type_pat returns none() in the first tuple element. This is exactly when a pattern doesn't exhaust a type or we don't know if a pattern exhausts a type. Do you think that'd be correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants