Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new %{^HOOK} hash, similar to %SIG, and add support for "require__before" and "require__after" hooks. #20637

Merged
merged 7 commits into from
Mar 18, 2023

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Dec 21, 2022

This PR adds a new %{^HOOK} hash, similar to %SIG, but for hooking keywords. The hash is magic in a similar way as %SIG, in that it validates its contents on store and actually maps to internal variables which store the values instead. The first two members of the hash are require__before and require__after. Members are expected to be of the form $keyword . "__" . $phase, where $keyword is the keyword that executes the hook, and $phase is when the hook is executed (Currently either "before" or "after"). The initial user visible implementation includes support for a hook which is executed before require, and a hook which is executed after require. The require__before hook may return a callback that is also executed after the require, but can be constructed in such a way that it shares state with the before hook that executed. The before hooks execute before nearly any other require related logic, including checking if the required item was already required, and the after hooks executed after the require completed, regardless as to whether the require died or was successful.

With the required__before hook we could fix Devel::TraceUse to not alter the stack when it is in use. It could also be used to detect circular dependencies, log all used modules, or track the memory and time taken to require a given module (with or without its dependencies).

In order to implement this there are some nice goodies from an internal point of view. mortal_destructor_sv() and mortal_svfunc_x() (and their macro wrappers SAVEMORTALDESTRUCTOR_SV() and SAVEMORTALSVFUNC_X()) can be used to execute a callback at the end of the current statement (either Perl or C callback respective). Also there is the SvREFCNT_dec_set_NULL() macro which can be used to decrement an SV* pointer and set it to NULL in a single statement, and a related function SvREFCNT_dec_ret_NULL() which decrements an SV's refcount and returns NULL, which is used by the SvREFCNT_dec_set_NULL() macro but can also be helpful in cases where a function wants to do this to an argument and then return NULL. There are also various other minor improvements to regen/mg_vtable.pl (trimming trailing whitespace on generated documentation and renaming of some confusingly named internal variables) and improvements to the documentation of the related functions in pod/perlguts.pod.

Everything has documentation, including the new utility functions and macros.

@haarg @ap @book

@demerphq demerphq marked this pull request as ready for review December 23, 2022 10:49
perl.c Outdated Show resolved Hide resolved
pp_ctl.c Outdated Show resolved Hide resolved
@leonerd
Copy link
Contributor

leonerd commented Jan 3, 2023

I've always thought that the abuse of %SIG which __DIE__ and __WARN__ make is incredibly ugly; I would much have preferred those two to have been written ${^DIEHOOK} and ${^WARNHOOK} so that every element of %SIG really does relate to a real POSIX signal.

It's rather too late to change those now, but can we avoid adding any more? I.e. can this please be spelled ${^REQUIREHOOK} instead?

@demerphq
Copy link
Collaborator Author

demerphq commented Jan 3, 2023

I.e. can this please be spelled ${^REQUIREHOOK} instead?

FWIW, my original implementation worked like that. But it was more work and it felt unperlish to me so i switched to %SIG. While I agree with your sentiment, and likely I would agree that if $SIG{__WARN__} was ${^WARN} or something it might be better, but that boat has sailed and I think that having /some/ hooks in magic caret vars and /some/ in %SIG is worse than having them all in %SIG, the element of surprise is reduced at the very least. There is also a small advantage with %SIG that you can enumerate all the hooks that are in effect by walking %SIG, which has some level of utility and affords a certain sense of future portability. For instance if its in %SIG then a generic "disable all hooks" is just local %SIG=(); (ignoring @INC hooks), with hooks all over the place that becomes rather more difficult and impossible to future proof against.

There is another advantage at the implementations level that adding support for validating a new special key in %SIG is nearly trivial, but adding a new var with new magic to front an internal state is rather more complex and invasive. I found it interesting how much simpler the code was when it was based on %SIG.

So frankly I'd prefer that we don't use a caret var for this. If PSC decides we must then so be it, it can be done, but I don't think its as good an idea as it sounds at first. And I mean that literally, my unpushed original code for this used ${^REQUIRE_HOOK} much as you suggested, but after playing with it I came to the conclusion that it just felt weird to have $SIG{__WARN__} on one hand and ${^REQUIRE_HOOK} on the other.

@demerphq
Copy link
Collaborator Author

demerphq commented Jan 3, 2023

@leonerd one option might be to introduce a %{^HOOK} variable. We could mirror warn and die between the two hashes. Then use that in new hooks.

@leonerd
Copy link
Contributor

leonerd commented Jan 3, 2023

Ahyes; that sounds like a better forward plan; a new place to put hooks would be nice.

In that case, since we'd want to move WARN and DIE out of the current place anyway, it probably makes little difference whether we add REQUIRE in to SIG for now anyway. Might as well continue on this PR as it is for now and move all of them later on.

@ap
Copy link
Contributor

ap commented Jan 3, 2023

I've always thought that the abuse of %SIG which __DIE__ and __WARN__ make is incredibly ugly

I agree it would be cleaner to only have %SIG concerned with POSIX signals, and have those other hooks live elsewhere. But they do make a sort of sense to me as well, considering they too are a program interruption somewhat akin to a signal.

A __REQUIRE__ hook in %SIG is a very weird place for it though. The only reason to put it there is that __DIE__ and __WARN__ happen to be there already, which also happens to make the implementation easier. Without that path dependence, it would seem a bizarre design decision.

One option might be to introduce a %{^HOOK} variable. We could mirror warn and die between the two hashes. Then use that in new hooks.

This, however, I have a severe allergic reaction to. Please don’t. Code which uses the new interface would not work on older perls, not because older perls lack capabilities to support the feature being used, but simply because the way to access the capability changed name. To put it another way: because fashion. Worse, dealing with these hooks is relevant inside some rather plumbing-level modules, where forcing churn is just going to impose more forced perl version obsolescence on the ecosystem.

The same reasoning is going to apply in the future, of course, so adding a %{^HOOK} variable now – but for require only, with no mirroring of the %SIG hooks – seems like a good idea, if that seems like something there will be more use for later. (Btw, also, can we have lowercase keys in that case? (Or rather, simply use the name of the function being hooked, verbatim.) That %SIG keys are all uppercase makes sense, but IMO %{^HOOK} should not be beholden to that precedent.)

@demerphq
Copy link
Collaborator Author

demerphq commented Jan 4, 2023

@ap: to you it is a category error to put REQUIRE with DIE and WARN in %SIG. But i dont agree with you really, I see a "signal" generated when we are about to compile more code to be compatible with the notion of a generic signal, as much as DIE or WARN do anyway. Remember that require is triggered invisibly by use, and internally by using of various functionality or variables. So when someone "uses" code, having a "signal" triggered at the start of that process isn't that weird to me. Certainly to me it is not so obviously wrong that it seems like a "bizarre design decision". To me it seems a design decision well within the general concept of a signal triggered when perl does something invisibly under the hood. Especially as the require hook really is a true /signal/, unlike DIE or WARN, in that it cannot affect the execution of the require other than to throw an exception, both DIE and WARN allow you to modify the behavior of the die/warn functions. So they arent just signal handlers but also mutators.

The only reason you seem to be distinguishing DIE and WARN from REQUIRE is that DIE and WARN are part of an exception handling process. To me the thing that the %SIG keys have in common is that they can all be triggered automatically by perl or the system itself when things happen. Perl requires modules behind the scenes, just as it throws exceptions, and thus it seems to me that they have more in common than they differ, and in fact, that REQUIRE, not being a mutator, actually has more in common with other signal handlers than it does with DIE or WARN and that it is really those two which are exceptional. [1]

With regard to %{^HOOK} or whatever we call it (if we go ahead with it), i was thinking we would just "tie" the DIE and WARN keys in %SIG to their equivalents in %{^HOOK}, not necessarily removing the keys from %SIG until there has been a very very long deprecation cycle. The only thing that might break back compat is someone using only %{^HOOK} instead of %SIG, but that is the same as someone using defer{}. It would also be possible to create a shim to make %{^HOOK} work properly on older perls if we wanted.

But i take your point that introducing a %{^HOOK} variable may just end up making things worse not better, that was part of my reason for saying "the only thing worse than putting this in %SIG while WARN and DIE are there is to have a different hook not in SIG." The precedent has been established, no matter how explicable we find that precedent, and we should just stick with it.

[1] the return behavior of the REQUIRE hook is a bit exceptional. but the actual location of the hook to me not so much.

@demerphq
Copy link
Collaborator Author

demerphq commented Jan 4, 2023

@ap: Consider this case. opening a scalar file buffer auto-requires PerlIO::scalar.

$ ./perl -Ilib -le'$SIG{__REQUIRE__}= sub { print "__REQUIRE__: @_" }; open my $fh,"<", \"" or die;'
__REQUIRE__: PerlIO.pm
__REQUIRE__: PerlIO/scalar.pm
__REQUIRE__: XSLoader.pm
__REQUIRE__: strict.pm

To me this feels very much like what perl level %SIGnals do. Tell us when something internal has happened that we might want to know about.

@demerphq
Copy link
Collaborator Author

So are there any objections to this moving forward?

@demerphq
Copy link
Collaborator Author

This feels warnocked. I would like to merge this for the next release.

@haarg
Copy link
Contributor

haarg commented Jan 18, 2023

@ap: to you it is a category error to put __REQUIRE__ with __DIE__ and __WARN__ in %SIG. But i dont agree with you really, I see a "signal" generated when we are about to compile more code to be compatible with the notion of a generic signal, as much as __DIE__ or __WARN__ do anyway. Remember that require is triggered invisibly by use, and internally by using of various functionality or variables. So when someone "uses" code, having a "signal" triggered at the start of that process isn't that weird to me. Certainly to me it is not so obviously wrong that it seems like a "bizarre design decision". To me it seems a design decision well within the general concept of a signal triggered when perl does something invisibly under the hood. Especially as the require hook really is a true /signal/, unlike DIE or WARN, in that it cannot affect the execution of the require other than to throw an exception, both DIE and WARN allow you to modify the behavior of the die/warn functions. So they arent just signal handlers but also mutators.

But all of the other %SIG entries do change the execution, just by being installed. For POSIX signals, installing a %SIG entry prevents the system default signal handler from being executed.

The only reason you seem to be distinguishing DIE and WARN from REQUIRE is that DIE and WARN are part of an exception handling process. To me the thing that the %SIG keys have in common is that they can all be triggered automatically by perl or the system itself when things happen. Perl requires modules behind the scenes, just as it throws exceptions, and thus it seems to me that they have more in common than they differ, and in fact, that REQUIRE, not being a mutator, actually has more in common with other signal handlers than it does with DIE or WARN and that it is really those two which are exceptional. [1]

With regard to %{^HOOK} or whatever we call it (if we go ahead with it), i was thinking we would just "tie" the __DIE__ and __WARN__ keys in %SIG to their equivalents in %{^HOOK}, not necessarily removing the keys from %SIG until there has been a very very long deprecation cycle. The only thing that might break back compat is someone using only %{^HOOK} instead of %SIG, but that is the same as someone using defer{}. It would also be possible to create a shim to make %{^HOOK} work properly on older perls if we wanted.

But i take your point that introducing a %{^HOOK} variable may just end up making things worse not better, that was part of my reason for saying "the only thing worse than putting this in %SIG while WARN and DIE are there is to have a different hook not in SIG." The precedent has been established, no matter how explicable we find that precedent, and we should just stick with it.

[1] the return behavior of the __REQUIRE__ hook is a bit exceptional. but the actual location of the hook to me not so much.

I don't think that require being used "behind the scenes" is really justification for it being handled differently from hooking into any other CORE:: function. And I think the fact that __REQUIRE__ functions entirely differently from all the other %SIG entries means it doesn't belong there. The magic return value is different from any other %SIG entry, and I don't think the desire to wrap require justifies the special case.

The original problem leading to this PR is that wrapping CORE::GLOBAL::require is problematic. This also applies to a number of other CORE functions, such as open and system. Maintaining context and correct parsing is hard or impossible. So I'd like to see a more general solution for attaching before/after actions to core subs. A separate variable from %SIG seems more appropriate. Since the hooks that this PR wants to implement work differently from the existing __DIE__ and __WARN__ handlers work, I don't think they need to be moved or aliased into the new feature.

@haarg
Copy link
Contributor

haarg commented Jan 18, 2023

I'll add that while I'd like to see a more general solution for this type of hook, I think it's fine if there's an initial implementation that only works for require. But I'd like the shape of it to allow future extension, and I don't think %SIG is the right place for that.

@demerphq
Copy link
Collaborator Author

demerphq commented Jan 18, 2023

Hi @haarg. I have outlined why i dont think a separate variable is a good idea. So while I respect your opinion, if that is the strongest point you have here, then I think we will have to just agree to disagree. IMO The only thing worse than having __WARN__, __DIE__ and __REQUIRE__ be in %SIG is having __WARN__ and __DIE__ be in a different data structure from __REQUIRE__ itself. Having this live in %SIG means that %SIG=(); clears the hooks and restores "normal behavior". If we introduce a new variable for this, then coders need to know about both. I think that is clearly worse than whatever conceptual friction you have with putting this stuff alongside the other hooks in %SIG.

I actually do agree that it would have been better if Larry had put the DIE and WARN hooks somewhere else, and he had I would have put the REQUIRE hook in the same place, but that boat has sailed. Lets not make the problem worse.

Also, I do agree that I would not be pushing REQUIRE at all if it was sane to override require and other core ops properly. But it isn't and it seems likely it wont be for some time, if ever. So I think we should go ahead with REQUIRE as is.

@Leont
Copy link
Contributor

Leont commented Jan 18, 2023

I don't quite like it in %SIG either (and hate the existing exceptions anyway).

@demerphq
Copy link
Collaborator Author

@Leont yeah, I understand. I dont like a number of design decisions in Perl. :-) But once they are baked in we have to consider this from the point of view of "least worst" options, not as though we have a clear field. And I think that the least worst option here is just swallow our dislike of this. The other option opens up its own can of portability worms. Specifcally that code that currently exists and does:

local %SIG;

wll continue to work as it used to: disabling all signal handlers, OS level and Perl level.

If we introduce a new var that is no longer true, and it becomes yet another thing people need to deal with.

So the question here isnt "is this the ideal way to do this", it is "what is the least sucky way to do this". And IMO the objections about it being in %SIG are a matter of taste, which is important, but it lacks technical foundation.

@Grinnz
Copy link
Contributor

Grinnz commented Jan 18, 2023

I'm not sure what you mean. @haarg 's objections were entirely technical (and seem reasonable to me).

@karenetheridge karenetheridge marked this pull request as draft January 18, 2023 23:25
@wchristian
Copy link
Contributor

Gotta agree that rather than messing more with %SIG, which should be posix signals, the optimal solution here would be a completely separate mechanism for hooking perl core function, which could then also include die and warn and permit future hooking on those without the issues %SIG replacements present, maybe allowing phasing that out in some far future.

@demerphq demerphq marked this pull request as ready for review January 20, 2023 01:58
regen/mg_vtable.pl Outdated Show resolved Hide resolved
scope.c Outdated Show resolved Hide resolved
scope.c Outdated Show resolved Hide resolved
scope.c Outdated
void
Perl_mortal_destructor_x(pTHX_ DESTRUCTORFUNC_t f, void *p) {
PERL_ARGS_ASSERT_MORTAL_DESTRUCTOR_X;
mortal_destructor_sv((SV*)f,(SV *)p);
Copy link
Contributor

@Leont Leont Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will treat f and p as refcounted, which is probably a bad idea.

Just using a different vtable is probably the easiest solution, this should be the simpler case anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the code a little bit to use SVFUNC_t which requires a SV * argument instead of DESTRUCTORFUNC_t which supports a void * argument. Also the code you quoted is/was just broken, the function needed to be wrapped in a SVt_IV and the f argument needed to be converted into an IV with PTR2IV(). But once I did that it seems work as expected and passes the tests I added to XS-APITest.

I am not really sure what we win by creating a new vtable to handle this and it would cost us one of the magic chars. It seems to me with the adjustments I just made it is fine as it is.

Can you review once more please?

@demerphq demerphq force-pushed the yves/require_hook_rebased branch 2 times, most recently from f882dcc to d848440 Compare March 17, 2023 13:15
@demerphq demerphq changed the title Add support for a __REQUIRE__ hook in %SIG. Add a new %{^HOOK} hash, similar to %SIG, and add support for "require__before" and "require__after" hooks. Mar 17, 2023
scope.h Outdated Show resolved Hide resolved
@haarg
Copy link
Contributor

haarg commented Mar 17, 2023

I generally like this. I'm not entirely certain about the interface - having two types of after hooks feels like it might indicate there's a better option. But that isn't a rejection.

But is this something that should be experimental?

@demerphq
Copy link
Collaborator Author

But is this something that should be experimental?

I don't see how that would be advantageous. This doesn't get in the way and its not a core part of the language nor does it affect code that already exists. (Like most use of experimental features.)

having two types of after hooks feels like it might indicate there's a better option

There aren't two types of after hook. The is a post-before hook and an after hook. They aren't the same thing. I explained why we have both in previous post, sharing state with /just/ an after hook is error prone and unreliable. On the other hand, @ap requested we provide a separate after hook for people who don't want to use a before hook and just want to do something after each require. I just gave him what he requested.

@iabyn
Copy link
Contributor

iabyn commented Mar 17, 2023 via email

I want to use these modules in other tests, so changing the name
makes sense.
and also SvREFCNT_dec_ret_NULL() which is used to implement
SvREFCNT_dec_set_NULL(). The set_NULL() macro is intended to
be used to replace code like this:

    if (sv) {
        SvREFCNT_dec_NN(sv);
        sv = NULL;
    }

The function form just facilitates it, and can be used in situations
where returning NULL after decrementing a refcount would be reduce
code complexity.
The function SAVEDESTRUCTOR_X() (save_destructor_x) can be used to
execute a C function at the end of the current psuedo-block. Prior to
this patch there was no "mortal" equivalent that would execute at the
end of the current statement. We offer a collection of functions which
are intended to free SV's at either point in time, but only support
callbacks at the end of the current pseudo-block.

This patch adds two such functions, "mortal_destructor_sv" which can be
used to trigger a perl code reference to execute at the end of the
current statement, and "mortal_svfunc_x" which can be used to trigger an
SVFUNC_t C function at the end of the current statement.

Both functions differ from save_destructor_x() in that instead of
supporting a void pointer argument they both require their argument to
be some sort of SV pointer. The Perl callback function triggered by
"mortal_destructor_sv" may be provided no arguments, a single argument
or a list of arguments, depending on the type of argument provided to
mortal_destructor_sv(): when the argument is a raw AV (with no SV ref
wrapping it), then the contents of the AV are passed in as a list of
arguments. When the argument is anything else but NULL, the argument is
provided as a single argument, and when it is NULL the perl function is
called with no arguments.

Both functions are implemented on top of a mortal SV (unseen by the
user) which has PERL_MAGIC_destruct magic associated with it, which
triggers the destructor behavior when the SV is freed.

Both functions are provided with macros to match the normal SAVExx()
API, with MORTALDESTRUCTOR_SV() wrapping mortal_destructor_sv() and
MORTALSVFUNC_X() wrapping mortal_svfunc_x().

The heart of this logic cribbed from Leon Timmermans' Variable-OnDestruct.
See the code at:

https://metacpan.org/dist/Variable-OnDestruct/source/lib/Variable/OnDestruct.xs#L6-17

I am very grateful to him for his help on this. Any errors or omissions
in this code are my fault, not his.
This defines a new magic hash C<%{^HOOK}> which is intended to be used for
hooking keywords. It is similar to %SIG in that the values it contains
are validated on set, and it is not allowed to store something in
C<%{^HOOK}> that isn't supposed to be there. Hooks are expected to be
coderefs (people can use currying if they really want to put an object
in there, the API is deliberately simple.)

The C<%{^HOOK}> hash is documented to have keys of the form
"${keyword}__${phase}" where $phase is either "before" or "after"
and in this initial release two hooks are supported,
"require__before" and "require__after":

The C<require__before> hook is called before require is executed,
including any @inc hooks that might be fired. It is called with the path
of the file being required, just as would be stored in %INC. The hook
may alter the filename by writing to $_[0] and it may return a coderef
to be executed *after* the require has completed, otherwise the return
is ignored.  This coderef is also called with the path of the file which
was required, and it will be called regardless as to whether the require
(or its dependencies) die during execution.  This mechanism makes it
trivial and safe to share state between the initial hook and the coderef
it returns.

The C<require__after> hook is similar to the C<require__before> hook
however except that it is called after the require completes
(successfully or not), and its return is ignored always.
@haarg
Copy link
Contributor

haarg commented Mar 17, 2023

I don't see how that would be advantageous. This doesn't get in the way and its not a core part of the language nor does it affect code that already exists. (Like most use of experimental features.)

It isn't syntax, but it's still a part of the language. There are plenty of other experimental things that have had warnings but no feature flag, because they didn't interfere with any existing syntax. postderef, const_attr, and builtin for example. smartmatch as well (not switch).

having two types of after hooks feels like it might indicate there's a better option

There aren't two types of after hook. The is a post-before hook and an after hook. They aren't the same thing.

Those are both "after" hooks, and I never said they were the same thing. I understand the need for carrying state between the hooks. Having both types feels a bit awkward, but I'm not sure there's a better option.

@demerphq demerphq merged commit 93f6f96 into blead Mar 18, 2023
@demerphq demerphq deleted the yves/require_hook_rebased branch March 18, 2023 12:58
@Leont
Copy link
Contributor

Leont commented Mar 19, 2023

Yeah, but SAVEMORTALIZESV() still uses the save stack. It's action is to
mortalise the SV in leave_scope() on scope exit

So I don't think SAVE should be in the names.

I think that is a strong argument for renaming them before the next major release.

@demerphq
Copy link
Collaborator Author

I think that is a strong argument for renaming them before the next major release.

Done already. I noted in reply to @leonerd's original request, which davem was replying to, but because he replied via email it was not logged in the right place.

(This is just for the record, you and I discussed this already in IRC.)

pjacklam pushed a commit to pjacklam/perl5 that referenced this pull request May 20, 2023
Historically we used to parse out the tests that we ran in t/harness
from the MANIFEST file.  At some point this changed and we started
consulting the disk using globs.  However because we do not use a
recursive search over the t/ directory it is quite possible that a new
directory of tests is added which actually never runs.

In Perl#20637 (comment) Tony
C noticed that I had added a new test file t/op/hook/require.t which is
in a new subdirectory t/op/hook/ which was unknown to t/harness and thus
not actually being run by make test_harness.  (This patch does NOT add
t/op/hoop to the list of directories to scan, I will do that in the PR.)

I then took the time to add code to detect if any other test files are
not being run, and it turns out that it is also the case for the new
t/class/ directory of tests and it is also the case for the tests for
test.pl itself, found in the t/test_pl directory.

This patch adds logic to detect if this happens and make t/harness die
if it finds a test file in the manifest which will not be detected by
the custom rules for finding test files that is used in t/harness.  It
does not die if t/harness finds tests that are not in MANIFEST, that
should be detected by a different test.

The level of complexity in finding and deciding the tests files that we
should run, and the differences between t/TEST and t/harness is fairly
high.  In the past Nicholas put some effort into unifying the logic, but
it seems since then we have drifted apart.  Even though t/harness uses
t/TEST and the _tests_from_manifest() function, for some time now it
has only used it to find which extensions to test, not which test
files to run.  I have *NOT* dug into whether t/TEST is also missing
test files that are in manifest.  That can happen in a follow up patch.

Long term we should unify all of this logic so that t/TEST and t/harness
run the same test files always, and that we will always detect
discrepancies between the MANIFEST and the tests we are running.  We do
not for instance test that they test the same things. :-) :-(
pjacklam pushed a commit to pjacklam/perl5 that referenced this pull request May 20, 2023
Historically we used to parse out the tests that we ran in t/harness
from the MANIFEST file.  At some point this changed and we started
consulting the disk using globs.  However because we do not use a
recursive search over the t/ directory it is quite possible that a new
directory of tests is added which actually never runs.

In Perl#20637 (comment) Tony
C noticed that I had added a new test file t/op/hook/require.t which is
in a new subdirectory t/op/hook/ which was unknown to t/harness and thus
not actually being run by make test_harness.  (This patch does NOT add
t/op/hoop to the list of directories to scan, I will do that in the PR.)

I then took the time to add code to detect if any other test files are
not being run, and it turns out that it is also the case for the new
t/class/ directory of tests and it is also the case for the tests for
test.pl itself, found in the t/test_pl directory.

This patch adds logic to detect if this happens and make t/harness die
if it finds a test file in the manifest which will not be detected by
the custom rules for finding test files that is used in t/harness.  It
does not die if t/harness finds tests that are not in MANIFEST, that
should be detected by a different test.

The level of complexity in finding and deciding the tests files that we
should run, and the differences between t/TEST and t/harness is fairly
high.  In the past Nicholas put some effort into unifying the logic, but
it seems since then we have drifted apart.  Even though t/harness uses
t/TEST and the _tests_from_manifest() function, for some time now it
has only used it to find which extensions to test, not which test
files to run.  I have *NOT* dug into whether t/TEST is also missing
test files that are in manifest.  That can happen in a follow up patch.

Long term we should unify all of this logic so that t/TEST and t/harness
run the same test files always, and that we will always detect
discrepancies between the MANIFEST and the tests we are running.  We do
not for instance test that they test the same things. :-) :-(
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jul 10, 2023
Historically we used to parse out the tests that we ran in t/harness
from the MANIFEST file.  At some point this changed and we started
consulting the disk using globs.  However because we do not use a
recursive search over the t/ directory it is quite possible that a new
directory of tests is added which actually never runs.

In Perl#20637 (comment) Tony
C noticed that I had added a new test file t/op/hook/require.t which is
in a new subdirectory t/op/hook/ which was unknown to t/harness and thus
not actually being run by make test_harness.  (This patch does NOT add
t/op/hoop to the list of directories to scan, I will do that in the PR.)

I then took the time to add code to detect if any other test files are
not being run, and it turns out that it is also the case for the new
t/class/ directory of tests and it is also the case for the tests for
test.pl itself, found in the t/test_pl directory.

This patch adds logic to detect if this happens and make t/harness die
if it finds a test file in the manifest which will not be detected by
the custom rules for finding test files that is used in t/harness.  It
does not die if t/harness finds tests that are not in MANIFEST, that
should be detected by a different test.

The level of complexity in finding and deciding the tests files that we
should run, and the differences between t/TEST and t/harness is fairly
high.  In the past Nicholas put some effort into unifying the logic, but
it seems since then we have drifted apart.  Even though t/harness uses
t/TEST and the _tests_from_manifest() function, for some time now it
has only used it to find which extensions to test, not which test
files to run.  I have *NOT* dug into whether t/TEST is also missing
test files that are in manifest.  That can happen in a follow up patch.

Long term we should unify all of this logic so that t/TEST and t/harness
run the same test files always, and that we will always detect
discrepancies between the MANIFEST and the tests we are running.  We do
not for instance test that they test the same things. :-) :-(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.