Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding explicit NimMain - Improving Nim libraries experience #538

Open
mratsim opened this issue Oct 22, 2023 · 3 comments
Open

Avoiding explicit NimMain - Improving Nim libraries experience #538

mratsim opened this issue Oct 22, 2023 · 3 comments

Comments

@mratsim
Copy link
Collaborator

mratsim commented Oct 22, 2023

Followup on the discussion at: https://discord.com/channels/371759389889003530/768367394547957761/1130053134727782410

Currently using a Nim libraries usually requires calling NimMain to initialize global variables and Nim runtime.

This is extra friction, especially when we want to replicate C libraries that don't require this.

Motivating example

For example, many scientific libraries can autodetect support for CPU features either through the compiler by re-using the same function name but with different target features:

  • A function for SSE
  • A function for AVX
  • ...

There are several ways to implement this, from Agner Fog https://www.agner.org/optimize/optimizing_cpp.pdf section 13.5, there are atleast:
image

And GCC function multiversioning: https://gcc.gnu.org/wiki/FunctionMultiVersioning

__attribute__ ((target ("default")))
int foo ()
{
  // The default version of foo.
  return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
  // foo version for SSE4.2
  return 1;
}

__attribute__ ((target ("arch=atom")))
int foo ()
{
  // foo version for the Intel ATOM processor
  return 2;
}

__attribute__ ((target ("arch=amdfam10")))
int foo ()
{
  // foo version for the AMD Family 0x10 processors.
  return 3;
}

int main ()
{
  int (*p)() = &foo;
  assert ((*p) () == foo ());
  return 0;
}

Current situation

We assume that we only want to ask for CPU capabilities once and not at each function call. Hence we need to:

  1. Call the CPU features detection function once.
  2. Either store the features detected in a global variable.
  3. Or store the correct functions to call, depending on the feature detected.

But as a library provider, this backend part is something that is ideally hidden and only the functions interesting for the user are exposed like compute_matrix_multiplication or verify_cryptographic_signature

Due to Nim globals being initialized in NimMain, this is currently not supported.
Furthermore, function multi-versioning will not work IIRC, even with codegendecl for target attributes, as Nim will not compile functions with colliding C names.

A workaround is to use an __attribute__((constructor)) function, possibly __attribute__((constructor,used)) (in case of zealous dead-code elimination by LTO) for each global a library needs to initialize. However this is limited to globals that don't require Nim runtime (so seqs, strings, ref are excluded)

Low-level - Unix

Looking at my library: https://github.com/mratsim/constantine/blob/67fbd8c/constantine/ethereum_bls_signatures.nim, compiled with --mm:arc and -d:UseMalloc --panics:on -d:noSignalHandler to ensure no runtime (allocator, exceptions which all needs an allocator, signals, ...), the NimMain related functions are:

// @methereum_bls_signatures.nim.c
N_LIB_PRIVATE void PreMainInner(void) {
	// This is my CPU detection routine that fills my global variables
	atmplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
	atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
	PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
	NimMainModule();
}

N_LIB_EXPORT N_CDECL(void, ctt_eth_bls_init_NimMain)(void) {
	void (*volatile inner)(void);
	PreMain();
	inner = NimMainInner;
	(*inner)();
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}
// @m..@s..@s..@s..@s.choosenim@stoolchains@snim-1.6.12@slib@ssystem.nim.c

static N_INLINE(void, initStackBottom)(void) {
}

N_LIB_PRIVATE N_NIMCALL(void, atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000)(void) {
{
	initStackBottom();
}
}

As mentioned in https://discord.com/channels/371759389889003530/768367394547957761/1130212409496322098, one of the motivation for the explicit call was for the old GCs to determine the stack size, I assume for stack scanning of pointers. And there are apparently other initialization routines (which?).

It's also interesting to note that nimbase.h defines

// https://github.com/nim-lang/Nim/blob/v2.0.0/lib/nimbase.h#L513
#define NIM_POSIX_INIT  __attribute__((constructor))

And it's supposed to be used in cgen for PosixCDllMain / NimMainInit:
image

but NimMainInit doesn't appear anywhere in my generated C code.

Low-level - Windows

MSVC provides a similar mechanism: https://github.com/supranational/blst/blob/f8af94a/src/cpuid.c#L47

Questions

  1. Now that arc/orc are default, should we at least have the globals auto-initialized when they are built with ARC/ORC?
  2. In which scenario is NimMainInit built into a library, as this would solve 1?
@mratsim
Copy link
Collaborator Author

mratsim commented Oct 22, 2023

Without passing the --noMain flag we have the following result:

Shared library

N_LIB_PRIVATE void PreMainInner(void) {
	atmdotdotatsconstantineatsplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
	atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
	PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
	NimMainModule();
}

N_LIB_EXPORT N_CDECL(void, ctt_init_NimMain)(void) {
	void (*volatile inner)(void);
	PreMain();
	inner = NimMainInner;
	(*inner)();
}

N_LIB_PRIVATE void NIM_POSIX_INIT NimMainInit(void) {
	ctt_init_NimMain();
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}

This is almost the wanted result. Tested and confirmed that N_LIB_PRIVATE void NIM_POSIX_INIT NimMainInit does the right thing ™️.

Only issue is that the NimMain is tagged N_LIB_EXPORT but I don't think it should?

Static library

N_LIB_PRIVATE void PreMainInner(void) {
	atmdotdotatsconstantineatsplatformsatsisaatscpuinfo_x86dotnim_Init000();
}

N_LIB_PRIVATE int cmdCount;
N_LIB_PRIVATE char** cmdLine;
N_LIB_PRIVATE char** gEnv;
N_LIB_PRIVATE void PreMain(void) {
	atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000();
	PreMainInner();
}

N_LIB_PRIVATE N_CDECL(void, NimMainInner)(void) {
	NimMainModule();
}

N_CDECL(void, ctt_init_NimMain)(void) {
	PreMain();
	NimMainInner();
}

int main(int argc, char** args, char** env) {
	cmdLine = args;
	cmdCount = argc;
	gEnv = env;
	ctt_init_NimMain();
	return nim_program_result;
}

N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
{
}
}

That's not what we want.

@Araq
Copy link
Member

Araq commented Oct 23, 2023

Somewhat related, a name like atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000 is a bug.

@mratsim
Copy link
Collaborator Author

mratsim commented Oct 24, 2023

For my use-case, I have created a loadTime macro pragma that allows a proc to be called at program or library load time, it works whether the code is compiled to an application, dynamic or static library.

Note: MSVC/VCC support to be confirmed. And unsure about TCC

https://github.com/mratsim/constantine/blob/40643f0/constantine/platforms/loadtime_functions.nim#L18-L51

import std/macros

const GCC_Compatible* = defined(gcc) or defined(clang) or
                        defined(llvm_gcc) or defined(icc)

macro loadTime*(procAst: untyped): untyped =
  ## This allows a function to be called at program or library load time
  ## Note: such a function cannot be dead-code eliminated.

  procAst.addPragma(ident"used")     # Remove unused warning
  procAst.addPragma(ident"exportc")  # Prevent the proc from being dead-code eliminated

  if GCC_Compatible:
    # {.pragma: gcc_constructor, codegenDecl: "__attribute__((constructor)) $# $#$#".}
    let gcc_constructor =
        nnkExprColonExpr.newTree(
          ident"codegenDecl",
          newLit"__attribute__((constructor)) $# $#$#"
        )
    procAst.addPragma(gcc_constructor) # Implement load-time functionality

    result = procAst

  elif defined(vcc):
    warning "CPU feature autodetection at Constantine load time has not been tested with MSVC"

    template msvcInitSection(procDef: untyped): untyped =
      let procName = astToStr(def)
      procDef
      {.emit:["""
      #pragma section(".CRT$XCU",read)
      __declspec(allocate(".CRT$XCU")) static int (*p)(void) = """, procName, ";"].}

    result = getAst(msvcInitSection(procAst))

  else:
    error "Compiler not supported."

Somewhat related, a name like atmdotdotatsdotdotatsdotdotatsdotdotatsdotchoosenimatstoolchainsatsnimminus1dot6dot12atslibatssystemdotnim_Init000 is a bug.

Seems like 2 things create those kind of proc names:

  • initStackBottom()
  • using var foo {.global.} = someProc()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants