Convert a C header .h
to a FFI (foreign function interface) .json
data structure for the purposes of generating bindings to other languages.
For differences between the other GitHub project under the same name https://github.com/rpav/c2ffi please read this section.
When creating applications (especially games) with higher level languages (such as C#, Java, Python), it's sometimes necessary to dip down into C for access to a native library with better raw performance and overall better portability of different low-level APIs accross various platforms. This works great, however, the problem is that maintaining the higher level language bindings by hand becomes time consuming, error-prone, and in some cases quite tricky, especially when the C library changes frequently.
Note that C++ or other low-level languages are not considered as part of the problem scope because they do not align to specific goals. Though, perhaps Zig or some other language may emerge in the future as superior to C for such goals. The goals are the following:
-
Portability. For better or worse, C can be used as the industry's standard portable assembler even if technically speaking it is not. Writing a native library in the C language (with some constraints) and building it for multiple targets such as Windows, macOS, Linux, iOS, Android, etc, is the path of least resistance. This is especially true for more non-traditional targets such as RaspberryPi, WebAssembly or even consoles.
-
Interopability. The C language, specfically the usage of data structures and functions in limited scope, is a common demonitor between C and higher level languages. This makes interaction between C and and other languages not only correct but as fast and efficient as possible.
-
Maintainability. Writing and maintaining a C code project is arguably simpler due to C being a relatively small language in comparison to C++/ObjectiveC. This makes the C language arguably easier to learn and work with, especially if limited in scope such as avoiding the use of function-like macros. This is important for open-source projects (in contrast to proprietary-enterprise-like projects) where one of the barriers to development is knowledge sharing at scale in a decentralized fashion.
Automate the first step of generating bindings for a higher level language by parsing a cross-platform C .h
file using libclang and extracting out the minimal FFI (foreign function interface) data as .json
.
Refer to the following 2 graphs as examples for a FFI between C and target language X
.
Example diagram: platform specific.
graph LR
subgraph C library: Linux
C_HEADER(C header file <br> .h)
C_SOURCE(C/C++/ObjC source code <br> .c/.cpp/.m)
C_HEADER --- C_SOURCE
end
subgraph c2ffi: extract
EXTRACT_FFI_LINUX[Extract <br> FFI]
C_HEADER -.-> EXTRACT_FFI_LINUX
end
subgraph Artifacts: native library
C_COMPILED_LINUX(compiled C code <br> .so)
end
subgraph Artifacts: target-platform
C_HEADER -.-> C_COMPILED_LINUX
C_SOURCE -.-> C_COMPILED_LINUX
PLATFORM_FFI_LINUX(platform FFI <br> .json)
EXTRACT_FFI_LINUX -.-> PLATFORM_FFI_LINUX
end
subgraph Your bindgen tool
PLATFORM_FFI_LINUX --> X_CODE_GENERATOR[X language code <br> generator]
end
subgraph Your app
C_COMPILED_LINUX === X_SOURCE
X_CODE_GENERATOR -.-> X_SOURCE(X language source code)
end
Example diagram: cross-platform.
graph LR
subgraph C library
C_HEADER(C header file <br> .h)
C_SOURCE(C/C++/ObjC source code <br> .c/.cpp/.m)
C_HEADER --- C_SOURCE
end
subgraph c2ffi: extract
EXTRACT_FFI_WINDOWS[Extract <br> FFI]
EXTRACT_FFI_MACOS[Extract <br> FFI]
EXTRACT_FFI_LINUX[Extract <br> FFI]
C_HEADER -.-> |Windows| EXTRACT_FFI_WINDOWS
C_HEADER -.-> |macOS| EXTRACT_FFI_MACOS
C_HEADER -.-> |Linux| EXTRACT_FFI_LINUX
end
subgraph Native library
C_COMPILED_WINDOWS(compiled C code <br> .dll)
C_COMPILED_MACOS(compiled C code <br> .dylib)
C_COMPILED_LINUX(compiled C code <br> .so)
end
subgraph Artifacts: target-platform
C_HEADER -.-> |Windows| C_COMPILED_WINDOWS
C_SOURCE -.-> |Windows| C_COMPILED_WINDOWS
C_HEADER -.-> |macOS| C_COMPILED_MACOS
C_SOURCE -.-> |macOS| C_COMPILED_MACOS
C_HEADER -.-> |Linux| C_COMPILED_LINUX
C_SOURCE -.-> |Linux| C_COMPILED_LINUX
PLATFORM_FFI_WINDOWS(platform FFI <br> .json)
PLATFORM_FFI_MACOS(platform FFI <br> .json)
PLATFORM_FFI_LINUX(platform FFI <br> .json)
EXTRACT_FFI_WINDOWS -.-> |Windows| PLATFORM_FFI_WINDOWS
EXTRACT_FFI_MACOS -.-> |macOS| PLATFORM_FFI_MACOS
EXTRACT_FFI_LINUX -.-> |Linux| PLATFORM_FFI_LINUX
end
subgraph c2ffi: merge
MERGE_FFI["Merge platform FFIs to a cross-platform FFI"]
PLATFORM_FFI_WINDOWS -.-> |Any OS| MERGE_FFI
PLATFORM_FFI_MACOS -.-> |Any OS| MERGE_FFI
PLATFORM_FFI_LINUX -.-> |Any OS| MERGE_FFI
end
subgraph Artifacts: cross-platform
CROSS_FFI(Cross-platform FFI <br> .json)
MERGE_FFI -.-> CROSS_FFI
end
subgraph Your bindgen tool
CROSS_FFI --> X_CODE_GENERATOR[X language code <br> generator]
end
subgraph Your app
C_COMPILED_WINDOWS === |Windows| X_SOURCE
C_COMPILED_MACOS === |macoS| X_SOURCE
C_COMPILED_LINUX === |Linux| X_SOURCE
X_CODE_GENERATOR -.-> X_SOURCE(X language source code)
end
Differences between https://github.com/rpav/c2ffi
I originally had this project named as something different but then re-wrote it with tests under the name c2ffi
as that accurately describes the project. Unfortunately it has the same name as another project (https://github.com/rpav/c2ffi) with similar goals. If someone has a better name I am open to suggestions. Perhaps c2ffix
where the x
is for cross-platform?
This project is different in the following ways:
- This project is licensed under
MIT
. The other project is licensed underGPL2
. - This project is written in C# and interacts with
libclang
over C interopability, the other project is written in C++. Additionally, this project has a C# library via a NuGet package that contains the code for serializing and deserializing the model to/from.json
. - This project only supports C. The other one apparently supports C++, ObjC, etc.
- This project fully supports macros objects by parsing via C++ using
auto
. - This project is intended to be used for generating a cross-platform FFI. Specific things which break portability such as variadic functions and bit-fields are not supported in this project by design. This project (
extract
step) outputs a.json
file for each target platform (clang target triple), then this program (merge
step) merges these platform specific.json
files into a cross-platform.json
and checks if it is indeed cross-platform. If it failed at themerge
step then there is likely something wrong with the C code which makes it not portable between one or more target platforms. The other project does not check for cross-platform and is rather left upon the developer. - This project supports various options for configuring
extract
andmerge
steps including skipping C declarations by using regular expression matching. - This project also includes a brain dump of things which one should do and do not in C for interoperability (see next section).
- This project is used directly by another project c2cs to generate C# bindings.
c2ffi
does not work for every C library. This is due to some technical limitations where some usages of C for cross-platform foreign function interface (FFI) are not appropriate. Everything in the external linkage of the C API is subject to the following list for being "FFI Ready". Think of it as the check list to creating a cross-platform C library for usage by other languages.
Note that the internals of the C library is irrelevant and to which this list does not apply. It is then possible to use C++/ObjectiveC behind a implementation file (.cpp
or .m
respectively) or reference C++/ObjectiveC from a C implementation file (.c
); all that c2ffi
needs is the C header file (.h
).
Supported | Description |
---|---|
✅ | Variable externs 1, 3, 7 |
✅ | Function externs 1, 3, 7 |
✅ | Function prototypes (a.k.a., function pointers.) 3, 7 |
✅ | Enums 3 |
✅ | Structs 2, 4, 7 |
✅ | Unions 2, 4, 7 |
✅ | Opaque types. 2, 7 |
✅ | Typedefs (a.k.a, type aliases) 2, 7 |
❌ | Function-like macros 5 |
✅ | Object-like macros 2, 6, 7 |
1: When declaring your external functions or variables, do set the default visibility explictly. This is necessary because c2ffi
is configured to have the visibiity set to hidden when using libclang
via the flag -fvisibility=hidden
so that only the strict subset of functions and variables intended for FFI are extracted. Most C libraries will have an API_DECL
macro object defined which can be redefined to also set the visibility. See ffi_helper.h for an example in C. You can also use the config .json
file to define tyour API_DECL
macro object.
Bad
#if defined(WIN32) || defined(_WIN32) || defined(__WIN32__)
#define MY_API_DECL __declspec(dllexport) // no visibility explictly set when using Clang, thus visibility is 'hidden'
#else
#define MY_API_DECL extern // no visibility explictly set, thus visibility is 'hidden'
#endif
...
MY_API_DECL const char* my_api_print_hello_world() // won't be extracted as part of FFI because the visibility is 'hidden'
{
printf("Hello world!");
}
Good
#if defined(WIN32) || defined(_WIN32) || defined(__WIN32__)
#if defined(__clang__)
#define MY_API_DECL __declspec(dllexport) __attribute__ ((visibility("default")))
#else
#define MY_API_DECL __declspec(dllexport)
#endif
#else
#define MY_API_DECL extern __attribute__ ((visibility("default")))
#endif
...
MY_API_DECL const char* my_api_print_hello_world() // visibility is explicitly set, will be extracted as part of FFI
{
printf("Hello world!");
}
2: Do use standard integer types from stdint.h
such as int32_t
, uint64_t
, etc which are portable. Do not use C's primitive integer types directly such as unsigned long
as they are not garanteed to be portable due to possibly having different bitwidths for target platforms.
Bad
unsigned long value; // is it 4 bytes or 8 bytes?
Good
#include <stdint.h>
...
uint32_t value; // 4 bytes
3: Do not use 64-bit enums due to compiler determinism in C; in C enums are only well defined for 32-bit in size or less.
Bad
enum MY_ENUM
{
MY_ENUM_LARGE_VALUE_1 = 0x1000000000000000,
MY_ENUM_LARGE_VALUE_2 = 0x2000000000000000,
};
Good
enum MY_ENUM
{
MY_ENUM_VALUE_1 = 0x1,
MY_ENUM_VALUE_2 = 0x2,
MY_ENUM_VALUE_MAX = 0x7FFFFFFF
};
4: Do not use bit fields in C. This is because bit fields may have different bit layouts across different compilers (e.g. GCC vs MSCV) which may break portability. Instead use bitmasks to get or set the bits of an integer yourself.
Bad
struct dob { // What is the sequential order of the struct's fields?
uint32_t date: 5;
uint32_t month: 4;
uint32_t year: 12;
};
5: Function-like macros are only possible if the parameters' types can be inferred 100% of the time during preprocessor; otherwise, not possible. Not yet implemented.
Bad
#define SUM(a,b,c) a + b + c // What is the type of a?
6: Object-like macros have full support. The value type is determined by evaluating the value of the macro as an C++ expression using auto
.
Acceptable
#define BUFFER_SIZE 1024 // Type is int16_t
7: Types must be explicitly transtive to a function extern, variable extern, or macro-object so that they can be included as part of the FFI. If this is not the case, then the type is not used in the FFI and will not be extracted.
Support for generating the FFI of a C library for different target platforms using c2ffi
is dependent on two things:
-
A "Clang target triple" (a.k.a. "target platform"). Target platforms are identified by a string in a specific format of
arch-vendor-os-environment
and passed to Clang which informs how to read C code. -
System C header
.h
files of the target platform. The root directory of where the files are located need to be passed to Clang to read C code correctly. The files are often distributed and installed with a software development environment (SDE) or additional downloadable components to the SDE in a form of a software development kit (SDK). By default for Windows, macOS, and Linux,c2ffi
will try to find these system headers automatically by searching for common default locations.
The following table demonstrates commonly used target platforms.
Open | OS | Arch | SDE | Clang Target Triple |
---|---|---|---|---|
🔓 | Windows | ARM64 |
MinGW | aarch64-pc-windows-gnu |
🔓 | Windows | X64 |
MinGW | x86_64-pc-windows-gnu |
🔓 | Windows | X86 |
MinGW | i686-pc-windows-gnu |
🔒1 | Windows | ARM64 |
MSVC | aarch64-pc-windows-msvc |
🔒1 | Windows | X64 |
MSVC | x86_64-pc-windows-msvc |
🔒1 | Windows | X86 |
MSVC | i686-pc-windows-msvc |
🔒2 | macOS | ARM64 |
XCode | aarch64-apple-darwin |
🔒2 | macOS | X64 |
XCode | x86_64-apple-darwin |
🔒2 | macOS | X86 |
XCode | i686-apple-darwin |
🔓 | Linux (kernel) | ARM64 |
CMake recommended | aarch64-unknown-linux-gnu |
🔓 | Linux (kernel) | X64 |
CMake recommended | x86_64-unknown-linux-gnu |
🔓 | Linux (kernel) | X86 |
CMake recommended | i686-unknown-linux-gnu |
🔒2 | iOS | ARM64 |
XCode | aarch64-apple-ios |
🔒2 | iOS | X64 |
XCode | x86_64-apple-ios |
🔒2 | tvOS | ARM64 |
XCode | aarch64-apple-tvos |
🔒2 | tvOS | X64 |
XCode | x86_64-apple-tvos |
🔒3 | Android | ARM64 |
Android Studio | aarch64-linux-android |
🔒3 | Android | X64 |
Android Studio | x86_64-linux-android |
Column | Notes |
---|---|
Open | If a target platform has an 🔓 here it means the system headers can be distributed and installed under a free and open-source (FOSS) license. If a target platform has an 🔒 here it means the system header can not be distributed under a (FOSS) license. |
OS | The operating system of the target platform. |
Arch | The computer architecture (a.k.a instruction set architecture) of the target platform. |
SDE | The software development environment (SDE) required to build native libraries for the target platform. |
1: Microsoft does not allow open distribution of their software development kits (SDKs) due to their Microsoft Software License Terms. However, you can download and install the SDKs here for Windows here for your Windows development machine. You will find the important directories for the C headers at %ProgramFiles(x86)%\Windows Kits\10\Include
. This effectively means that to generate the FFI for target platforms which are Windows, c2ffi
must run from Windows with the Windows SDK installed.
2: Apple does not allow copy or usage of their software development kits (SDKs) on non-Apple hardware due to their service level agreement. You can download and install XCode through the App Store to gain access to the SDKs for macOS, iOS, tvOS, watchOS, or any other Apple target platform. This effectively means that to generate FFI for target platforms which are Apple, then c2ffi
must run from macOS with XCode installed. Additional SDKs for each target platform (e.g. macOS, iOS, tvOS) are also installed through XCode.
3: Google does not allow copy or usage of their software development kits (SDKs) due to their Android Software Development Kit License Agreement. You can download and install Android Studio to gain access to the SDKs for Android. This effectively means that to generate FFI for target platforms which are Android, then c2ffi
must run from Windows, macOS, or Linux with Android Studio installed and additional SDKs for each target platform are also installed through Android Studio or equivalent.
Note that pointers such as void*
can have different sizes across target computer architectures. E.g., X86
pointers are 4 bytes and X64
(aswell as ARM64
) pointers are 8 bytes. This means that FFIs that c2ffi
generates between 32-bit and 64-bit target platforms will have different return type sizes, parameter type sizes, or record sizes when using pointers. That being said, 64-bit word size is pretty ubiquitous on Windows these days, at least for gaming, as you can see from Steam hardware survey where 64-bit is 99%+. Additionally, you can see that the "trend" is that 64-bit is becoming standard over time with 32-bit getting dropped. If you are planning on targeting modern machines, I would advise making your life simple and just forgeting about target platforms with 32-bit computer architectures such as X86
and ARM32
.
dotnet tool install bottlenoselabs.c2ffi.tool -g
Extract the platform specific FFI using a configuration .json
file.
config-extract.json
:
{
"inputFilePath": "path/to/libary/include/header.h",
"userIncludeDirectories": [
"path/to/other_library/include"
],
"targetPlatforms": {
"windows": {
"x86_64-pc-windows-msvc": {},
"aarch64-pc-windows-msvc": {}
},
"macos": {
"aarch64-apple-darwin": {},
"x86_64-apple-darwin": {},
},
"linux": {
"x86_64-unknown-linux-gnu": {},
"aarch64-unknown-linux-gnu": {}
}
}
}
Terminal:
c2ffi extract path/to/config-extract.json
NOTE: The targetPlatforms
in the config.json
is a matrix of operating systems to extract the Clang target triples on. In other words, it will only extract the Clang target triple when on the specific operating systems. For example given the config.json
above, when the current operating system is windows
, only the x86_64-pc-windows-msvc
and aarch64-pc-windows-msvc
target triples will extracted.
Once one or more FFI .json
files have been extracted, merge them together into a cross-platform FFI .json
file.
This step is necessary to verify that the platform specific FFIs are indeed cross-platform by checking functions, types, bit-widths, etc, are all the same. If you plan on only targetting a specific platform such as Windows only, you may wish to skip this step.
Terminal:
c2ffi merge --inputDirectoryPath /path/to/platform/ast --outputFilePath /path/to/cross-platform-ast.json