Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang build support #132

Merged
merged 25 commits into from
Mar 27, 2024
Merged

Clang build support #132

merged 25 commits into from
Mar 27, 2024

Conversation

robambalu
Copy link
Collaborator

No description provided.

@robambalu robambalu requested a review from ngoldbaum March 4, 2024 21:54
…me clang warnings

Signed-off-by: Ambalu, Robert <robert.ambalu@point72.com>
AdamGlustein
AdamGlustein previously approved these changes Mar 5, 2024
Copy link
Collaborator

@AdamGlustein AdamGlustein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ngoldbaum
Copy link
Contributor

We should bring the github actions build changes over from #83, we can also update the build instructions to no longer reference installing g++ or using it on MacOS.

@timkpaine
Copy link
Member

timkpaine commented Mar 5, 2024

This code does not work on Apple Clang. The first of many errors (which do not occur in #83)

/Users/timkpaine/Developer/projects/point72/csp/cpp/csp/engine/Struct.h:345:9: error: static assertion failed due to requirement 'std::is_same<std::__bit_const_reference<std::vector<bool, std::allocator<bool>>>, bool>::value || std::is_same<std::vector<std::__bit_const_reference<std::vector<bool, std::allocator<bool>>>, std::allocator<std::__bit_const_reference<std::vector<bool, std::allocator<bool>>>>>, bool>::value'
        static_assert(std::is_same<V,ElemT>::value || std::is_same<std::vector<V>,ElemT>::value );

@timkpaine timkpaine marked this pull request as draft March 5, 2024 03:24
Signed-off-by: Rob Ambalu <robert.ambalu@RA7293-M1.saccap.int>
@timkpaine timkpaine added tag: wip PRs that are a work in progress - converted to drafts part: build Issues and PRs related to the build process labels Mar 17, 2024
robambalu and others added 6 commits March 18, 2024 14:27
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
…out the --arch options in CMakeLists.txt, not sure what the implications are

Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@ngoldbaum
Copy link
Contributor

The last push adds support and tests for mac conda builds, updates the docs, and removes some remanants of using gcc in the mac builds. I don't know if there are additional fixes needed, I'll comment again once I'm done iterating on this.

Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@ngoldbaum ngoldbaum force-pushed the feature/clang_support branch from ffa8659 to c2d9f38 Compare March 19, 2024 17:10
@ngoldbaum
Copy link
Contributor

ngoldbaum commented Mar 19, 2024

It turns out building using the conda compilers on an intel mac requires setting the min required SDK version to 10.12 (conda-forge defaults to 10.9). I'm not sure how to do that on the github actions runner using a regular pip install invocation outside of the conda build infrastructure, so I'm punting on testing conda-based builds on intel macs for now. Otherwise this should be good to go as far as the build goes, I think.

Maybe we should delete the commented code for supporting cross-compiling on MacOS now arm64 runners are available and it's not needed on github actions?

I think the ld_classic thing is a workaround for a MacOS bug that is fixed in the latest MacOS builds and can also be safely deleted.

@ngoldbaum
Copy link
Contributor

Hmm, not sure what's up with this:

=================================== FAILURES ===================================
_________________________ TestEngine.test_threaded_run _________________________

self = <csp.tests.test_engine.TestEngine testMethod=test_threaded_run>

    def test_threaded_run(self):
        # simple test
        runner = csp.run_on_thread(
            csp.count, csp.timer(timedelta(seconds=1)), starttime=datetime(2021, 4, 23), endtime=timedelta(seconds=60)
        )
        res = runner.join()[0]
        self.assertEqual(len(res), 60)
        # ensure stopping doesnt try to access dead push input adapter
        runner.stop_engine()
    
        # realtime
        @csp.graph
        def g(count: int) -> csp.ts[int]:
            x = csp.count(csp.timer(timedelta(seconds=0.1)))
            stop = x == count
            stop = csp.filter(stop, stop)
    
            csp.stop_engine(stop)
            return x
    
        runner = csp.run_on_thread(g, 5, starttime=datetime.utcnow(), endtime=timedelta(seconds=60), realtime=True)
        res = runner.join()[0]
        self.assertEqual(len(res), 5)
    
        # midway stop
        runner = csp.run_on_thread(g, 50000, starttime=datetime.utcnow(), endtime=timedelta(minutes=30), realtime=True)
        import time
    
        time.sleep(1)
        runner.stop_engine()
        res = runner.join()[0]
>       self.assertLess(len(res), 1000)
E       AssertionError: 17999 not less than 1000

csp/tests/test_engine.py:1406: AssertionError

Only happens using the conda build environment on an ARM mac. I don't reproduce this failure on my M3 macbook pro using a conda-based build, so not sure why this is a problem on github actions.

Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Signed-off-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@ngoldbaum
Copy link
Contributor

ngoldbaum commented Mar 19, 2024

@isuruf pointed out the remnants of the cross-compilation support was breaking the build for intel macs, so I've removed that and added back the intel mac tests.

Still not sure what's up with the test failure on ARM macs. I reduced the timeout so at least we don't have to wait 30 minutes for the test to fail...

…figure out why it fails on GH

Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
@robambalu
Copy link
Collaborator Author

Have you figured out any way to reproduce this locally? If I run on my macbook it seems to pass fine.

Nope, I've only seen this behavior on the github actions runner 😢

adding some prints may help diagnose this too

I added some prints in this commit. You can see a failing run with those prints here - deeplinking into the log where the prints start.

So it looks like runner.stop_engine() does get called, and then a hang happens in the C++ code somewhere.

I added an fprintf call inside PyPushInputAdapter.pushPyTick:

diff --git a/cpp/csp/python/PyPushInputAdapter.cpp b/cpp/csp/python/PyPushInputAdapter.cpp
index 3b5f91b..26a9892 100644
--- a/cpp/csp/python/PyPushInputAdapter.cpp
+++ b/cpp/csp/python/PyPushInputAdapter.cpp
@@ -156,6 +156,7 @@ public:
         {
             if( !validatePyType( this -> dataType(), m_pyType.ptr(), value ) )
                 CSP_THROW( TypeError, "" );
+            fprintf(stderr, "PyPushInputAdapter.pushPyTick called\n");
             pushTick<T>( std::move(fromPython<T>( value, *this -> dataType() )), batch );
         }
         catch( const TypeError & )

And that seems to make the failures happen less often, but they still happen, and that fprintf does get called before the hang. I'm having a little trouble navigating the C++ internals - any other likely spots I can put fprintf calls to narrow down where the hang is happening?

@ngoldbaum that pushTick method should never block, so I assume its getting past that. It somewhat feels like the python gil may not be getting released at all or enough. Its really hard to debug without being able to reproduce outside of the github workflows. I'm going to try adding logs now as well.

@robambalu
Copy link
Collaborator Author

I added a step to the end of the action to trigger a remote debugging session and managed to reproduce this issue. It is a deadlock, here's the traceback from both threads:

Thread 0x000000016f90b000 (most recent call first):
  File "/Users/runner/work/csp/csp/csp/impl/wiring/runtime.py", line 125 in _run_engine
  File "/Users/runner/work/csp/csp/csp/impl/wiring/runtime.py", line 227 in run
  File "/Users/runner/work/csp/csp/csp/impl/wiring/runtime.py", line 241 in run
  File "/Users/runner/work/csp/csp/csp/impl/wiring/threaded_runtime.py", line 81 in _run
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/threading.py", line 982 in run
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00000001e0325000 (most recent call first):
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/threading.py", line 1139 in _wait_for_tstate_
lock
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/threading.py", line 1119 in join
  File "/Users/runner/work/csp/csp/csp/impl/wiring/threaded_runtime.py", line 92 in join
  File "/Users/runner/work/csp/csp/csp/tests/test_engine.py", line 1405 in test_threaded_run
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/unittest/case.py", line 579 in _callTestMethod
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/unittest/case.py", line 623 in run
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/unittest/case.py", line 678 in __call__
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/_pytest/unittest.py", line 321 in runtest
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/_pytest/runner.py", line 172 in
 pytest_runtest_call
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/pluggy/_callers.py", line 102 i
n _multicall
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/pluggy/_manager.py", line 119 i
n _hookexec
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in
__call__
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/_pytest/runner.py", line 240 in
 <lambda>
  File "/Users/runner/micromamba/envs/csp-dev/lib/python3.11/site-packages/_pytest/runner.py", line 340
<further frames elided>

So it looks like for whatever reason engine.run() on the work thread never returns.

Happy to do further debugging here or walk someone else through how to set this up.

If you can show me how to set this up that would be great. Do you get a remote gdb session or pdb? I would likely need gdb for this, or the macos equivalent

…timing

Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
…nts can be deferred indefinitely if timer executes before event ( wait will keep waiting until max time, even if an event is pending )

Simplify QueueWaiter ( remove it entirely ) and move condvar/mutex/blocking logic directly into SRMWLockFreeQueue so that we can check on m_head directly.
Revert all intermediate changes that were committed for testing / tracking down this issue

Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
… fix and attack the queue issue properly another day. fix cpp time test

Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
Signed-off-by: Rob Ambalu <robert.ambalu@point72.com>
@robambalu robambalu marked this pull request as ready for review March 27, 2024 13:09
@robambalu robambalu requested a review from czgdp1807 as a code owner March 27, 2024 13:09
@timkpaine timkpaine removed the tag: wip PRs that are a work in progress - converted to drafts label Mar 27, 2024
@timkpaine
Copy link
Member

@timkpaine
Copy link
Member

timkpaine commented Mar 27, 2024

full build isnt running pypi-oriented mac tests (just conda), testing in #168

timkpaine and others added 2 commits March 27, 2024 11:34
Signed-off-by: Tim Paine <timothy.paine@cubistsystematic.com>
clean up ci/cd code and ensure mac tests run
@timkpaine
Copy link
Member

actual full build this time (+ mac tests!) https://github.com/Point72/csp/actions/runs/8455095906

@robambalu
Copy link
Collaborator Author

curious why are we still using vcpkg for the full build if the conda builds are working ( and are significantly faster )

@ngoldbaum
Copy link
Contributor

The wheels need to be built using the toolchain on the manylinux image, so something like vcpkg is necessary. You can’t use the toolchain conda-forge is built with.

@timkpaine
Copy link
Member

timkpaine commented Mar 27, 2024

reasonable summary: https://www.pyopensci.org/python-package-guide/package-structure-code/publish-python-package-pypi-conda.html#publishing-your-package-in-a-community-repository-pypi-or-anaconda-cloud

tl;dr:

code -> wheels -> pypi
         -> sdist     -> pypi
                |
                |------> conda-forge build -> anaconda.org/conda-forge

pypi -> No C++ / system dependency assumptions
conda -> C++ / system dependencies can be specified as other conda packages

@timkpaine
Copy link
Member

gtg
image

@robambalu robambalu requested a review from timkpaine March 27, 2024 17:01
Copy link
Member

@timkpaine timkpaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@robambalu robambalu requested a review from AdamGlustein March 27, 2024 17:05
@robambalu robambalu merged commit 86f26ed into main Mar 27, 2024
50 checks passed
@robambalu robambalu deleted the feature/clang_support branch March 27, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part: build Issues and PRs related to the build process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants