Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAPACK 3.11.0 #114

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

LAPACK 3.11.0 #114

wants to merge 26 commits into from

Conversation

h-vetinari
Copy link
Member

Builds on #96; testing if this helps with the MKL failures, see conda-forge/intel_repack-feedstock#44

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@conda-forge-webservices
Copy link

conda-forge-webservices bot commented Jul 22, 2024

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

Copy link
Member Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in conda-forge/lapack-feedstock#67, this PR should be merged first to avoid a window where but no blas implementation except netlib is on 3.11 (which would cause it to be installed everywhere during that time).

recipe/meta.yaml Outdated
{% set version_major = version.split(".")[0] %}
# blas_major denotes major infrastructural change to how blas is managed
{% set blas_major = "2" %}
# make sure we do not create colliding version strings of output "blas"
# for builds across lapack-versions within the same blas_major
{% set blas_minor = build_num + 100 %}
{% set blas_minor = build_num + 300 %}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this as is so we keep the + 200 for LAPACK 3.10, if we ever need to build that for whatever reason.

@h-vetinari
Copy link
Member Author

Not sure what's happening with openblas on osx.

The following tests FAILED:
	  1 - LAPACK_Test_Summary (Failed)
	  2 - LAPACK-xlintsts_stest_in (Failed)
	  8 - LAPACK-xeigtsts_sec_in (Failed)
	 46 - LAPACK-xlintstc_ctest_in (Failed)

The LAPACK_Test_Summary stuff already showed up in #119, but the rest is new.

@h-vetinari
Copy link
Member Author

I cannot seem to get rid of the channel confusion for blis on windows either, it pulls in a too old lapack:

+ conda install -c conda-forge/label/lapack_rc -c conda-forge 'libblas=*=*blis' 'libcblas=*=*blis' 'liblapack=*=*netlib' 'liblapacke=*=*netlib' --use-local --yes -p 'D:\bld\blas-split_1721704755901\_h_env'
Channels:
 - local
 - conda-forge/label/lapack_rc
 - conda-forge
Platform: win-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

[...]

The following NEW packages will be INSTALLED:

  libblas            conda-forge/win-64::libblas-3.9.0-23_win64_blis 
  libcblas           conda-forge/win-64::libcblas-3.9.0-23_win64_blis 
  liblapack          conda-forge/win-64::liblapack-3.9.0-5_hd5c7e75_netlib 
  liblapacke         conda-forge/win-64::liblapacke-3.9.0-5_hd5c7e75_netlib 

Pinning the version just leads to

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package liblapacke-3.11.0-2_he5c8b5d_netlib requires libcblas 3.11.0.*, but none of the providers can be installed
  - package liblapack-3.11.0-2_he5c8b5d_netlib requires libblas 3.11.0.*, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
\u251c\u2500 libblas * *blis is requested and can be installed;
\u251c\u2500 libcblas * *blis is requested and can be installed;
\u251c\u2500 liblapack 3.11 2*netlib is not installable because it requires
\u2502  \u2514\u2500 libblas 3.11.0.* , which conflicts with any installable versions previously reported;
\u2514\u2500 liblapacke 3.11 2*netlib is not installable because it requires
   \u2514\u2500 libcblas 3.11.0.* , which conflicts with any installable versions previously reported.

@h-vetinari h-vetinari force-pushed the 3.11 branch 9 times, most recently from c0af6fa to 011a282 Compare August 18, 2024 08:13
@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 18, 2024

@martin-frbg, sorry for another ping, I'm completely stumped whats happening with openblas on osx-64 (against the LAPACK 3.11 test suite). Those failures were already there with 0.3.27, and have remained with 0.3.28.

I cannot make heads or tails of what's happening;

  7/107 Test   #8: LAPACK-xeigtsts_sec_in ...........***Failed    0.04 sec
Running: /Users/runner/miniforge3/conda-bld/blas-split_1723965414526/work/build/bin/xeigtsts
ARGS= OUTPUT_FILE;/Users/runner/miniforge3/conda-bld/blas-split_1723965414526/work/build/TESTING/sec.out;ERROR_FILE;/Users/runner/miniforge3/conda-bld/blas-split_1723965414526/work/build/TESTING/sec.out.err;INPUT_FILE;/Users/runner/miniforge3/conda-bld/blas-split_1723965414526/work/TESTING/sec.in
Test OUTPUT:
 ** On entry to STRSYLPSM� parameter number  1 had an illegal value
 ** On entry to STRSYLPSM� parameter number  2 had an illegal value
 ** On entry to STRSYLPSM� parameter number  3 had an illegal value
 ** On entry to STRSYLPSM� parameter number  4 had an illegal value
 ** On entry to STRSYLPSM� parameter number  5 had an illegal value
 ** On entry to STRSYLPSM� parameter number  7 had an illegal value
 ** On entry to STRSYLPSM� parameter number  9 had an illegal value
 ** On entry to STRSYLPSM� parameter number 11 had an illegal value

Test ERROR:
dyld[35081]: missing symbol called

Program received signal SIGABRT: Process abort signal.

I have no idea why the symbol name (according to the logs) seems corrupted? Presumably this is also the reason why the linker fails to find it... However, this only happens in this combination of libraries/platforms, so this is very strange.

The overall results are:

97% tests passed, 3 tests failed out of 107

Total Test time (real) =  75.48 sec

The following tests FAILED:
	  2 - LAPACK-xlintsts_stest_in (Failed)
	  8 - LAPACK-xeigtsts_sec_in (Failed)
	 46 - LAPACK-xlintstc_ctest_in (Failed)

			-->   LAPACK TESTING SUMMARY  <--
		Processing LAPACK Testing output found in the TESTING directory
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	493230		2	(0.000%)	3870	(0.785%)	
DOUBLE PRECISION	1327825		0	(0.000%)	4888	(0.368%)	
COMPLEX          	472269		0	(0.000%)	4048	(0.857%)	
COMPLEX16         	787822		0	(0.000%)	4950	(0.628%)	

--> ALL PRECISIONS	3081146		2	(0.000%)	17756	(0.576%)

which is quite a bit worse (both overall and in detail, e.g. "other error") than a comparable result with another backend on osx

blis on osx-64
100% tests passed, 0 tests failed out of 95

Total Test time (real) =   5.18 sec

			-->   LAPACK TESTING SUMMARY  <--
		Processing LAPACK Testing output found in the TESTING directory
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	39516		0	(0.000%)	28	(0.071%)	
DOUBLE PRECISION	39516		0	(0.000%)	29	(0.073%)	
COMPLEX          	39516		0	(0.000%)	28	(0.071%)	
COMPLEX16         	39516		0	(0.000%)	29	(0.073%)	

--> ALL PRECISIONS	158064		0	(0.000%)	114	(0.072%)	

or (note also the wildly different number of tests 🤔)

openblas on linux-64
100% tests passed, 0 tests failed out of 107

Total Test time (real) =  63.93 sec

			-->   LAPACK TESTING SUMMARY  <--
		Processing LAPACK Testing output found in the TESTING directory
SUMMARY             	nb test run 	numerical error   	other error  
================   	===========	=================	================  
REAL             	1319227		1	(0.000%)	12	(0.001%)	
DOUBLE PRECISION	1327825		0	(0.000%)	12	(0.001%)	
COMPLEX          	776795		15	(0.002%)	20	(0.003%)	
COMPLEX16         	783874		15	(0.002%)	20	(0.003%)	

--> ALL PRECISIONS	4207721		31	(0.001%)	64	(0.002%)	

There are hundreds of lines of logs along the lines of

 ** On entry to CGTSV  parameter number  <nr> had an illegal value
 *** Illegal value of parameter number  <nr> not detected by CGBTF2 ***
 *** CPP routines failed the tests of the error exits ***
 *** CPP drivers failed the tests of the error exits ***

I checked the commit history of the TESTING folder back to 3.11, but cannot something that stands out w.r.t. to the errors that I see; but then I've also not managed to reconstruct from which test/function call the following kind of logs get produced

 ** On entry to CPBCONSafe minimumNon-unitConjugate transposeUpperNo transposeLower parameter number  5 had an illegal value

Not all symbols have the mojibake, but quite a number of them do (which was what I was trying to dig into, but I cannot tell where the symbol names get constructed, much less how they'd get messed up).

 ** On entry to CPPSVX� parameter number 1 had an illegal value
 ** On entry to CPBTRF����� parameter number  1 had an illegal value
 ** On entry to CPBTF2 parameter number  1 had an illegal value

Any ideas would be appreciated! 🙏

regro-cf-autotick-bot and others added 26 commits November 4, 2024 23:21
avoid picking up ambient python from somewhere else in the image

LAPACK_TESTING_USE_PYTHON only landed in 3.12, see Reference-LAPACK/lapack@d8f668c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants