Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pharo CI stopped working - libgit errors #11481

Closed
JanBliznicenko opened this issue Jul 27, 2022 · 73 comments
Closed

Pharo CI stopped working - libgit errors #11481

JanBliznicenko opened this issue Jul 27, 2022 · 73 comments
Assignees
Labels
Priority: Critical To fix or review as soon as possible Type: Bug

Comments

@JanBliznicenko
Copy link
Contributor

JanBliznicenko commented Jul 27, 2022

I am not exactly sure what is the source of the problem, but all Pharo CI/CD on Linux and Mac stopped working in last 24 hours. Win builds seem unaffected.
I have seen different errors. Mostly IceGenericError: error reading from the zlib stream, but few times IceGenericError: bad packet length.

MetacelloNotification: Loaded -> BaselineOfUMLProfiles-CompatibleUserName.1658909608 --- https://github.com/openponk/uml-profiles.git[master] --- https://github.com/openponk/uml-profiles.git[master]
I got an error while cloning: There was an authentication error while trying to execute the operation: . 
This happens usually because you didn't provide a valid set of credentials. 
You may fix this problem in different ways: 

1. adding your keys to ssh-agent, executing ssh-add ~/.ssh/id_rsa in your command line.
2. adding your keys in settings (open settings browser search for "Use custom SSH keys" and
add your public and private keys).
IceGenericError: error reading from the zlib stream
IceLibgitErrorVisitor>>visitGenericError:
IceLibgitErrorVisitor>>visitERROR:
LGit_GIT_ERROR>>acceptError:
[ :error |
		location exists ifTrue: [ location ensureDeleteAll ].
		error acceptError: (IceLibgitErrorVisitor onContext: self) ] in IceGitClone>>execute in Block: [ :error |...
FullBlockClosure(BlockClosure)>>cull:
Context>>evaluateSignal:
Context>>handleSignal:
LGit_GIT_ERROR(Exception)>>signal
LGit_GIT_ERROR class(LGitCallReturnHandler class)>>signalWith:
LGitReturnCodeEnum>>handleLGitReturnCode
LGitRepository(LGitExternalObject)>>withReturnHandlerDo:
LGitRepository>>clone:options:to:
LGitRepository>>clone:options:
[location ensureCreateDirectory.
	
	repo := LGitRepository on: location.
	cloneOptions := repo cloneOptionsStructureClass withCredentialsProvider: (IceCredentialsProvider defaultForRemoteUrl: url).

	"Keeping references, because if not the GC take them."
	checkoutOptions := cloneOptions checkoutOptions.
	callbacks := cloneOptions fetchOptions callbacks.
	callbacks transferProgress: IceGitTransferProgress new.
	
	checkoutOptions checkoutStrategy: LGitCheckoutStrategyEnum git_checkout_force.
	checkoutOptions progressCallback: IceGitCheckoutProgress new.

	repo clone: url options: cloneOptions.

	(LGitRemote of: repo named: 'origin')
		lookup;
		setUrl: url.
		
	] in IceGitClone>>execute in Block: [location ensureCreateDirectory....
FullBlockClosure(BlockClosure)>>on:do:
IceGitClone>>execute
IceRepositoryCreator>>cloneRepository
[
		self validate.
		self isCloning
			ifTrue: [ self cloneRepository ]
			ifFalse: [ self addLocalRepository ] ] in IceRepositoryCreator>>createRepository in Block: [...
FullBlockClosure(BlockClosure)>>on:do:
IceRepositoryCreator>>createRepository
[ repository := builder createRepository ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryFor: in Block: [ repository := builder createRepository ]
FullBlockClosure(BlockClosure)>>on:do:
MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryFor:
[ ^ self createIcebergRepositoryFor: urlToUse ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryWithFallbackFor:url: in Block: [ ^ self createIcebergRepositoryFor: urlToUse ]
FullBlockClosure(BlockClosure)>>on:do:
MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryWithFallbackFor:url:
[ | remote |
			remote := IceGitRemote url: remoteUrl.
			self createIcebergRepositoryWithFallbackFor: remote url: remoteUrl ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>getOrCreateIcebergRepository in Block: [ | remote |...
OrderedCollection(Collection)>>detect:ifFound:ifNone:
OrderedCollection(Collection)>>detect:ifNone:
MCGitHubRepository(MCGitBasedNetworkRepository)>>getOrCreateIcebergRepository
3. using HTTPS instead SSH (Just use an url in the form https://etc.git)./ I will try to clone the HTTPS variant.

Error with status code 1:
653 travis_wait /home/runner/.smalltalkCI/helpers.sh

Source of the log: https://github.com/OpenPonk/class-editor/runs/7543017828?check_suite_focus=true

Probably related to pharo-vcs/iceberg#1600 and hpi-swa/smalltalkCI#562

@kasperosterbye
Copy link
Contributor

I am bitten by this in an other project (pillar-markup/Microdown).

I believe it is related to #11222.

@badetitou
Copy link
Member

Yes, It kills all my projects and Moose projects

@guillep
Copy link
Member

guillep commented Jul 29, 2022

Is this happening for GitHub actions only? or others too?

@JanBliznicenko
Copy link
Contributor Author

Is this happening for GitHub actions only? or others too?

@guillep I use only GitHub actions, unfortunately, but @badetitou said in hpi-swa/smalltalkCI#562 (comment) that GitLab works fine for him

@badetitou
Copy link
Member

badetitou commented Jul 29, 2022

Note that some GitHub actions work randomly (maybe if the baseline load only a few dependencies)
It is the case for

GitHub
Binding between Famix and FAST. Contribute to badetitou/Carrefour development by creating an account on GitHub.
GitHub
Contribute to moosetechnology/FAST-JAVA development by creating an account on GitHub.

@JanBliznicenko
Copy link
Contributor Author

Most of my projects use Moose and all these fail consistently at the time it usually fetches the Moose, but that is probably because the Moose is large. My small projects with few simple dependencies (and no Moose) seem to mostly work, but some of them not (and these fail always the same time as well).

@kasperosterbye
Copy link
Contributor

I get it not when building, but this piece of code in a fresh image (before I apply my github credentials)

IceGitHubAPI new in: [ :api |	50 timesRepeat:  [ api get: 'repos/pillar-markup/Microdown']	 ]

However, I am not sure it is related.

@badetitou
Copy link
Member

Does it work after setting up credentials?

@kasperosterbye
Copy link
Contributor

After credentials I can do around 5000 api calls per hour, which so far has been enough. It is in the iceberg wiki.

But how to make sure the credentials are set in the CI I have no idea. Also, I have no idea why the CI is suddenly broken.

@Rinzwind
Copy link
Contributor

After doing this:

#(
	'https://github.com/SeasideSt/Seaside-Legacy30'
	'https://github.com/pharo-project/pharo-zeroconf.git'
	'https://github.com/pharo-project/pharo.git'
	'https://github.com/pharo-project/pharo-launcher.git'
	'https://github.com/pharo-project/pharo-vm.git'
	'https://github.com/pharo-project/pharo-site.git'
	'https://github.com/pharo-project/pharo-changelogs.git'
	'https://github.com/pharo-project/pharo-core.git'
) do: [ :url |
	Transcript show: url; cr.
	[
		IceRepositoryCreator new 
			url: url;
			location: FileLocator temp / ('IceRepositoryCreator Test ' , Time microsecondClockValue asString);
			createRepository
	] on: Error do: [ :error |
		Transcript show: error asString; cr ]
] separatedBy: [ Transcript cr ]

My Transcript shows:

https://github.com/SeasideSt/Seaside-Legacy30
IceGenericError: error reading from the zlib stream

https://github.com/pharo-project/pharo-zeroconf.git

https://github.com/pharo-project/pharo.git
IceGenericError: error reading from the zlib stream

https://github.com/pharo-project/pharo-launcher.git

https://github.com/pharo-project/pharo-vm.git
IceGenericError: bad packet length

https://github.com/pharo-project/pharo-site.git
IceGenericError: bad packet length

https://github.com/pharo-project/pharo-changelogs.git

https://github.com/pharo-project/pharo-core.git
IceGenericError: error reading from the zlib stream

So: 3 OK, 2 times ‘bad packet length’ and 3 times ‘error reading from the zlib stream’.

@badetitou
Copy link
Member

I try setting up the token as described in the Iceberg wiki and proposed by @kasperosterbye in moosetechnology/Moose@30d5dc6
It does not work

Yes I have killed this token in my personal account right after the test

@badetitou
Copy link
Member

badetitou commented Aug 1, 2022

GitHub action virtual environment has changed the 20220724. Maybe it created the problem for us (updating of one lib)
https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-Readme.md

GitHub
GitHub Actions virtual environments. Contribute to actions/virtual-environments development by creating an account on GitHub.

@badetitou
Copy link
Member

And also, we have this issue now

MetacelloNameNotDefinedError: project group, or package named: 'FileTree' not found when used in requires: or includes: field of package: 'Metacello-FileTree' for version: baseline of BaselineOfMetacello.

@Rinzwind
Copy link
Contributor

Rinzwind commented Aug 1, 2022

I suspect the problem has to do with the following point in the changes listed for libgit2 v1.0.1:

  • A bug where the smart HTTP transport could not read large data packets has been fixed. Previously, fetching from servers like Gerrit, that sent large data packets, would error.

The following gives v1.0.0 as the version of libgit2 used in Pharo:

LGitLibrary uniqueInstance version "=> #(1 0 0)"

Using that version of libgit2, the C program given below reproduces the error:

$ ./test 
libgit2 version: 1.0.0
cloning to: /tmp/libgit2_clone_test.uhfXQw
git_clone returned: -1
error message: error reading from the zlib stream

Upgrading to v1.0.1 fixes the error:

$ ./test 
libgit2 version: 1.0.1
cloning to: /tmp/libgit2_clone_test.TBNpFL
git_clone returned: 0

I used MacPorts to install the different versions of libgit2, making use of the instructions on the page “How to install an older version of a port”. The ‘Portfile’ for v1.0.1 can be found in: macports/macports-ports@abe3087564d83e7e. A diff for v1.0.0 is given below.

test.c:

#include <unistd.h>
#include <stdio.h>
#include <git2/global.h>
#include <git2/clone.h>
#include <git2/errors.h>

int main() {
	int major, minor, rev;
	git_libgit2_init();
	if (git_libgit2_version(&major, &minor, &rev) != 0) {
		printf("git_libgit2_version failed\n");
		return 1;
	}
	printf("libgit2 version: %i.%i.%i\n", major, minor, rev);
	char dir[] = "/tmp/libgit2_clone_test.XXXXXX";
	if (mkdtemp(dir) == NULL) {
		printf("mkdtemp failed\n");
		return 1;
	}
	printf ("cloning to: %s\n", dir);
	git_repository *repo;
	int value = git_clone(&repo, "https://github.com/SeasideSt/Seaside-Legacy30", dir, NULL);
	printf("git_clone returned: %i\n", value);
	const git_error *error = git_error_last();
	if (error != NULL)
		printf("error message: %s\n", error->message);
	return (value == 0) ? 0 : 1;
}

Makefile:

test: test.c
	cc -I/opt/local/include -L/opt/local/lib test.c -o test -lgit2

Portfile diff:

diff --git a/devel/libgit2/Portfile b/devel/libgit2/Portfile
index 7c7d6c532e0..e8120559c40 100644
--- a/devel/libgit2/Portfile
+++ b/devel/libgit2/Portfile
@@ -9 +9 @@ PortGroup           legacysupport 1.0
-github.setup        libgit2 libgit2 1.0.1 v
+github.setup        libgit2 libgit2 1.0.0 v
@@ -27,3 +27,3 @@ homepage            https://libgit2.org/
-checksums           rmd160  3f9ced0e0dff170a8156e4aa8fcb0abce66c8f60 \
-                    sha256  5dae7cb32b6977cd95ed849d24f3692f0b7e9eb9b0ee9ffaa14baebb9cac76e1 \
-                    size    5304918
+checksums           rmd160  f46ca0500f159e058d854ed6aeb8d4418420b419 \
+                    sha256  1d3135077f7b0401c1172e41f561cadd06fd159e75aa24d710de1bd3a24b1440 \
+                    size    5302852

@badetitou
Copy link
Member

badetitou commented Aug 2, 2022

I can confirm that version 1.0.0 is used in the CI
https://github.com/moosetechnology/Moose/runs/7626118950?check_suite_focus=true#step:4:65

GitHub
MOOSE - Platform for software and data analysis. . Contribute to moosetechnology/Moose development by creating an account on GitHub.

@JanBliznicenko
Copy link
Contributor Author

JanBliznicenko commented Aug 2, 2022

Was not the libgit 1.0.0 there the whole time? I wonder why is it problem now. Maybe GitHub changed something about data transfers (since GitLab does not seem affected)?

@Rinzwind
Copy link
Contributor

Rinzwind commented Aug 2, 2022

The description of the change in libgit2 I referred to says “fetching from servers like Gerrit, that sent large data packets, would error.” I assume GitHub just didn’t send large data packets before but now does.

@gcotelli
Copy link
Member

gcotelli commented Aug 2, 2022

Sadly, the error seems to be a bit random. I've run again the failing builds and sometimes fail but other times don't. And for jobs having a build matrix, they failed for some configurations but not for others, even when the only difference is the packages loaded.

@gcotelli
Copy link
Member

gcotelli commented Aug 2, 2022

I was able to reproduce the problem without using GitHub actions at all, but by doing a docker build in my notebook. The first 2 times it failed with IceGenericError: error reading from the zlib stream and IceGenericError: bad packet length, but the third time worked.

@jbrichau
Copy link

jbrichau commented Aug 3, 2022

As mentioned in #11481 (comment), libgit2 needs an update in the Pharo vm... where does this need to be reported to get it done?

@Ducasse
Copy link
Member

Ducasse commented Aug 3, 2022

Here but the team is on vacation :(
I will see if we can do something with christophe monday.
And yes we will have to make sure that not everybody is on vacation at the same time.

@Ducasse Ducasse added the Priority: Critical To fix or review as soon as possible label Aug 3, 2022
@Ducasse
Copy link
Member

Ducasse commented Aug 3, 2022

It is a bit crazy that such changes are not backward compatible. :( and impacts everybody.

@jbrichau
Copy link

jbrichau commented Aug 3, 2022

@Ducasse thanks for the reply Stef. I suspected that France is in holiday mode right now ;-). Just wanted to make sure it reaches the right people.

@Ducasse
Copy link
Member

Ducasse commented Aug 3, 2022

It does :) we got impacted too. I cannot build some of my projects and not even load the code :(

@guillep
Copy link
Member

guillep commented Aug 23, 2022

Yes, the OSX problem is separate, check the issue here: #11561
PRs are issued, I'm waiting that the CI runs to check it's ok.

@gcotelli
Copy link
Member

@guillep can we get a new Pharo 10 release once all the related issues are merged? I want to update our docker images for Pharo but prefer to base it on a tagged version (like v10.1.0) and not a commit hash.

@tesonep
Copy link
Collaborator

tesonep commented Aug 24, 2022

@gcotelli The version v10.0.1 is ready

@Rinzwind
Copy link
Contributor

@guillep @tesonep I just saw there was another ‘IceGenericError: bad packet length’ on our Jenkins server last night, so I was wondering what the status of this issue is with respect to Pharo 9? Is an update still coming?

Info from PharoDebug.log:

THERE_BE_DRAGONS_HERE
IceGenericError: bad packet length
31 August 2022 12:35:17.203046 am

VM: unix - x86_64 - linux-gnu - CoInterpreter * VMMaker-tonel.1 uuid: 365973b2-49a3-0d00-90e4-5907092bce84 Aug 23 2022
StackToRegisterMappingCogit * VMMaker-tonel.1 uuid: 365973b2-49a3-0d00-90e4-5907092bce84 Aug 23 2022
v9.0.17 - Commit: 9e4879f - Date: 2022-08-22 14:31:22 +0200

Image: Pharo9.0.0 [Build information: Pharo-9.0.0+build.1575.sha.9bb5f998e8a6d016ec7abde3ed09c4a60c0b4551 (64 Bit)]

@JanBliznicenko
Copy link
Contributor Author

JanBliznicenko commented Sep 1, 2022

Actually, I still randomly see errors as well even for Pharo 10 on Mac. At least it is "sometimes randomly", and not every time like before.
For example, yesterday I got:
IceGenericError: SecureTransport error: connection closed via error in https://github.com/OpenPonk/plugins/runs/8118540172?check_suite_focus=true

@guillep
Copy link
Member

guillep commented Sep 2, 2022

Hi @Rinzwind , the issue is still not backported to Pharo9. There is PR #11596 on hold, but it requires some more work. There is probably a mismatch between Pharo9 and the version of NewTools that is trying to be installed.

@JanBliznicenko Can you tell us if the problem persists? having connection errors is something that happens...

@JanBliznicenko
Copy link
Contributor Author

JanBliznicenko commented Sep 2, 2022

@guillep Unfortunately it seems so. It is random, so MUCH less often than before, but it used to be completely ok before all these GitHub-related problems started.
This is another one, from today: https://github.com/OpenPonk/fsm-editor/runs/8152818612?check_suite_focus=true

GitHub
Finite-state machine diagrams for OpenPonk tool. Contribute to OpenPonk/fsm-editor development by creating an account on GitHub.

@tesonep
Copy link
Collaborator

tesonep commented Sep 2, 2022

Hi @JanBliznicenko, to minimize the noise maybe a good alternative is to put as preInstall script:

Iceberg remoteTypeSelector: #httpsUrl

In this way it will just try to use HTTPS and don't try to use SSH. Now, it tries with SSH and if it fails retries with HTTPS.
I am not sure if that will work better, but it will reduce the noise in the error.

@Rinzwind
Copy link
Contributor

Rinzwind commented Sep 2, 2022

Might be useful to someone else: to avoid the problem on our Jenkins server, which uses Debian, I now extended our build scripts to install the package ‘libgit2-1.1’, and to apply a patch to smalltalkCI like the one given below. The output then shows LGitLibrary uniqueInstance version = #(1 1 0).

diff --git a/pharo/run.sh b/pharo/run.sh
index c35c456..6dde247 100644
--- a/pharo/run.sh
+++ b/pharo/run.sh
@@ -334,6 +334,25 @@ pharo::run_script() {
 # Load project into Pharo image.
 ################################################################################
 pharo::load_project() {
+  pharo::run_script "
+    LGitLibrary compile: 'unix64LibraryName
+
+      \"Patched to try libgit2.so.1.1 first, see: https://github.com/pharo-project/pharo/issues/11481\"
+
+      ^ FFIUnix64LibraryFinder findAnyLibrary: #(
+        ''libgit2.so.1.1''
+        \"This name is wrong, but some versions of the VM has this library shipped with the bad name\"
+        ''libgit2.1.0.0.so''
+        ''libgit2.so.1.0.0''
+        ''libgit2.so.1.0''
+        ''libgit2.so.1.2''
+        ''libgit2.so.0.25.1'')'.
+    Smalltalk snapshot: true andQuit: true
+  "
+  pharo::run_script "
+    Transcript show: 'LGitLibrary uniqueInstance version = ' , LGitLibrary uniqueInstance version asString; cr.
+    Smalltalk snapshot: true andQuit: true
+  "
   pharo::run_script "
     | smalltalkCI |
     $(conditional_debug_halt)

@JanBliznicenko
Copy link
Contributor Author

JanBliznicenko commented Sep 4, 2022

Hi @JanBliznicenko, to minimize the noise maybe a good alternative is to put as preInstall script:

Iceberg remoteTypeSelector: #httpsUrl

In this way it will just try to use HTTPS and don't try to use SSH. Now, it tries with SSH and if it fails retries with HTTPS. I am not sure if that will work better, but it will reduce the noise in the error.

Yes, that looks much better now, I have been postponing doing something with those warnings for years and it is actually simpler than I thought :)

Unfortunately, it does not solve the problem I have. It seems really Mac-only now. It happens for me about 50 % of the time.
https://github.com/OpenPonk/class-editor/runs/8175042543?check_suite_focus=true
https://github.com/OpenPonk/fsm-editor/runs/8175046734?check_suite_focus=true
https://github.com/OpenPonk/OpenPonk-BPMN/runs/8175061785?check_suite_focus=true

@guillep
Copy link
Member

guillep commented Sep 6, 2022

@pablo is working on issue #1612, to retry cloning automatically if there is a connection problem.
That should be (hopefully) the last issue required here, at least for some time :).

@Rinzwind
Copy link
Contributor

Rinzwind commented Sep 6, 2022

@guillep I’m not sure you linked to the right issue (1612 in this repository is a pull request, ‘add window tiling shortcuts’). Edit: I hadn’t noticed the right issue is actually mentioned right above your message (so: pharo-vcs/iceberg#1612).

One question I had here still: in LGitLibrary>>#unix64LibraryName, shouldn’t the library versions be ordered from highest to lowest? Otherwise, if say only libgit2.so.1.0 and libgit2.so.1.1 can be found, v1.0 is used while it would be better to use v1.1? In LGitLibrary>>#macLibraryName, less versions are given, but they are ordered from highest to lowest.

@guillep
Copy link
Member

guillep commented Sep 7, 2022

@guillep I’m not sure you linked to the right issue (1612 in this repository is a pull request, ‘add window tiling shortcuts’). Edit: I hadn’t noticed the right issue is actually mentioned right above your message (so: pharo-vcs/iceberg#1612).

Oups, yes, different repositories :) I'll fix the link

One question I had here still: in LGitLibrary>>#unix64LibraryName, shouldn’t the library versions be ordered from highest to lowest? Otherwise, if say only libgit2.so.1.0 and libgit2.so.1.1 can be found, v1.0 is used while it would be better to use v1.1? In LGitLibrary>>#macLibraryName, less versions are given, but they are ordered from highest to lowest.

Yes, I think so!

@tesonep
Copy link
Collaborator

tesonep commented Sep 7, 2022

I have integrated a fix for the OSX problem with the connection, it is integrated into P11, later we are going to do a release of Iceberg and integrated it into P10. This version should improve the problem with OSX, as it will retry if there is a network issue.

@guillep
Copy link
Member

guillep commented Oct 5, 2022

Hi all, is this problem finally fixed?

@badetitou
Copy link
Member

I still havbe problem with Pharo 9.
But everything is ok for Pharo 10

@badetitou
Copy link
Member

Hmmm.. I'll check again and send you the trace if one

@JanBliznicenko
Copy link
Contributor Author

@guillep All my builds use Pharo 10 and are on Win, Linux and Mac and all are completely fine lately, thank you.

@Rinzwind
Copy link
Contributor

Rinzwind commented Oct 6, 2022

I have not seen either of the two errors anymore (and the workaround for our build scripts has been removed from the scripts).

@guillep
Copy link
Member

guillep commented Oct 10, 2022

Thank you all!

@badetitou yes, a more concrete case would help, because

  • we have backported all fixes to Pharo9 and Pharo10
  • we have even made a new release on windows using the latest libgit, fixing several pre-existing issues on windows (related to proxies, https auth)

I'll close it. We can reopen a new case if needed.

@guillep guillep closed this as completed Oct 10, 2022
@JanBliznicenko
Copy link
Contributor Author

JanBliznicenko commented Oct 25, 2022

So, it seems sometimes I still get the error for my biggest projects. Like this: https://github.com/OpenPonk/class-editor/actions/runs/3321581366/jobs/5489424282
It is quite rare though, like once in 30 runs and it seems to happen only on Mac now.

GitHub
Contribute to OpenPonk/class-editor development by creating an account on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Critical To fix or review as soon as possible Type: Bug
Projects
None yet
Development

No branches or pull requests

10 participants