Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] conan install sometimes fails offline, despite all recipes/packages in cache #15339

Closed
spiderkeys opened this issue Dec 23, 2023 · 28 comments · Fixed by #15516
Closed

[bug] conan install sometimes fails offline, despite all recipes/packages in cache #15339

spiderkeys opened this issue Dec 23, 2023 · 28 comments · Fixed by #15516
Assignees
Milestone

Comments

@spiderkeys
Copy link

spiderkeys commented Dec 23, 2023

Environment details

  • Operating System+version: Ubuntu 22.04 x86_64
  • Compiler+version: g++11
  • Conan version: 2.0.15
  • Python version: 3.10

We are using a private artifactory instance as our conan remote, with conan-center removed from the remotes list.

Steps to reproduce

Reproduction:
I'm still not sure how to reproduce this. Sometimes it happens, and sometimes it doesn't.

The only step I take on my end that causes it to occur is to disconnect my machine from the internet.

Notes:

  • All recipes and binaries are definitely in my local cache (due to earlier builds succeeding while online)
  • "--update" is not passed as an arg to conan install, so it shouldn't be reaching out to any remotes
  • Passing -nr while offline (and while this error is occurring) allows for a successful build, also proving the recipes/package binaries are available
  • The project build does succeed when -nr is passed.
  • My conan install invocation (with -vvv passed in) is contained within the logs attached.
  • The is one log for invoking with -nr and one without (and contains the error output)

Logs

install_log_with_nr.txt
install_log_without_nr.txt

Related slack thread:
https://cpplang.slack.com/archives/C41CWV9HA/p1703140106967259

@spiderkeys
Copy link
Author

conanfile.py doesn't contain anything exotic. Note, a good number of deps are custom recipes within our artifactory instance.

from conan import ConanFile
from conan.tools.cmake import CMake, cmake_layout
from conan.tools.cmake import CMakeDeps
from conan.tools.files import copy

import os

class TestProjectConan(ConanFile):
    name = "test_project"
    settings    = "os", "compiler", "build_type", "arch"
    generators  = "CMakeToolchain", "CMakeDeps", "VirtualRunEnv"

    options = {
        "with_option": [True, False],
    }

    default_options = {
        "with_option": False
    }


    def layout(self):
        cmake_layout(self)

    def generate(self):
        conanlibs_path = os.path.join( self.build_folder, "thirdparty" )
        for dep in self.dependencies.values():
            for libdir in dep.cpp_info.libdirs:
                copy(self, "*.so", libdir, conanlibs_path )
                copy(self, "*.so.*", libdir, conanlibs_path )

    def requirements(self):
        self.requires("connextdds/7.2.0")
        self.requires("rtiddsgen/7.2.0")
        self.requires("cli11/2.3.2")
        self.requires("asio/1.24.0")
        self.requires("concurrentqueue/1.0.3")
        self.requires("readerwriterqueue/1.0.6")
        self.requires("nlohmann_json/3.11.2")
        self.requires("spdlog/1.11.0")
        self.requires("imgui/1.89.9-docking")
        self.requires("implot/0.16-docking")
        self.requires("glfw/3.3.8")
        self.requires("boost/1.81.0")
        self.requires("yaml-cpp/0.8.0")
        self.requires("eigen/3.4.0")
        self.requires("mavlink_v2/2.0.0-mr")
        self.requires("woodall_serial/1.2.3")
        self.requires("hfsm2/2.3.2")
        self.requires("threepp/0.0.1-mr")

        if self.options.with_option:
            self.requires( "ffmpeg/5.0.3-mr" )
            self.requires( "sdl/2.26.1" )
            self.requires( "intel-media-driver/22.6.6-dev")
            self.requires( "nvidia-vaapi-driver/0.0.9-dev" )

@memsharded memsharded added this to the 2.1 milestone Dec 23, 2023
@memsharded
Copy link
Member

Hi @spiderkeys

Thanks for your report.

I have been trying to reproduce again for a while without success.
I have also been trying to understand the traceback when it fails, from it I understand that:

  • The _evaluate_download() is being called, this happens when package is not in the local cache
  • For this to be called, a previous call to cache_latest_prev = self._cache.get_latest_package_reference(node.pref) needs to return None as if that package-reference was not installed in the cache.
  • But posterior calls with -nr shows that this is in the cache

A possibility to explore is that this could be related to some sqlite3 DB corruption or similar issue. Some quick questions:

  • Is it deterministic when it fails, it fails at every attempt, or it might eventually work, or fail randomly?
  • When you execute it with -nr and then execute it again without it, will it fail again?
  • Does it fail in different Ubuntu machines, or is it always the same physical machine? Does it work always fine in other platforms/OSs?
  • Any special Python installation detail in that Ubuntu? Specifically anything related to python sqlite

If these ideas don't get us to something, then maybe I could propose some patch with extra traces, so if you could run with those traces, we could understand something more about the issue.

@spiderkeys
Copy link
Author

spiderkeys commented Dec 24, 2023

Hi @memsharded ,

I'll try to answer your questions first, and then I've also found some form of clue/culprit that changes whether the install succeeds or not.

  • The failure is pretty deterministic: if it is currently failing and no actions are taken other than to repeatedly try to run conan install, it will continuously fail. I don't know what action I took to make it start working again for a little while, nor what action I took to make it start failing again, but it was probably some combination of cleaning local cmake folders, conan caches, etc.
  • Yes, it will fail after previously executing a successful -nr run
  • Yes, it has failed for some of my other team members as well, though they are similarly on Ubuntu 22.04 and we use ansible to set up our environments to be fairly identical, with respect to conan
  • I personally install python via pyenv, but my other colleague who has encountered the issue does not and just uses apt-installed python

Just now I tried to pare down the problem a bit. First, I found that the error can be encountered simply by running:

conan graph info ./src/conanfile.py

With -vvv passed in, it fails at the same _evaluate_download() method that you identified.
From here, I took a couple of steps to try and clean up everything:

rm -rf src/build/
conan cache clean -vvv -s -b -d -t "*"
conan remove -c "*"

# ... reconnect to internet, do a successful conan install (so all packages get downloaded again), disconnect again
conan graph info ./src/conanfile.py
# error occurs

Next, I tried to reduce the conanfile.py to bare bones to see if it might be a particular package causing the issue.

  • I commented out everything except self.requires("nlohmann_json/3.11.2")
  • I ran the conan graph command again
  • The command succeeded

From there, I gradually added self.requires() back in for each original dependency.

  • Once I got to uncommenting self.requires("boost/1.81.0"), the command failed again
  • I commented out all requires except boost, and the command failed
  • I commented out boost and uncommented nlohmann_json again and the command succeeded.

So maybe it has something to do with the boost package? I acknowledge it could still be some issue with my local conan sqlite db relating to boost, but hopefully this can provide some additional insight.

@spiderkeys
Copy link
Author

One more clue: boost and sdl2 are the only recipes in my conan package graph that specify dependencies with a version range, so far as I can tell:

Resolved version ranges
    openssl/[>=1.1 <4]: openssl/3.2.0
    zlib/[>=1.2.11 <2]: zlib/1.3

I now find that either package causes the install to fail, and if they are not specified as requires (and thus nothing in my graph has a version range to evaluate), the install succeeds.

@memsharded
Copy link
Member

Thanks very much for the deep investigation and insights.
I will check it again trying to focus on the transitive version-ranges, that might be a good clue.

@memsharded
Copy link
Member

No success.
I think it would be good to add some extra traces in the conan execution, lets see if we get some further info (though the related code seems quite narrow, no idea yet why this is happening...)

Something like this:

diff --git a/conan/internal/cache/db/cache_database.py b/conan/internal/cache/db/cache_database.py                       
index fffa02486..e48cb135e 100644                                                                                        
--- a/conan/internal/cache/db/cache_database.py                                                                          
+++ b/conan/internal/cache/db/cache_database.py                                                                          
@@ -28,6 +28,7 @@ class CacheDatabase:                                                                                   
                                                                                                                         
     def get_latest_package_reference(self, ref):                                                                        
         prevs = self.get_package_revisions_references(ref, True)                                                        
+        ConanOutput().info(f"CacheDatabase.get_latest_package_reference {ref}: obtained {prevs}")                       
         return prevs[0] if prevs else None                                                                              
                                                                                                                         
     def update_recipe_timestamp(self, ref):                                                                             
diff --git a/conans/client/graph/graph_binaries.py b/conans/client/graph/graph_binaries.py                               
index db3a9e78c..47a0e3123 100644                                                                                        
--- a/conans/client/graph/graph_binaries.py                                                                              
+++ b/conans/client/graph/graph_binaries.py                                                                              
@@ -197,12 +197,17 @@ class GraphBinariesAnalyzer(object):                                                               
             return                                                                                                      
                                                                                                                         
         # Obtain the cache_latest valid one, cleaning things if dirty                                                   
+        node.conanfile.output.info(f"Checking if {node} is in the cache")                                               
         while True:                                                                                                     
+            node.conanfile.output.info(f"Checking DB for {node.pref}")                                                  
             cache_latest_prev = self._cache.get_latest_package_reference(node.pref)                                     
+            node.conanfile.output.info(f"Obtained latest_prev for {node.pref}: {cache_latest_prev}")                    
             if cache_latest_prev is None:                                                                               
                 break                                                                                                   
             package_layout = self._cache.pkg_layout(cache_latest_prev)                                                  
+            node.conanfile.output.info(f"Checking if {node.pref} is dirty {cache_latest_prev}")                         
             if not self._evaluate_clean_pkg_folder_dirty(node, package_layout):                                         
+                node.conanfile.output.info(f"Checked {node.pref} not dirty")                                            
                 break                                                                                                   
                                                                                                                         
         if node.conanfile.upload_policy == "skip":                                                                      
@@ -215,8 +220,10 @@ class GraphBinariesAnalyzer(object):                                                                
             else:                                                                                                       
                 node.binary = BINARY_MISSING                                                                            
         elif cache_latest_prev is None:  # This binary does NOT exist in the cache                                      
+            node.conanfile.output.info(f"Cache latest_prev is None for {node.pref}, evaluate download")                 
             self._evaluate_download(node, remotes, update)                                                              
         else:  # This binary already exists in the cache, maybe can be updated                                          
+            node.conanfile.output.info(f"Cache latest_prev is not None for {node.pref}, checking if in cache")          
             self._evaluate_in_cache(cache_latest_prev, node, remotes, update)                                           

Do you think it would be possible to add those messages in your code and run again? Would you like a patch file, a branch in the repo and running from the branch (from source, with pip install -e . is usually convenient)? Let me know, I am really puzzled now by this 😅 I need to know what is happening!

@spiderkeys
Copy link
Author

A branch to pip install would be good!

@spiderkeys
Copy link
Author

I manually made the logging changes above locally and this is what I get for a conanfile.py that specifies SDL as a requirement, first without -nr and then with -nr.

Without -nr (command fails):

conan graph info ./src/conanfile.py

======== Computing dependency graph ========
Graph root
    conanfile.py (nadir_os/None): /home/spiderkeys/z/workspace/src/conanfile.py
Requirements
    libiconv/1.17#80da49fca2bee7160307d39a19feff20 - Cache
    libunwind/1.7.2#ff9d8d1b58dc3088f08815185905550f - Cache
    sdl/2.26.1#af76f51b1fadaffaa2fad915d90e9612 - Cache
    xz_utils/5.4.5#56e3eb071afe0b5021eaff18b9a6c193 - Cache
    zlib/1.3#a422e59075096a9f747cf7566fade8f3 - Cache
Build requirements
    meson/1.2.2#cbb4b8d147762c87b0f93927d8027136 - Cache
    ninja/1.11.1#de85db1490cc53d3353a536dadf1db50 - Cache
    pkgconf/2.1.0#9e71750d481accfc87b678b470a32956 - Cache
Resolved version ranges
    zlib/[>=1.2.11 <2]: zlib/1.3

======== Computing necessary packages ========
libiconv/1.17: Checking if libiconv/1.17 is in the cache
libiconv/1.17: Checking DB for libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205
CacheDatabase.get_latest_package_reference libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205: obtained [libiconv/1.17#80da49fca2bee7160307d39a19feff20:b647c43bfefae3f830561ca202b6cfd935b56205#fbd617c1c877b4e09c958d977c165986%1702487512.493]
libiconv/1.17: Obtained latest_prev for libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205: libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205
libiconv/1.17: Checking if libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205 is dirty libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205
libiconv/1.17: Checked libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205 not dirty
libiconv/1.17: Cache latest_prev is not None for libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205, checking if in cache
ninja/1.11.1: Checking if ninja/1.11.1 is in the cache
ninja/1.11.1: Checking DB for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a
CacheDatabase.get_latest_package_reference ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: obtained []
ninja/1.11.1: Obtained latest_prev for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: None
ninja/1.11.1: Cache latest_prev is None for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a, evaluate download
ERROR: HTTPSConnectionPool(host='ourprivateartifactory.jfrog.io', port=443): Max retries exceeded with url: /artifactory/api/conan/private-conan/v1/ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f5c92df6860>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Unable to connect to private-conan=https://ourprivateartifactory.jfrog.io/artifactory/api/conan/private-conan
1. Make sure the remote is reachable or,
2. Disable it by using conan remote disable,
Then try again.

Output relating to ninja with -nr (the command succeeds):

libiconv/1.17: Cache latest_prev is not None for libiconv/1.17:b647c43bfefae3f830561ca202b6cfd935b56205, checking if in cache
ninja/1.11.1: Checking if ninja/1.11.1 is in the cache
ninja/1.11.1: Checking DB for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a
CacheDatabase.get_latest_package_reference ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: obtained []
ninja/1.11.1: Obtained latest_prev for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: None
ninja/1.11.1: Cache latest_prev is None for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a, evaluate download
ninja/1.11.1: Compatible package ID 3593751651824fb813502c69c971267624ced41a equal to the default package ID: Skipping it.
xz_utils/5.4.5: Checking if xz_utils/5.4.5 is in the cache

@spiderkeys
Copy link
Author

Ok so digging deeper, here is the code path that I see getting triggered differently when offline:

The various pieces of code that I added more instrumentation to:

    def _process_node(self, node, build_mode, remotes, update):
        # ...
        if node.conanfile.upload_policy == "skip":
            # Download/update shouldn't be checked in the servers if this is "skip-upload"
            # The binary can only be in cache or missing.
            if cache_latest_prev:
                conanfile.output.info(f"process F")
                node.binary = BINARY_CACHE
                node.prev = cache_latest_prev.revision
            else:
                conanfile.output.info(f"process G")
                node.binary = BINARY_MISSING
        elif cache_latest_prev is None:  # This binary does NOT exist in the cache
            conanfile.output.info(f"process H")
            node.conanfile.output.info(f"Cache latest_prev is None for {node.pref}, evaluate download")
            self._evaluate_download(node, remotes, update)
        else:  # This binary already exists in the cache, maybe can be updated
            conanfile.output.info(f"process I")
            node.conanfile.output.info(f"Cache latest_prev is not None for {node.pref}, checking if in cache")
            self._evaluate_in_cache(cache_latest_prev, node, remotes, update)

        # The INVALID should only prevail if a compatible package, due to removal of
        # settings in package_id() was not found
        if node.binary in (BINARY_MISSING, BINARY_BUILD):
            conanfile.output.info(f"process J")
            if node.conanfile.info.invalid and node.conanfile.info.invalid[0] == BINARY_INVALID:
                conanfile.output.info(f"process K")
                node.binary = BINARY_INVALID
    
    # ...

    def _evaluate_download(self, node, remotes, update):
        output = node.conanfile.output
        try:
            output.info("Download A")
            self._get_package_from_remotes(node, remotes, update)
            output.info("Download ~A")
        except NotFoundException:
            output.info("Download B")
            node.binary = BINARY_MISSING
        else:
            output.info("Download C")
            node.binary = BINARY_DOWNLOAD
        output.info("Download D")
    
    # ...

    # check through all the selected remotes:
    # - if not --update: get the first package found
    # - if --update: get the latest remote searching in all of them
    def _get_package_from_remotes(self, node, remotes, update):
        results = []
        pref = node.pref
        for r in remotes:
            try:
                info = node.conanfile.info
                latest_pref = self._remote_manager.get_latest_package_reference(pref, r, info)
                results.append({'pref': latest_pref, 'remote': r})
                if len(results) > 0 and not update:
                    break
            except NotFoundException:
                pass

        if not remotes and update:
            node.conanfile.output.warning("Can't update, there are no remotes defined")

        if len(results) > 0:
            remotes_results = sorted(results, key=lambda k: k['pref'].timestamp, reverse=True)
            result = remotes_results[0]
            node.prev = result.get("pref").revision
            node.pref_timestamp = result.get("pref").timestamp
            node.binary_remote = result.get('remote')
        else:
            node.binary_remote = None
            node.prev = None
            raise PackageNotFoundException(pref)

With -nr, I get:

ninja/1.11.1: Eval node
ninja/1.11.1: Checking if ninja/1.11.1 is in the cache
ninja/1.11.1: Checking DB for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a
CacheDatabase.get_latest_package_reference ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: obtained []
ninja/1.11.1: Obtained latest_prev for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: None
ninja/1.11.1: process H
ninja/1.11.1: Cache latest_prev is None for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a, evaluate download
ninja/1.11.1: Download A
ninja/1.11.1: Download B
ninja/1.11.1: Download D
ninja/1.11.1: process J
ninja/1.11.1: post process A
ninja/1.11.1: Compatible package ID 3593751651824fb813502c69c971267624ced41a equal to the default package ID: Skipping it.

Without -nr I get:

ninja/1.11.1: Eval node
ninja/1.11.1: Checking if ninja/1.11.1 is in the cache
ninja/1.11.1: Checking DB for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a
CacheDatabase.get_latest_package_reference ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: obtained []
ninja/1.11.1: Obtained latest_prev for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a: None
ninja/1.11.1: process H
ninja/1.11.1: Cache latest_prev is None for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a, evaluate download
ninja/1.11.1: Download A
ERROR: HTTPSConnectionPool

From the above, it looks to me like what is happening is this line is throwing an unhandled exception:

latest_pref = self._remote_manager.get_latest_package_reference(pref, r, info)

With -nr passed, the size of the remotes list is empty, so this line is never triggered, and we go on to happily use whatever available compatible package is currently in the cache.

@spiderkeys
Copy link
Author

If I change the exception handling to no-op on any exception, instead of just NotFoundException, the offline issue goes away:

def _get_package_from_remotes(self, node, remotes, update):
        results = []
        pref = node.pref
        for r in remotes:
            try:
                info = node.conanfile.info
                latest_pref = self._remote_manager.get_latest_package_reference(pref, r, info)
                results.append({'pref': latest_pref, 'remote': r})
                if len(results) > 0 and not update:
                    break
            except:
                pass

Of course, this may not be the desired behavior. Maybe there is some group of connection-related exceptions that could be caught here explicitly.

@spiderkeys
Copy link
Author

spiderkeys commented Dec 26, 2023

This works and seems like perhaps the better way to check for the connection-specific error:

            try:
                info = node.conanfile.info
                latest_pref = self._remote_manager.get_latest_package_reference(pref, r, info)
                results.append({'pref': latest_pref, 'remote': r})
                if len(results) > 0 and not update:
                    break
            except ConanConnectionError:
                node.conanfile.output.warning("Failed to connect to remote - will evaluate packages in local cache")
            except NotFoundException:
                pass

@memsharded
Copy link
Member

Ok, thanks very much for your detailed feedback and investigation again.
I think I start to understand what is happening.

The key is the line

ninja/1.11.1: Cache latest_prev is None for ninja/1.11.1:3593751651824fb813502c69c971267624ced41a, evaluate download

Appearing in both cases, the successful -nr one and the failing one. This give a hint in another completely different direction to the one that I was thinking before.

I'll try to summarize:

  • The ninja requirement is a tool-require, only really needed for building from source some packages
  • ninja package binary is not installed in your cache
  • The computation of the package binaries for installation is as follow, assuming that we start with a full dependency graph (recipes and versions):
    • First step in the computation of existing packages is checking what binaries exist, with what revisions+package-id. This computation happens from the leaves (packages without dependencies) to the root (the user conanfile). This is done this way, because the package-ids of dependencies can in turn affect the package-id of the consumers.
    • The ninja binary is not in the cache, so Conan checks for it in the servers. Depending on the ninja binary found, it is possible to compute different package-ids (for example if binary compatibility is triggered). The ninja resulting package_id could be used by other consumers that decide to depend on the specific ninja binary
    • When all the graph binaries and package-ids have been computed, depending also on the --build arguments, Conan can decide that it really doesn't need some binaries (mark them as Skip), and avoid doing an actual download of the binaries
    • It is impossible to know this in advance in the general case, as it would be a chicken-and-egg problem, it is necessary to know what needs to be built to know what can be skipped, and for that it is first necessary to compute the package_ids that will be used.

So I am now inclined to think that this is expected behavior. If you force a download of ninja binary to your cache, you will not see this error message again, and it will work both with -nr and without it, and Conan will not even try to reach any remote.

Or with another perspective, if something changed in some of your local recipes and you change the build_mode of its dependencies including ninja, then the conan install --build=missing will fail to work offline, because ninja will not be in the cache, and in theory it is the same case as above, the necessary dependencies resources are not fully there.

Please let me know if this explanation helps a bit to understand the current behavior.

I will reproduce the scenario in my test. Skipping all remote exceptions is not a possibility, hiding connection problems with the remotes is something that needs to be avoided, because it can silently produce confusing behavior (like after uploading a new version to the server, but CIs not picking it, but not complaining either, even if the URL to the remote was incorrect or some network error)

@memsharded
Copy link
Member

This test reproduces it:

def test_info_not_hit_server2():
    """
    https://github.com/conan-io/conan/issues/15339
    """
    c = TestClient(default_server_user=True)
    c.save({"tool/conanfile.py": GenConanfile("tool", "0.1"),
            "math/conanfile.py": GenConanfile("math", "0.1").with_tool_requires("tool/0.1"),
            "app/conanfile.py": GenConanfile("app", "0.1").with_requires("math/0.1")})
    c.run("create tool")
    c.run("create math")
    c.run("install app")
    c.run("upload * -r=default -c")
    c.run("remove * -c")
    c.run("install app")
    assert "Downloaded" in c.out
    c.run("cache clean -s -b -d -t *")
    # break the server to make sure it is not being contacted at all
    c.servers["default"] = None
    c.run("graph info app", assert_error=True)
    assert "ERROR: 'NoneType' object has no attribute 'fake_url'. [Remote: default]" in c.out

    c.run("graph info app -nr")
    assert re.search(r"Skipped binaries(\s*)tool/0.1", c.out)

@spiderkeys
Copy link
Author

spiderkeys commented Dec 27, 2023

Thanks for the detailed description of what is happening. This makes some sense to me, in terms of why it is happening, from a mechanical perspective.

That said, I'm still a little confused about the current design being expected/desirable from a user experience perspective.

  • When all the graph binaries and package-ids have been computed, depending also on the --build arguments, Conan can decide that it really doesn't need some binaries (mark them as Skip), and avoid doing an actual download of the binaries
  • It is impossible to know this in advance in the general case, as it would be a chicken-and-egg problem, it is necessary to know what needs to be built to know what can be skipped, and for that it is first necessary to compute the package_ids that will be used.

While I think I generally follow that in this case ninja is a tool requires and its binary isn't present in the cache because it was determined to not be needed (because packages were downloaded and not built), how is it the case that the install can proceed with -build=missing still set in combination with -nr? I don't think -nr provides you any additional information to solve the chicken-and-egg problem, compared with a run that did not pass -nr. The only difference is that a remote is not reachable, so no new information is gained. It seems like a potentially common scenario where all dependencies can be downloaded from CCI or a user's private artifactory without requiring a build, the user can then go offline, and then this issue will arise (assuming any dependency in their graph specified a tool_requires().

In this scenario, why is an install that succeeds with -nr (which is determining a binary for ninja is not necessary because no local packages need to be rebuilt using it) any more or less correct than an install that succeeds while ignoring remote connectivity issues for a given remote? I guess another way to phrase this is why should the -nr run ever succeed, if we are say it is necessary to successfully reach out to a remote to correctly compute the complete dependency graph? Note, that my default intuition/expectations about the situation is completely reverse of the previous question - I would expect that, without any local changes to the cache/db or recipe/options, any install/graph info query previously run with a successful outcome could be run again with the same conclusion while offline, since nothing has changed.

An explanation of the above withstanding, for our user experience, I think I would prefer to not have situations where tool_requires() package builds/downloads are ever skipped and are always ensured to be present in the host's cache, if only to avoid this specific offline build error. Your suggestion to force a download of the tool package seems like the best way to achieve this. Is there a way to do this within the invocation of conan install, assuming the user is online and can thus build/download the tool package during a successful exchange with the remote?

Because these tool_requires are transitive, it would be nice if there was a blanket method to making sure they make it into the local cache, rather than manually hunting down and installing/building each one.

@memsharded
Copy link
Member

While I think I generally follow that in this case ninja is a tool requires and its binary isn't present in the cache because it was determined to not be needed (because packages were downloaded and not built), how is it the case that the install can proceed with -build=missing still set in combination with -nr? I don't think -nr provides you any additional information to solve the chicken-and-egg problem, compared with a run that did not pass -nr. The only difference is that a remote is not reachable, so no new information is gained.

The behavior is different, if the package_id binary cannot be found in any remote, it will proceed with the current value, as if it existed. And if that value can finally be skipped, then good. But if the remote is available and not disabled, Conan will try to look for the existence of that package_id, because it can compute things better if it knows if is available or not, like for example falling back to some other compatible binary, but that in turn can affect the resulting binaries of the consumers.

Because these tool_requires are transitive, it would be nice if there was a blanket method to making sure they make it into the local cache, rather than manually hunting down and installing/building each one.

There is already a conf for that, just defining tools.graph:skip_binaries=False will force the download of the binaries.

We didn't select this as the default, because many users were complaining about the extra transfer time, storage (and costs), so the current Conan default is try to avoid transfers if possible

@memsharded
Copy link
Member

An explanation of the above withstanding, for our user experience, I think I would prefer to not have situations where tool_requires() package builds/downloads are ever skipped and are always ensured to be present in the host's cache, if only to avoid this specific offline build error.

In any case, the intended and designed Conan logic is that "offline remotes" must be managed explicitly, not implicitly, and -nr is a convenience to avoid temporary disabling of remotes by users, not as a general strategy. If at some point some remotes are offline, the recommended approach is to explicitly inform Conan with conan remote disable. Otherwise remotes are expected to be available and connected, and not being able to reach them is considered an error, not skipped.

@spiderkeys
Copy link
Author

spiderkeys commented Dec 27, 2023

Thanks, glad to know that there is an existing setting for it. I think this ultimately addresses what I would want. I was typing out the below response, but I think it is ultimately achieved by having a user locally configure their development machine to use tools.graph:skip_binaries=False but configuring CI for True to make sure that the bandwidth efficiencies can be realized.


I guess one more way to express our need as a user is that we want to be able to run the same build script invocation without the user needing to manually specify whether they are offline or not.

In reference to your description of the graph computation process (numbered for reference):

  • The computation of the package binaries for installation is as follow, assuming that we start with a full dependency graph (recipes and versions):
    1. First step in the computation of existing packages is checking what binaries exist, with what revisions+package-id. This computation happens from the leaves (packages without dependencies) to the root (the user conanfile). This is done this way, because the package-ids of dependencies can in turn affect the package-id of the consumers.
    2. The ninja binary is not in the cache, so Conan checks for it in the servers. Depending on the ninja binary found, it is possible to compute different package-ids (for example if binary compatibility is triggered). The ninja resulting package_id could be used by other consumers that decide to depend on the specific ninja binary
    3. When all the graph binaries and package-ids have been computed, depending also on the --build arguments, Conan can decide that it really doesn't need some binaries (mark them as Skip), and avoid doing an actual download of the binaries
    4. It is impossible to know this in advance in the general case, as it would be a chicken-and-egg problem, it is necessary to know what needs to be built to know what can be skipped, and for that it is first necessary to compute the package_ids that will be used.

It feels to me like there should be some way (an argument, config setting, etc) to specify that in the process of step 2, if the remotes are not reachable, behave in the same way as if -nr was passed and check to see if what is available locally is enough to satisfy the dependency requirements. That said, if the remotes are reachable, they should be consulted to appropriately accommodate whatever step 3 decides needs to be done from a -build=x perspective. This would be more of a best-effort approach, such that the remotes will be consulted if they can be, but if not, the computation can proceed as if there were no remotes. I would probably never use this proposed argument on CI, where I would prefer it strictly be able to access and evaluate the latest info in the remote repo, but would use it in our build script locally to ensure that you will be successful if you have run successfully once in the past and just gone offline without other changes.

@memsharded
Copy link
Member

It feels to me like there should be some way (an argument, config setting, etc) to specify that in the process of step 2, if the remotes are not reachable, behave in the same way as if -nr was passed and check to see if what is available locally is enough to satisfy the dependency requirements. That said, if the remotes are reachable, they should be consulted to appropriately accommodate whatever step 3 decides needs to be done from a -build=x perspective. This would be more of a best-effort approach, such that the remotes will be consulted if they can be, but if not, the computation can proceed as if there were no remotes. I would probably never use this proposed argument on CI, where I would prefer it strictly be able to access and evaluate the latest info in the remote repo, but would use it in our build script locally to ensure that you will be successful if you have run successfully once in the past and just gone offline without other changes.

That could make sense, it is good feedback thanks. I was actually thinking of ways to improve this UX, this could be a possibility (I am still not fully discarding the possibility to have other automated flows for this)

@spiderkeys
Copy link
Author

👍 glad to help, and thanks again for helping understand what was going on. Feel free to close this or leave open as you see fit.

@memsharded
Copy link
Member

I am experimenting in #15516 the possibility of the behavior that you suggest above, and considering the possible risks. Not guaranteed to move forward, just trying at this moment

@memsharded
Copy link
Member

Finally #15516 added some clarifying error messages, and includes the -nr recommendation, but doing it automatically and not erroring had more risks than benefits.

@realbogart
Copy link

@memsharded We just had the same issue now when the internet connection went down at the company. We don't want to disable the remote using -nr since it should be required when packages are missing. But whevener everything is cached we want our developers to be able to work without a connection. Is this possible?

@memsharded
Copy link
Member

Hi @realbogart

@memsharded We just had the same issue now when the internet connection went down at the company. We don't want to disable the remote using -nr since it should be required when packages are missing. But whevener everything is cached we want our developers to be able to work without a connection. Is this possible?

That is the problem. Hiding or silencing connectivity problems and failing servers in the general case cannot be done, it is risky, and users could have obsolete builds and the liking without noticing in their CI just because of Conan not raising errors when a remote is unavailable.

For that reasons, when the internet is down or something like that, the alternatives are opt-in: using -nr, remote disable. These alternatives should be easy for developers if the internet is down, and they are shown in the error message.

Furthermore, it is important to highlight that when the packages are in the Conan cache, Conan will not even try to reach to the servers, so Conan can perfectly work fully offline without -nr or disabling the remotes, as long as the packages are installed in the cache. It is true that Conan will not always download all binaries if they are not really necessary, avoiding large transfers. If you want to force the download all with -c tools.graph:skip_binaries

@realbogart
Copy link

realbogart commented Feb 20, 2024

Hi @memsharded,

Thank you for the quick reply.

Furthermore, it is important to highlight that when the packages are in the Conan cache, Conan will not even try to reach to the servers, so Conan can perfectly work fully offline without -nr or disabling the remotes, as long as the packages are installed in the cache.

This is exactly what is not working for us. This is the command that we are trying to run:
conan install . -pr:a linux-clang-14-debug-x86_64

We also have all of the dependencies locked in conan.lock which gets picked up automatically.

All packages are in the cache and it works with the network enabled. We also have no -r or -u that would make it access the remote.

It fails like this:

ERROR: HTTPSConnectionPool(host='<hidden>', port=443): Max retries exceeded with url: /artifactory/api/conan/conan-local/v1/ping (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f4ddba158a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Unable to connect to artifactory-local=<hidden>/artifactory/api/conan/conan-local
1. Make sure the remote is reachable or,
2. Disable it by using conan remote disable,
Then try again.

(I hid our URL in the message above using <hidden>)

We are currently on Conan version 2.0.16.

@memsharded
Copy link
Member

memsharded commented Feb 20, 2024

This is exactly what is not working for us. This is the command that we are trying to run:

The typical situation is that you don't really have all the necessary packages installed in the cache.

If you could please run first that command with -c tools.graph:skip_binaries=False, then turn off the network and try again? If that still fails trying to connect the network, then that would need a separate new ticket, and we will have a look, because this shouldn't happen.

All packages are in the cache and it works with the network enabled. We also have no -r or -u that would make it access the remote.

The -r, if not specified, Conan uses by default the defined and enabled remotes. So still trying to connect them if some package binary is not in the cache but still necessary while computing the dependency graph, even if later it is marked as skip.

@realbogart
Copy link

Hi again, @memsharded.

I tried adding the argument you provided to my install command. This is the full command:
conan install . -pr:a linux-clang-14-debug-x86_64 -c tools.graph:skip_binaries=False

Every thing works with the network on. If we disable it, it fails with the same error message as before.

@memsharded
Copy link
Member

Every thing works with the network on. If we disable it, it fails with the same error message as before.

Ok, then let's open a new ticket, this sounds something different. Could you please create it?:

  • Make sure that it is reported against latest 2.1
  • Try to provide reproducible steps, like starting from a blank cache, configure remotes, do the installation of packages, etc.
  • Please provide the necessary files, conanfiles, profiles, etc. If it is possible to make it self-contained, like creating packages from conan new templates, that helps a lot.

Thanks very much.

@realbogart
Copy link

realbogart commented Feb 22, 2024

@memsharded While creating a smaller reproducible example, I managed to boil it down to a one-liner that fails with the network disabled:

conan install --requires libcurl/7.87.0@#3fa5ead82de3d84eb3bf43078a69c69b -pr:a <some_profile> -c tools.graph:skip_binaries=False -vvv

I created a ticket here:
#15739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants