Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between argopy.status() and status page #168

Closed
matdever opened this issue Jan 19, 2022 · 13 comments
Closed

Discrepancy between argopy.status() and status page #168

matdever opened this issue Jan 19, 2022 · 13 comments
Labels
invalid This doesn't seem right

Comments

@matdever
Copy link

There seems to be a discrepancy between the status return by the command argopy.status() and the status website https://argopy.statuspage.io/

MCVE Code Sample

import argopy
argopy.status()

Expected Output

Problem Description

The status webpage says all systems are operational but the the status return by argopy is "offline". When I try to load data using the datafetcher, it returns a "timeout" error too.

---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
<ipython-input-13-0d90945364fd> in <module>
----> 1 argo = ArgoDataFetcher().profile(WMO,1).to_xarray()

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.10 (default, Feb 26 2021, 10:16:00)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.6.1

argopy: 0.1.8
xarray: 0.18.2
scipy: 1.6.1
sklearn: 0.22.1
netCDF4: 1.5.7
dask: 2021.03.0
toolz: 0.11.1
erddapy: 1.1.1
fsspec: 0.8.3
gsw: 3.4.0
aiohttp: 3.7.4.post0
bottleneck: 1.3.2
cartopy: None
cftime: 1.5.1.1
conda: 4.11.0
distributed: 2021.03.0
IPython: 7.21.0
iris: None
matplotlib: 3.3.4
nc_time_axis: None
numpy: 1.19.2
pandas: 1.2.3
packaging: 20.9
pip: 21.2.2
PseudoNetCDF: None
pytest: 6.2.2
seaborn: 0.11.1
setuptools: 52.0.0.post20210125
sphinx: 3.5.2
zarr: None

@gmaze gmaze added the invalid This doesn't seem right label Jan 19, 2022
@gmaze
Copy link
Member

gmaze commented Jan 19, 2022

It's hard to tell what's going on

The status webpage is triggered every 5 minutes, but considering delays for the monitoring tool to run and for the information to propagate through all online systems, you may find a lag between your own argopy.status() and the webpage.

I would say that the webpage reports for the API status with about 5-10 minutes lag time.
The one information reliable is argopy.status(), since it refreshes every 1 sec by default, and directly check for the API status

We should observe discrepancies between the webpage and your argopy.status() when API downtime are no longer than about 5 minutes, which I guess can happen regularly

@matdever
Copy link
Author

Thanks @gmaze. I just cannot figure out what is going on. It has been happening for days. I tried a bunch of different things (e.g. restart computer, kernel, etc) but I always have the same result.

The only solution I have found for now is to rsync the whole argo repository and rely on the localftp option... but it is not ideal given the size of the cargo repository (that first sync is LONG!)

@gmaze
Copy link
Member

gmaze commented Jan 20, 2022

This is indeed not satisfactory at all, we shall find a solution for a better experience

do you mean that you can't actually fetch data from the default 'erddap' data source ?

@matdever
Copy link
Author

Yes, I have not been able to fetch data since the erddap status has gone offline. and I cannot use argovis as I'm looking for RBR CTD data (which is not flagged "1" yet)

But if I'm the only one experiencing this issue, then I know to troubleshoot locally. I'll investigate some more on what could be wrong.

@gmaze
Copy link
Member

gmaze commented Jan 20, 2022

are you sure you're not behind some firewall or proxy ?

in this case, you may want to set the trust_env argopy option to True, in order to let our file system to use your local environment variables setting to connect to the internet

argopy.set_options(trust_env=True)

This will let argopy get proxies information from HTTP_PROXY / HTTPS_PROXY environment variables. Also it can get proxy credentials from ~/.netrc file if present.

@gmaze
Copy link
Member

gmaze commented Jan 20, 2022

you may wan to try:

import argopy
with argopy.set_options(trust_env=True):
    fs = argopy.stores.httpstore()
    uri = "https://github.com/euroargodev/argopy-data/raw/master/ftp/dac/csiro/5900865/5900865_prof.nc"
    ds = fs.open_dataset(uri)

@matdever
Copy link
Author

thank you @gmaze for taking the time to help.

the snippet of code you provided works well for me, I get all the data loaded. This code , on the other hand, returns the timeout error.

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argopy.set_options(mode='expert')
argopy.set_options(src='erddap')
argopy.set_options(trust_env=True)
with argopy.set_options(trust_env=True):
    fs = argopy.stores.httpstore()
    uri = "https://github.com/euroargodev/argopy-data/raw/master/ftp/dac/csiro/5900865/5900865_prof.nc"
    ds = fs.open_dataset(uri)
    
argo = ArgoDataFetcher().profile(6903754,1).to_xarray()
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
<ipython-input-8-71dd2dafda3d> in <module>
      9     ds = fs.open_dataset(uri)
     10 
---> 11 argo = ArgoDataFetcher().profile(6903754,1).to_xarray()
     12 argo

/opt/anaconda3/lib/python3.7/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
    373                 % ",".join(self.Fetchers.keys())
    374             )
--> 375         xds = self.fetcher.to_xarray(**kwargs)
    376         xds = self.postproccessor(xds)
    377         return xds

/opt/anaconda3/lib/python3.7/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self, errors)
    417         if not self.parallel:
    418             if len(self.uri) == 1:
--> 419                 ds = self.fs.open_dataset(self.uri[0])
    420             else:
    421                 ds = self.fs.open_mfdataset(

/opt/anaconda3/lib/python3.7/site-packages/argopy/stores/filesystems.py in open_dataset(self, url, *args, **kwargs)
    385         # with self.fs.open(url) as of:
    386         #     ds = xr.open_dataset(of, *args, **kwargs)
--> 387         data = self.fs.cat_file(url)
    388         ds = xr.open_dataset(data, *args, **kwargs)
    389         if "source" not in ds.encoding:

/opt/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py in wrapper(*args, **kwargs)
    116     def wrapper(*args, **kwargs):
    117         self = obj or args[0]
--> 118         return maybe_sync(func, self, *args, **kwargs)
    119 
    120     return wrapper

/opt/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py in maybe_sync(func, self, *args, **kwargs)
     95         if inspect.iscoroutinefunction(func):
     96             # run the awaitable on the loop
---> 97             return sync(loop, func, *args, **kwargs)
     98         else:
     99             # just call the blocking function

/opt/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py in sync(loop, func, callback_timeout, *args, **kwargs)
     66     if error[0]:
     67         typ, exc, tb = error[0]
---> 68         raise exc.with_traceback(tb)
     69     else:
     70         return result[0]

/opt/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py in f()
     50             if callback_timeout is not None:
     51                 future = asyncio.wait_for(future, callback_timeout)
---> 52             result[0] = await future
     53         except Exception:
     54             error[0] = sys.exc_info()

/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/http.py in _cat_file(self, url, **kwargs)
    150         kw.update(kwargs)
    151         logger.debug(url)
--> 152         async with self.session.get(url, **kw) as r:
    153             if r.status == 404:
    154                 raise FileNotFoundError(url)

/opt/anaconda3/lib/python3.7/site-packages/aiohttp/client.py in __aenter__(self)
   1115 
   1116     async def __aenter__(self) -> _RetType:
-> 1117         self._resp = await self._coro
   1118         return self._resp
   1119 

/opt/anaconda3/lib/python3.7/site-packages/aiohttp/client.py in _request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx, read_bufsize)
    617                         continue
    618 
--> 619                     break
    620 
    621             # check response status

/opt/anaconda3/lib/python3.7/site-packages/aiohttp/helpers.py in __exit__(self, exc_type, exc_val, exc_tb)
    654 
    655         if exc_type is asyncio.CancelledError and self._cancelled:
--> 656             raise asyncio.TimeoutError from None
    657         return None
    658 

TimeoutError: 

@gmaze
Copy link
Member

gmaze commented Jan 21, 2022

  • Setting an option can be done like:
argopy.set_options(trust_env=True)

OR

with argopy.set_options(trust_env=True):
    fs = argopy.stores.httpstore()

Using both is redundant

@gmaze
Copy link
Member

gmaze commented Jan 21, 2022

if

with argopy.set_options(trust_env=True):
    fs = argopy.stores.httpstore()
    uri = "https://github.com/euroargodev/argopy-data/raw/master/ftp/dac/csiro/5900865/5900865_prof.nc"
    ds = fs.open_dataset(uri)

works, there is no reason for other API fetchers to fail

You get a TimeoutError after:

argo = ArgoDataFetcher().profile(6903754,1).to_xarray()

is this systematic ?
If yes, I guess it's the specific access to the erddap the issue, not to internet

@gmaze
Copy link
Member

gmaze commented Jan 21, 2022

Can you try the following ?

  1. Fetch directly the data:
argo = ArgoDataFetcher().profile(6903754,1)
with argopy.set_options(trust_env=True):
    fs = argopy.stores.httpstore()
    ds = fs.open_dataset(argo.uri[0])

Here we try to directly fetch data from the erddap, but without the internal pre/post processing

  1. Go visit the erddap uri in your browser tab:
    From the request above, the URI is given by:
argo = ArgoDataFetcher().profile(6903754,1)
print(argo.uri)

This link should take you there as well

@matdever
Copy link
Author

This is precisely what I was going to try. the URL gives me a timeout error as well. A friend of mine tried from a different computer (and network) and he had no problems. I then used a VPN and I no longer have the timeout error!

Is it possible the ERDDAP sever would have blocked my IP based on my number of requests? I'm trying to characterize the compressibility error on the RBR floats deployed, by comparing them with neighbor floats, so I do make a large number of requests in the process...

@gmaze
Copy link
Member

gmaze commented Jan 24, 2022

I then used a VPN and I no longer have the timeout error!

Awesome ! So I guess we can close this issue then

Is it possible the ERDDAP sever would have blocked my IP based on my number of requests?

I am not aware about such a limitation
I'm also a very active user and I've never been blocked (although I crashed the server quite a few times 😄 )

by comparing them with neighbor floats

I've been thinking about a new method to be implemented in argopy to actually fetch profiles along a given trajectory (from another float, or a R/V transect for instance) and given a space/time distance
Do you got something along this idea ?
Let's talk about his in #169

@gmaze gmaze closed this as completed Jan 24, 2022
@matdever
Copy link
Author

confirmed - too many queries on the index argo file caused my IP to be blocked! Problem is about to be solved soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants