You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to run the "Loading paper text" chapter from the Primer, I run into an error indicating that it can't find "test.elicit.org". Since paper.parse_pdf depends on this remote resource to parse the PDF, it can't proceed at all.
Here's a full trace of what I see:
Full trace
python recipes/paper_hello.py --paper papers/keenan-2018.pdf
/home/cass/src/ice/venv/lib/python3.11/site-packages/pydantic/_migration.py:283: UserWarning: `pydantic.generics:GenericModel` has been moved to `pydantic.BaseModel`.
warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')
/home/cass/src/ice/venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:
* 'keep_untouched' has been renamed to 'ignored_types'
* 'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/socket.py", line 961, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 616, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connection.py", line 205, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/adapters.py", line 589, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='test.elicit.org', port=443): Max retries exceeded with url: /elicit-previews/james/oug-3083-support-parsing-arbitrary-pdfs-using/parse_pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cass/src/ice/recipes/paper_hello.py", line 10, in <module>
recipe.main(answer_for_paper)
File "/home/cass/src/ice/ice/recipe.py", line 176, in main
defopt.run(
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 348, in run
call = bind(
^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 255, in bind
call, rest = _bind_or_bind_known(*args, _known=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/defopt.py", line 203, in _bind_or_bind_known
args, rest = parser.parse_args(argv), []
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1862, in parse_args
args, argv = self.parse_known_args(args, namespace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1895, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2103, in _parse_known_args
start_index = consume_optional(start_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2043, in consume_optional
take_action(action, args, option_string)
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 1955, in take_action
argument_values = self._get_values(action, argument_strings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2485, in _get_values
value = self._get_value(action, arg_string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/.pyenv/versions/3.11.0/lib/python3.11/argparse.py", line 2518, in _get_value
result = type_func(arg_string)
^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/ice/recipe.py", line 181, in <lambda>
Paper: lambda path: Paper.load(Path(path)),
^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/ice/paper.py", line 158, in load
paragraph_dicts = parse_pdf(file)
^^^^^^^^^^^^^^^
File "/home/cass/src/ice/ice/cache.py", line 28, in sync_wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/ice/paper.py", line 119, in parse_pdf
r = requests.post(PDF_PARSER_URL, files=files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cass/src/ice/venv/lib/python3.11/site-packages/requests/adapters.py", line 622, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='test.elicit.org', port=443): Max retries exceeded with url: /elicit-previews/james/oug-3083-support-parsing-arbitrary-pdfs-using/parse_pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x751bb09bb4d0>: Failed to resolve 'test.elicit.org' ([Errno -2] Name or service not known)"))
❓ Is there an alternative that folks recommend for PDF parsing here?
The text was updated successfully, but these errors were encountered:
I have quick-fixed this here https://github.com/TommyBark/ice/tree/fix-parse_pdf by using pdfminer.six package.
The semantic chunking is not very reliable as it is done based on html parsing and not all pdfs work nicely with it, but it works as proof of concept for the Factored Cognition Primer examples.
When trying to run the "Loading paper text" chapter from the Primer, I run into an error indicating that it can't find "test.elicit.org". Since
paper.parse_pdf
depends on this remote resource to parse the PDF, it can't proceed at all.Here's a full trace of what I see:
Full trace
❓ Is there an alternative that folks recommend for PDF parsing here?
The text was updated successfully, but these errors were encountered: