cursor.fetchallarrow() followed by SegmentationFault #139

IceS2 · 2017-12-15T17:28:04Z

Hello guys, it's the first time I post an Issue on a project, so I'm sorry if I'm doing it the wrong way, please correct me if wrong (=

I'm trying to use turbodbc with pyarrow and I'm running into a segmentation fault issue.
I'm querying a SQLServer database using FreeTDS. After I assign cursor.fetchallarrow() to a variable, it runs automatically into a segmentation fault. If it doesn't run automatically into the segmentation fault, as soon as I try to do anything with that variable it runs into segmentation fault.
My python version and installed packages:

Python 3.6.3
ansible==2.4.0.0
asn1crypto==0.23.0
attrs==17.3.0
avro-python3==1.8.2
awscli==1.11.143
bcrypt==3.1.3
beautifulsoup4==4.6.0
boto==2.48.0
boto3==1.4.7
botocore==1.7.1
bs4==0.0.1
cached-property==1.3.0
certifi==2017.7.27.1
cffi==1.11.2
chardet==3.0.4
colorama==0.3.7
colorclass==2.2.0
configparser==3.5.0
cryptography==2.0.3
Cython==0.27.3
decorator==4.1.2
docker==2.5.1
docker-compose==1.15.0
docker-pycreds==0.2.1
dockerpty==0.4.1
docopt==0.6.2
docutils==0.14
formats==0.1.1
google-api-python-client==1.6.4
gspread==0.6.2
httplib2==0.10.3
idna==2.6
ipython==6.1.0
ipython-genutils==0.2.0
jedi==0.10.2
Jinja2==2.9.6
jmespath==0.9.3
jsonschema==2.6.0
MarkupSafe==1.0
mock==2.0.0
numpy==1.13.1
oauth2client==4.1.2
pandas==0.20.3
paramiko==2.3.1
pbr==3.1.1
pexpect==4.2.1
pickleshare==0.7.4
pluggy==0.6.0
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py==1.5.2
pyarrow==0.7.1
pyasn1==0.3.7
pyasn1-modules==0.1.5
pybind11==2.2.1
pycairo==1.15.4
pycparser==2.18
pycrypto==2.6.1
Pygments==2.2.0
pymssql==2.1.3
PyMySQL==0.7.11
PyNaCl==1.1.2
pyOpenSSL==17.3.0
pytest==3.3.0
python-dateutil==2.6.1
pytz==2017.2
pywal==0.7.1
PyYAML==3.12
requests==2.18.4
rsa==3.4.2
s3transfer==0.1.10
simplegeneric==0.8.1
six==1.11.0
slacker==0.9.60
SQLAlchemy==1.1.13
texttable==0.8.8
tortilla==0.4.2
traitlets==4.3.2
turbodbc==2.4.1
ua-parser==0.7.3
Unidecode==0.4.21
uritemplate==3.0.0
urllib3==1.22
user-agents==1.1.0
wcwidth==0.1.7
websocket-client==0.44.0
xlrd==1.1.0

You can use the next code to try to reproduce the issue. I just took off the database credentials.

from turbodbc import connect, make_options

options = make_options(prefer_unicode=True)
connection = connect(driver='FreeTDS', server='<server>', port='<port>', database='<database>', uid='<uid>', pwd='<pwd>', turbodb_options=options)

cursor = connection.cursor()
cursor.execute('select * from <table>')

table = cursor.fetchallarrow()

The text was updated successfully, but these errors were encountered:

xhochy · 2017-12-15T19:10:49Z

Can you provide us with a backtrace related to the segfault?

On Linux you can get it with:

ulimit -c unlimited
<run python code>
gdb python core

In the then resulting gdb prompt, enter bt full and paste the output here (please be careful that it does not contain credentials).

IceS2 · 2017-12-15T20:11:55Z

It seems I can't Oo... Any idea why?

$ ulimit -c unlimited
$ python test_turbodbc_pyarrow.py
[1]    26933 segmentation fault (core dumped)  python test_turbodbc_pyarrow.py
$ gdb python core
GNU gdb (GDB) 8.0.1
Copyright © 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...(no debugging symbols found)...done.
/home/pablo/workspace/scratch/core: No such file or directory.
(gdb) bt full
No stack.
(gdb)

MathMagique · 2017-12-16T03:08:17Z

Hello @IceS2! Thanks for reporting! You did well :-).

I have a hunch that the prefer_unicode=True in combination with fetchallarrow() is the culprit here, as I fear that this code path is not properly implemented yet. Even though prefer_unicode=True is the recommended setting for MSSQL, please check whether the segmentation fault disappears if this option is set to False.

As a workaround, you could use fetchallnumpy() instead of fetchallarrow(). Performance is comparable, and fetchallnumpy() has full support for prefer_unicode=True.

xhochy · 2017-12-16T16:59:51Z

@IceS2 it could also be that your core is named core.26933 (taken from the message 26933 segmentation fault (core dumped)). If the numbered suffix is used depends a bit on your distribution.

IceS2 · 2017-12-18T12:16:17Z

@MathMagique, @xhochy, Sorry for the delayed answer. Wasn't near my computer past weekend!
So, I've run the code again setting prefer_unicode=False and the result was the same: [1] 23037 segmentation fault (core dumped) without any backtrace.

It seems to work with cursor.fetchallnumpy(). I was testing turbodbc because I'm experimenting with pyarrow and I need to do some batch extractions from a database. turbodbc into arrow table would be awesome!
My fallback plan is to work with SqlAlchemy and Pandas. Not sure how to transform the OrderedDict from cursor.fetchallnumpy() to a pyarrow table.

dirkjonker · 2017-12-18T16:49:01Z

What version of FreeTDS and unixODBC are you using? Can you test using the Microsoft ODBC driver for Linux instead of FreeTDS? See: https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server

IceS2 · 2017-12-18T18:40:50Z

Hey @dirkjonker, I've just tested using the Microsoft ODBC driver you mentioned. The result was the same [1] 3542 segmentation fault (core dumped)

The version of the packages you asked are

extra/unixodbc 2.3.4-2
extra/freetds 1.00.44-1
local/msodbcsql 13.1.9.1-1

dirkjonker · 2017-12-19T08:10:17Z

That's too bad, sometimes switching the driver works to resolve this type of problem.

What types of columns are in the table you are selecting from?

xhochy · 2017-12-19T13:56:35Z

@IceS2 are you on Fedora 24+? There we have a known problem with pyarrow in combination with turbodbc.

xhochy · 2017-12-19T13:58:13Z

It can be fixed by also building pyarrow from source which is not totally simple: https://arrow.apache.org/docs/python/development.html#developing-on-linux-and-macos or we could continue to work on providing manylinux1 Wheels for turbodbc: #108

Alternatively, using a conda based installation instead of a pip-based one will work.

IceS2 · 2017-12-19T15:10:45Z

@xhochy, I'm actually running Arch Linux!
Do you think it'd be fixed as well by building pyarrow from source? I could try that as soon as I get some "me time"

xhochy · 2017-12-19T22:14:07Z

@IceS2 It could be a possible fix. I guess the Fedora problem is due to Turbodbc being compiled with a different C++ ABI than the pyarrow wheel. Rebuilding both with the same ABI should fix the problems.

IceS2 · 2018-01-05T18:38:33Z

Hey @xhochy, Sorry for the late answer. I had to work on other stuff first.
I'm back at turbodbc, but after I upgraded pyarrow to 0.8.0, I was getting an error with turbodbc saying I didn't have the pyarrow support installed. So I uninstalled turbodbc and tried to install it back with pip, but I'm getting error: command 'gcc' failed with exit status 1
Can you help me out? Thanks!

MathMagique · 2018-06-29T08:38:12Z

@IceS2 Hi again! Have you tried using more recent versions of turbodbc/pyarrow in the mean time? Does this fix things?

albertoRamon · 2018-09-18T22:04:29Z

Same error, with same line (the last)

from turbodbc import connect
import  pyarrow
connection = connect(dsn='mysql_DNS_ANSI')
cursor = connection.cursor()
cursor.execute('SELECT col1 from test01;')
table = cursor.fetchallarrow()

change last time to print cursor.fetchall() returns:

[[1L], [2L], [3L], [4L], [5L]]

Can be reproduced with this command:

docker run -it albertozgz/turbodbc_extrator:debian9 bash

(You only need connect this Docker to your database, I uses MySQL 8.0)

TIP1: table=cursor.fetchallnumpy() works fine
TIP2: tested ANSI and UNICODE driver
TIP3: tested fetchallarrow(adaptive_integers=True/False)
TIP4:

batches = cursor.fetcharrowbatches()
for batch in batches:
  print(batch)

segmentation fault (core dumped)

MathMagique · 2018-09-19T07:34:29Z

@xhochy Would you have the time to look at @albertoRamon 's reproducing example, please?

xhochy · 2018-09-19T08:23:40Z

This is the same problem as above. Debian 9 builds with by default with a different C++ ABI than the pyarrow wheels are built with. As long as we don't ship turbodbc manylinux1 wheels, these segfaults will persist.

MathMagique · 2018-09-19T09:04:01Z

Would it work to switch to the conda environment with our "blessed" builds?

xhochy · 2018-09-19T09:06:25Z

Yes using pyarrow and turbodbc both from conda-forge will work. They are both build in the same consistent environment.

MathMagique · 2018-09-19T09:16:28Z

@albertoRamon Could you try using the turbodbc conda package, please? https://anaconda.org/conda-forge/turbodbc

albertoRamon · 2018-09-19T09:19:32Z

Yes of course

Any test or test that they want to do I can prove it
Or if the solution is not to use debian9 (I tried with Alpine3.8 and Debian10 and it did not work)

MathMagique · 2018-09-19T09:33:24Z

Anything too modern will not work because the precompiled pyarrow wheel uses a "classic" version of the ABIs, while pip install turbodbc will compile stuff with the latest and greatest ABIs. Conda packages for turbodbc and pyarrow are built with consistent settings, and should work on any modern system.

albertoRamon · 2018-09-19T21:52:45Z

@MathMagique @xhochy , Thanks
Your suggestion works fine

 pip uninstall pyarrow
 pip uninstall turbodbc
 
 wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh 
 chmod +x  Miniconda2-latest-Linux-x86_64.sh 
 ./Miniconda2-latest-Linux-x86_64.sh 
 conda install -c conda-forge pyarrow
 source ~/.bashrc
 
 conda install -c conda-forge pyarrow
 conda install -c conda-forge turbodbc

python:

table = cursor.fetchallarrow()
print table.num_rows

bash:> 5

If you think that the best option for production environment is download code from Git and compile it. I will be happy to modify the docker file to realize these steps

BR

MathMagique · 2018-09-20T07:35:14Z

I never would download code from Git for production; if anything, download source packages from pypi.org. I'd suggest to go down the conda route for production, however, as this has already solved the hassle of compiling stuff the right way.

MathMagique added the bug label Dec 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cursor.fetchallarrow() followed by SegmentationFault #139

cursor.fetchallarrow() followed by SegmentationFault #139

IceS2 commented Dec 15, 2017

xhochy commented Dec 15, 2017

IceS2 commented Dec 15, 2017

MathMagique commented Dec 16, 2017

xhochy commented Dec 16, 2017

IceS2 commented Dec 18, 2017

dirkjonker commented Dec 18, 2017

IceS2 commented Dec 18, 2017

dirkjonker commented Dec 19, 2017

xhochy commented Dec 19, 2017

xhochy commented Dec 19, 2017

IceS2 commented Dec 19, 2017

xhochy commented Dec 19, 2017

IceS2 commented Jan 5, 2018

MathMagique commented Jun 29, 2018

albertoRamon commented Sep 18, 2018 •

edited

Loading

MathMagique commented Sep 19, 2018

xhochy commented Sep 19, 2018

MathMagique commented Sep 19, 2018

xhochy commented Sep 19, 2018

MathMagique commented Sep 19, 2018

albertoRamon commented Sep 19, 2018

MathMagique commented Sep 19, 2018

albertoRamon commented Sep 19, 2018

MathMagique commented Sep 20, 2018

cursor.fetchallarrow() followed by SegmentationFault #139

cursor.fetchallarrow() followed by SegmentationFault #139

Comments

IceS2 commented Dec 15, 2017

xhochy commented Dec 15, 2017

IceS2 commented Dec 15, 2017

MathMagique commented Dec 16, 2017

xhochy commented Dec 16, 2017

IceS2 commented Dec 18, 2017

dirkjonker commented Dec 18, 2017

IceS2 commented Dec 18, 2017

dirkjonker commented Dec 19, 2017

xhochy commented Dec 19, 2017

xhochy commented Dec 19, 2017

IceS2 commented Dec 19, 2017

xhochy commented Dec 19, 2017

IceS2 commented Jan 5, 2018

MathMagique commented Jun 29, 2018

albertoRamon commented Sep 18, 2018 • edited Loading

MathMagique commented Sep 19, 2018

xhochy commented Sep 19, 2018

MathMagique commented Sep 19, 2018

xhochy commented Sep 19, 2018

MathMagique commented Sep 19, 2018

albertoRamon commented Sep 19, 2018

MathMagique commented Sep 19, 2018

albertoRamon commented Sep 19, 2018

MathMagique commented Sep 20, 2018

albertoRamon commented Sep 18, 2018 •

edited

Loading