Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent invocations of puts seem to interfere with each other #7978

Open
rvprasad opened this issue Jul 20, 2019 · 42 comments
Open

Concurrent invocations of puts seem to interfere with each other #7978

rvprasad opened this issue Jul 20, 2019 · 42 comments

Comments

@rvprasad
Copy link

rvprasad commented Jul 20, 2019

Since I am not sure if this is a Crystal issue or Kemal issue, I have reported this to the Kemal team as well (kemalcr/kemal#543).

Description

A simple server responds to get requests with random numbers in JSON format. When the server is running on a Raspberry Pi and ab tool is used to issue sequential requests to the server in quick succession, the output of the server on standard output is garbled.

I am reporting the issue here as I am not sure if this is a crystal issue or a kemal issue.

Steps to Reproduce

  1. Execute the server following the instructions in README.md
  2. Execute ab -n 10 -c 1 http://127.0.0.1:1234/random?num=30

Expected behavior: The following or something similar should be seen in the terminal executing the server.

8.108ms
8.051ms
8.403ms
8.242ms
8.209ms
8.437ms
8.095ms
7.988ms
8.126ms
8.087ms

Actual behavior: The following is seen in the terminal executing the server.

-159659774543452127413503ApacheBench/2.3127.0.0.1:1234/random?num=30��<�pyR�p��%�h2.3
                                                                                     /get/random34
                                                                                                  /get/rams
-1358916334152017894057263��7R뱵��Al`|��34
                                          /get/random096127.0.0.1:41128	127.0.0.1
                                                                                 /get/random2.3ms
321435393.761ms
-0.000ms
-0.000ms
4076574188583769414937274��&��5���Qg����2.3
                                           /get/random34
                                                        /get/random30
                                                                     /get/random127.0.0.1:41152	127.0ms
-902563470739474328704365�^��b�Z'�v�;�ق2ms
-108291519172398526629057��1m�EU]�K�����9�2.3
                                             /get/random34
                                                          /get/randomms
0.000ms
-0.000ms

Reproduces how often: Always

Versions

Crystal 0.29.0 (2019-07-20)

Kemal 0.25.2

LLVM: 6.0.1
Default target: arm-unknown-linux-gnueabihf

Linux master10 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux

Additional Information

The object file of the crystal compiler was compiled on Ubuntu/Laptop and then linked to create the executable on Hypriot/RaspberryPi.

@rvprasad rvprasad changed the title Concurrent invocations of puts seem to interfer with each other Concurrent invocations of puts seem to interfere with each other Jul 20, 2019
@rvprasad
Copy link
Author

I observed similar garbled output even after isolating puts into a dedicated fiber.

@asterite
Copy link
Member

Totally offtopic, but there's no need to do Random.new every time. You can just use the top-level rand method.

@asterite
Copy link
Member

I can't reproduce this on Mac OSX.

I don't think Crystal works fine on a Raspberry Pi. At least it's not something we officially support it so it'll be very hard for us to give help on this.

@rvprasad
Copy link
Author

I haven't observed this issue on Linux. Just on Raspberry Pi 3. Since Crystal works on Linux and RaspPi runs Linux, I figured Crystal would work on RaspPi if Crystal could be cross compiled to run on RaspPi (even if this configuration is not supported). Clearly, that is not the case :)

@girng
Copy link
Contributor

girng commented Jul 22, 2019

i feel like this is a unique and specific use-case. it also can't be reproduced on WSL or native Debian

@rvprasad
Copy link
Author

I am not sure if this an issue with cross compilation (that is causing the compiled program to output garbage) or the Crystal compiler on Raspberry Pi. Any suggestions to verify this is welcome.

So far, I have tried building Crystal on Raspberry Pi 3B by 1) using the cross-compiled compiler and 2) bootstrapping the compiler (crystal-lang/boostrap-script). First approach failed due to insufficient memory while the second failed due to failure in building ruby in the first stage of bootstrapping.

That said, I am looking for a way to compile Crystal programs to run on Raspberry Pi 3B. Doing so on Raspberry Pi would be ideal.

@asterite
Copy link
Member

@rvprasad Do yo know what architecture is Raspberry Pi running on? Or check if this gives true?

{{ puts flag?(:aarch64) }}

If so, #7983 might fix this issue.

@rvprasad
Copy link
Author

rvprasad commented Jul 23, 2019

Rasp Pi 3B uses Armv7 (v7l) processor. (The target triple is armv6-unknown-linux-gnueabihf.) So, the above test fails.

@rvprasad
Copy link
Author

I did some digging with gdb. For this, I changed

puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)

to

tmp1 = "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)
puts tmp1

When a single request was submitted to the server, I found all of the strings and puts output were well-formed.

(gdb) info locals
ret = 0x245be8
elapsed_time = {seconds = 0, nanoseconds = 190677}
tmp1 = 0x260e60
(gdb) print &(tmp1->c)
$5 = (UInt8 *) 0x260e6c "-0.000ms"
(gdb) print &(ret->c)
$9 = (UInt8 *) 0x245bf4 "[680012]"

However, while elapsed_time.total_nanoseconds was 190677, the second string was "-0.000ms". This is wrong.

When two concurrent requests were submitted to the server, ret was well-formed but tmp1 was not.

(gdb) info locals
ret = 0x245990
elapsed_time = {seconds = 0, nanoseconds = 191874}
tmp1 = 0x23de10
(gdb) print &(tmp1->c)
$10 = (UInt8 *) 0x23de1c "-501870678723681322067296"
(gdb) print &(ret->c)
$11 = (UInt8 *) 0x24599c "[926334]"

@rvprasad
Copy link
Author

Changing puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6) to puts "#{elapsed_time.total_nanoseconds / 1e6}ms" reduces the frequency of garbled output and the amount of garbled output in each instance.

@ysbaddaden
Copy link
Contributor

There is an issue with hard/soft floats on ARM 32 targets. Specifying the target CPU while compiling may help LLVM backend to generate proper code (or not).

@rvprasad
Copy link
Author

rvprasad commented Jul 24, 2019

@ysbaddaden if you are suggesting floating point calculations (not data movements) involving floating point values) pose issues on ARM 32 targets, then I doubt that is the case as I was able to generate garbled output with the below minimal program.

require "kemal"

get "/random" do |env|
  tmp1 = "%fms" % 3.333
  puts tmp1
  tmp2 = "%dms" % 3.333
  puts tmp2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Submitting a single request to the above server produces the following output:

$ ./server 
Serving at 0.0.0.0:1234
-158972576233719173059H�/getget

get/random/getget

/ws/randomms
3ms

To cross-compile the compiler,

  1. Ran ./bin/crystal build src/compiler/crystal.cr --warnings all --cross-compile --target "armv6-unknown-linux-gnueabihf" -s -D without_openssl -D without_zlib on a Ubuntu box.
  2. Copied crystal.o onto a Raspberry Pi 3B running Raspbian 9.9 (from Hypriot).
  3. Ran sudo cc 'crystal.o' -o '/usr/lib/crystal/bin/crystal' -rdynamic /usr/share/crystal/src/llvm/ext/llvm_ext.o `/usr/bin/llvm-config-6.0 --libs --system-libs --ldflags 2> /dev/null` -lstdc++ -lpcre -lm -lgc /usr/share/crystal/src/ext/libcrystal.a -levent -lrt -ldl -lpthread -L/usr/lib -L/usr/local/lib after copying crystal/src into /usr/share/crystal/src (following the instructions here).

It seems the format specifier for float (%f) is broken.

@straight-shoota
Copy link
Member

straight-shoota commented Jul 25, 2019

Well, the stringifier for floats uses float arithmetic and that might be broken as @ysbaddaden mentioned. Why do you doubt that this could be the issue? Your example does actually support that.

@asterite
Copy link
Member

For %f we call sprintf, there might be an issue there. But it's hard if we can't reproduce it on Mac and Linux.

@rvprasad
Copy link
Author

@straight-shoota TIL stringifying float values involves floating point calculations/arithmetic operations :) That said, it seems the floating point operations (e.g., the division operation worked fine) in the application code seems to work fine while the ones used to stringifying floats seem to be broken. Nevertheless, I will check this.

@ysbaddaden
Copy link
Contributor

Don't use floats, at all. Use integers instead —and don't round a float!

The problem with hard/soft floats is that the caller/called ABI doesn't expect the value in the same registers (cpu integers vs fpu registers), which results in garbage.

@rvprasad
Copy link
Author

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid. This is in line with @asterite's comment.

@rvprasad
Copy link
Author

@asterite I was able to reproduce the issue in a Raspbian container running on a Linux host. The instructions to use the Docker image to create a compiler is available in docker-files repo. Once you have built the compiler and shards in the container, the issue can be repro-ed by building and executing the example. You will need to install curl in the container (via apt install curl) to repro the issue.

@straight-shoota
Copy link
Member

Is the linux host running on armv6 as well or is it an x86 architecture? Raspbian has an x86 variant, so I'm not sure.

@rvprasad
Copy link
Author

The Linux host is a x86 machine while the Raspbian is running on armv7l emulator.

@ysbaddaden
Copy link
Contributor

Please read the following:

Likely a duplicate of #6954

All ARM isues are related to a hard/soft float issue. Specifying a CPU when compiling may fix the issue (it did for my qemu raspbian image) or not (fails on scaleway armv7 server, and reportedly rpi3).

Reproducing isn't an issue. It's quite easy to. The problem is understanding why LLVM won't enable hard floats when compiling. Maybe we miss some LLVM attributes on functions? I recently noticed that clang sets a bunch of them, related to FP and soft-float.

@konovod
Copy link
Contributor

konovod commented Jul 25, 2019

Maybe I'm saying something stupid, but for hard float you should specify cpu, fpu and float abi: -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

@rvprasad
Copy link
Author

@konovod I tried doing this today but I gave up as I could not figure out how to pass these flags to LLVM compiler from the command line interface of crystal compiler.

@rvprasad
Copy link
Author

@ysbaddaden Thanks for the pointer. I read thru some of these while digging around. BTW, what is the best way to pass the compiler options mentioned by @konovod to LLVM via crystals CLI? That would allow more experimentation.

@asterite
Copy link
Member

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid.

What do you mean?

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jul 25, 2019

Maybe we just miss the "use-soft-float"="false" attribute on every functions? It should always be false, unless the target's arch is arm and the ABI is *eabi.

Maybe we miss the ability to trigger thumb mode; when -mthumb is specified clang changes the target from armv7-linux-gnueabihf to thumbv7-linux-gnueabihf for example (i.e. replaces arm with thumb).

@rvprasad
Copy link
Author

@asterite I wasn't sure at which program point were the memory addresses getting messed up. So, I dug around and found that format_buf (the format string) and float (the value to be printed) arguments passed to LibC.snprintf at src/string/formatter.cr:265 were indeed the values they should have been. Assuming the other input argument len was valid, the fault resulting in the garbled output is encountered sometime after the LibC.snprintf is invoked at src/string/formatter.cr:265. This kinda aligns with @ysbaddaden suggestion -- ABI mismatch between the app and libc.

@rvprasad
Copy link
Author

rvprasad commented Jul 26, 2019

The failure occurs only when the foo() is invoked within the context of a Kemal Handler (at call2 and not at call1). This seems to suggest the issue may be specific to Kemal and not related to ABI or hard/soft floats. [I have mentioned this in kemalcr/kemal/issues/543.]

require "kemal"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

get "/random" do |env|
  foo() # call2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Output upon requesting http://127.0.0.1:1234/random

root@6f9b4df16a4b:/t/t# ./t
3.333000ms
3ms
Serving at 0.0.0.0:1234
-158972575668831573782�{#/getget

get/random/getget

/ws/randomms
3ms

@asterite
Copy link
Member

What happens if you just use http server from the standard library? I don't think kemal is involved here.

@rvprasad
Copy link
Author

rvprasad commented Jul 26, 2019

The below snippet fails as well.

require "http/server"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

server = HTTP::Server.new do |ctxt|
  foo() # call2
  ctxt.response.print "Boo"
end

address = server.bind_tcp 1234
puts "Serving at #{address}"
server.listen

Output

3.333000ms
3ms
Serving at 127.0.0.1:1234
-158972575660657402946.

keep-alive

Keep-Alive

ConnectionHoidentity*/*ms
3ms

@asterite
Copy link
Member

Cool! Now you could try reducing it to using spawn and Fiber.yield in between calls, and spawning a lot of fibers and see if it still reproduces.

The less code we have that reproduces the problem, the easier to find the cause.

@asterite
Copy link
Member

(I'll send such code later today)

@asterite
Copy link
Member

@rvprasad Try this:

chan = Channel(Nil).new

1000.times do |i|
  spawn do
    tp1 = "%fms" % 3.333
    puts tp1
    Fiber.yield
    tp2 = "%dms" % 3.333
    puts tp2
    chan.send(nil)
  end
end

1000.times do
  chan.receive
end

@rvprasad
Copy link
Author

rvprasad commented Jul 26, 2019

@asterite Thanks. I was able to repro the issue in the Docker container.

spawn do
  tp1 = "%fms" % 3.333
  puts tp1
end

sleep(10)

@asterite
Copy link
Member

What output you get with that program?

@rvprasad
Copy link
Author

It produces the following output in a container.

root@4b026a742082:/t/t# crystal run -d s.cr 
Using compiled compiler at `.build/crystal'
-158972575643887689414`?� I�l@Z~?6�~΄/t/t//t/t@@ms

@asterite
Copy link
Member

But it works fine if you don't do it inside spawn?

@rvprasad
Copy link
Author

Yes.

Also, output is fine when spawn is replaced with 1.times.

@rvprasad
Copy link
Author

@ysbaddaden In the below snippet, foo prints 1.000000 2.000000 when invoked in mydo but it prints 0.000000 0.000000 when invoked in the fiber. Since the same code fragment provides both "correct" and "incorrect" outcomes, I doubt if this is an ABI related issue but...

The float argument to corresponding calls to LibC.snprintf in string/formatter.c in the above invocations are identical, i.e., 1.0 and 2.0. So, it seems the context swapping is interfering with the call to an external library function.

def foo
  i = {1.0, 2.0}
  printf("%f %f\n", *i)
end

def mydo(&x)
  x.call()
end

mydo(&->foo)
spawn foo

sleep(2)

Any ideas on what/where might be the issue?

@ysbaddaden
Copy link
Contributor

We save/restore FPU registers on ARM when switching fibers, and it leads to the same hard/soft float codegen bug.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2020

I want to experiment with different LLVM compiler flags during code generation, e.g., related to FPU and NEON. Any ideas where and how these flags can be added/modified/specified?

@HCLarsen
Copy link
Contributor

I know this issue has been dormant for a couple of years, but I wanted to bring up that I have also experienced this, although it was much earlier this year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants