-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't run on colab with Pygmalion-6B / results look different on Colab #14
Comments
I have the same issue when trying to run the 4chan model in colab. I get a "Memory cgroup out of memory" error when I look at the dmesg in the colab terminal. |
I can confirm this issue. On colab, I can't load either pygmalion-2.7b or pygmalion-6b. The free colab has around 13GB RAM, while pygmalion-2.7b takes 6.8GB of RAM to load on my system (peak allocation). So it should in principle work. |
Pygmalion-6B should be able to loaded on colab too, since the colab notebooks of other projects are able to load it (namely the Pyg devs own notebook, and KoboldAI's). |
An Anonymous 4chan user has kindly provided this notebook that allows the 6b model to be loaded in 8bit mode: https://colab.research.google.com/github/81300/AI-Notebooks/blob/main/Colab-TextGen-GPU.ipynb I haven't tested it yet because Google is not giving me a free instance with a GPU. |
I still can't launch the "basic commands" colab even with the additional arguments. I tried a little more on the Anon's colab with a different character and without fail, within the first 5 messages, I can always get the bot to sperg out with a normal chat: Wish I could test a local installation but without a GPU I can only do colab. |
Disabling text output streaming through Another issue is that the Python environment is inconsistent if you don't make sure Conda gets activated at each command. For instance if The CPU RAM issues can be alleviated if we break the model into smaller shards, though I'm not sure yet of the negative side effects. I have updated that notebook adding an option and script to do this within the runtime. |
@81300 thanks for looking into this. The most likely culprit is indeed the CUDA library. The debug preset generating different results implies that the logits are different, which can only happen if the internal calculations are performed differently (different precision?).
In any case, I don't know why, but the replies in your notebook are a LOT better now (although still different than running locally). I have sent 18 messages to Chiharu with the default settings and she didn't start ranting once. |
My only remaining question is if it is possible to get the exact same responses on colab and locally. The colab responses feel a bit worse than local. Not nearly as bad as before, but still not as good. The debug preset is now included by default. |
Yesterday I tried the colab again with Pyg-6B, the bot didn't start to rant in the first 5 messages, and the answers were "slightly passable" at first (I've never been able to run it locally so I have no idea how that's supposed to look, so my only point of comparison is CAI, to which Pyg is not even remotely close to). |
@oobabooga, with the deterministic preset I currently get the same results locally and on Colab.
However there could potentially be other aspects at play than the inference settings, as described in [1] and [2]. Locally I could only test with a RTX 2000 series card which--like the Tesla T4s I've been assigned each time on Colab--is on the Turing architecture. Perhaps cuDNN [3] behaves differently on your Ampere card. Now I feel like on Colab forcing the app, via Conda, to use different CUDA libraries than the ones that are preinstalled by Google is wrong because the instance isn't exactly bare-metal, it runs in a Docker container [4] and the host has its own CUDA drivers. [1] https://pytorch.org/docs/stable/notes/randomness.html |
It is also possible that the divergent results are caused by |
Here is the
test on my RTX 3090 with the debug preset, for comparison. On my laptop which has a turing gpu the results are the same as Colab and @81300. The goal is to reproduce this >Hi result on any GPU. |
Thanks. We still can't load Pygmalion-6B on Colab without To get around the memory issue we could try DeepSpeed ZeRO-3 inference, in which case if you'd launch a single process on a single GPU, the CPU RAM requirement should just be the size of the biggest shard in your model. Will update later on this. So I retested on Colab without
I also tried EleutherAI/gpt-neo-125M on Tesla T4 with the task and Python snippet from the issue you linked but my results were correct. |
@81300 @waifusd, I have discovered the issue: it was the model. The model that I was using on my computer was the very first commit to the HuggingFace repository, which I downloaded on January 12nd. The current commit to that repository (main branch) is different from the first one. This updated commit passes the >Hi test when executed in GPU-only mode, but fails in CPU, GPU+CPU, or 8-bit mode, generating the "what do you think of my setup?" response that we have been seeing. On the other hand, the first commit passes the >Hi test in any mode. It always yields the same responses. This is the response that I got on colab using this commit:
You can try it in this notebook (which is the one I used for the screenshot above): https://colab.research.google.com/github/oobabooga/AI-Notebooks/blob/main/Colab-TextGen-GPU.ipynb In other words, it is now possible to get 1000x better responses on Colab. |
@81300: you are right that the next step now would be to ditch 8-bit mode altogether, as this would probably make the model run a bit faster. I am also worried about the loading times on Colab. Ideally, it would be best to get the model working in less than 5 minutes instead of 12. |
Here are some comparisons between different branches of pygmalion-6b and different modes (GPU, GPU+CPU, and 8-bit): https://huggingface.co/PygmalionAI/pygmalion-6b/discussions/8#63d15cae119416cdbe15ae2e |
Using the provided notebook and just changing the model to Pygmalion-6B instead of Pygmalion-1.3B generates the following tcmalloc error and the execution stops cold.
The text was updated successfully, but these errors were encountered: