h2oai · zainmhaq · Mar 19, 2024 · Mar 13, 2024 · Mar 16, 2024 · Mar 18, 2024
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -4,22 +4,22 @@
 
 Gradio 4.18.0+ fails to work for streaming audio from UI.  No audio is generated.  Waiting for bug fix: https://github.com/gradio-app/gradio/issues/7497.
 
-Work-around: Use gradio 4.17.0 or lower:
+Workaround: Use gradio 4.17.0 or lower:
 ```bash
 pip uninstall gradio gradio_client -y
 pip install gradio==4.17.0
 ```
 
-### nginx and k8 multi-pod support
+### nginx and K8s multi-pod support
 
-Gradio 4.x.y fails to support k8 multi-pod use.  Basically gradio client on one pod can't reach gradio server on nearby pod.  See: https://github.com/gradio-app/gradio/issues/6920 and https://github.com/gradio-app/gradio/issues/7317.
+Gradio 4.x.y fails to support K8s multi-pod use. Specifically, the Gradio client on one pod can't reach a Gradio server on a nearby pod. For more information, see https://github.com/gradio-app/gradio/issues/6920 and https://github.com/gradio-app/gradio/issues/7317.
 
-Work-around: Use gradio 3.50.2 and gradio_client 0.6.1 by commenting-in/out relevant lines in `requirements.txt`, `reqs_optional/reqs_constraints.txt`, and comment-out `gradio_pdf` in `reqs_optional/requirements_optional_langchain.txt`, i.e.
+Workaround: Use gradio 3.50.2 and `gradio_client` 0.6.1 by commenting in or out relevant lines in `requirements.txt` and `reqs_optional/reqs_constraints.txt`, and comment out `gradio_pdf` in `reqs_optional/requirements_optional_langchain.txt`, i.e.
 ```bash
 pip uninstall gradio gradio_client gradio_pdf -y
 pip install gradio==3.50.2
 ```
-If you see spontaneous crashes via OS killer, then use gradio 3.50.1 instead:
+If you experience spontaneous crashes via OS killer, then use gradio 3.50.1 instead:
 ```bash
 pip uninstall gradio gradio_client gradio_pdf -y
 pip install gradio==3.50.1
@@ -33,13 +33,13 @@ CUDA error: an illegal memory access was encountered
 
 With upgrade to llama_cpp_python 0.2.56 for faster performance and other bug fixes, thread safety is worse.  So cannot do audio streaming + GGUF streaming at same time.  See: https://github.com/ggerganov/llama.cpp/issues/3960.
 
-A temporary work-around is present in h2oGPT, whereby XTTS model (not microsoft TTS model) and llama.cpp models are not used at same time.  Leads to more delays in streaming for text+audio, but not too bad result.
+A temporary workaround is present in h2oGPT, whereby the XTTS model (not the Microsoft TTS model) and llama.cpp models are not used at the same time. This leads to more delays in streaming for text + audio, but not too bad a result.
 
-Other work-arounds:
+Other workarounds:
 
-* Work-around 1: Use inference server like oLLaMa, vLLM, gradio inference server, etc.  as described [below](FAQ.md#running-ollama-vs-h2ogpt-as-inference-server).
+* Workaround 1: Use inference server like oLLaMa, vLLM, gradio inference server, etc.  as described [below](FAQ.md#running-ollama-vs-h2ogpt-as-inference-server).
 
-* Work-around 2: Follow normal directions for installation, but replace 0.2.56 with 0.2.26, e.g. for CUDA with Linux:
+* Workaround 2: Follow normal directions for installation, but replace 0.2.56 with 0.2.26, e.g. for CUDA with Linux:
     ```bash
     pip uninstall llama_cpp_python llama_cpp_python_cuda -y
     export LLAMA_CUBLAS=1
@@ -51,7 +51,7 @@ Other work-arounds:
 
 ## Frequently asked questions
 
-### Running oLLaMa vs. h2oGPT as inference server.
+### Running oLLaMa vs. h2oGPT as inference server
 
 * Run oLLaMa as server for h2oGPT frontend.
 
@@ -271,7 +271,7 @@ ulimit -n 1048576
 
 export H2OGPT_LLAVA_MODEL=http://xxx.xxx.xxx.144:7860/
 ```
-Be careful with gradio and secret files.  h2oGPT sets `allowed_paths` to include `.`, unless public instance when `GPT_H2O_AI=1` is set.  So if you put your key file in `.` and didn't set to be public instance, it'll be possible to access your key file even if have a soft link to secret location.
+Exercise caution with gradio and secret files.  h2oGPT sets `allowed_paths` to include `.`, unless public instance when `GPT_H2O_AI=1` is set.  So if you put your key file in `.` and didn't set to be public instance, it'll be possible to access your key file even if have a soft link to secret location.
 
 Then running:
 ```
@@ -1008,7 +1008,7 @@ For Twitter, one can right-click on Twitter video, copy video address, then past
 
 For fast performance, one can use `distil-whisper/distil-large-v2` as the model, which is about 10x faster for similar accuracy.
 
-In addition, faster_whisper package can be used if using large v2 or v3, which is about 4x faster and 2x less memory for similar accuracy.
+In addition, `faster_whisper` package can be used if using large v2 or v3, which is about 4x faster and 2x less memory for similar accuracy.
 
 ### Voice Cloning
 
@@ -1427,15 +1427,15 @@ We take care of this for distilgpt2, but other similar models might fail in same
 
 ### Adding Models
 
-You can choose any Hugging Face model or quantized GGUF model file in h2oGPT.  Hugging Face models are automatically downloaded to the Hugging Face .cache folder (in home folder).
+You can choose any Hugging Face model or quantized GGUF model file in h2oGPT.  Hugging Face models are automatically downloaded to the Hugging Face `.cache` folder (in home folder).
 
 #### Hugging Face
 
 Hugging Face models are passed via `--base_model` in all cases, with fine-control using `hf_model_dict`.
 
 #### TheBloke
 
-For models by [TheBloke](https://huggingface.co/TheBloke), h2oGPT tries to automatically handle all types of models (AWQ, GGUF, GGML, GPTQ, with or without safetensors) automatically all passed with `--base_model` only (CLI or UI both).  For example, these models all can be passed just with `--base_model` without any extra model options:
+For models by [TheBloke](https://huggingface.co/TheBloke), h2oGPT tries to automatically handle all types of models (AWQ, GGUF, GGML, and GPTQ, with or without [safetensors](https://huggingface.co/docs/safetensors/index#safetensors)). These models can all be passed using only the `--base_model` option (CLI or UI both).  For example, the following models can all be passed with just the `--base_model` option without any additional model options:
 ```text
 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b
 python generate.py --base_model=TheBloke/Xwin-LM-13B-V0.1-GPTQ
@@ -1446,15 +1446,15 @@ python generate.py --base_model=TheBloke/zephyr-7B-beta-AWQ
 python generate.py --base_model=zephyr-7b-beta.Q5_K_M.gguf --prompt_type=zephyr
 python generate.py --base_model=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf?download=true
 ```
-Some are these are non-quantized models with links HF links, some specific files on local disk ending in `.gguf`.  Given `TheBloke` HF names, if a quantized model, h2oGPT pulls the recommended model from his repository.  You can also provide a resolved web link directly, or a file.
+Some are these are non-quantized models with HF links, and some are specific files on local disk ending in `.gguf`.  Given `TheBloke` HF names, if it is a quantized model, h2oGPT pulls the recommended model from his repository.  You can also provide a resolved web link directly, or a file.
 
-Watch out for typos.  h2oGPT broadly detects if the URL is valid, but Hugging Face just returns a redirect for resolved links, leading to page containing `Entry not found` if one makes a mistake in the file name, e.g. `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguffoo`.
+Watch out for typos.  h2oGPT broadly detects if the URL is valid, but Hugging Face just returns a redirect for resolved links, leading to a page containing `Entry not found` if there is a mistake in the file name, e.g. `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguffoo`.
 
-For AWQ, GPTQ, we try the required safe tensors or other options, and by default use transformers's GPTQ unless one specifies `--use_autogptq=True`.
+For AWQ, GPTQ, we try the required safe tensors or other options, and by default use transformers' GPTQ unless one specifies `--use_autogptq=True`.
 
 #### AWQ & GPTQ
 
-For full control over AWQ, GPTQ models, one can use an extra `--load_gptq` and `gptq_dict` for GPTQ models or an extra `--load_awq` for AWQ models.
+For full control over AWQ and GPTQ models, one can use an extra `--load_gptq` and `gptq_dict` for GPTQ models or an extra `--load_awq` for AWQ models.
 
 ##### GPTQ
 
@@ -1489,7 +1489,7 @@ For full control (e.g. for non-TheBloke models), use `--base_model=llama` and sp
 
 #### GGUF
 
-GGUF models are supported (can run either CPU and GPU in same install), see installation instructions for installing the separate GPU and CPU packages.
+GGUF (GPT-Generated Unified Format) models are supported (can run either CPU and GPU in same install), see installation instructions for installing the separate GPU and CPU packages.
 
 GGUF using Mistral:
 ```bash