143 lines
7.8 KiB
Plaintext
143 lines
7.8 KiB
Plaintext
---
|
|
title: FAQ
|
|
description: Frequently asked questions
|
|
---
|
|
|
|
### General
|
|
|
|
**Q: The trainer stopped and hasn't progressed in several minutes.**
|
|
|
|
> A: Usually an issue with the GPUs communicating with each other. See the [NCCL doc](nccl.qmd)
|
|
|
|
**Q: exitcode: -9**
|
|
|
|
> A: This usually happens when you run out of system RAM.
|
|
|
|
**Q: exitcode: -7 while using deepspeed**
|
|
|
|
> A: Try upgrading deepspeed w: `pip install -U deepspeed`
|
|
|
|
**Q: AttributeError: 'DummyOptim' object has no attribute 'step'**
|
|
|
|
**Q: ModuleNotFoundError: No module named 'mpi4py' using single GPU with deepspeed**
|
|
|
|
> A: You may be using deepspeed with single gpu. Please remove the `deepspeed:` section in the yaml file or `--deepspeed` CLI flag.
|
|
|
|
**Q: The codes is stuck on saving preprocessed datasets.**
|
|
|
|
> A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable `CUDA_VISIBLE_DEVICES=0`. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.
|
|
|
|
**Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.**
|
|
|
|
> A: This is likely due to vocab size mismatch. By default, Axolotl expands the model's embeddings if the tokenizer has more tokens than the model. Please use the `axolotl merge-lora` command to merge the adapters instead of using your own scripts.
|
|
|
|
> On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model's embeddings unless `shrink_embeddings: true` is set in the config.
|
|
|
|
**Q: How to call Axolotl via custom python scripts?**
|
|
|
|
> A: Since Axolotl is just Python, please see `src/axolotl/cli/main.py` on how each command is called.
|
|
|
|
**Q: How to know the value to use for `fsdp_transformer_layer_cls_to_wrap`?**
|
|
|
|
> A: This is the class name of the transformer layer to wrap with FSDP. For example, for `LlamaForCausalLM`, the value is `LlamaDecoderLayer`. To find this for a specific model, check the model's `PreTrainedModel` definition and look for `_no_split_modules` variable in the `modeling_<model_name>.py` file within `transformers` library.
|
|
|
|
**Q: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token**
|
|
|
|
> A: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:
|
|
|
|
> ```yaml
|
|
> special_tokens:
|
|
> # str. If you're not sure, set to same as `eos_token`.
|
|
> pad_token: "..."
|
|
> ```
|
|
|
|
**Q: `IterableDataset error` or `KeyError: 'input_ids'` when using `preprocess` CLI**
|
|
|
|
> A: This is because you may be using `preprocess` CLI with `pretraining_dataset:` or `skip_prepare_dataset: true` respectively. Please use `axolotl train` CLI directly instead as these datasets are prepared on demand.
|
|
|
|
**Q: vLLM is not working with Axolotl**
|
|
|
|
> A: We currently recommend torch 2.6.0 for use with `vllm`. Please ensure you use the right version. For Docker, please use the `main-py3.11-cu124-2.6.0` tag.
|
|
|
|
**Q: FA2 2.8.0 `undefined symbol` runtime error on CUDA 12.4**
|
|
|
|
> A: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.
|
|
|
|
### Chat templates
|
|
|
|
**Q: `jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____`**
|
|
|
|
> A: This means that the property mapping for the stated attribute does not exist when building `chat_template` prompt. For example, if `no attribute 'content'`, please check you have added the correct mapping for `content` under `message_property_mappings`.
|
|
|
|
**Q: `Empty template generated for turn ___`**
|
|
|
|
> A: The `content` is empty for that turn.
|
|
|
|
**Q: `Could not find content start/end boundary for turn __`**
|
|
|
|
> A: The specific turn's start/end could not be detected. Please ensure you have set the `eos_token` following your `chat_template`. Otherwise, this could be a `chat_template` which doesn't use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not `[[dummy_message]]`. Please let us know about this.
|
|
|
|
**Q: `Content end boundary is before start boundary for turn ___`**
|
|
|
|
> A: This is an edge case which should not occur. Please create an Issue if this happens.
|
|
|
|
**Q: `Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.`**
|
|
|
|
> A: This is likely an empty turn.
|
|
|
|
**Q: The EOS token is incorrectly being masked or not being masked / `EOS token __ not found in chat template`.**
|
|
|
|
> A: There can be two reasons:
|
|
|
|
> 1. This is because of the mismatch between `tokenizer.eos_token` and EOS token in template. Please make sure to set `eos_token: ` under `special_tokens: ` to the same EOS token as in template.
|
|
|
|
> 2. The EOS token is not in the template. Please check if your template is correct. As an example, `phi_35` template does not use its dedicated EOS token `<|endoftext|>` at the end.
|
|
|
|
**Q: "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null. Please add a `chat_template` in tokenizer config"**
|
|
|
|
> A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See [chat_template](dataset-formats/conversation.qmd#chat-template) for more details.
|
|
|
|
**Q: The EOT token(s) are incorrectly being masked or not being masked / `EOT token __ not found in chat template`.**
|
|
|
|
> A: There can be two reasons:
|
|
|
|
> 1. The EOT token is different from the EOS token and was not specified under `eot_tokens: `. Please set `eot_tokens: ` to the same EOT token(s) as in template.
|
|
|
|
> 2. There is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case.
|
|
|
|
**Q: `EOT token encoding failed. Please check if the token is valid and can be encoded.`**
|
|
|
|
> A: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue.
|
|
|
|
**Q: `EOT token __ is encoded as multiple tokens.`**
|
|
|
|
> A: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under `tokens: ` or (recommended) override unused added_tokens via `added_tokens_overrides: `.
|
|
|
|
**Q: `Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot`**
|
|
|
|
> A: This is because the EOS token is in the `eot_tokens: ` while mismatch between `train_on_eos: ` and `train_on_eot: `. This will cause one to override the other. Please ensure that `train_on_eos: ` and `train_on_eot: ` are the same or remove the EOS token from `eot_tokens: `.
|
|
|
|
**Q: If `eot_tokens: ` is not provided, what happens?**
|
|
|
|
> A: If `eot_tokens: ` is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable.
|
|
|
|
> Internally, `eot_tokens: tokenizer.eos_token` and `train_on_eot: train_on_eos` (which defaults to `turn`). This transition helps clarify the naming and behavior of EOT/EOS tokens.
|
|
|
|
**Q: `Data processing error: CAS service error`**
|
|
|
|
> A: Try disabling XET with `export HF_HUB_DISABLE_XET=1`
|
|
|
|
**Q: `torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. `**
|
|
|
|
> A: Depending on the version of torch, you may need to include this in your YAML:
|
|
|
|
> ```yaml
|
|
> flex_attn_compile_kwargs:
|
|
> dynamic: false
|
|
> mode: max-autotune-no-cudagraphs
|
|
> ```
|
|
|
|
**Q: `ValueError("Backward pass should have cleared tracker of all tensors")`
|
|
|
|
> A: This may happen due to edge cases in using the modern OffloadActivations context manager for CUDA streams. If you encounter this error, you may have success using the naive implementation with `offload_activations: legacy` in your YAML.
|