127 lines
6.7 KiB
Plaintext
127 lines
6.7 KiB
Plaintext
---
|
|
title: FAQ
|
|
description: Frequently asked questions
|
|
---
|
|
|
|
### General
|
|
|
|
**Q: The trainer stopped and hasn't progressed in several minutes.**
|
|
|
|
> A: Usually an issue with the GPUs communicating with each other. See the [NCCL doc](nccl.qmd)
|
|
|
|
**Q: Exitcode -9**
|
|
|
|
> A: This usually happens when you run out of system RAM.
|
|
|
|
**Q: Exitcode -7 while using deepspeed**
|
|
|
|
> A: Try upgrading deepspeed w: `pip install -U deepspeed`
|
|
|
|
**Q: AttributeError: 'DummyOptim' object has no attribute 'step'**
|
|
|
|
**Q: ModuleNotFoundError: No module named 'mpi4py' using single GPU with deepspeed**
|
|
|
|
> A: You may be using deepspeed with single gpu. Please remove the `deepspeed:` section in the yaml file or `--deepspeed` CLI flag.
|
|
|
|
**Q: The codes is stuck on saving preprocessed datasets.**
|
|
|
|
> A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable `CUDA_VISIBLE_DEVICES=0`. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.
|
|
|
|
**Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.**
|
|
|
|
> A: This is likely due to vocab size mismatch. By default, Axolotl expands the model's embeddings if the tokenizer has more tokens than the model. Please use the `axolotl merge-lora` command to merge the adapters instead of using your own scripts.
|
|
|
|
> On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model's embeddings unless `shrink_embeddings: true` is set in the config.
|
|
|
|
**Q: How to call Axolotl via custom python scripts?**
|
|
|
|
> A: Since Axolotl is just Python, please see `src/axolotl/cli/main.py` on how each command is called.
|
|
|
|
**Q: How to know the value to use for `fsdp_transformer_layer_cls_to_wrap`?**
|
|
|
|
> A: This is the class name of the transformer layer to wrap with FSDP. For example, for `LlamaForCausalLM`, the value is `LlamaDecoderLayer`. To find this for a specific model, check the model's `PreTrainedModel` definition and look for `_no_split_modules` variable in the `modeling_<model_name>.py` file within `transformers` library.
|
|
|
|
**Q: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token**
|
|
|
|
> A: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:
|
|
|
|
> ```yaml
|
|
> special_tokens:
|
|
> # str. If you're not sure, set to same as `eos_token`.
|
|
> pad_token: "..."
|
|
> ```
|
|
|
|
### Chat templates
|
|
|
|
**Q: `jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____`**
|
|
|
|
> A: This means that the property mapping for the stated attribute does not exist when building `chat_template` prompt. For example, if `no attribute 'content'`, please check you have added the correct mapping for `content` under `message_property_mappings`.
|
|
|
|
**Q: `Empty template generated for turn ___`**
|
|
|
|
> A: The `content` is empty for that turn.
|
|
|
|
**Q: `Could not find content start/end boundary for turn __`**
|
|
|
|
> A: The specific turn's start/end could not be detected. Please ensure you have set the `eos_token` following your `chat_template`. Otherwise, this could be a `chat_template` which doesn't use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not `[[dummy_message]]`. Please let us know about this.
|
|
|
|
**Q: `Content end boundary is before start boundary for turn ___`**
|
|
|
|
> A: This is an edge case which should not occur. Please create an Issue if this happens.
|
|
|
|
**Q: `Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.`**
|
|
|
|
> A: This is likely an empty turn.
|
|
|
|
**Q: The EOS token is incorrectly being masked or not being masked / `EOS token __ not found in chat template`.**
|
|
|
|
> A: There can be two reasons:
|
|
|
|
> 1. This is because of the mismatch between `tokenizer.eos_token` and EOS token in template. Please make sure to set `eos_token: ` under `special_tokens: ` to the same EOS token as in template.
|
|
|
|
> 2. The EOS token is not in the template. Please check if your template is correct. As an example, `phi_35` template does not use its dedicated EOS token `<|endoftext|>` at the end.
|
|
|
|
**Q: "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null. Please add a `chat_template` in tokenizer config"**
|
|
|
|
> A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See [chat_template](dataset-formats/conversation.qmd#chat-template) for more details.
|
|
|
|
**Q: The EOT token(s) are incorrectly being masked or not being masked / `EOT token __ not found in chat template`.**
|
|
|
|
> A: There can be two reasons:
|
|
|
|
> 1. The EOT token is different from the EOS token and was not specified under `eot_tokens: `. Please set `eot_tokens: ` to the same EOT token(s) as in template.
|
|
|
|
> 2. There is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case.
|
|
|
|
**Q: `EOT token encoding failed. Please check if the token is valid and can be encoded.`**
|
|
|
|
> A: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue.
|
|
|
|
**Q: `EOT token __ is encoded as multiple tokens.`**
|
|
|
|
> A: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under `tokens: ` or (recommended) override unused added_tokens via `added_tokens_overrides: `.
|
|
|
|
**Q: `Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot`**
|
|
|
|
> A: This is because the EOS token is in the `eot_tokens: ` while mismatch between `train_on_eos: ` and `train_on_eot: `. This will cause one to override the other. Please ensure that `train_on_eos: ` and `train_on_eot: ` are the same or remove the EOS token from `eot_tokens: `.
|
|
|
|
**Q: If `eot_tokens: ` is not provided, what happens?**
|
|
|
|
> A: If `eot_tokens: ` is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable.
|
|
|
|
> Internally, `eot_tokens: tokenizer.eos_token` and `train_on_eot: train_on_eos` (which defaults to `turn`). This transition helps clarify the naming and behavior of EOT/EOS tokens.
|
|
|
|
**Q: `Data processing error: CAS service error`**
|
|
|
|
> A: Try disabling XET with `export HF_HUB_DISABLE_XET=1`
|
|
|
|
**Q: `torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. `**
|
|
|
|
> A: Depending on the version of torch, you may need to include this in your YAML:
|
|
|
|
> ```yaml
|
|
> flex_attn_compile_kwargs:
|
|
> dynamic: false
|
|
> mode: max-autotune-no-cudagraphs
|
|
> ```
|