diff --git a/.nojekyll b/.nojekyll index 84a10c498..5673794d1 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f92aa588 \ No newline at end of file +92924141 \ No newline at end of file diff --git a/docs/faq.html b/docs/faq.html index cd5cf252f..acc545f2b 100644 --- a/docs/faq.html +++ b/docs/faq.html @@ -601,6 +601,14 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
+A: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.
Q: Can we mix text and text+image datasets for VLM training?
+++A: Yes, you can for newer VLM arch. The ones that would not work are LLaVA / Pixtral arch. If you notice one not working, please let us know!
+
Q: Why is memory/max_* different from nvidia-smi?
+A: We use
+torchAPIs to retrieve this information. You can see https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management for more information.
In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate
of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s
self attention q_proj module.
We currently only support varying lr for now. If you’re interested in adding support for others (weight_decay), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17
Please see examples folder for full configs.
-Some of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.
As of now, we do not truncate nor drop samples based on sequence_len as each arch has different ways to process non-text tokens. We are looking for help on this.
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
@@ -757,6 +771,12 @@ Tip
chat_template: qwen2_vl # same as qwen2-vlbase_model: Qwen/Qwen3-VL-4B-Instruct
+
+chat_template: qwen2_vl # same as qwen2-vlPlease make sure to install num2words via pip3 install num2words==0.5.14
base_model: HuggingFaceTB/SmolVLM2-500M-Video-Instructbase_model: HuggingFaceTB/SmolVLM2-500M-Video-InstructPlease uninstall causal-conv1d via pip3 uninstall -y causal-conv1d
base_model: LiquidAI/LFM2-VL-450Mbase_model: LiquidAI/LFM2-VL-450MHere is an example of a multi-modal dataset:
-[
- {
- "messages": [
- {
- "role": "system",
- "content": [
- {"type": "text", "text": "You are a helpful assistant."}
- ]
- },
- {
- "role": "user",
- "content": [
- {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
- {"type": "text", "text": "Describe this image in detail."}
- ]
- },
- {
- "role": "assistant",
- "content": [
- {"type": "text", "text": "The image is a bee."}
- ]
- }
- ]
- }
-][
+ {
+ "messages": [
+ {
+ "role": "system",
+ "content": [
+ {"type": "text", "text": "You are a helpful assistant."}
+ ]
+ },
+ {
+ "role": "user",
+ "content": [
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
+ {"type": "text", "text": "Describe this image in detail."}
+ ]
+ },
+ {
+ "role": "assistant",
+ "content": [
+ {"type": "text", "text": "The image is a bee."}
+ ]
+ }
+ ]
+ }
+]