FSDP1 -> FSDP2 (#2760)
* FSDP2 args migration implementation This commit implements the migration to FSDP2 arguments including: - FSDP2 support with LoRA training - DPO integration with FSDP2 - Model loading fixes and refactoring - CPU offloading and PEFT handling - Test updates and CI improvements - Bug fixes for dtype errors and various edge cases
This commit is contained in:
@@ -23,8 +23,6 @@ Axolotl supports several methods for multi-GPU training:
|
||||
|
||||
## DeepSpeed {#sec-deepspeed}
|
||||
|
||||
DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.
|
||||
|
||||
### Configuration {#sec-deepspeed-config}
|
||||
|
||||
Add to your YAML config:
|
||||
@@ -32,7 +30,6 @@ Add to your YAML config:
|
||||
```{.yaml}
|
||||
deepspeed: deepspeed_configs/zero1.json
|
||||
```
|
||||
|
||||
### Usage {#sec-deepspeed-usage}
|
||||
|
||||
```{.bash}
|
||||
@@ -75,9 +72,66 @@ ZeRO Stage 3 can be used for training on a single GPU by manually setting the en
|
||||
|
||||
:::
|
||||
|
||||
## FSDP {#sec-fsdp}
|
||||
## Fully Sharded Data Parallel (FSDP) {#sec-fsdp}
|
||||
|
||||
### Basic FSDP Configuration {#sec-fsdp-config}
|
||||
::: {.callout-note}
|
||||
|
||||
FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.
|
||||
|
||||
:::
|
||||
|
||||
### Migrating from FSDP1 to FSDP2 {#sec-migrate-fsdp1-fsdp2}
|
||||
|
||||
To migrate your config from FSDP1 to FSDP2, you must use the `fsdp_version` top-level config field to specify the FSDP version, and
|
||||
also follow the config field mapping below to update field names.
|
||||
|
||||
#### Config mapping
|
||||
|
||||
FSDP1 | FSDP2
|
||||
-------- | --------
|
||||
fsdp_sharding_strategy | reshard_after_forward
|
||||
fsdp_backward_prefetch_policy | **REMOVED**
|
||||
fsdp_backward_prefetch | **REMOVED**
|
||||
fsdp_forward_prefetch | **REMOVED**
|
||||
fsdp_sync_module_states | **REMOVED**
|
||||
fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loading
|
||||
fsdp_state_dict_type | state_dict_type
|
||||
fsdp_use_orig_params | **REMOVED**
|
||||
|
||||
|
||||
For example, if you were using the following FSDP1 config:
|
||||
|
||||
```{.yaml}
|
||||
fsdp_version: 1
|
||||
fsdp_config:
|
||||
fsdp_offload_params: false
|
||||
fsdp_cpu_ram_efficient_loading: true
|
||||
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
|
||||
fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
|
||||
fsdp_state_dict_type: FULL_STATE_DICT
|
||||
fsdp_sharding_strategy: FULL_SHARD
|
||||
```
|
||||
|
||||
You can migrate to the following FSDP2 config:
|
||||
|
||||
```{.yaml}
|
||||
fsdp_version: 2
|
||||
fsdp_config:
|
||||
offload_params: false
|
||||
cpu_ram_efficient_loading: true
|
||||
auto_wrap_policy: TRANSFORMER_BASED_WRAP
|
||||
transformer_layer_cls_to_wrap: Qwen3DecoderLayer
|
||||
state_dict_type: FULL_STATE_DICT
|
||||
reshard_after_forward: true
|
||||
```
|
||||
|
||||
### FSDP1 (deprecated) {#sec-fsdp-config}
|
||||
|
||||
::: {.callout-note}
|
||||
|
||||
Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead.
|
||||
|
||||
:::
|
||||
|
||||
```{.yaml}
|
||||
fsdp:
|
||||
@@ -89,6 +143,7 @@ fsdp_config:
|
||||
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
|
||||
```
|
||||
|
||||
|
||||
## Sequence parallelism {#sec-sequence-parallelism}
|
||||
|
||||
We support sequence parallelism (SP) via the
|
||||
|
||||
Reference in New Issue
Block a user