README update
This commit is contained in:
@@ -24,3 +24,17 @@ plugins:
|
|||||||
|
|
||||||
diff_attention: true
|
diff_attention: true
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Additional, optional arguments include:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# How often to log diffential attention-related metrics to wandb
|
||||||
|
diff_attn_log_every: 100
|
||||||
|
|
||||||
|
# How many differential attention layers to monitor (strided from 0..k..num_layers)
|
||||||
|
diff_attn_num_monitor_layers: 3
|
||||||
|
|
||||||
|
# How many steps to "warmup" the mixing parameter for the negative component of differential attention
|
||||||
|
# Follows a linear warmup schedule from 0 to 1; if not specified, the mixing component is set to 1
|
||||||
|
diff_attn_warmup_steps: 1000
|
||||||
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user