README update

This commit is contained in:
Dan Saunders
2025-01-23 22:11:35 +00:00
parent 7145d52d99
commit 016ba124e4

View File

@@ -24,3 +24,17 @@ plugins:
diff_attention: true
```
Additional, optional arguments include:
```yaml
# How often to log diffential attention-related metrics to wandb
diff_attn_log_every: 100
# How many differential attention layers to monitor (strided from 0..k..num_layers)
diff_attn_num_monitor_layers: 3
# How many steps to "warmup" the mixing parameter for the negative component of differential attention
# Follows a linear warmup schedule from 0 to 1; if not specified, the mixing component is set to 1
diff_attn_warmup_steps: 1000
```