Wing Lian
|
f544ab2bed
|
don't compile deepspeed or bitsandbytes from source (#837)
|
2023-11-08 19:49:55 -05:00 |
|
Wing Lian
|
8b79ff0e94
|
fix eval_steps to be a sane default (#797)
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
|
2023-10-27 22:36:30 -04:00 |
|
Wing Lian
|
2d8def68dc
|
simplify by removing duplicate base_model_config (#772)
|
2023-10-23 01:42:38 -04:00 |
|
Wing Lian
|
e50a64e85e
|
prepared dataset caching, other misc fixes (#665)
* prepared dataset caching, other misc fixes
* also don't load from disk cache unless explicit
|
2023-10-02 21:07:24 -04:00 |
|
Wing Lian
|
d887ad86c3
|
eval_table isn't quite stable enough to be in default llama configs (#637)
|
2023-09-26 10:13:20 -04:00 |
|
Glavin Wiechert
|
5b67ea98a6
|
Add training callback to send predictions to WandB table (#521)
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
|
2023-09-13 09:51:08 -04:00 |
|