Commit Graph

20 Commits

Author SHA1 Message Date
Wing Lian
d9b71edf84 bump transformers for fsdp-grad-accum fix, remove patch (#2079) 2024-11-19 02:23:09 -05:00
Wing Lian
9871fa060b optim e2e tests to run a bit faster (#2069) [skip ci]
* optim e2e tests to run a bit faster

* run prequant w/o lora_modules_to_save

* use smollm2
2024-11-18 12:35:31 -05:00
Wing Lian
d42f202046 Fsdp grad accum monkeypatch (#2064) 2024-11-15 19:11:04 -05:00
NanoCode012
5c7e89105d Fix: modelloader handling of model_kwargs load_in*bit (#1999)
* fix: load_in_*bit not properly read

* fix: load_*bit check

* fix: typo

* refactor: load * bit handling

* feat: add test dpo lora multi-gpu

* fix: turn off sample packing for dpo

* fix: missing warmup_steps

* fix: test to load in 8bit for lora

* skip 8bit lora on h100, add 4bit lora on h100 to multi gpu tests

* chore: reduce max_steps

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-10-30 14:41:34 -04:00
NanoCode012
2501c1a6a3 Fix: Gradient Accumulation issue (#1980)
* feat: support new arg num_items_in_batch

* use kwargs to manage extra unknown kwargs for now

* upgrade against upstream transformers main

* make sure trl is on latest too

* fix for upgraded trl

* fix: handle trl and transformer signature change

* feat: update trl to handle transformer signature

* RewardDataCollatorWithPadding no longer has max_length

* handle updated signature for tokenizer vs processor class

* invert logic for tokenizer vs processor class

* processing_class, not processor class

* also handle processing class in dpo

* handle model name w model card creation

* upgrade transformers and add a loss check test

* fix install of tbparse requirements

* make sure to add tbparse to req

* feat: revert kwarg to positional kwarg to be explicit

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-10-25 11:28:23 -04:00
Wing Lian
87455e7f32 swaps to use newer sample packing for mistral (#1773)
* swaps to use newer sample packing for mistral

* fix multipack patch test

* patch the common fa utils

* update for refactor of flash attn unpad

* remove un-needed drop attn mask for mistral

* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2

* update test
2024-07-23 01:41:11 -04:00
Wing Lian
98af5388ba bump flash attention 2.5.8 -> 2.6.1 (#1738)
* bump flash attention 2.5.8 -> 2.6.1

* use triton implementation of cross entropy from flash attn

* add smoke test for flash attn cross entropy patch

* fix args to xentropy.apply

* handle tuple from triton loss fn

* ensure the patch tests run independently

* use the wrapper already built into flash attn for cross entropy

* mark pytest as forked for patches

* use pytest xdist instead of forked, since cuda doesn't like forking

* limit to 1 process and use dist loadfile for pytest

* change up pytest for fixture to reload transformers w monkeypathc
2024-07-14 19:11:31 -04:00
Wing Lian
47e1916484 add tests so CI can catch updates where patches will break with unsloth (#1737) [skip ci] 2024-07-11 16:43:19 -04:00
Wing Lian
5370cedf0c support for gemma2 w sample packing (#1718) 2024-06-29 01:38:55 -04:00
Wing Lian
fe650dd326 make sure the CI fails when pytest script fails (#1669)
* make sure the pytest script fails

* make sure the defaults come through for tests

* make sure tensorboard is loaded for test assertion
2024-05-29 10:12:11 -04:00
Wing Lian
c10563c444 fix broken linting (#1541)
* chore: lint

* include examples in yaml check

* mistral decided to gate their models...

* more mistral models that were gated
2024-04-19 01:03:04 -04:00
Wing Lian
0f985e12fe more fixes 20240228 (#1342) [skip ci]
* add missing evals_per_epoch setting

* more pydantic fixes

* more fixes

* move test from normalization to validation

* increase eval size for sample packing tests
2024-02-28 12:57:45 -05:00
Wing Lian
5894f0e57e make mlflow optional (#1317)
* make mlflow optional

* fix xformers

don't patch swiglu if xformers not working
fix the check for xformers swiglu

* fix install of xformers with extra index url for docker builds

* fix docker build arg quoting
2024-02-26 11:41:33 -05:00
Wing Lian
8c2e05ade3 relora: magnitude pruning of the optimizer (#1245)
* magnitude pruning of the optimizer

* add alpaca chat template and fix relora patch

* fix handling of lora adapter for relora

* fix merge and save call

* fixes for 8-bit lora merge

* save intermediate checkpoint adapters

* auto merge

* fix eval check

* handle relora annealing

* fix anneal step logic

* chore: lint

* misx fix

* fix types

* Update tests/e2e/test_relora_llama.py

* check for safetensors saved from relora
2024-02-06 00:35:30 -05:00
Wing Lian
00568c1539 support for true batches with multipack (#1230)
* support for true batches with multipack

* patch the map dataset fetcher to handle batches with packed indexes

* patch 4d mask creation for sdp attention

* better handling for BetterTransformer

* patch general case for 4d mask

* setup forward patch. WIP

* fix patch file

* support for multipack w/o flash attention for llama

* cleanup

* add warning about bf16 vs fp16 for multipack with sdpa

* bugfixes

* add 4d multipack tests, refactor patches

* update tests and add warnings

* fix e2e file check

* skip sdpa test if not at least torch 2.1.1, update docs
2024-02-01 10:18:42 -05:00
Wing Lian
814aee6603 Phi2 multipack (#1173)
* phi2 multipack

* update validation and examples for phi

* more updates to phi examples

* make sure to use the correct collator for phi multipack

* phi needs attention mask now for multipack

* if the special token already exists in the tokenizer, don't require in lora modules to save

* fix qlora yml for phi, fix phi test validation

* test qlora too

* make sure flash attention is enabled for the test

* don't use remote code for phi anymore

* reduce sequence len for sample packing phi
2024-01-23 12:54:36 -05:00
Wing Lian
e799e08d3c Falcon embeddings (#1149) [skip docker]
* also fix multipack for falcon and add smoke tests

* make sure to handle special tokens and added tokens for lora

* fix reference to model_type

* fix tests for falcon

* fix stray typo

* fixes for smoke tests
2024-01-22 21:01:42 -05:00
Wing Lian
6910e6a8ca Multipack simplify for Mixtral (#1142) 2024-01-18 16:23:49 -05:00
Joe Cummings
1d70f24b50 Add shifted sparse attention (#973) [skip-ci]
* Add s2_attn to hijack flash code

* Refactor code to account for s2_attn

* Add test for models utils

* Add ``s2_attention`` option to llama configs

* Add ``s2_attention`` option to README config

* Format code to appease linter

* chore: lint

* Remove xpos and llama-landmark [bad merge]

* add e2e smoke tests for shifted sparse attention

* remove stray patch from merge

* update yml with link to paper for s2_attention/longlora

* fix assertion check for full fine tune

* increase sequence len for tests and PR feedback updates

* reduce context len to 16k for tests

* reduce context len to 16k for tests

* reduce batch size for larger context len and udpate test to check message

* fix test for message

---------

Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-18 10:16:07 -05:00
Wing Lian
788649fe95 attempt to also run e2e tests that needs gpus (#1070)
* attempt to also run e2e tests that needs gpus

* fix stray quote

* checkout specific github ref

* dockerfile for tests with proper checkout

ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support

* pytest skip for auto-gptq requirements

* skip mamba tests for now, split multipack and non packed lora llama tests

* split tests that use monkeypatches

* fix relative import for prev commit

* move other tests using monkeypatches to the correct run
2024-01-09 21:23:23 -05:00