Files
axolotl/tests
kallewoof 92ee4256f7 feature: raise on long sequence drop (#3321)
* feature: raise on long sequence drop

It is sometimes not desired that sequences are silently dropped from the dataset, especially when the dataset has been carefully crafted and pre-fitted for the training context. This would then suggest that an error occurred somewhere in the process. This feature adds a third value for excess_length_strategy called 'raise', which will raise a ValueError if a sequence is encountered that is too long and would have normally been dropped/truncated.

* tests: add excess_length_strategy tests

* doc: updated return value description for drop_long_seq_in_dataset

* add @enable_hf_offline

* fixed cfg modified after validate_config called

* hf offline fix

* fix tqdm desc when raise is used

* test: added test for non-batched case

* accidental code change revert

* test: use pytest.raises

* test: simplified drop_seq_len tests

* test: moved excess_length_strat test to test_data.py

---------

Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-12-22 13:59:49 -05:00
..
2025-12-19 10:43:47 -05:00
2025-12-19 10:43:47 -05:00
2025-09-25 12:03:50 -04:00
2025-12-19 10:43:47 -05:00
2025-10-10 14:44:25 +01:00