* hf offline decorator for tests to workaround rate limits
* fail quicker so we can see logs
* try new cache name
* limit files downloaded
* phi mini predownload
* offline decorator for phi tokenizer
* handle meta llama 8b offline too
* make sure to return fixtures if they are wrapped too
* more fixes
* more things offline
* more offline things
* fix the env var
* fix the model name
* handle gemma also
* force reload of modules to recheck offline status
* prefetch mistral too
* use reset_sessions so hub picks up offline mode
* more fixes
* rename so it doesn't seem like a context manager
* fix backoff
* switch out tinyshakespeare dataset since it runs a py script to fetch data and doesn't work offline
* include additional dataset
* more fixes
* more fixes
* replace tiny shakespeaere dataset
* skip some tests for now
* use more robust check using snapshot download to determine if a dataset name is on the hub
* typo for skip reason
* use local_files_only
* more fixtures
* remove local only
* use tiny shakespeare as pretrain dataset and streaming can't be offline even if precached
* make sure fixtures aren't offline
improve the offline reset
try bumping version of datasets
reorder reloading and setting
prime a new cache
run the tests now with fresh cache
try with a static cache
* now run all the ci again with hopefully a correct cache
* skip wonky tests for now
* skip wonky tests for now
* handle offline mode for model card creation
* Extend MultiPackBatchSampler test to include shorter sequence length and drop long sequences filter
* Fix get_dataset_lengths for datasets that were previously filtered (e.g., with drop_long_seq_in_dataset)
* Update src/axolotl/utils/samplers/utils.py
Fix get_dataset_lengths for datasets that do not have position_ids or length attributes
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* Switch to parallel FFD bin packing algorithm.
Add support for packing in a distributed context.
Add packing efficiency estimate back.
* revert changes to distributed code
* chore: lint
* fix config w new params for packing test
* add sample_packing_group_size and sample_packing_bin_size to cfg schema
* fix lamdbda function
* fix sampler/dataloader calculations for packing
---------
Co-authored-by: dsesclei <dave@sescleifer.com>
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs