axolotl

Author	SHA1	Message	Date
Wing Lian	131afdbd89	add bf16 check (#587 )	2023-09-17 13:49:03 -04:00
NanoCode012	00dce35fb2	Feat(data): Allow loading local csv and text (#594 ) * Feat(data): Allow loading local csv and text * chore: update readme for loading data	2023-09-17 11:32:27 -04:00
Wing Lian	b15b19eb8d	gather/broadcast the max value of the packing efficiency automatically (#463 )	2023-09-17 11:08:18 -04:00
Wing Lian	ab534d75ba	don't add position_ids for evals (#591 )	2023-09-16 16:11:57 -04:00
Wing Lian	21ec195c9f	optionally configure sample packing for evals (#589 )	2023-09-16 00:09:48 -04:00
Wing Lian	62eaee7649	make phi training work with Loras (#588 ) * valdiation for phi loras * fix model config class check * update readme for phi traiing	2023-09-15 20:51:55 -04:00
Jan Philipp Harries	be75668400	set fsdp state dict (#584 ) Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>	2023-09-15 17:47:36 -04:00
Wing Lian	aeec7c4688	pop block_cls since it's not an actual kwarg	2023-09-15 15:54:06 -04:00
Wing Lian	360788296a	don't resize embeddings if it's already large enough (#577 ) * don't resize embeddings if it's already large enough * make sure to tie weights, even if we aren't resizing	2023-09-15 15:47:09 -04:00
Wing Lian	12a2dbbc2c	Support Sample packing for phi arch (#586 ) * phi sequence packing * sample packing fixes * fix linting * fix inference and phi e2e tests * update phi example now that sample packing works * wandb import keeps getting moved around	2023-09-15 15:46:54 -04:00
NanoCode012	3a2edc85c3	Feat(doc): Add features to doc (#583 )	2023-09-16 01:14:15 +09:00
Wing Lian	f7a22632d7	support custom field for completion from yml (#580 ) * support custom field for completion from yml * remove legacy completion check and add doc * update README docs	2023-09-15 07:48:21 -04:00
Doan Minh Phuong	1aa400721e	Fix Codellama examples (#582 ) * Fix seq_len * Update lora.yml * Update qlora.yml * Update lora.yml * Update lora.yml * Update qlora.yml	2023-09-15 04:19:13 -04:00
Wing Lian	8dcd40ac78	prevent cli functions from getting fired on import (#581 )	2023-09-15 04:03:32 -04:00
Wing Lian	a5a625f47e	update support matrix with btlm and phi (#579 )	2023-09-15 02:46:15 -04:00
Wing Lian	861cecac2a	refactor scripts/finetune.py into new cli modules (#550 ) * refactor scripts/finetune.py into new cli modules * continue to support scripts/finetune.py * update readme with updated cli commands * Update scripts/finetune.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-09-15 01:43:52 -04:00
Wing Lian	1078d3eae7	E2e passing tests (#576 ) * run e2e tests after all other checks have passed * tweak tests so they get run on PRs or push to main * change dependent action for chcecking * one test workflow to rule them all * no need for custom action, just use needs * whoops, python version should be a string * e2e tests can run on any available gpu	2023-09-15 01:03:49 -04:00
Wing Lian	24146733db	E2e device cuda (#575 ) * use torch.cuda.current_device() instead of local_rank * ignore NVML errors for gpu stats * llama lora packing e2e tests	2023-09-14 22:49:27 -04:00
Wing Lian	9218ebecd2	e2e testing (#574 )	2023-09-14 21:56:11 -04:00
Wing Lian	228420972e	Phi examples (#569 ) * add phi full ft example * Add readme to point out that deepspeed should be used * zero1 is better than zero2 for phi	2023-09-14 11:17:47 -04:00
Wing Lian	c6d870b91d	mypy wandb ignore (#572 ) * mypy wandb ignore * fix isort for wandb	2023-09-14 11:17:30 -04:00
Wing Lian	115795079d	remove columns after tokenizing for pretraining (#571 )	2023-09-14 11:08:22 -04:00
Wing Lian	3b18c963cc	set auto for other params that hf trainer sets for ds. include zero1 json (#570 )	2023-09-14 11:04:37 -04:00
Wing Lian	3fbde762ab	fix save_steps so it doesn't get duplicated (#567 )	2023-09-13 20:40:33 -04:00
Wing Lian	f6060a664e	Model parallel (#538 ) * model-parallel for single process * fix device/device_map * fix handling for device	2023-09-13 11:45:30 -04:00
Wing Lian	a4e1bb6606	let hf trainer handle torch compile (#516 ) * let hf trainer handle torch compile * remove torch compile checks, include option for backend * suppress torch errors to get further * require min torch version of 2.1.0 for torch compile to work --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2023-09-13 11:42:12 -04:00
Wing Lian	36e53c7442	improve how we setup eval/save strategies and steps (#547 ) * setup save end eval strategies to be consistent with trainer logic * add comments * better eval handling	2023-09-13 11:37:23 -04:00
Wing Lian	e7aa7b1a1e	gracefully handle length feature used for group by (#565 )	2023-09-13 11:23:30 -04:00
Wing Lian	e5bb22a56b	add optimization for group-by-len (#563 )	2023-09-13 10:57:12 -04:00
Wing Lian	fdb777bc06	check for the existence of the default accelerate config that can create headaches (#561 )	2023-09-13 10:38:28 -04:00
Wing Lian	bf0804447c	fix wandb so mypy doesn't complain (#562 ) * fix wandb so mypy doesn't complain * fix wandb so mypy doesn't complain * no need for mypy override anymore	2023-09-13 10:36:16 -04:00
Glavin Wiechert	5b67ea98a6	Add training callback to send predictions to WandB table (#521 ) * WIP Add training callback to send predictions to WandB table * WIP improve wandb table reporting callback * WIP improve wandb table reporting callback (cont) * Add VSCode launching for debugging * Add tiny llama example * WIP attempt to improve post-eval prediction generation for table * WIP attempt to improve post-eval prediction generation for table - part 2 * WIP batch generation * WIP attempt to handle sample_packing using position_ids for wandb prediction table * WIP add code for debugging * Fix sample_packing support for wandb prediction table * Clean up code for PR review * Add eval_table_size, eval_table_max_new_tokens configs & clean up code * Clean up PR, delete VSCode config, add tiny-llama example * Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting	2023-09-13 09:51:08 -04:00
Jan Philipp Harries	2f586d18db	Fix pretraining with iterable/streaming Dataset (#556 ) * return without packing prep/len * fix remove columns * fix encode arguments * add error when max steps not set * fix test --------- Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>	2023-09-13 00:16:40 -04:00
Wing Lian	9845c5e12d	document that packaging needs to be installed before flash-attn (#559 )	2023-09-12 12:18:30 -04:00
Wing Lian	772cd870d4	fix the sed command to replace the version w the tag Some checks failed pre-commit / pre-commit (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details PyTest / test (3.10) (push) Has been cancelled Details PyTest / test (3.9) (push) Has been cancelled Details v0.3.0	2023-09-11 13:44:19 -04:00
Wing Lian	6c5fbe6223	add long_description for pypi push (#555 )	2023-09-11 13:34:29 -04:00
Wing Lian	bcbc9597e9	replace tags, build dist for pypi publish (#553 ) * replace tags, build dist for pypi publish * missing trailing comma	2023-09-11 13:25:41 -04:00
The Objective Dad	6d57f2f0f0	ergonomic update to optimizer config doc (#548 )	2023-09-11 12:35:45 -04:00
Wing Lian	20ed4c1f9e	pypi on tag push (#552 )	2023-09-11 10:33:42 -04:00
Wing Lian	c5dedb17ad	remove with section, doesn't seem to work (#551 )	2023-09-11 10:27:17 -04:00
Wing Lian	b56503d423	publish to pypi workflow on tagged release (#549 )	2023-09-11 09:44:47 -04:00
Wing Lian	a94f9cb99e	fix for quant config from model (#540 )	2023-09-10 12:40:52 -04:00
dongxiaolong	c1921c9acb	Update requirements.txt (#543 ) fix fsdp	2023-09-08 16:07:11 -04:00
Wing Lian	0b4cf5bc8c	workaround for md5 variations (#533 ) * workaround for md5 variations * refactor the prepared hash too	2023-09-08 16:01:05 -04:00
SlapDrone	78ee2cdab2	add git environment variables to compose: avoid checkout failure error 128 on build (#534 )	2023-09-08 15:59:49 -04:00
Wing Lian	34c0a86a11	update readme to point to direct link to runpod template, cleanup install instrucitons (#532 ) * update readme to point to direct link to runpod template, cleanup install instrucitons * default install flash-attn and auto-gptq now too * update readme w flash-attn extra * fix version in setup	2023-09-08 11:58:54 -04:00
The Objective Dad	5e2d8a42d9	Adding NCCL Timeout Guide (#536 ) * fixes NCCL_P2P_LEVEL=NVL #429 * adding more insights into verious values of NCCL_P2P_LEVEL	2023-09-08 11:57:47 -04:00
Wing Lian	e30f1e3cf7	Early stopping metric (#537 ) * set early stopping metric to check * tweak how load_best_model_at_end gets set for early stopping * add validation for earl;y stopping patience * remove negation * save results to metrics in callback * move early stopping callback after the benchmark evals * broadcast metrics so early stopping works	2023-09-08 11:57:02 -04:00
Wing Lian	343714972b	recommend padding when using sample packing (#531 )	2023-09-06 17:00:21 -04:00
Wing Lian	245c5c41e2	log rank too (#527 )	2023-09-06 08:37:51 -04:00

1 2 3 4 5 ...

924 Commits