axolotl

Author	SHA1	Message	Date
Wing Lian	1078d3eae7	E2e passing tests (#576 ) * run e2e tests after all other checks have passed * tweak tests so they get run on PRs or push to main * change dependent action for chcecking * one test workflow to rule them all * no need for custom action, just use needs * whoops, python version should be a string * e2e tests can run on any available gpu	2023-09-15 01:03:49 -04:00
Wing Lian	24146733db	E2e device cuda (#575 ) * use torch.cuda.current_device() instead of local_rank * ignore NVML errors for gpu stats * llama lora packing e2e tests	2023-09-14 22:49:27 -04:00
Wing Lian	9218ebecd2	e2e testing (#574 )	2023-09-14 21:56:11 -04:00
Wing Lian	228420972e	Phi examples (#569 ) * add phi full ft example * Add readme to point out that deepspeed should be used * zero1 is better than zero2 for phi	2023-09-14 11:17:47 -04:00
Wing Lian	c6d870b91d	mypy wandb ignore (#572 ) * mypy wandb ignore * fix isort for wandb	2023-09-14 11:17:30 -04:00
Wing Lian	115795079d	remove columns after tokenizing for pretraining (#571 )	2023-09-14 11:08:22 -04:00
Wing Lian	3b18c963cc	set auto for other params that hf trainer sets for ds. include zero1 json (#570 )	2023-09-14 11:04:37 -04:00
Wing Lian	3fbde762ab	fix save_steps so it doesn't get duplicated (#567 )	2023-09-13 20:40:33 -04:00
Wing Lian	f6060a664e	Model parallel (#538 ) * model-parallel for single process * fix device/device_map * fix handling for device	2023-09-13 11:45:30 -04:00
Wing Lian	a4e1bb6606	let hf trainer handle torch compile (#516 ) * let hf trainer handle torch compile * remove torch compile checks, include option for backend * suppress torch errors to get further * require min torch version of 2.1.0 for torch compile to work --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2023-09-13 11:42:12 -04:00
Wing Lian	36e53c7442	improve how we setup eval/save strategies and steps (#547 ) * setup save end eval strategies to be consistent with trainer logic * add comments * better eval handling	2023-09-13 11:37:23 -04:00
Wing Lian	e7aa7b1a1e	gracefully handle length feature used for group by (#565 )	2023-09-13 11:23:30 -04:00
Wing Lian	e5bb22a56b	add optimization for group-by-len (#563 )	2023-09-13 10:57:12 -04:00
Wing Lian	fdb777bc06	check for the existence of the default accelerate config that can create headaches (#561 )	2023-09-13 10:38:28 -04:00
Wing Lian	bf0804447c	fix wandb so mypy doesn't complain (#562 ) * fix wandb so mypy doesn't complain * fix wandb so mypy doesn't complain * no need for mypy override anymore	2023-09-13 10:36:16 -04:00
Glavin Wiechert	5b67ea98a6	Add training callback to send predictions to WandB table (#521 ) * WIP Add training callback to send predictions to WandB table * WIP improve wandb table reporting callback * WIP improve wandb table reporting callback (cont) * Add VSCode launching for debugging * Add tiny llama example * WIP attempt to improve post-eval prediction generation for table * WIP attempt to improve post-eval prediction generation for table - part 2 * WIP batch generation * WIP attempt to handle sample_packing using position_ids for wandb prediction table * WIP add code for debugging * Fix sample_packing support for wandb prediction table * Clean up code for PR review * Add eval_table_size, eval_table_max_new_tokens configs & clean up code * Clean up PR, delete VSCode config, add tiny-llama example * Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting	2023-09-13 09:51:08 -04:00
Jan Philipp Harries	2f586d18db	Fix pretraining with iterable/streaming Dataset (#556 ) * return without packing prep/len * fix remove columns * fix encode arguments * add error when max steps not set * fix test --------- Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>	2023-09-13 00:16:40 -04:00
Wing Lian	9845c5e12d	document that packaging needs to be installed before flash-attn (#559 )	2023-09-12 12:18:30 -04:00
Wing Lian	772cd870d4	fix the sed command to replace the version w the tag Some checks failed pre-commit / pre-commit (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details PyTest / test (3.10) (push) Has been cancelled Details PyTest / test (3.9) (push) Has been cancelled Details v0.3.0	2023-09-11 13:44:19 -04:00
Wing Lian	6c5fbe6223	add long_description for pypi push (#555 )	2023-09-11 13:34:29 -04:00
Wing Lian	bcbc9597e9	replace tags, build dist for pypi publish (#553 ) * replace tags, build dist for pypi publish * missing trailing comma	2023-09-11 13:25:41 -04:00
The Objective Dad	6d57f2f0f0	ergonomic update to optimizer config doc (#548 )	2023-09-11 12:35:45 -04:00
Wing Lian	20ed4c1f9e	pypi on tag push (#552 )	2023-09-11 10:33:42 -04:00
Wing Lian	c5dedb17ad	remove with section, doesn't seem to work (#551 )	2023-09-11 10:27:17 -04:00
Wing Lian	b56503d423	publish to pypi workflow on tagged release (#549 )	2023-09-11 09:44:47 -04:00
Wing Lian	a94f9cb99e	fix for quant config from model (#540 )	2023-09-10 12:40:52 -04:00
dongxiaolong	c1921c9acb	Update requirements.txt (#543 ) fix fsdp	2023-09-08 16:07:11 -04:00
Wing Lian	0b4cf5bc8c	workaround for md5 variations (#533 ) * workaround for md5 variations * refactor the prepared hash too	2023-09-08 16:01:05 -04:00
SlapDrone	78ee2cdab2	add git environment variables to compose: avoid checkout failure error 128 on build (#534 )	2023-09-08 15:59:49 -04:00
Wing Lian	34c0a86a11	update readme to point to direct link to runpod template, cleanup install instrucitons (#532 ) * update readme to point to direct link to runpod template, cleanup install instrucitons * default install flash-attn and auto-gptq now too * update readme w flash-attn extra * fix version in setup	2023-09-08 11:58:54 -04:00
The Objective Dad	5e2d8a42d9	Adding NCCL Timeout Guide (#536 ) * fixes NCCL_P2P_LEVEL=NVL #429 * adding more insights into verious values of NCCL_P2P_LEVEL	2023-09-08 11:57:47 -04:00
Wing Lian	e30f1e3cf7	Early stopping metric (#537 ) * set early stopping metric to check * tweak how load_best_model_at_end gets set for early stopping * add validation for earl;y stopping patience * remove negation * save results to metrics in callback * move early stopping callback after the benchmark evals * broadcast metrics so early stopping works	2023-09-08 11:57:02 -04:00
Wing Lian	343714972b	recommend padding when using sample packing (#531 )	2023-09-06 17:00:21 -04:00
Wing Lian	245c5c41e2	log rank too (#527 )	2023-09-06 08:37:51 -04:00
Wing Lian	a546ca2813	misc fixes/improvements (#513 ) fix per pr feedback	2023-09-05 16:40:13 -04:00
Wing Lian	3355706e22	Add support for GPTQ using native transformers/peft (#468 ) * auto gptq support * more tweaks and add yml * remove old gptq docker * don't need explicit peft install for tests * fix setup.py to use extra index url install torch for tests fix cuda version for autogptq index set torch in requirements so that it installs properly move gptq install around to work with github cicd * gptq doesn't play well with sample packing * address pr feedback * remove torch install for now * set quantization_config from model config * Fix the implementation for getting quant config from model config	2023-09-05 12:43:22 -04:00
mhenrichsen	daa4faca12	Merge pull request #520 from bdashore3/sharegpt-fixes Allow for custom system prompts with ShareGPT	2023-09-05 09:02:55 +02:00
Aman Karmani	fc8766e502	reorg a bit	2023-09-05 02:21:24 +00:00
Aman Gupta Karmani	72a6fe1c1f	use flash_attn rmsnorm when available (#526 ) * use flash_attn xentropy when available * use flash_attn.ops.rms_norm when available * log when xentropy is not found * log how to install RMSNorm * add quotes so pip install works	2023-09-04 19:44:51 -04:00
Aman Gupta Karmani	5fe30b1497	use flash_attn xentropy when available (#525 ) * use flash_attn xentropy when available * log when xentropy is not found	2023-09-04 17:49:16 -04:00
Aman Gupta Karmani	44454ae4c4	move is_llama_derived_model into normalize_config (#524 )	2023-09-04 00:19:03 -04:00
Wing Lian	09f154397e	No gather single gpu (#523 ) * don't attempt to gather on multi-gpu * also check distributed status in bench callback	2023-09-03 23:24:28 -04:00
kingbri	995557bdf3	Prompters: ShareGPT: Allow for custom system prompts If a system prompt is present in a conversation, add it instead of using the default. Signed-off-by: kingbri <bdashore3@proton.me>	2023-09-01 13:53:05 -04:00
Maxime	1991946c5a	fix: bad dtype for full finetune (#504 ) * fix: bad dtype for full finetune * Update src/axolotl/utils/models.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update models.py --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-09-01 07:11:45 -07:00
NanoCode012	f51c9c56c6	Fix(doc): Inform Windows users to use WSL/docker (#518 )	2023-09-01 00:08:21 -07:00
Wing Lian	7710e81f50	log supervised token count (#448 )	2023-08-31 15:45:23 -07:00
Tom Jobbins	48434bec54	Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see (#511 )	2023-08-31 14:26:52 -07:00
Jan Philipp Harries	396a7a74fc	Added advanced DDP args (#515 ) * add ddp_config * add advanced ddp config * add ddp_config * add advanced ddp config --------- Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>	2023-08-31 10:37:47 -07:00
Wing Lian	b21e4a20fe	split train from other cli options (#503 )	2023-08-30 22:01:47 -07:00
Alpay Ariyak	42f9642792	Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. (#512 ) * Added "eval_" prefix * Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.	2023-08-30 22:00:50 -07:00

1 2 3 4 5 ...

908 Commits