Update data.py for signature generation (#851)

* Update data.py Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature. * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-11-15 11:12:32 -08:00
parent b33c1d55a2
commit 48630f5b34
1 changed files with 6 additions and 1 deletions
--- a/src/axolotl/utils/data.py
+++ b/src/axolotl/utils/data.py
@@ -99,7 +99,12 @@ def load_tokenized_prepared_datasets(
                str(cfg.sequence_len)
                + "@"
                + "|".join(
-                    sorted([f"{d.path}:{d.type}:{d.shards}" for d in cfg.datasets])
+                    sorted(
                        [
                            f"{d.path}:{d.type}:{d.shards}:{d.conversation}"
                            for d in cfg.datasets
                        ]
                    )
                )
                + "|"
                + tokenizer_name