* Add test cases to verify that the problem exists in the underlying
* Update the handle_long_sequences function to correctly use Map instead of filter for the truncation strategy. Also remove the minimal length filtering from the truncate_long_samples function, and run it separately and before.
* fix: refactor and add test truncate for non-input id fields
* fix: refactor long seq handling fn
* fix: refactor duplicate fn and simplify route
* add additional tests and make them work on mac
* handle logging exception on empty datasets
---------
Co-authored-by: 2ndset bot <bot@2ndset.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>