* feat: add pos id to flex attention for packing part 1 * feat: update to include sliding window mask patch * fix: suppress MatMul8bitLt: inputs will be cast from warnings * fix: remove redundant flex attention patch * chore: update olmo docs * feat: add validator patch for cross entropy
* feat: update cce to include olmo family * chore: update docs following feedback * feat: add olmo3 config * fix: clarify 3 methods * chore: add olmo to readme