allow remote data paths (#1278)

* allow remote data paths

* add docs about public url

* only allow https

* better docs

* better docs
This commit is contained in:
Hamel Husain
2024-02-08 15:02:35 -08:00
committed by GitHub
parent 1daecd161e
commit 91cf4ee72c
2 changed files with 18 additions and 0 deletions

View File

@@ -468,6 +468,14 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod
dataset:
- path: s3://path_to_ds # Accepts folder with arrow/parquet or file path like above. Supports s3, gcs.
...
# Loading Data From a Public URL
# - URLs must use HTTPS protocol for security reasons, not HTTP.
# - The URL should be a direct link to the file you wish to load.
# - The file format is `json` (which includes `jsonl`) by default. For different formats, adjust the `ds_type` option accordingly.
dataset:
- path: https://some.url.com/yourdata.jsonl # Accepts folder with arrow/parquet or file path like above. Supports s3, gcs.
ds_type: json # this is the default, see other options below.
```
- loading