Format dataset types

This commit is contained in:
NanoCode012
2023-05-21 23:28:06 +09:00
parent cba0048067
commit 857a80b70e

View File

@@ -35,31 +35,31 @@ Go ahead and axolotl questions!!
Have a dataset in one of the following format (JSONL recommended): Have a dataset in one of the following format (JSONL recommended):
- alpaca: instruction; input(optional) - `alpaca`: instruction; input(optional)
```json ```json
{"instruction": "...", "input": "...", "output": "..."} {"instruction": "...", "input": "...", "output": "..."}
``` ```
- jeopardy: question and answer - `jeopardy`: question and answer
```json ```json
{"question": "...", "category": "...", "answer": "..."} {"question": "...", "category": "...", "answer": "..."}
``` ```
- oasst: instruction - `oasst`: instruction
```json ```json
{"INSTRUCTION": "...", "RESPONSE": "..."} {"INSTRUCTION": "...", "RESPONSE": "..."}
``` ```
- gpteacher: instruction; input(optional) - `gpteacher`: instruction; input(optional)
```json ```json
{"instruction": "...", "input": "...", "response": "..."} {"instruction": "...", "input": "...", "response": "..."}
``` ```
- reflection: instruction with reflect; input(optional) - `reflection`: instruction with reflect; input(optional)
```json ```json
{"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."} {"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
``` ```
- sharegpt: conversations - `sharegpt`: conversations
```json ```json
{"conversations": [{"from": "...", "value": "..."}]} {"conversations": [{"from": "...", "value": "..."}]}
``` ```
- completion: raw corpus - `completion`: raw corpus
```json ```json
{"text": "..."} {"text": "..."}
``` ```