-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* adding schema to ebnf conversion script + documentation * minor format fixes
- Loading branch information
Showing
7 changed files
with
928 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Custom json grammars | ||
|
||
You can use custom grammars to constrain the output of a language model to generate valid json objects. This is useful when you want to generate json objects for specific applications, such as http requests or shopping carts. | ||
|
||
## Quickstart | ||
|
||
There are multiple ways to represent json schemas. | ||
We provide recommendations on how to do this for two common formats: Typescript and json. | ||
|
||
<details> | ||
<summary> Example of a Typescript schema for a Student object </summary> | ||
|
||
```Typescript | ||
interface Student{ | ||
name: string; | ||
age: number; | ||
is_student : boolean; | ||
courses: string[]; | ||
} | ||
``` | ||
</details> | ||
|
||
<details> | ||
<summary> Example of a json schema for a Student object </summary> | ||
|
||
```json | ||
{ | ||
"type": "object", | ||
"properties": { | ||
"name": {"type": "string"}, | ||
"age": {"type": "number"}, | ||
"is_student": {"type": "boolean"}, | ||
"courses": { | ||
"type": "array", | ||
"items": { "type": "string"} | ||
} | ||
} | ||
} | ||
``` | ||
</details> | ||
|
||
|
||
### From Typescript | ||
|
||
To generate custom json grammars from Typescript schemas, you can use [this online tool](https://grammar.intrinsiclabs.ai/) or [this Typescript generator](https://github.com/IntrinsicLabsAI/gbnfgen) from Intrinsic AI. Then, simply copy paste the resulting grammar into a text file and use it with the `IncrementalGrammarConstraint`. | ||
|
||
|
||
### From json schemas | ||
|
||
Alternatively, you can generate custom json grammars from json format schemas using the `json_schema_to_grammar.py` script, analogous to [the one in the lama.cpp repository](https://github.com/ggerganov/llama.cpp/blob/ab9a3240a9da941fdef5cd4a25f2b97c2f5a67aa/examples/json_schema_to_grammar.py). | ||
|
||
|
||
To generate a grammar from a json schema, run the following command: | ||
|
||
```bash | ||
python3 json_schema_to_grammar.py -i schemas/product_catalog.json -o grammars/product_catalog.ebnf | ||
``` | ||
This script generates a grammar from a json schema file (see examples of json schemas in `/schemas` and the corresponding grammars in `/grammars`). The generated grammar is in the Extended Backus-Naur Form (EBNF) format and can be directly used with the `IncrementalGrammarConstraint`. | ||
|
||
Additional arguments allow to specify the property order of the json object as well as string formatting parameters. | ||
|
11 changes: 11 additions & 0 deletions
11
examples/grammars/custom_json_grammars/grammars/product_catalog.ebnf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) | ||
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)? | ||
integer ::= ("-"? integral-part) space | ||
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)? | ||
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space | ||
price-kv ::= "\"price\"" space ":" space number | ||
productId-kv ::= "\"productId\"" space ":" space integer | ||
productName-kv ::= "\"productName\"" space ":" space string | ||
root ::= "{" space productId-kv "," space productName-kv "," space price-kv "}" space | ||
space ::= " "? | ||
string ::= "\"" char* "\"" space |
16 changes: 16 additions & 0 deletions
16
examples/grammars/custom_json_grammars/grammars/student.ebnf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
age-kv ::= "\"age\"" space ":" space number | ||
age-rest ::= ( "," space courses-kv )? | ||
boolean ::= ("true" | "false") space | ||
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) | ||
courses ::= "[" space (string ("," space string)*)? "]" space | ||
courses-kv ::= "\"courses\"" space ":" space courses | ||
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)? | ||
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)? | ||
is-student-kv ::= "\"is_student\"" space ":" space boolean | ||
is-student-rest ::= ( "," space name-kv )? name-rest | ||
name-kv ::= "\"name\"" space ":" space string | ||
name-rest ::= ( "," space age-kv )? age-rest | ||
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space | ||
root ::= "{" space (is-student-kv is-student-rest | name-kv name-rest | age-kv age-rest | courses-kv )? "}" space | ||
space ::= " "? | ||
string ::= "\"" char* "\"" space |
Oops, something went wrong.