Skip to content

Commit

Permalink
Custom json grammars (#45)
Browse files Browse the repository at this point in the history
* adding schema to ebnf conversion script + documentation

* minor format fixes
  • Loading branch information
arinaruck authored Jun 9, 2024
1 parent 9c2aaf3 commit 308066c
Show file tree
Hide file tree
Showing 7 changed files with 928 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,9 @@ More details can be found in this [doc from llama-cpp](https://github.com/ggerga
Advanced grammar debugging guide can be found [here](docs/debugging_custom_grammars.md)

### Automatic Grammar Generation
Here is an awesome tool to generate grammars for you: [Grammar Builder](https://grammar.intrinsiclabs.ai/)

You can use custom grammars to constrain the output of a language model.
Check out the [documentation](examples%2Fgrammars%2Fcustom_json_grammars%2FREADME.md) on json schema to grammar conversion to learn how to automatically create custom grammars for complex json objects.

### Grammar Collection

Expand Down
61 changes: 61 additions & 0 deletions examples/grammars/custom_json_grammars/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Custom json grammars

You can use custom grammars to constrain the output of a language model to generate valid json objects. This is useful when you want to generate json objects for specific applications, such as http requests or shopping carts.

## Quickstart

There are multiple ways to represent json schemas.
We provide recommendations on how to do this for two common formats: Typescript and json.

<details>
<summary> Example of a Typescript schema for a Student object </summary>

```Typescript
interface Student{
name: string;
age: number;
is_student : boolean;
courses: string[];
}
```
</details>

<details>
<summary> Example of a json schema for a Student object </summary>

```json
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"is_student": {"type": "boolean"},
"courses": {
"type": "array",
"items": { "type": "string"}
}
}
}
```
</details>


### From Typescript

To generate custom json grammars from Typescript schemas, you can use [this online tool](https://grammar.intrinsiclabs.ai/) or [this Typescript generator](https://github.com/IntrinsicLabsAI/gbnfgen) from Intrinsic AI. Then, simply copy paste the resulting grammar into a text file and use it with the `IncrementalGrammarConstraint`.


### From json schemas

Alternatively, you can generate custom json grammars from json format schemas using the `json_schema_to_grammar.py` script, analogous to [the one in the lama.cpp repository](https://github.com/ggerganov/llama.cpp/blob/ab9a3240a9da941fdef5cd4a25f2b97c2f5a67aa/examples/json_schema_to_grammar.py).


To generate a grammar from a json schema, run the following command:

```bash
python3 json_schema_to_grammar.py -i schemas/product_catalog.json -o grammars/product_catalog.ebnf
```
This script generates a grammar from a json schema file (see examples of json schemas in `/schemas` and the corresponding grammars in `/grammars`). The generated grammar is in the Extended Backus-Naur Form (EBNF) format and can be directly used with the `IncrementalGrammarConstraint`.

Additional arguments allow to specify the property order of the json object as well as string formatting parameters.

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
integer ::= ("-"? integral-part) space
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
price-kv ::= "\"price\"" space ":" space number
productId-kv ::= "\"productId\"" space ":" space integer
productName-kv ::= "\"productName\"" space ":" space string
root ::= "{" space productId-kv "," space productName-kv "," space price-kv "}" space
space ::= " "?
string ::= "\"" char* "\"" space
16 changes: 16 additions & 0 deletions examples/grammars/custom_json_grammars/grammars/student.ebnf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
age-kv ::= "\"age\"" space ":" space number
age-rest ::= ( "," space courses-kv )?
boolean ::= ("true" | "false") space
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
courses ::= "[" space (string ("," space string)*)? "]" space
courses-kv ::= "\"courses\"" space ":" space courses
decimal-part ::= [0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)?)?)?)?)?)?)?)?)?)?
is-student-kv ::= "\"is_student\"" space ":" space boolean
is-student-rest ::= ( "," space name-kv )? name-rest
name-kv ::= "\"name\"" space ":" space string
name-rest ::= ( "," space age-kv )? age-rest
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
root ::= "{" space (is-student-kv is-student-rest | name-kv name-rest | age-kv age-rest | courses-kv )? "}" space
space ::= " "?
string ::= "\"" char* "\"" space
Loading

0 comments on commit 308066c

Please sign in to comment.