kserve · andreeamun · Jul 23, 2022 · Jul 23, 2022 · Jul 25, 2022 · Jul 25, 2022
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -2,7 +2,7 @@ name: Mike deploy
 
 on:
   push:
-    branches: [ main ]
+    branches: [ v0.9 ]
 
 jobs:
   build_docs:
@@ -41,4 +41,4 @@ jobs:
 
       - name: Deploy with mike
         run: |
-          mike deploy --push $(cat version.txt)
+          mike deploy --push --update-aliases $(cat version.txt) latest
diff --git a/build.sh b/build.sh
@@ -17,6 +17,6 @@ git branch gh-pages origin/gh-pages
 echo "Listing branches"
 git branch
 
-mike deploy $currentVersion
+mike deploy --update-aliases $(cat version.txt) latest
 
 git checkout gh-pages
diff --git a/docs/admin/kubernetes_deployment.md b/docs/admin/kubernetes_deployment.md
@@ -7,9 +7,10 @@ Kubernetes version.
 ## Recommended Version Matrix
 | Kubernetes Version | Recommended Istio Version   |
 | :---------- | :------------ |
-| 1.20       | 1.9, 1.10, 1.11   |
 | 1.21       | 1.10, 1.11   |
 | 1.22       | 1.11, 1.12   |
+| 1.23       | 1.12, 1.13   |
+| 1.24       | 1.13, 1.14   |
 
 ## 1. Install Istio 
 

diff --git a/docs/admin/serverless.md b/docs/admin/serverless.md
@@ -8,9 +8,10 @@ Kubernetes version.
 ## Recommended Version Matrix
 | Kubernetes Version | Recommended Istio Version   | Recommended Knative Version  |
 | :---------- | :------------ | :------------|
-| 1.20       | 1.9, 1.10, 1.11   | 0.25, 0.26, 1.0  |
 | 1.21       | 1.10, 1.11   | 0.25, 0.26, 1.0  |
 | 1.22       | 1.11, 1.12   | 0.25, 0.26, 1.0  |
+| 1.23       | 1.12, 1.13   | 1.0-1.4  |
+| 1.24       | 1.13, 1.14   | 1.0-1.4  |
 
 ## 1. Install Istio
 Please refer to the [Istio install guide](https://knative.dev/docs/admin/install/installing-istio).

diff --git a/docs/blog/articles/2022-07-21-KServe-0.9-release.md b/docs/blog/articles/2022-07-21-KServe-0.9-release.md
@@ -0,0 +1,152 @@
+# Announcing: KServe v0.9.0
+
+Today, we are pleased to announce the v0.9.0 release of KServe!  [KServe](https://github.com/kserve) has now fully onboarded to [LF AI & Data Foundation](https://lfaidata.foundation) as an [Incubation Project](https://lfaidata.foundation/projects/kserve)!
+
+In this release we are excited to introduce the new `InferenceGraph` feature which has long been asked from the community. Also continuing the effort from the last release for unifying the InferenceService API for deploying models on KServe and ModelMesh, ModelMesh is now fully compatible with KServe InferenceService API!
+
+
+## Introduce InferenceGraph
+
+The ML Inference system is getting bigger and more complex. It often consists of many models to make a single prediction. 
+The common use cases are image classification and natural language multi-stage processing pipelines. For example, an image classification pipeline needs to run top level classification first then downstream further classification based on previous prediction results.
+
+KServe has the unique strength to build the distributed inference graph with its native integration of InferenceServices, standard inference protocol for chaining models and serverless auto-scaling capabilities. KServe leverages these strengths to build the InferenceGraph and enable users to deploy complex ML Inference pipelines to production in a declarative and scalable way.
+
+
+**InferenceGraph** is made up of a list of routing nodes with each node consisting of a set of routing steps. Each step can either route to an InferenceService or another node defined on the graph which makes the InferenceGraph highly composable.
+The graph router is deployed behind an HTTP endpoint and can be scaled dynamically based on request volume. The InferenceGraph supports four different types of routing nodes: **Sequence**, **Switch**, **Ensemble**, **Splitter**.
+
+![InferenceGraph](../../modelserving/inference_graph/images/inference_graph.png)
+
+- **Sequence Node**: It allows users to define multiple `Steps` with `InferenceServices` or `Nodes` as routing targets in a sequence. The `Steps` are executed in sequence and the request/response from the previous step and be passed to the next step as input based on configuration.
+- **Switch Node**: It allows users to define routing conditions and select a `Step` to execute if it matches the condition. The response is returned as soon as it finds the first step that matches the condition. If no condition is matched, the graph returns the original request.
+- **Ensemble Node**: A model ensemble requires scoring each model separately and then combines the results into a single prediction response. You can then use different combination methods to produce the final result. Multiple classification trees, for example, are commonly combined using a "majority vote" method. Multiple regression trees are often combined using various averaging techniques.
+- **Splitter Node**: It allows users to split the traffic to multiple targets using a weighted distribution.
+
+```yaml
+apiVersion: "serving.kserve.io/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: "cat-dog-classifier"
+spec:
+  predictor:
+    pytorch:
+      resources:
+        requests:
+          cpu: 100m
+      storageUri: gs://kfserving-examples/models/torchserve/cat_dog_classification
+---
+apiVersion: "serving.kserve.io/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: "dog-breed-classifier"
+spec:
+  predictor:
+    pytorch:
+      resources:
+        requests:
+          cpu: 100m
+      storageUri: gs://kfserving-examples/models/torchserve/dog_breed_classification
+---
+apiVersion: "serving.kserve.io/v1alpha1"
+kind: "InferenceGraph"
+metadata:
+  name: "dog-breed-pipeline"
+spec:
+  nodes:
+    root:
+      routerType: Sequence
+      steps:
+      - serviceName: cat-dog-classifier
+        name: cat_dog_classifier # step name
+      - serviceName: dog-breed-classifier
+        name: dog_breed_classifier
+        data: $request
+        condition: "[@this].#(predictions.0==\"dog\")"
+```
+
+Currently `InferenceGraph` is supported with the `Serverless` deployment mode. You can try it out following the [tutorial](https://kserve.github.io/website/master/modelserving/inference_graph/image_pipeline/).
+
+
+## InferenceService API for ModelMesh
+
+
+The InferenceService CRD is now the primary interface for interacting with ModelMesh. Some changes were made to the InferenceService spec to better facilitate ModelMesh’s needs.
+
+### Storage Spec
+
+To unify how model storage is defined for both single and multi-model serving, a new storage spec was added to the predictor model spec. With this storage spec, users can specify a key inside a common secret holding config/credentials for each of the storage backends from which models can be loaded. Example:
+
+```yaml
+storage:
+  key: localMinIO # Credential key for the destination storage in the common secret
+  path: sklearn # Model path inside the bucket
+  # schemaPath: null # Optional schema files for payload schema
+  parameters: # Parameters to override the default values inside the common secret.
+    bucket: example-models
+```
+Learn more [here](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/storage/storageSpec).
+
+
+
+### Model Status
+
+For further alignment between ModelMesh and KServe, some additions to the InferenceService status were made. There is now a `Model Status` section which contains information about the model loaded in the predictor. New fields include:
+
+- `states` - State information of the predictor's model.
+- `activeModelState` - The state of the model currently being served by the predictor's endpoints. 
+- `targetModelState` - This will be set only when `transitionStatus` is not `UpToDate`, meaning that the target model differs from the currently-active model. 
+- `transitionStatus` - Indicates state of the predictor relative to its current spec.
+- `modelCopies` - Model copy information of the predictor's model.
+- `lastFailureInfo` - Details about the most recent error associated with this predictor. Not all of the contained fields will necessarily have a value.
+
+### Deploying on ModelMesh
+
+For deploying InferenceServices on ModelMesh, the ModelMesh and KServe controllers will still require that the user specifies the `serving.kserve.io/deploymentMode: ModelMesh` annotation. 
+A complete example on an InferenceService with the new storage spec is showing below:
+
+```yaml
+apiVersion: serving.kserve.io/v1beta1
+kind: InferenceService
+metadata:
+  name: example-tensorflow-mnist
+  annotations:
+    serving.kserve.io/deploymentMode: ModelMesh
+spec:
+  predictor:
+    model:
+      modelFormat:
+        name: tensorflow
+      storage:
+        key: localMinIO
+        path: tensorflow/mnist.savedmodel
+``` 
+
+## Other New Features:
+
+- Support [serving MLFlow model format](https://kserve.github.io/website/0.9/modelserving/v1beta1/mlflow/v2/) via MLServer serving runtime.
+- Support [unified autoscaling target and metric fields](https://kserve.github.io/website/0.9/modelserving/autoscaling/autoscaling/) for InferenceService components with both Serverless and RawDeployment mode.
+- Support [InferenceService ingress class and url domain template configuration](https://kserve.github.io/website/0.9/admin/kubernetes_deployment/) for RawDeployment mode.
+- ModelMesh now has a default [OpenVINO Model Server](https://github.com/openvinotoolkit/model_server) ServingRuntime.
+
+
+## What’s Changed?
+
+- The KServe controller manager is changed from StatefulSet to Deployment to support HA mode.
+- log4j security vulnerability fix
+- Upgrade TorchServe serving runtime to 0.6.0
+- Update MLServer serving runtime to 1.0.0
+
+Check out the full release notes for [KServe](https://github.com/kserve/kserve/releases/tag/v0.9.0) and
+[ModelMesh](https://github.com/kserve/modelmesh-serving/releases/tag/v0.9.0) for more details.
+
+## Join the community
+
+- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
+- Join the Slack ([#kserve](https://kubeflow.slack.com/archives/CH6E58LNP))
+- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
+- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!
+
+Thank you for contributing or checking out KServe!
+
+– The KServe Working Group
diff --git a/docs/get_started/README.md b/docs/get_started/README.md
@@ -19,6 +19,6 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
 You can get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:
 
 ```bash
-curl -s "https://raw.github.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash
+curl -s "https://raw.github.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
 ```
 
diff --git a/docs/modelserving/v1beta1/custom/custom_model/README.md b/docs/modelserving/v1beta1/custom/custom_model/README.md
@@ -33,7 +33,7 @@ if __name__ == "__main__":
     model = AlexNetModel("custom-model")
     kserve.ModelServer().start([model])
 ```
-The full code example can be found [here](https://github.com/kserve/kserve/tree/master/python/custom_model/model.py).
+The full code example can be found [here](https://github.com/kserve/kserve/blob/release-0.9/python/custom_model/model.py).
 
 ## Build the custom image with Buildpacks
 [Buildpacks](https://buildpacks.io/) allows you to transform your inference code into images that can be deployed on KServe without
@@ -75,7 +75,7 @@ class AlexNetModel(kserve.Model):
 if __name__ == "__main__":
     kserve.ModelServer().start({"custom-model": AlexNetModel})
 ```
-The full code example can be found [here](https://github.com/kserve/kserve/tree/master/python/custom_model/model_remote.py).
+The full code example can be found [here](https://github.com/kserve/kserve/blob/release-0.9/python/custom_model/model_remote.py).
 
 Modify the `Procfile` to `web: python -m model_remote` and then run the above `pack` command, it builds the serving image which launches
 each model as separate python worker and tornado webserver routes to the model workers by name. 

diff --git a/docs/modelserving/v1beta1/serving_runtime.md b/docs/modelserving/v1beta1/serving_runtime.md
@@ -25,7 +25,7 @@ After models are deployed with InferenceService, you get all the following serve
 | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
 | [Triton Inference Server](https://github.com/triton-inference-server/server) | [TensorFlow,TorchScript,ONNX](https://github.com/triton-inference-server/server/blob/r21.09/docs/model_repository.md)| v2 | :heavy_check_mark: | :heavy_check_mark: | [Compatibility Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)| [Torchscript cifar](triton/torchscript) |
 | [TFServing](https://www.tensorflow.org/tfx/guide/serving) | [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) | v1 | :heavy_check_mark: | :heavy_check_mark: | [TFServing Versions](https://github.com/tensorflow/serving/releases) | [TensorFlow flower](./tensorflow)  |
-| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 REST | :heavy_check_mark: | :heavy_check_mark: | 0.5.3 | [TorchServe mnist](./torchserve)  |
+| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 REST | :heavy_check_mark: | :heavy_check_mark: | 0.6.0 | [TorchServe mnist](./torchserve)  |
 | [SKLearn MLServer](https://github.com/SeldonIO/MLServer) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.0.1 | [SKLearn Iris V2](./sklearn/v2)  |
 | [XGBoost MLServer](https://github.com/SeldonIO/MLServer) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.5.0 | [XGBoost Iris V2](./xgboost)  |
 | [SKLearn ModelServer](https://github.com/kserve/kserve/tree/master/python/sklearnserver) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v1 | :heavy_check_mark: | -- | 1.0.1 | [SKLearn Iris](./sklearn/v1)  |

diff --git a/docs/modelserving/v1beta1/transformer/torchserve_image_transformer/README.md b/docs/modelserving/v1beta1/transformer/torchserve_image_transformer/README.md
@@ -76,7 +76,7 @@ class ImageTransformer(kserve.Model):
         return {"predictions": response.as_numpy("OUTPUT__0").tolist()}
 ```
 
-Please see the code example [here](https://github.com/kserve/kserve/tree/release-0.8/python/custom_transformer).
+Please see the code example [here](https://github.com/kserve/kserve/tree/release-0.9/python/custom_transformer).
 
 ### Transformer Server Entrypoint
 For single model you just create a transformer object and register that to the model server.

diff --git a/docs/modelserving/v1beta1/triton/torchscript/README.md b/docs/modelserving/v1beta1/triton/torchscript/README.md
@@ -219,6 +219,106 @@ Apply the gRPC `InferenceService` yaml and then you can call the model with `tri
 kubectl apply -f torchscript_grpc.yaml
 ```
 
+
+### Run a prediction with grpcurl
+
+After the gRPC `InferenceService` becomes ready, [grpcurl](https://github.com/fullstorydev/grpcurl), can be used to send gRPC requests to the `InferenceService`.
+
+```bash
+# download the proto file
+curl -O https://raw.github.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto
+
+# download the input json file
+curl -O https://raw.github.com/kserve/website/triton-grpc/docs/modelserving/v1beta1/triton/torchscript/input-grpc.json
+
+INPUT_PATH=input-grpc.json
+PROTO_FILE=grpc_predict_v2.proto
+SERVICE_HOSTNAME=$(kubectl get inferenceservice torchscript-cifar10 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+```
+
+The gRPC APIs follow the KServe [prediction V2 protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).
+
+For example, `ServerReady` API can be used to check if the server is ready:
+
+```bash
+grpcurl \
+  -plaintext \
+  -proto ${PROTO_FILE} \
+  -H "Host: ${SERVICE_HOSTNAME}" \
+  ${INGRESS_HOST}:${INGRESS_PORT} \
+  inference.GRPCInferenceService.ServerReady
+```
+
+Expected Output
+```json
+{
+  "ready": true
+}
+```
+
+`ModelInfer` API takes input following the `ModelInferRequest` schema defined in the `grpc_predict_v2.proto` file. Notice that the input file differs from that used in the previous `curl` example. 
+
+```bash
+grpcurl \
+  -vv \
+  -plaintext \
+  -proto ${PROTO_FILE} \
+  -H "Host: ${SERVICE_HOSTNAME}" \
+  -d @ \
+  ${INGRESS_HOST}:${INGRESS_PORT} \
+  inference.GRPCInferenceService.ModelInfer \
+  <<< $(cat "$INPUT_PATH")
+```
+
+==** Expected Output **==
+
+```
+Resolved method descriptor:
+// The ModelInfer API performs inference using the specified model. Errors are
+// indicated by the google.rpc.Status returned for the request. The OK code
+// indicates success and other codes indicate failure.
+rpc ModelInfer ( .inference.ModelInferRequest ) returns ( .inference.ModelInferResponse );
+
+Request metadata to send:
+host: torchscript-cifar10.default.example.com
+
+Response headers received:
+accept-encoding: identity,gzip
+content-type: application/grpc
+date: Fri, 12 Aug 2022 01:49:53 GMT
+grpc-accept-encoding: identity,deflate,gzip
+server: istio-envoy
+x-envoy-upstream-service-time: 16
+
+Response contents:
+{
+  "modelName": "cifar10",
+  "modelVersion": "1",
+  "outputs": [
+    {
+      "name": "OUTPUT__0",
+      "datatype": "FP32",
+      "shape": [
+        "1",
+        "10"
+      ]
+    }
+  ],
+  "rawOutputContents": [
+    "wCwGwOJLDL7icgK/dusyQAqAD799KP8/In2QP4zAs7+WuRk/2OoHwA=="
+  ]
+}
+
+Response trailers received:
+(empty)
+Sent 1 request and received 1 response
+```
+
+The content of output tensor is encoded in `rawOutputContents` field. It can be `base64` decoded and loaded into a Numpy array with the given datatype and shape.
+
+Alternatively, Triton also provides [Python client library](https://pypi.org/project/tritonclient/) which has many [examples](https://github.com/triton-inference-server/client/tree/main/src/python/examples) showing how to interact with the KServe V2 gPRC protocol.
+
+
 ## Add Transformer to the InferenceService
 
 `Triton Inference Server` expects tensors as input data, often times a pre-processing step is required before making the prediction call
@@ -227,9 +327,10 @@ User is responsible to create a python class which extends from KServe `Model` b
 format to tensor format according to V2 prediction protocol, `postprocess` handle is to convert raw prediction response to a more user friendly response.
 
 ### Implement pre/post processing functions
-```python
+
+```python title="image_transformer_v2.py"
 import kserve
-from typing import List, Dict
+from typing import Dict
 from PIL import Image
 import torchvision.transforms as transforms
 import logging
@@ -253,10 +354,11 @@ def image_transform(instance):
     return res.tolist()
 
 
-class ImageTransformer(kserve.Model):
-    def __init__(self, name: str, predictor_host: str):
+class ImageTransformerV2(kserve.Model):
+    def __init__(self, name: str, predictor_host: str, protocol: str):
         super().__init__(name)
         self.predictor_host = predictor_host
+        self.protocol = protocol
 
     def preprocess(self, inputs: Dict) -> Dict:
         return {
@@ -271,11 +373,10 @@ class ImageTransformer(kserve.Model):
         }
 
     def postprocess(self, results: Dict) -> Dict:
-        # Here we reshape the data because triton always returns the flatten 1D array as json if not explicitly requesting binary
-        # since we are not using the triton python client library which takes care of the reshape it is up to user to reshape the returned tensor.
-        return {output["name"] : np.array(output["data"]).reshape(output["shape"]) for output in results["outputs"]}
+        return {output["name"]: np.array(output["data"]).reshape(output["shape"]).tolist()
+                for output in results["outputs"]}
 ```
-Please find [the code example](https://github.com/kserve/kserve/tree/release-0.8/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.8/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
+Please find [the code example](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.9/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
 
 ### Build Transformer docker image
 ```