Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 861 Bytes

README.md

File metadata and controls

9 lines (8 loc) · 861 Bytes

This demo(based on @greyovo's jupyter notebook) show the inference result for the same text & image input with different model, including the original CLIP and the onnx quantized model. Test result on my local machine is as follows:

model result
CLIP [[6.1091479e-02 9.3267566e-01 5.3717378e-03 8.6108845e-04]]
clip-image-encoder.onnx & clip-text-encoder.onnx [[6.1091259e-02 9.3267584e-01 5.3716768e-03 8.6109847e-04]]
clip-image-encoder-quant-int8.onnx & clip-text-encoder-quant-int8.onnx [[4.703762e-02 9.391219e-01 9.90335e-03 3.93698e-03]]

The test input text is ["a tiger", "a cat", "a dog", "a bear"] and the test image is as follows: