diff --git a/provider/docs/tutorial/notebooks/01-register-features.ipynb b/provider/docs/tutorial/notebooks/01-register-features.ipynb index a9b229e..307bbb4 100644 --- a/provider/docs/tutorial/notebooks/01-register-features.ipynb +++ b/provider/docs/tutorial/notebooks/01-register-features.ipynb @@ -2,6 +2,7 @@ "cells": [ { "cell_type": "markdown", + "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation.\n", "Licensed under the MIT license.\n", @@ -12,64 +13,45 @@ "\n", "## Configure Feature Repo\n", "\n", - "The cell below connects to your feature store. The `registry_blob_url` should point to the location on blob where you want your feature repsository to be stored." - ], - "metadata": {} + "The cell below connects to your feature store. __You need to update the feature_repo/feature_store.yaml file so that the registry path points to your blob location__" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ - "from feast import FeatureStore, RepoConfig\n", - "from feast.registry import RegistryConfig\n", - "from feast_azure_provider.mssqlserver import MsSqlServerOfflineStoreConfig\n", - "from feast.infra.online_stores.redis import RedisOnlineStoreConfig\n", + "import os\n", + "from feast import FeatureStore\n", "from azureml.core import Workspace\n", "\n", - "# update this to your location on blob\n", - "registry_blob_url = \"https://.blob.core.windows.net///registry.db\"\n", - "\n", "# access key vault to get secrets\n", "ws = Workspace.from_config()\n", "kv = ws.get_default_keyvault()\n", "\n", "# update with your connection string\n", - "offline_conn_str = kv.get_secret(\"FEAST-SQL-CONN\")\n", - "online_conn_str = kv.get_secret(\"FEAST-REDIS-CONN\")\n", - "\n", - "# set RegistryConfig\n", - "reg_config = RegistryConfig(\n", - " registry_store_type=\"feast_azure_provider.registry_store.AzBlobRegistryStore\",\n", - " path=registry_blob_url,\n", - ")\n", - "\n", - "# set RepoConfig\n", - "repo_cfg = RepoConfig(\n", - " registry=reg_config,\n", - " project=\"production\",\n", - " provider=\"feast_azure_provider.azure_provider.AzureProvider\",\n", - " offline_store=MsSqlServerOfflineStoreConfig(connection_string=offline_conn_str),\n", - " online_store=RedisOnlineStoreConfig(connection_string=online_conn_str),\n", - ")\n", + "os.environ['SQL_CONN']=kv.get_secret(\"FEAST-SQL-CONN\")\n", + "os.environ['REDIS_CONN']=kv.get_secret(\"FEAST-REDIS-CONN\")\n", "\n", "# connect to feature store\n", - "store = FeatureStore(config=repo_cfg)" - ], - "outputs": [], - "metadata": {} + "fs = FeatureStore(\"./feature_repo\")" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Define the data source (offline store)\n", "\n", "The data source refers to raw underlying data (a table in Azure SQL DB or Synapse SQL). Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from feast_azure_provider.mssqlserver_source import MsSqlServerSource\n", "\n", @@ -88,12 +70,11 @@ " event_timestamp_column=\"datetime\",\n", " created_timestamp_column=\"\",\n", ")" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Define Feature Views\n", "\n", @@ -106,12 +87,13 @@ "- Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.\n", "\n", "__NOTE: Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.__" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from feast import Feature, FeatureView, ValueType\n", "from datetime import timedelta\n", @@ -139,12 +121,11 @@ " batch_source=customer_source,\n", " ttl=timedelta(days=2),\n", ")" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Define entities\n", "\n", @@ -158,22 +139,22 @@ "A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).\n", "\n", "Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from feast import Entity\n", "driver = Entity(name=\"driver\", join_key=\"driver_id\", value_type=ValueType.INT64)\n", "customer = Entity(name=\"customer_id\", value_type=ValueType.INT64)" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Feast `apply()`\n", "\n", @@ -183,41 +164,40 @@ "1. Feast will validate your feature definitions\n", "1. Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on Azure Blob Storage.\n", "1. Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider configuration that you have set in feature_store.yaml." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, - "source": [ - "store.apply([driver, driver_fv, customer, customer_fv])" - ], + "metadata": {}, "outputs": [], - "metadata": {} + "source": [ + "fs.apply([driver, driver_fv, customer, customer_fv])" + ] } ], "metadata": { - "orig_nbformat": 4, + "interpreter": { + "hash": "1f420f8439dfed2bd66fe971ededeecdddcec354e785e62812183e5ad86a193f" + }, + "kernelspec": { + "display_name": "Python 3.8.11 64-bit ('feast-dev': conda)", + "name": "python3" + }, "language_info": { - "name": "python", - "version": "3.8.11", - "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, - "pygments_lexer": "ipython3", + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", "nbconvert_exporter": "python", - "file_extension": ".py" - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3.8.11 64-bit ('feast-dev': conda)" + "pygments_lexer": "ipython3", + "version": "3.8.11" }, - "interpreter": { - "hash": "1f420f8439dfed2bd66fe971ededeecdddcec354e785e62812183e5ad86a193f" - } + "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} diff --git a/provider/docs/tutorial/notebooks/02-train-and-deploy-with-feast.ipynb b/provider/docs/tutorial/notebooks/02-train-and-deploy-with-feast.ipynb index 75a24b9..dc1aa2e 100644 --- a/provider/docs/tutorial/notebooks/02-train-and-deploy-with-feast.ipynb +++ b/provider/docs/tutorial/notebooks/02-train-and-deploy-with-feast.ipynb @@ -2,6 +2,13 @@ "cells": [ { "cell_type": "markdown", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, "source": [ "Copyright (c) Microsoft Corporation. Licensed under the MIT license.\n", "\n", @@ -14,127 +21,115 @@ "1. train a model using the offline store (using the feast function `get_historical_features()`)\n", "1. use the feast `materialize()` function to push features from the offline store to an online store (redis)\n", "1. Deploy the model to an Azure ML endpoint where the features are consumed from the online store (feast function `get_online_features()`)" - ], - "metadata": { - "nteract": { - "transient": { - "deleting": false - } - } - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Connect to Feature store\n", "\n", "Below we create a Feast repository config, which accesses the registry.db file and also provides the credentials to the offline and online storage.\n", "\n", - "__NOTE: You will need to provide the registry location on your blob storage__" - ], - "metadata": {} + "__You need to update the feature_repo/feature_store.yaml file so that the registry path points to your blob location__" + ] }, { "cell_type": "code", "execution_count": null, - "source": [ - "import os\n", - "from feast import FeatureStore, RepoConfig\n", - "from feast.registry import RegistryConfig\n", - "from feast_azure_provider.mssqlserver import MsSqlServerOfflineStoreConfig\n", - "from feast.infra.online_stores.redis import RedisOnlineStoreConfig\n", - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "keyvault = ws.get_default_keyvault()\n", - "\n", - "# update this to your location on blob\n", - "feast_registry_path = \"https://.blob.core.windows.net///registry.db\"\n", - "\n", - "sql_conn_str = keyvault.get_secret('FEAST-SQL-CONN')\n", - "redis_conn_str = keyvault.get_secret('FEAST-REDIS-CONN')\n", - "orders_table = \"orders\"\n", - "driver_hourly_table = \"driver_hourly\"\n", - "customer_profile_table = \"customer_profile\"\n", - "\n", - "reg_config = RegistryConfig(\n", - " registry_store_type=\"feast_azure_provider.registry_store.AzBlobRegistryStore\",\n", - " path=feast_registry_path,\n", - ")\n", - "\n", - "repo_cfg = RepoConfig(\n", - " project = \"production\",\n", - " provider = \"feast_azure_provider.azure_provider.AzureProvider\",\n", - " registry = reg_config,\n", - " offline_store = MsSqlServerOfflineStoreConfig(connection_string=sql_conn_str),\n", - " online_store = RedisOnlineStoreConfig(connection_string=redis_conn_str)\n", - " )\n", - "\n", - "store = FeatureStore(config=repo_cfg)\n" - ], - "outputs": [], "metadata": { "collapsed": true, + "gather": { + "logged": 1627130565121 + }, "jupyter": { - "source_hidden": false, - "outputs_hidden": false + "outputs_hidden": false, + "source_hidden": false }, "nteract": { "transient": { "deleting": false } - }, - "gather": { - "logged": 1627130565121 } - } + }, + "outputs": [], + "source": [ + "import os\n", + "from feast import FeatureStore\n", + "from azureml.core import Workspace\n", + "\n", + "# access key vault to get secrets\n", + "ws = Workspace.from_config()\n", + "kv = ws.get_default_keyvault()\n", + "os.environ['SQL_CONN']=kv.get_secret(\"FEAST-SQL-CONN\")\n", + "os.environ['REDIS_CONN']=kv.get_secret(\"FEAST-REDIS-CONN\")\n", + "\n", + "# connect to feature store\n", + "fs = FeatureStore(\"./feature_repo\")" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### List the feature views\n", "\n", "Below lists the registered feature views." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, - "source": [ - "store.list_feature_views()" - ], + "metadata": {}, "outputs": [], - "metadata": {} + "source": [ + "fs.list_feature_views()" + ] }, { "cell_type": "markdown", - "source": [ - "## Load features into a pandas dataframe\n", - "\n", - "Below you load the features from the feature store into a pandas data frame." - ], "metadata": { "collapsed": true, + "gather": { + "logged": 1627130724228 + }, "jupyter": { - "source_hidden": false, - "outputs_hidden": false + "outputs_hidden": false, + "source_hidden": false }, "nteract": { "transient": { "deleting": false } - }, - "gather": { - "logged": 1627130724228 } - } + }, + "source": [ + "## Load features into a pandas dataframe\n", + "\n", + "Below you load the features from the feature store into a pandas data frame." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": true, + "gather": { + "logged": 1626933777036 + }, + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [], "source": [ - "sql_job = store.get_historical_features(\n", + "sql_job = fs.get_historical_features(\n", " entity_df=\"SELECT * FROM orders\",\n", " features=[\n", " \"driver_stats:conv_rate\",\n", @@ -148,40 +143,26 @@ "\n", "training_df = sql_job.to_df()\n", "training_df.head()" - ], - "outputs": [], - "metadata": { - "collapsed": true, - "jupyter": { - "source_hidden": false, - "outputs_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - }, - "gather": { - "logged": 1626933777036 - } - } + ] }, { "cell_type": "markdown", - "source": [ - "## Train a model and capture metrics with MLFlow" - ], "metadata": { "nteract": { "transient": { "deleting": false } } - } + }, + "source": [ + "## Train a model and capture metrics with MLFlow" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import mlflow\n", "import numpy as np\n", @@ -211,64 +192,64 @@ "# train the model\n", "with mlflow.start_run() as run:\n", " clf.fit(X_train, y_train)" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Prepare for deployment\n", "\n", "### Register model and the feature registry " - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# register the model\n", "model_uri = \"runs:/{}/model\".format(run.info.run_id)\n", "model = mlflow.register_model(model_uri, \"order_model\")" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### `materialize()` data into the online store (redis)" - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from datetime import datetime, timedelta\n", "\n", "end_date = datetime.now()\n", "start_date = end_date - timedelta(days=365)\n", - "store.materialize(start_date=start_date, end_date=end_date)" - ], - "outputs": [], - "metadata": {} + "fs.materialize(start_date=start_date, end_date=end_date)" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Set up deployment configuration\n", "\n", "__Note: You will need to set up a service principal (SP) and add that SP to your blob storage account as a *Storage Blob Data Contributor* role to authenticate to the storage containing the feast registry file.__\n", "\n", "Once you have set up the SP, populate the `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET` environment variables below." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from azureml.core.environment import Environment\n", "from azureml.core.webservice import AciWebservice\n", @@ -289,29 +270,29 @@ "\n", "# again ensure that the scoring environment has access to the registry file\n", "env.environment_variables = {\n", - " \"FEAST_SQL_CONN\": sql_conn_str,\n", - " \"FEAST_REDIS_CONN\": redis_conn_str,\n", - " \"FEAST_REGISTRY_BLOB\": feast_registry_path,\n", + " \"FEAST_SQL_CONN\": fs.config.offline_store.connection_string,\n", + " \"FEAST_REDIS_CONN\": fs.config.online_store.connection_string,\n", + " \"FEAST_REGISTRY_BLOB\": fs.config.registry.path,\n", " \"AZURE_CLIENT_ID\": \"\",\n", " \"AZURE_TENANT_ID\": \"\",\n", " \"AZURE_CLIENT_SECRET\": \"\"\n", "}" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Deploy model\n", "\n", "Next, you deploy the model to Azure Container Instance. Please note that this may take approximately 10 minutes." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import uuid\n", "from azureml.core.model import InferenceConfig\n", @@ -339,55 +320,53 @@ ")\n", "\n", "service.wait_for_deployment(show_output=True)" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Test service\n", "\n", "Below you test the service. The first score takes a while as the feast registry file is downloaded from blob. Subsequent runs will be faster as feast uses a local cache for the registry." - ], - "metadata": {} + ] }, { "cell_type": "code", "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import json\n", "\n", "input_payload = json.dumps({\"driver\":50521, \"customer_id\":20265})\n", "\n", "service.run(input_data=input_payload)" - ], - "outputs": [], - "metadata": {} + ] } ], "metadata": { + "interpreter": { + "hash": "1f420f8439dfed2bd66fe971ededeecdddcec354e785e62812183e5ad86a193f" + }, + "kernel_info": { + "name": "newenv" + }, "kernelspec": { - "name": "python3", - "display_name": "Python 3.8.11 64-bit ('feast-test': conda)" + "display_name": "Python 3.8.11 64-bit ('feast-dev': conda)", + "name": "python3" }, "language_info": { - "name": "python", - "version": "3.8.11", - "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, - "pygments_lexer": "ipython3", + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", "nbconvert_exporter": "python", - "file_extension": ".py" - }, - "kernel_info": { - "name": "newenv" - }, - "nteract": { - "version": "nteract-front-end@1.0.0" + "pygments_lexer": "ipython3", + "version": "3.8.11" }, "microsoft": { "host": { @@ -396,10 +375,10 @@ } } }, - "interpreter": { - "hash": "613d966a601f3cb09e011da323334ad47ef40156c92ff5801f125852831662b0" + "nteract": { + "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} diff --git a/provider/docs/tutorial/notebooks/feature_repo/feature_store.yaml b/provider/docs/tutorial/notebooks/feature_repo/feature_store.yaml new file mode 100644 index 0000000..e7d30cc --- /dev/null +++ b/provider/docs/tutorial/notebooks/feature_repo/feature_store.yaml @@ -0,0 +1,11 @@ +registry: + registry_store_type: feast_azure_provider.registry_store.AzBlobRegistryStore + path: https://.blob.core.windows.net///registry.db +project: production +provider: feast_azure_provider.azure_provider.AzureProvider +online_store: + type: redis + connection_string: ${REDIS_CONN} +offline_store: + type: feast_azure_provider.mssqlserver.MsSqlServerOfflineStore + connection_string: ${SQL_CONN}