Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hedging: Adds Read hedging PREVIEW contracts #4598

Merged

Conversation

NaluTripician
Copy link
Contributor

@NaluTripician NaluTripician commented Jul 22, 2024

Pull Request Template

Description

Adds PREVIEW contracts for request hedging

Parallel Hedging APIs + Samples

When Building a new CosmosClient there will be an option to include Parallel hedging in that client.

CosmosClient client = new CosmosClientBuilder("connection string")
    .WithApplicationPreferredRegions(
        new List<string> { "East US", "Central US", "West US" } )
    .WithAvailabilityStrategy(
        AvailabilityStrategy.CrossRegionHedgingStrategy(
        threshold: TimeSpan.FromMilliseconds(500),
        thresholdStep: TimeSpan.FromMilliseconds(100)
     ))
    .Build();

or

CosmosClientOptions options = new CosmosClientOptions()
{
    AvailabilityStrategy
     = AvailabilityStrategy.CrossRegionHedgingStrategy(
        threshold: TimeSpan.FromMilliseconds(500),
        thresholdStep: TimeSpan.FromMilliseconds(100)
     )
      ApplicationPreferredRegions = new List<string>() { "East US", "West US", "Central US"},
};

CosmosClient client = new CosmosClient(
    accountEndpoint: "account endpoint",
    authKeyOrResourceToken: "auth key or resource token",
    clientOptions: options);

The example above will create a CosmosClient instance with AvailabilityStrategy enabled with at 500ms threhshold. This means that if a request takes longer than 500ms the SDK will send a new request to the backend in order of the Preferred Regions List. If the ApplicationRegion or ApplicationPreferredRegions list is not set, then an AvailabilityStrategy will not be able to be set. If still no response comes back from the first hedge or the primary request after the step time, another parallel request will be made to the next region. The SDK will then return the first response that comes back from the backend. The threshold parameter is a required parameter can can be set to any value greater than 0. There is also an option to the AvailabilityStrategy at request level and override the client level AvailabilityStrategy by setting the AvailabilityStrategy on the RequestOptions object.

Note: ApplicationRegion or ApplicationPreferredRegions MUST be set to use Hedging

Override AvailabilityStrategy:

//Send one request out with a more aggressive threshold
ItemRequestOptions requestOptions = new ItemRequestOptions()
{
    AvailabilityStrategyOptions =AvailabilityStrategy.CrossRegionHedgingStrategy(
        threshold: TimeSpan.FromMilliseconds(100),
        thresholdStep: TimeSpan.FromMilliseconds(50)
     ))
};

Hedging can be enabled for all read requests: ReadItem, Queries (single and cross partition), ReadMany, and ChangeFeed. It is not enabled for write requests.

Diagnostics

In the diagnostics data there are two new areas of note Response Region and Hedge Context that will appear when using this feature. Response Region shows the region that the request is ultimately served out of. Hedge Context shows all the regions requests were sent to.

Design

The SDK will send the first request to the primary region. If there is no response from the backend before the threshold time, then the SDK will begin sending hedged requests to the regions in order of the ApplicationPreferredRegions list. After the first hedged request is sent out, the hedged requests will continue to be fired off one by one after waiting the time specified in the threshold step. Once a response is received from one of the requests, the availability strategy will check to see if the result is considered final. If the result is final, then it is returned. If not, the SDK will skip the remaining threshold/threshold step time and send out the next hedged request. If all hedged requests are sent out and no final response is received, the SDK will return the last response it received. The AvaiabilityStrategy operates on the RequestInvokerHandler level meaning that each hedged request will go through its own handler pipeline, including the ClientRetryPolicy. This means that the hedged requests will be retried independently of each other. Note that the hedged requests are restricted to the region they are sent out in so no cross region retries will be made, only local retries. The primary request will be retried as normal.

Status Codes SDK Will Consider Final

Status Code Description
1xx 1xx Status Codes are considered Final
2xx 2xx Status Codes are considered Final
3XX 3xx Status Codes are considered Final
400 Bad Request
401 Unauthorized
404/0 Not Found, 404/0 responses are final results as the document was not yet available after enforcing the consistency model
409 Conflict
405 Method Not Allowed
412 Precondition Failed
413 Request Entity Too Large

All other status codes are treated as possible transient errors and will be retried with hedging.

Example Flow For Cross Region Hedging With 3 Regions

graph TD
    A[RequestMessage] <--> B[RequestInvokerHandler]
    B <--> C[CrossRegionHedgingStrategy]
    C --> E(PrimaryRequest)
    E --> F{time spent < threshold}

    F -- No --> I
    F -- Yes --> G[[Wait for response]]
    G -- Response --> H{Is Response Final}
    H -- Yes --> C
    H -- No --> I(Hedge Request 1)
    
    I --> J{time spent < threshold step}

    J -- No --> K(Hedge Request 2)
    J -- Yes --> M[[Wait for response]]
    M -- Response --> N{Is Response Final}
    N -- Yes --> C
    N -- No --> K

    K --> O[[Wait for response]]
    O -- Response --> P{Is Response Final}
    P -- Yes --> C
    P -- No, But this is the final hedge request --> C

    
Loading

@NaluTripician NaluTripician self-assigned this Jul 22, 2024
@NaluTripician NaluTripician marked this pull request as ready for review July 22, 2024 21:41
kundadebdatta
kundadebdatta previously approved these changes Aug 9, 2024
Copy link
Member

@kundadebdatta kundadebdatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kirankumarkolli
Copy link
Member

Update PR description with

  • API contracts.
  • Sample usage
  • Any troubleshooting guides/references

@NaluTripician NaluTripician changed the title [Internal] Hedging: Adds PREVIEW contracts Hedging: Adds PREVIEW contracts Aug 9, 2024
Copy link
Member

@ealsur ealsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check accessors on methods

@kirankumarkolli kirankumarkolli changed the title Hedging: Adds PREVIEW contracts Hedging: Adds Read hedging PREVIEW contracts Aug 15, 2024
@kirankumarkolli
Copy link
Member

PR description builder pattern needs to include ApplicationRegion or PreferredRegions

Copy link
Contributor

@philipthomas-MSFT philipthomas-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No major qualms.

@microsoft-github-policy-service microsoft-github-policy-service bot merged commit d242b85 into master Aug 29, 2024
21 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the users/nalutripician/hedgingPreviewAPIs branch August 29, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants