Skip to content

Prioritize system CA over bundled CA#40583

Closed
Kimahriman wants to merge 4 commits intogrpc:masterfrom
Kimahriman:ca-priority
Closed

Prioritize system CA over bundled CA#40583
Kimahriman wants to merge 4 commits intogrpc:masterfrom
Kimahriman:ca-priority

Conversation

@Kimahriman
Copy link
Contributor

Resolves #40580

Currently the priority order of loadings CAs for a client is something like:

  1. The certificate provided in the SSL channel credentials
  2. The value of GRPC_DEFAULT_SSL_ROOTS_FILE_PATH
  3. The result of calling the ssl_roots_override_cb if set
  4. The system CAs unless GRPC_NOT_USE_SYSTEM_SSL_ROOTS is set
  5. CAs shipped with gRPC native install

The problem is the different language clients use ssl_roots_override_cb as the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, as ssl_roots_override_cb is always set to something.

This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in GRPC_DEFAULT_SSL_ROOTS_FILE_PATH.

The old behavior can be maintained by simply setting GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true.

This was described well in #29682 (comment)

Example

Using examples/python/auth/token_based_auth_*.py in an ubuntu Docker container:

Update channel_credentials to not specify a root certificate:

    channel_credential = grpc.ssl_channel_credentials(
        # _credentials.ROOT_CERTIFICATE
    )

Load the custom CA into the system CA store

cp credentials/root.crt /usr/local/share/ca-certificates/
update-ca-certificates

Run the server and client:

python token_based_auth_server.py &
python token_based_auth_client.py

Expected behavior with this change

This works, successfully verifying the certificate:

INFO:__main__:Received message: message: "Hello, you!"

Actual behavior currently

I0000 00:00:1756469167.513885   36074 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
I0000 00:00:1756469167.515520   36075 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
ERROR:__main__:Received error: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"}"

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 30, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@pawbhard pawbhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pawbhard pawbhard added kokoro:run release notes: yes Indicates if PR needs to be in release notes labels Sep 4, 2025
@pawbhard
Copy link
Contributor

pawbhard commented Sep 4, 2025

@gtcooke94 , Please take a look at this PR

Copy link
Contributor

@pawbhard pawbhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update tests , I see may failure in our CI. I would let @gtcooke94 take a look at this.

Copy link
Contributor

@gtcooke94 gtcooke94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about this more. This represents a significant behavior change to the C code that I don't think we can introduce without breaking semantic versioning. I also don't think we should introduce this change at all (it's intentional that a user setting certificate overrides takes precedence over the default system root cert).

I believe the issue here lies in how the python layering configures certificates, not in the C-Core behavior. Thus, this needs to be fixed in the Python layering. To be specific to your above comments, Python converts (5), the gRPC roots to (3), the override callback.

@matthewstevenson88
Copy link
Contributor

Deferring the review to @gtcooke94.

@matthewstevenson88 matthewstevenson88 removed their request for review September 4, 2025 14:23
@Kimahriman
Copy link
Contributor Author

We need to think about this more. This represents a significant behavior change to the C code that I don't think we can introduce without breaking semantic versioning. I also don't think we should introduce this change at all (it's intentional that a user setting certificate overrides takes precedence over the default system root cert).

I believe the issue here lies in how the python layering configures certificates, not in the C-Core behavior. Thus, this needs to be fixed in the Python layering. To be specific to your above comments, Python converts (5), the gRPC roots to (3), the override callback.

My very rough understanding, and let me know if any of this is incorrect, is that a user setting a certificate would be through (1) or (2). Someone writing custom C could theoretically provide (3) themselves but is that a real use case? That would be the main breaking change that would require someone to explicitly disable system certs (unless they really want the bundled ones for some reason). I'm assuming the main use case for (3) is to provide the bundled certs in a language package specific way. But users writing C code would actually get the system certs by default because (3) wouldn't be set, which is actually a disconnect between C and other languages

@gtcooke94 gtcooke94 requested a review from sergiitk September 4, 2025 16:25
@gtcooke94
Copy link
Contributor

gtcooke94 commented Sep 4, 2025

But users writing C code would actually get the system certs by default because (3) wouldn't be set, which is actually a disconnect between C and other languages

Can you expand on what you mean here?

As for whether or not (3) is a real use case, we can't really know - such is a challenge of working on OSS. Given that this ordering change is specifically for the callback which is exposed in C-Core APIs, I think we could technically change this without breaking commitments because C-Core APIs are not considered stable. On the other hand, changing the ordering of certificate preferences, even if only accessible via a C-Core API isn't so much a breaking compiler change, but rather a significant security difference.

Adding @sergiitk to this discussion as the Python lead, and I think the bug here lies in Python's integration with C-Core, not C-Core itself.

@ctiller
Copy link
Member

ctiller commented Sep 4, 2025

It might be worth considering adding a new API/parameter to the existing core API to express intent - "I'm setting an environment level preference" vs "I want to override the defaults for my application".

Such a thing would likely need updating the Python sources, and Ruby/PHP/etc.

It will be a significant behavior change for Python users however, and it's unclear to me how to safely do that.

@Kimahriman
Copy link
Contributor Author

But users writing C code would actually get the system certs by default because (3) wouldn't be set, which is actually a disconnect between C and other languages

Can you expand on what you mean here?

The easiest way to explain is by using the C++ example examples/cpp/auth. I'm having a little trouble getting it to work, as it currently gives me

root@dbcaf7edf57c:/data/examples/cpp/auth# ./ssl_client 
16: Invalid signature
Greeter received: RPC failed

during a normal run.

But if I do the same thing as the Python example and remove specifying a CA in the channel credentials on the client side:

  grpc::SslCredentialsOptions ssl_options;
  // ssl_options.pem_root_certs = LoadStringFromFile(kRootCertificate); <- commented this out
  // Create a channel with SSL credentials
  GreeterClient greeter(
      grpc::CreateChannel(target_str, grpc::SslCredentials(ssl_options)));

And then run as is I would get a certificate verification error as expected:

root@dbcaf7edf57c:/data/examples/cpp/auth# ./ssl_client 
14: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
Greeter received: RPC failed

But if I then load the root.ca into the system certs, it is able to successfully verify the cert (but I still get the Invalid signature error, not sure if that's an issue with my setup or something wrong with that example now)

root@dbcaf7edf57c:/data/examples/cpp/auth# cp credentials/root.crt /usr/local/share/ca-certificates/
root@dbcaf7edf57c:/data/examples/cpp/auth# update-ca-certificates 
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
rehash: warning: skipping duplicate certificate in root.pem
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
Processing triggers for ca-certificates-java (20240118) ...
Replacing debian:root.pem
done.
done.
root@dbcaf7edf57c:/data/examples/cpp/auth# ./ssl_client 
16: Invalid signature
Greeter received: RPC failed

So for some client languages, the system CAs are used by default, and for some the gRPC bundled CAs are used by default. Based on https://github.com/search?q=repo%3Agrpc%2Fgrpc%20grpc_set_ssl_roots_override_callback&type=code it looks like Python, Ruby, and PHP will use the gRPC bundled CAs, where clients like C++, Objective C, and maybe Node would use the system CAs by default.

@pawbhard pawbhard self-requested a review September 5, 2025 12:02
@gtcooke94
Copy link
Contributor

gtcooke94 commented Sep 5, 2025

I think we are on the same page - the existing ordering behavior (that we must keep) is that system roots are preferred over gRPC roots, and both are below the callback in preference order. It is a bug in the Python, Ruby, and PHP wrappers that the gRPC roots are preferred over system roots by setting the callback as the gRPC roots.

@Kimahriman
Copy link
Contributor Author

I think we are on the same page - the existing ordering behavior (that we must keep) is that system roots are preferred over gRPC roots, and both are below the callback in preference order. It is a bug in the Python, Ruby, and PHP wrappers that the gRPC roots are preferred over system roots by setting the callback as the gRPC roots.

Are you saying it's up to Python/Ruby/PHP to supply the system roots in the callback instead? Or that they should be using a different mechanism than the callback to supply the gRPC roots as a fallback? It seems that the callback is the method of supplying a language-specific location for the gRPC roots, so this PR would exactly fix the bug as you described it? With the only caveat being manual use of grpc_set_ssl_roots_override_callback by users in C code would break without disabling system roots.

@gtcooke94
Copy link
Contributor

This PR "fixes" the Python behavior by changing the Core C behavior which is what we cannot do.
The wrapping languages are incorrectly using this override by using it to provide the gRPC repo roots. While this callback is currently used to provide wrapped language roots, that is not it's only use.

I do wonder what would happen if we simply didn't override anything at all - would the C Core behavior properly fall back to the system then repo roots since Python is still just wrapping C-Core?
The python roots aren't special, they seem to be just the gRPC repo roots ref.

@Kimahriman
Copy link
Contributor Author

This PR "fixes" the Python behavior by changing the Core C behavior which is what we cannot do. The wrapping languages are incorrectly using this override by using it to provide the gRPC repo roots. While this callback is currently used to provide wrapped language roots, that is not it's only use.

I'm saying I'm guessing/assuming the whole point of the Core C behavior in question is to provide methods for wrapping languages to provide the same CAs in a package-dependent manner, so it seems odd to say that can't be changed if that is the bug. Direct use of the callback is possible and I'm kinda asking if that is a real use case anyone would actually use, and if the answer is yes there is a simple workaround when upgrading, just disable the system roots via GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true, then the existing behavior would remain.

I do wonder what would happen if we simply didn't override anything at all - would the C Core behavior properly fall back to the system then repo roots since Python is still just wrapping C-Core? The python roots aren't special, they seem to be just the gRPC repo roots ref.

The problem is default C-Core system roots will likely only exist if you install grpc via a system package manager. If you simple pip install grpcio, then /usr/share/grpc/roots.pem isn't going to exist. That's why the callback exists, because you need to call Python code in order to extract the roots.pem from the installed Python package

@Kimahriman
Copy link
Contributor Author

Relevant issue with linked PRs indicating the callback was created exactly for this purpose: #4834

@gtcooke94
Copy link
Contributor

I do see what you're getting at and thanks for linking the original design, but I'm still very hesitant to change a behavior in a breaking way (even if it isn't intended to be used in that way). This is definitely in a murky place stability-wise since it is accessible via C-Core APIs. Pulling in @markdroth for his thoughts.

If we do decide that changing the ordering behavior is the route we want to take, in a best effort to not break anything, we'll need to put it behind a flag such that the default behavior does not change, and you can opt-in to the new ordering.

I'd still rather see this fixed at the wrapping layer if we can do that without changing the ordering in ssl_utils.cc.

@Kimahriman
Copy link
Contributor Author

If we do decide that changing the ordering behavior is the route we want to take, in a best effort to not break anything, we'll need to put it behind a flag such that the default behavior does not change, and you can opt-in to the new ordering.

The main awkward thing is there is already a flag to opt out of using system CAs. So also adding a flag to opt in to using them would be pretty confusing to a user. But it is an option.

I'd still rather see this fixed at the wrapping layer if we can do that without changing the ordering in ssl_utils.cc.

I just don't really see how this could be fixed at the wrapping layer, unless you "break" the behavior by respecting GRPC_NOT_USE_SYSTEM_SSL_ROOTS in each wrapping function and forcing Python/PHP/Ruby to provide their own way of finding the correct system CAs, and switching those languages to use the system CAs by default.

@gtcooke94
Copy link
Contributor

It is awkward that there is already a flag, but the new flag wouldn't specifically be opting into using system root certs at all, but rather opting into using system root certs above the certs in the callback in preference ordering. This would open up a way forward without causing a default behavior change.

To be clear, I agree with you, and I think this PR is what was intended when this override function was added, but somewhere along the way the intention got mixed up. However, I generally want to avoid a breaking security behavior change for anyone who happened to use this.

@Kimahriman
Copy link
Contributor Author

It is awkward that there is already a flag, but the new flag wouldn't specifically be opting into using system root certs at all, but rather opting into using system root certs above the certs in the callback in preference ordering. This would open up a way forward without causing a default behavior change.

Something like GRPC_PREFER_USE_SYSTEM_SSL_ROOTS? I can start looking into that unless you are waiting for other opinions.

@gtcooke94
Copy link
Contributor

How about GRPC_USE_SYSTEM_ROOTS_OVER_LANGUAGE_CALLBACK

I've discussed some with other contributors and we think this is the best approach to take - we can also mark the existing C API with more documentation saying what it is and basically not to use it and migrate to something else (if someone is manually setting roots via this callback, they should be able to use TlsCredentials instead as the intended way for a user to provide certificates).

Then, with enough time and notice, we can swap the ordering to what was intended (this PRs ordering).

@Kimahriman
Copy link
Contributor Author

Ok did my best to add a new config and test it

Copy link
Contributor

@gtcooke94 gtcooke94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this! LGTM from the security perspective.
Retriggering @pawbhard's review request for him to look at it from the config var perspective.

Copy link
Contributor

@pawbhard pawbhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anniefrchz anniefrchz mentioned this pull request Oct 3, 2025
asheshvidyut pushed a commit to asheshvidyut/grpc that referenced this pull request Oct 10, 2025
<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Resolves grpc#40580

Currently the priority order of loadings CAs for a client is something like:
1. The certificate provided in the SSL channel credentials
2. The value of `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`
3. The result of calling the `ssl_roots_override_cb` if set
4. The system CAs unless `GRPC_NOT_USE_SYSTEM_SSL_ROOTS` is set
5. CAs shipped with gRPC native install

The problem is the different language clients use `ssl_roots_override_cb` as the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, as `ssl_roots_override_cb` is always set to something.

This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`.

The old behavior can be maintained by simply setting `GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true`.

This was described well in grpc#29682 (comment)

## Example
Using `examples/python/auth/token_based_auth_*.py` in an ubuntu Docker container:

Update `channel_credentials` to not specify a root certificate:
```python
    channel_credential = grpc.ssl_channel_credentials(
        # _credentials.ROOT_CERTIFICATE
    )
```

Load the custom CA into the system CA store
```bash
cp credentials/root.crt /usr/local/share/ca-certificates/
update-ca-certificates
```

Run the server and client:
```bash
python token_based_auth_server.py &
python token_based_auth_client.py
```

### Expected behavior with this change
This works, successfully verifying the certificate:
```
INFO:__main__:Received message: message: "Hello, you!"
```

### Actual behavior currently
```
I0000 00:00:1756469167.513885   36074 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
I0000 00:00:1756469167.515520   36075 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
ERROR:__main__:Received error: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"}"
```

Closes grpc#40583

COPYBARA_INTEGRATE_REVIEW=grpc#40583 from Kimahriman:ca-priority 0bf8e76
PiperOrigin-RevId: 810473149
sreenithi pushed a commit to sreenithi/grpc that referenced this pull request Nov 26, 2025
<!--

If you know who should review your pull request, please assign it to that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the appropriate
lang label.

-->

Resolves grpc#40580

Currently the priority order of loadings CAs for a client is something like:
1. The certificate provided in the SSL channel credentials
2. The value of `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`
3. The result of calling the `ssl_roots_override_cb` if set
4. The system CAs unless `GRPC_NOT_USE_SYSTEM_SSL_ROOTS` is set
5. CAs shipped with gRPC native install

The problem is the different language clients use `ssl_roots_override_cb` as the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, as `ssl_roots_override_cb` is always set to something.

This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`.

The old behavior can be maintained by simply setting `GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true`.

This was described well in grpc#29682 (comment)

## Example
Using `examples/python/auth/token_based_auth_*.py` in an ubuntu Docker container:

Update `channel_credentials` to not specify a root certificate:
```python
    channel_credential = grpc.ssl_channel_credentials(
        # _credentials.ROOT_CERTIFICATE
    )
```

Load the custom CA into the system CA store
```bash
cp credentials/root.crt /usr/local/share/ca-certificates/
update-ca-certificates
```

Run the server and client:
```bash
python token_based_auth_server.py &
python token_based_auth_client.py
```

### Expected behavior with this change
This works, successfully verifying the certificate:
```
INFO:__main__:Received message: message: "Hello, you!"
```

### Actual behavior currently
```
I0000 00:00:1756469167.513885   36074 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
I0000 00:00:1756469167.515520   36075 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
ERROR:__main__:Received error: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"}"
```

Closes grpc#40583

COPYBARA_INTEGRATE_REVIEW=grpc#40583 from Kimahriman:ca-priority 0bf8e76
PiperOrigin-RevId: 810473149
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python client cannot use system CAs

6 participants