Prioritize system CA over bundled CA#40583
Conversation
|
@gtcooke94 , Please take a look at this PR |
pawbhard
left a comment
There was a problem hiding this comment.
We might need to update tests , I see may failure in our CI. I would let @gtcooke94 take a look at this.
There was a problem hiding this comment.
We need to think about this more. This represents a significant behavior change to the C code that I don't think we can introduce without breaking semantic versioning. I also don't think we should introduce this change at all (it's intentional that a user setting certificate overrides takes precedence over the default system root cert).
I believe the issue here lies in how the python layering configures certificates, not in the C-Core behavior. Thus, this needs to be fixed in the Python layering. To be specific to your above comments, Python converts (5), the gRPC roots to (3), the override callback.
|
Deferring the review to @gtcooke94. |
My very rough understanding, and let me know if any of this is incorrect, is that a user setting a certificate would be through (1) or (2). Someone writing custom C could theoretically provide (3) themselves but is that a real use case? That would be the main breaking change that would require someone to explicitly disable system certs (unless they really want the bundled ones for some reason). I'm assuming the main use case for (3) is to provide the bundled certs in a language package specific way. But users writing C code would actually get the system certs by default because (3) wouldn't be set, which is actually a disconnect between C and other languages |
Can you expand on what you mean here? As for whether or not (3) is a real use case, we can't really know - such is a challenge of working on OSS. Given that this ordering change is specifically for the callback which is exposed in C-Core APIs, I think we could technically change this without breaking commitments because C-Core APIs are not considered stable. On the other hand, changing the ordering of certificate preferences, even if only accessible via a C-Core API isn't so much a breaking compiler change, but rather a significant security difference. Adding @sergiitk to this discussion as the Python lead, and I think the bug here lies in Python's integration with C-Core, not C-Core itself. |
|
It might be worth considering adding a new API/parameter to the existing core API to express intent - "I'm setting an environment level preference" vs "I want to override the defaults for my application". Such a thing would likely need updating the Python sources, and Ruby/PHP/etc. It will be a significant behavior change for Python users however, and it's unclear to me how to safely do that. |
The easiest way to explain is by using the C++ example during a normal run. But if I do the same thing as the Python example and remove specifying a CA in the channel credentials on the client side: grpc::SslCredentialsOptions ssl_options;
// ssl_options.pem_root_certs = LoadStringFromFile(kRootCertificate); <- commented this out
// Create a channel with SSL credentials
GreeterClient greeter(
grpc::CreateChannel(target_str, grpc::SslCredentials(ssl_options)));And then run as is I would get a certificate verification error as expected: But if I then load the So for some client languages, the system CAs are used by default, and for some the gRPC bundled CAs are used by default. Based on https://github.com/search?q=repo%3Agrpc%2Fgrpc%20grpc_set_ssl_roots_override_callback&type=code it looks like Python, Ruby, and PHP will use the gRPC bundled CAs, where clients like C++, Objective C, and maybe Node would use the system CAs by default. |
|
I think we are on the same page - the existing ordering behavior (that we must keep) is that system roots are preferred over gRPC roots, and both are below the callback in preference order. It is a bug in the Python, Ruby, and PHP wrappers that the gRPC roots are preferred over system roots by setting the callback as the gRPC roots. |
Are you saying it's up to Python/Ruby/PHP to supply the system roots in the callback instead? Or that they should be using a different mechanism than the callback to supply the gRPC roots as a fallback? It seems that the callback is the method of supplying a language-specific location for the gRPC roots, so this PR would exactly fix the bug as you described it? With the only caveat being manual use of |
|
This PR "fixes" the Python behavior by changing the Core C behavior which is what we cannot do. I do wonder what would happen if we simply didn't override anything at all - would the C Core behavior properly fall back to the system then repo roots since Python is still just wrapping C-Core? |
I'm saying I'm guessing/assuming the whole point of the Core C behavior in question is to provide methods for wrapping languages to provide the same CAs in a package-dependent manner, so it seems odd to say that can't be changed if that is the bug. Direct use of the callback is possible and I'm kinda asking if that is a real use case anyone would actually use, and if the answer is yes there is a simple workaround when upgrading, just disable the system roots via
The problem is default C-Core system roots will likely only exist if you install grpc via a system package manager. If you simple |
|
Relevant issue with linked PRs indicating the callback was created exactly for this purpose: #4834 |
|
I do see what you're getting at and thanks for linking the original design, but I'm still very hesitant to change a behavior in a breaking way (even if it isn't intended to be used in that way). This is definitely in a murky place stability-wise since it is accessible via C-Core APIs. Pulling in @markdroth for his thoughts. If we do decide that changing the ordering behavior is the route we want to take, in a best effort to not break anything, we'll need to put it behind a flag such that the default behavior does not change, and you can opt-in to the new ordering. I'd still rather see this fixed at the wrapping layer if we can do that without changing the ordering in |
The main awkward thing is there is already a flag to opt out of using system CAs. So also adding a flag to opt in to using them would be pretty confusing to a user. But it is an option.
I just don't really see how this could be fixed at the wrapping layer, unless you "break" the behavior by respecting |
|
It is awkward that there is already a flag, but the new flag wouldn't specifically be opting into using system root certs at all, but rather opting into using system root certs above the certs in the callback in preference ordering. This would open up a way forward without causing a default behavior change. To be clear, I agree with you, and I think this PR is what was intended when this override function was added, but somewhere along the way the intention got mixed up. However, I generally want to avoid a breaking security behavior change for anyone who happened to use this. |
Something like |
|
How about I've discussed some with other contributors and we think this is the best approach to take - we can also mark the existing C API with more documentation saying what it is and basically not to use it and migrate to something else (if someone is manually setting roots via this callback, they should be able to use Then, with enough time and notice, we can swap the ordering to what was intended (this PRs ordering). |
|
Ok did my best to add a new config and test it |
a0671bf to
363e21b
Compare
<!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> Resolves grpc#40580 Currently the priority order of loadings CAs for a client is something like: 1. The certificate provided in the SSL channel credentials 2. The value of `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH` 3. The result of calling the `ssl_roots_override_cb` if set 4. The system CAs unless `GRPC_NOT_USE_SYSTEM_SSL_ROOTS` is set 5. CAs shipped with gRPC native install The problem is the different language clients use `ssl_roots_override_cb` as the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, as `ssl_roots_override_cb` is always set to something. This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`. The old behavior can be maintained by simply setting `GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true`. This was described well in grpc#29682 (comment) ## Example Using `examples/python/auth/token_based_auth_*.py` in an ubuntu Docker container: Update `channel_credentials` to not specify a root certificate: ```python channel_credential = grpc.ssl_channel_credentials( # _credentials.ROOT_CERTIFICATE ) ``` Load the custom CA into the system CA store ```bash cp credentials/root.crt /usr/local/share/ca-certificates/ update-ca-certificates ``` Run the server and client: ```bash python token_based_auth_server.py & python token_based_auth_client.py ``` ### Expected behavior with this change This works, successfully verifying the certificate: ``` INFO:__main__:Received message: message: "Hello, you!" ``` ### Actual behavior currently ``` I0000 00:00:1756469167.513885 36074 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate I0000 00:00:1756469167.515520 36075 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate ERROR:__main__:Received error: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate" debug_error_string = "UNKNOWN:Error received from peer {grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"}" ``` Closes grpc#40583 COPYBARA_INTEGRATE_REVIEW=grpc#40583 from Kimahriman:ca-priority 0bf8e76 PiperOrigin-RevId: 810473149
<!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> Resolves grpc#40580 Currently the priority order of loadings CAs for a client is something like: 1. The certificate provided in the SSL channel credentials 2. The value of `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH` 3. The result of calling the `ssl_roots_override_cb` if set 4. The system CAs unless `GRPC_NOT_USE_SYSTEM_SSL_ROOTS` is set 5. CAs shipped with gRPC native install The problem is the different language clients use `ssl_roots_override_cb` as the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, as `ssl_roots_override_cb` is always set to something. This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH`. The old behavior can be maintained by simply setting `GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true`. This was described well in grpc#29682 (comment) ## Example Using `examples/python/auth/token_based_auth_*.py` in an ubuntu Docker container: Update `channel_credentials` to not specify a root certificate: ```python channel_credential = grpc.ssl_channel_credentials( # _credentials.ROOT_CERTIFICATE ) ``` Load the custom CA into the system CA store ```bash cp credentials/root.crt /usr/local/share/ca-certificates/ update-ca-certificates ``` Run the server and client: ```bash python token_based_auth_server.py & python token_based_auth_client.py ``` ### Expected behavior with this change This works, successfully verifying the certificate: ``` INFO:__main__:Received message: message: "Hello, you!" ``` ### Actual behavior currently ``` I0000 00:00:1756469167.513885 36074 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate I0000 00:00:1756469167.515520 36075 ssl_transport_security.cc:1883] Handshake failed with error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate ERROR:__main__:Received error: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate" debug_error_string = "UNKNOWN:Error received from peer {grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Ssl handshake failed (TSI_PROTOCOL_FAILURE): SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate"}" ``` Closes grpc#40583 COPYBARA_INTEGRATE_REVIEW=grpc#40583 from Kimahriman:ca-priority 0bf8e76 PiperOrigin-RevId: 810473149
Resolves #40580
Currently the priority order of loadings CAs for a client is something like:
GRPC_DEFAULT_SSL_ROOTS_FILE_PATHssl_roots_override_cbif setGRPC_NOT_USE_SYSTEM_SSL_ROOTSis setThe problem is the different language clients use
ssl_roots_override_cbas the mechanism to use the CAs shipped with gRPC through a language specific packaging mechanism. So Python and other derived languages can never use the system CAs, asssl_roots_override_cbis always set to something.This purposes to switch the order of 3 and 4, so system CAs are prioritized over those shipped with gRPC for different language clients. The system CAs are likely more up to date than the custom list shipped with the client. Additionally, companies can have custom CAs automatically installed on corporate systems that should just automatically work for gRPC TLS verification, but don't unless you manually specify the system CA in
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH.The old behavior can be maintained by simply setting
GRPC_NOT_USE_SYSTEM_SSL_ROOTS=true.This was described well in #29682 (comment)
Example
Using
examples/python/auth/token_based_auth_*.pyin an ubuntu Docker container:Update
channel_credentialsto not specify a root certificate:Load the custom CA into the system CA store
Run the server and client:
python token_based_auth_server.py & python token_based_auth_client.pyExpected behavior with this change
This works, successfully verifying the certificate:
Actual behavior currently