Fixes#24070
Use scalar ALU for 1 chacha block with rvv ALU simultaneously.
The tail elements(non-multiple of block length) will be handled by
the scalar logic.
Use rvv path if the input length > chacha_block_size.
And we have about 1.2x improvement comparing with the original code.
Reviewed-by: Hongren Zheng <i@zenithal.me>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24097)
In order to get asm code running on OpenBSD we must place
all constants into .rodata sections.
davidben@ also pointed out we need to adjust `x86_64-xlate.pl` perlasm
script to adjust read-olny sections for various flavors (OSes). Those
changes were cherry-picked from boringssl.
closes#23312
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23997)
current `translate_msg()` function attempts to set `->msg_name`
(and `->msg_namelen`) with `BIO`'s peer name (connection destination)
regardless if underlying socket is connected or not. Such implementation
uncovers differences in socket implementation between various OSes.
As we have learned hard way `sendmsg()` and `sendmmsg()` on `OpenBSD`
and (`MacOS` too) fail to send messages with `->msg_name` being
set on connected socket. In such case the caller receives
`EISCON` errro.
I think `translate_msg()` caller should provide a hint to indicate
whether we deal with connected (or un-connected) socket. For
connected sockets the peer's name should not be set/filled
by `translate_msg()`. On the other hand if socket is un-connected,
then `translate_msg()` must populate `->msg_name` and `->msg_namelen`
members.
The caller can use `getpeername(2)` to see if socket is
connected. If `getpeername()` succeeds then we must be dealing
with connected socket and `translate_msg()` must not set
`->msg_name` and `->msg_namelen` members. If `getpeername(2)`
fails, then `translate_msg()` must provide peer's name (destination
address) in `->msg_name` and set `->msg_namelen` accordingly.
The propposed fix introduces `is_connected()` function,
which applies `getpeername()` to socket bound to `BIO` instance.
The `dgram_sendmmsg()` uses `is_connected()` as a hint
for `translate_msg()` function, so msghdr gets initialized
with respect to socket state.
The change also modifies existing `test/quic_client_test.c`
so it also covers the case of connected socket. To keep
things simple we can introduce optional argument `connect_first`
to `./quic_client_test` function. Without `connect_first`
the test run as usual. With `connect_first` the test creates
and connects socket first. Then it passes such socket to
`BIO` sub-system to perform `QUIC` connect test as usual.
Fixes#23251
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23396)
The atomics fallbacks were using 'void *' as a generic transport for all
possible scalar and pointer types, with the hypothesis that a pointer is
as large as the largest possible scalar type that we would use.
Then enters the use of uint64_t, which is larger than a pointer on any
32-bit system (or any system that has 32-bit pointer configurations).
We could of course choose a larger type as a generic transport. However,
that only pushes the problem forward in time... and it's still a hack.
It's therefore safer to reimplement the fallbacks per type that atomics
are used for, and deal with missing per type fallbacks when the need
arrises in the future.
For test build purposes, the macro USE_ATOMIC_FALLBACKS is introduced.
If OpenSSL is configured with '-DUSE_ATOMIC_FALLBACKS', the fallbacks
will be used, unconditionally.
Fixes#24096
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24123)
CLA: trivial
In the provider store API, it is not necessary to provide both open and
attach method at the same time and providing at least one of them is
enough. Adding some null pointer checks to prevent exceptions in case
of not providing both methods at the same time.
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23703)
CRYPTO_atomic_add has a lock as a parameter, which is often ignored, but in
some cases (for example, when BROKEN_CLANG_ATOMICS is defined) it is required.
There is no easy way to determine if the lock is needed or not. The current
logic looks like this:
if defined(OPENSSL_THREADS) && !defined(CRYPTO_TDEBUG) && !defined(OPENSSL_SYS_WINDOWS)
if defined(__GNUC__) && defined(__ATOMIC_ACQ_REL) && !defined(BROKEN_CLANG_ATOMICS)
- It works without the lock, but in general the need for the
lock depends on __atomic_is_lock_free results
elif defined(__sun) && (defined(__SunOS_5_10) || defined(__SunOS_5_11))
- The lock is not needed (unless ret is NULL, which should never
happen?)
else
- The lock is required
endif
else
- The lock is not needed
endif
Adding such conditions outside of crypto.h is error-prone, so it is better to
always allocate the lock, otherwise CRYPTO_atomic_add may silently fail.
Fixes#23376.
CLA: trivial
Fixes: fc570b2605 ("Avoid taking a write lock in ossl_provider_doall_activated()")
Signed-off-by: Oleg Bulatov <oleg@bulatov.me>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24081)
Currently 20-test_dgst.t calls a quite bogus command:
$ openssl dgst -sha256 -hmac -macopt hexkey:FFFF test/data.bin test/data.bin
hexkey:FFFF: No such file or directory
HMAC-SHA2-256(test/data.bin)= b6727b7bb251dfa65846e0a8223bdd57d244aa6d7e312cb906d8e21f2dee3a57
HMAC-SHA2-256(test/data.bin)= b6727b7bb251dfa65846e0a8223bdd57d244aa6d7e312cb906d8e21f2dee3a57
805B632D4A730000:error:80000002:system library:file_ctrl:No such file or directory:crypto/bio/bss_file.c:297:calling fopen(hexkey:FFF, r)
805B632D4A730000:error:10080002:BIO routines:file_ctrl:system lib:crypto/bio/bss_file.c:300:
Does not check status code, discards stderr, and verifies the
checksums as per above. Note that the checksum is for the HMAC key
"-macopt", and `hexkey:FFFF` is attempted to be opened as a file.
See HMAC values for key `-macopt` and `hexkey:FFFF` using `openssl-mac`:
$ openssl mac -digest SHA256 -macopt hexkey:$(printf '%s' '-macopt' | xxd -p -u) -in ./test/data.bin HMAC
B6727B7BB251DFA65846E0A8223BDD57D244AA6D7E312CB906D8E21F2DEE3A57
$ openssl mac -digest SHA256 -macopt hexkey:FFFF -in ./test/data.bin HMAC
7C02D4A17D2560A5BB6763EDBF33F3A34F415398F8F2E07F04B83FFD7C087DAE
Fix this test case to actually use HMAC with hexkey:FFFF as intended.
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
(Merged from https://github.com/openssl/openssl/pull/24068)
Since the addition of macos14 M1 runners in our CI jobs we've been
seeing periodic random failures in the test_threads CI job.
Specifically we've seen instances in which the shared pointer in the
test (which points to a monotonically incrementing uint64_t went
backwards.
From taking a look at the disassembled code in the failing case, we see
that __atomic_load_n when emitted in clang 15 looks like this
0000000100120488 <_ossl_rcu_uptr_deref>:
100120488: f8bfc000 ldapr x0, [x0]
10012048c: d65f03c0 ret
Notably, when compiling with gcc on the same system we get this output
instead:
0000000100120488 <_ossl_rcu_uptr_deref>:
100120488: f8bfc000 ldar x0, [x0]
10012048c: d65f03c0 ret
Checking the arm docs for the difference between ldar and ldapr:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDAPR--Load-Acquire-RCpc-Register-https://developer.arm.com/documentation/dui0802/b/A64-Data-Transfer-Instructions/LDAR
It seems that the ldar instruction provides a global cpu fence, not
completing until all writes in a given cpus writeback queue have
completed
Conversely, the ldapr instruction attmpts to achieve performance
improvements by honoring the Local Ordering register available in the
system coprocessor, only flushing writes in the same address region as
other cpus on the system.
I believe that on M1 virtualized cpus the ldapr is not properly ordering
writes, leading to an out of order read, despite the needed fencing.
I've opened an issue with apple on this here:
https://developer.apple.com/forums/thread/749530
I believe that it is not safe to issue an ldapr instruction unless the
programmer knows that the Local order registers are properly configured
for use on the system.
So to fix it I'm proposing with this patch that we, in the event that:
1) __APPLE__ is defined
AND
2) __clang__ is defined
AND
3) __aarch64__ is defined
during the build, that we override the ATOMIC_LOAD_N macro in the rcu
code such that it uses a custom function with inline assembly to emit
the ldar instruction rather than the ldapr instruction. The above
conditions should get us to where this is only used on more recent MAC
cpus, and only in the case where the affected clang compiler emits the
offending instruction.
I've run this patch 10 times in our CI and failed to reproduce the
issue, whereas previously I could trigger it within 5 runs routinely.
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23974)
SM2 requires that the public EC_POINT be present in a key when signing.
If its not there we crash on a NULL pointer. Add a check to ensure that
its present, and raise an error if its not
Reviewed-by: Paul Yang <kaishen.yy@antfin.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23887)