qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 082/116] target/i386: correct fix for pcmpxstrx substring sear


From: Philippe Mathieu-Daudé
Subject: Re: [PULL 082/116] target/i386: correct fix for pcmpxstrx substring search
Date: Mon, 15 Jun 2020 12:18:04 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 6/12/20 6:07 PM, Paolo Bonzini wrote:
> From: Joseph Myers <joseph@codesourcery.com>
> 
> This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
> / pcmpestrm / pcmpistri / pcmpistrm substring search, commit
> ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.
> 
> That commit fixed a bug that showed up in four GCC tests with one libc
> implementation.  The tests in question generate random inputs to the
> intrinsics and compare results to a C implementation, but they only
> test 1024 possible random inputs, and when the tests use the cases of
> those instructions that work with word rather than byte inputs, it's
> easy to have problematic cases that show up much less frequently than
> that.  Thus, testing with a different libc implementation, and so a
> different random number generator, showed up a problem with the
> previous patch.
> 
> When investigating the previous test failures, I found the description
> of these instructions in the Intel manuals (starting from computing a
> 16x16 or 8x8 set of comparison results) confusing and hard to match up
> with the more optimized implementation in QEMU, and referred to AMD
> manuals which described the instructions in a different way.  Those
> AMD descriptions are very explicit that the whole of the string being
> searched for must be found in the other operand, not running off the
> end of that operand; they say "If the prototype and the SUT are equal
> in length, the two strings must be identical for the comparison to be
> TRUE.".  However, that statement is incorrect.
> 
> In my previous commit message, I noted:
> 
>   The operation in this case is a search for a string (argument d to
>   the helper) in another string (argument s to the helper); if a copy
>   of d at a particular position would run off the end of s, the
>   resulting output bit should be 0 whether or not the strings match in
>   the region where they overlap, but the QEMU implementation was
>   wrongly comparing only up to the point where s ends and counting it
>   as a match if an initial segment of d matched a terminal segment of
>   s.  Here, "run off the end of s" means that some byte of d would
>   overlap some byte outside of s; thus, if d has zero length, it is
>   considered to match everywhere, including after the end of s.
> 
> The description "some byte of d would overlap some byte outside of s"
> is accurate only when understood to refer to overlapping some byte
> *within the 16-byte operand* but at or after the zero terminator; it
> is valid to run over the end of s if the end of s is the end of the
> 16-byte operand.  So the fix in the previous patch for the case of d
> being empty was correct, but the other part of that patch was not
> correct (as it never allowed partial matches even at the end of the
> 16-byte operand).  Nor was the code before the previous patch correct
> for the case of d nonempty, as it would always have allowed partial
> matches at the end of s.
> 
> Fix with a partial revert of my previous change, combined with
> inserting a check for the special case of s having maximum length to
> determine where it is necessary to check for matches.
> 
> In the added test, test 1 is for the case of empty strings, which
> failed before my 2017 patch, test 2 is for the bug introduced by my
> 2017 patch and test 3 deals with the case where a match of an initial
> segment at the end of the string is not valid when the string ends
> before the end of the 16-byte operand (that is, the case that would be
> broken by a simple revert of the non-empty-string part of my 2017
> patch).
> 
> Signed-off-by: Joseph Myers <joseph@codesourcery.com>
> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  target/i386/ops_sse.h                |  4 ++--
>  tests/tcg/i386/Makefile.target       |  3 +++
>  tests/tcg/i386/test-i386-pcmpistri.c | 33 ++++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+), 2 deletions(-)
>  create mode 100644 tests/tcg/i386/test-i386-pcmpistri.c
> 
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 01d6017412..14f2b16abd 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -2089,10 +2089,10 @@ static inline unsigned pcmpxstrx(CPUX86State *env, 
> Reg *d, Reg *s,
>              res = (2 << upper) - 1;
>              break;
>          }
> -        for (j = valids - validd; j >= 0; j--) {
> +        for (j = valids == upper ? valids : valids - validd; j >= 0; j--) {
>              res <<= 1;
>              v = 1;
> -            for (i = validd; i >= 0; i--) {
> +            for (i = MIN(valids - j, validd); i >= 0; i--) {
>                  v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
>              }
>              res |= v;
> diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
> index 43ee2e181e..53efec0668 100644
> --- a/tests/tcg/i386/Makefile.target
> +++ b/tests/tcg/i386/Makefile.target
> @@ -10,6 +10,9 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
>  SKIP_I386_TESTS=test-i386-ssse3
>  X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
>  
> +test-i386-pcmpistri: CFLAGS += -msse4.2
> +run-test-i386-pcmpistri: QEMU_OPTS += -cpu max

This test fails on our CI:
https://travis-ci.org/github/qemu/qemu/jobs/698006621#L4246

> +
>  #
>  # hello-i386 is a barebones app
>  #
> diff --git a/tests/tcg/i386/test-i386-pcmpistri.c 
> b/tests/tcg/i386/test-i386-pcmpistri.c
> new file mode 100644
> index 0000000000..1e81ae611a
> --- /dev/null
> +++ b/tests/tcg/i386/test-i386-pcmpistri.c
> @@ -0,0 +1,33 @@
> +/* Test pcmpistri instruction.  */
> +
> +#include <nmmintrin.h>
> +#include <stdio.h>
> +
> +union u {
> +    __m128i x;
> +    unsigned char uc[16];
> +};
> +
> +union u s0 = { .uc = { 0 } };
> +union u s1 = { .uc = "abcdefghijklmnop" };
> +union u s2 = { .uc = "bcdefghijklmnopa" };
> +union u s3 = { .uc = "bcdefghijklmnab" };
> +
> +int
> +main(void)
> +{
> +    int ret = 0;
> +    if (_mm_cmpistri(s0.x, s0.x, 0x4c) != 15) {
> +        printf("FAIL: pcmpistri test 1\n");
> +        ret = 1;
> +    }
> +    if (_mm_cmpistri(s1.x, s2.x, 0x4c) != 15) {
> +        printf("FAIL: pcmpistri test 2\n");
> +        ret = 1;
> +    }
> +    if (_mm_cmpistri(s1.x, s3.x, 0x4c) != 16) {
> +        printf("FAIL: pcmpistri test 3\n");
> +        ret = 1;
> +    }
> +    return ret;
> +}
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]