Skip to content

Conversation

@alexdowad
Copy link
Contributor

On short strings, there is no difference in performance. However, for strings around 10,000 bytes long, the AVX2-accelerated function is about 55% faster than the SSE2-accelerated one.

FYA @cmb69 @Girgias @nikic @nielsdos

…ds for now)

On short strings, there is no difference in performance. However, for
strings around 10,000 bytes long, the AVX2-accelerated function is
about 55% faster than the SSE2-accelerated one.
@alexdowad
Copy link
Contributor Author

It would be better if the 'AVX2 resolver' stuff was implemented as well... this is just a quick change which I coded up in a few minutes, which boosts performance for AVX2-only builds.

Copy link
Member

@ndossche ndossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, just one small nit


#define BLOCKCONV_LOAD(input) \
__m256i blconv_operand = _mm256_loadu_si256((__m256i*)(input)); \
__m256i blconv_mask = _mm256_cmpgt_epi8(blconv_threshold, _mm256_add_epi8(blconv_operand, blconv_offset));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe this should be changed to lt instead of gt with the argument swapped in order to be symmetric with the SSE2 version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 There is no such thing as _mm256_cmplt_epi8. That's why I had to change to cmpgt and swap the arguments to get the code to compile.

Ref: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_cmplt_epi8

If we want to make it symmetric, then we would need to change the SSE2 version.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexdowad
Copy link
Contributor Author

Thanks for reviewing! Just merging now.

@alexdowad alexdowad closed this Feb 3, 2023
@alexdowad alexdowad deleted the stravx2 branch February 3, 2023 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants