Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
78 views

I have a function that contains an OpenMP-parallelized for loop, which calls a callback at each iteration, similar to this: template<class Callback> void iterate(const Callback& callback, ...
tmlen's user avatar
  • 9,230
2 votes
1 answer
134 views

I'm trying to perform an std::vector sum reduction with an OpenMP reduction declaration for it: // g++ -fopenmp MRE.cpp -o MRE #include <vector> #include <algorithm> #include <omp.h> ...
Stefan de Souza's user avatar
1 vote
1 answer
99 views

The book Algorithms Illuminated Part 4 has the following definition to prove problems NP-hard: A problem A reduces to another problem B if an algorithm that solves B can be easily translated into one ...
MangoPizza's user avatar
0 votes
1 answer
88 views

I can't understand if torch.scatter or torch.gather could be used to reduce values of a tensor according to a reduction function over specified indices. I've frequently used the torch_geometric.nn....
daqh's user avatar
  • 146
1 vote
1 answer
66 views

Leaving out non-essential code: double* max_of_two(double *r,double *n); int npoints = 1000000; double* xy = (double*)malloc(2*npoints*sizeof(double)); double zero_coord[2] = {0.0,0.0}; double ...
Victor Eijkhout's user avatar
0 votes
1 answer
27 views

OpenCL offers built-in/intrinsic "vector types" (see table 3 at the link), such as int4 or float2. It also defines binary and unary elementwise operators which accept these types, e.g. ...
einpoklum's user avatar
  • 138k
2 votes
0 answers
43 views

I am trying to use a reduction algorithm like thrust::reduce for a sequence of matrices. Let's say I want to do the product of N matrices: A1A2....*AN. I think a reduction algorithm would be great ...
Santiago's user avatar
0 votes
0 answers
60 views

I've encountered a bug when using Clang[1] with libomp[2] whereby using omp_priv = omp_orig in the initializer of a custom OpenMP reduction silently gives erroneous output. For example: /* file.cpp */ ...
Anti Earth's user avatar
  • 4,941
3 votes
1 answer
105 views

I am attempting to parallelise a calculation and consolidate the results into a matrix. A large number of calculations are performed and each one contributes to a summed matrix of all the results. ...
Neil Butcher's user avatar
  • 1,074
0 votes
0 answers
54 views

I've got the following problem, trying to parallelize my code. The simplified code looks like this: !$omp parallel do private(e, b0_vek) reduction(+:b_vek) schedule(static, chunk_elem) do e = 1, ...
user22593146's user avatar
0 votes
0 answers
188 views

I am exploring a reduction from the general Independent Set Problem to the Independent Set Problem specifically for 3-colorable graphs. The goal is to demonstrate that the maximal independent set of a ...
Mason Kane's user avatar
-1 votes
1 answer
103 views

The code solves the following equation: A1(y,bp,kp) = \sum_i (B(y,yp_i)*C(Yp_i,Bp,Kp)*sum_j(D(bpp_j,kpp_j,yp_i,bp,kp)*A0(yp_i,bpp,kpp))) I have the following code with multiple do-loops. The purpose ...
Zillux's user avatar
  • 1
0 votes
1 answer
101 views

Okay so I'm just learning some lambda calculus and I came across this problem. Perform reduction on this - if it cannot be reduced then say it will diverge (λy.(λx.xx)y)(λx.x) These are the steps I ...
Priit's user avatar
  • 1
-2 votes
1 answer
111 views

I'm following a previous answered question here about how to implement an all reduce in cuda, which links to a slide deck from nvidia. What I have works majority of the time (when the input size is a ...
JakeTuero's user avatar
  • 105
0 votes
2 answers
944 views

I am trying to write a CUDA kernel that is used to find min and max index of values in 1D array greater than a particular threshold Below is the pseudo code in CPU for doing the same int min_index = 0,...
Sampath's user avatar
2 votes
1 answer
335 views

I want to find the total sum, minimum and maximum (and their positions) in a matrix using openMP, and more specifically, the reduction clause. The problem I'm having is that I can't apply reduction ...
Pablo's user avatar
  • 95
0 votes
0 answers
208 views

I am trying to implement a small library of mathematical functions for 32-bit floats (simple precision) as part of one of my java projects. When it comes to calculating the sine of very large ...
dananr's user avatar
  • 1
1 vote
2 answers
257 views

I am try to writing a simple game and I need to study some x86 assemble for vector operation. Use xmm as 4 packed single-precision floating-point, are there any aggregate operations? Such as: "...
wangjianyu's user avatar
-1 votes
1 answer
167 views

I got this bug while running SExtractor `----- SExtractor 2.28.0 started on 2023-12-04 at 16:30:12 with 1 thread Reading Neural Network Weights Error: SOM file not found: default.som ` But I didn't ...
Meee_'s user avatar
  • 1
0 votes
1 answer
183 views

I got the following problem statement: "Given a undirected graph, check if a cycle of K nodes exists." And I want to take any input and convert it to a Conjunctive Normal Form formula for ...
Josué Pedrajas's user avatar
0 votes
1 answer
293 views

I'm trying to use the bkz_reduction function of the fplll library in my c++ programm, however, I always get an "undefined reference to `fplll::bkz_reduction(fplll::ZZ_mat<__mpz_struct [1]>&...
user22749332's user avatar
-4 votes
1 answer
60 views

#include <stdio.h> #include <omp.h> #define N 5 int X[N]; int main() { int num = 0; int moy = 0; // Initialize the array (you should populate it as needed) for (int i = ...
M V's user avatar
  • 3
1 vote
0 answers
89 views

I have the following function that calculates the maximum value of 2D/3D arrays in a nested for loop. I used reduction clause to gain some additional speedup however I am not getting a good speedup ...
Jamie 's user avatar
  • 527
0 votes
1 answer
65 views

I am developing my own implementation of sparse BLAS functions for CSC storage formats. To do so, I created the following data structure: typedef struct SparseMatrixCSC { int m; // Number ...
Nicolas Venkovic's user avatar
0 votes
1 answer
248 views

I am taking a Linear Algebra for Data Science class through DeepLearning.AI, and one of the exercises has a linear system of equation problem, where you define 3 functions (MultiplyRow, AddRows, ...
mush's user avatar
  • 1
2 votes
3 answers
312 views

Let's say I have a 3d numpy array.shape of (27,27,27). I want to compress this to (9,9,9) by averaging every 3 elements across every axis simultaneously (e.g. make 3x3x3 pixels into 1x1x1).  The ...
Lee Drake's user avatar
1 vote
0 answers
283 views

What is the standard approach for doing a reduction operation such as computing the maximum, on an entire structured buffer in HLSL? Context: I have a HLSL RWStructuredBuffer which I want to normalize ...
Niels's user avatar
  • 169
0 votes
1 answer
186 views

I am trying to parallelize the following function (an iterative solver) that has a while loop and a nested for loop inside. The code looks like: static const int nx = 128; static const int ny = 128; ...
Jamie 's user avatar
  • 527
-2 votes
1 answer
244 views

TLDR: I am trying to write a GPU code that computes a blockwise reduction on an array. The input looks like [block_0, trash_0, block_1, trash_1, ..., block_n, trash_n], and I want to compute block_0 + ...
s769's user avatar
  • 3
1 vote
0 answers
30 views

Following are the two code snippets where I am working with OpenMP reduction, in the first case (reduction variable total_num_sp_edges) every time I get the correct result, but in the second case, ...
Faysal's user avatar
  • 35
0 votes
2 answers
285 views

I have to do partial sums using parallel reduction approach in C. but I doesn't have any idea about it. So, I need guidance of Community to achieve this. What I need to achieve: for example, ...
Harsh Patel's user avatar
  • 1,434
1 vote
1 answer
100 views

The following Fortran code fails (random result), but replacing the call to mysum by abc=abc+1 gives the correct result. How to make OpenMP recognizing the reduction in a subprogram? program reduc ...
MRheinhardt's user avatar
-1 votes
1 answer
353 views

I have the following example code: !$omp threadpriavate(var) !$omp parallel do reduction(+:var) do var = var + compilated_floating_point_computation() end do !$omp end parallel do print *,var And ...
nadavhalahmi's user avatar
0 votes
0 answers
49 views

I have to send a struct that contains, among other things, a dynamically allocated array of another struct. The receiver has to merge the received message with its data and then send the result to ...
Scotty's user avatar
  • 43
2 votes
2 answers
124 views

I am searching for a suitable stream-based reduction operation to find the maximum difference of a double-list. (Please no solutions with old-style nested for-loops...) Lets say my double list is List&...
Andy's user avatar
  • 518
-1 votes
1 answer
57 views

In a set of tuples (pairs), like this one: s = {(1, 2), (3, 4), (1, 3), ('v', 'n'), ('v', 'k')} I would like to remove all pairs (a, b) and (a, c), so that the resulting set is: {(3, 4)} Is there a ...
Paul Jurczak's user avatar
  • 8,650
1 vote
2 answers
93 views

I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example: line_no | def | undef -------------------- 1 | 'a' | NULL 2 | '...
Christoph's user avatar
0 votes
1 answer
724 views

I am currently sitting on a java problem I've found online. We have an array which has several thousand, if not millions, of entries. the goal is to efficiently get the full sum of the array. The ...
againeatingkirby's user avatar
0 votes
1 answer
60 views

What does this do, and is there a simpler way to write it? Collection>>into: a2block | all pair | all := ([:allIn :each| allIn key key: allIn value. each]) -> (pair := nil -...
Jim Sawyer's user avatar
8 votes
1 answer
4k views

Assume a 2*X(always 2 rows) pytorch tensor: A = tensor([[ 1., 2., 2., 3., 3., 3., 4., 4., 4.], [43., 33., 43., 76., 33., 76., 55., 55., 55.]]) torch.unique(A, dim=1) will return: ...
ojipadeson's user avatar
1 vote
0 answers
502 views

I have an array of tensors for a single image. I want to flatten the vectors and perform PCA on the same. The below is the code to extract the tensors on a single image : bottle_neck_model_tensors = ...
budding_star's user avatar
0 votes
1 answer
895 views

I am trying to use thrust to reduce an array of 1M elements to a single value. My code is as follows: #include<chrono> #include<iostream> #include<thrust/host_vector.h> #include<...
thePhantom's user avatar
1 vote
0 answers
491 views

I wrote a decision tree regressor from scratch in python. It is outperformed by the sklearn algorithm. Both trees build exactly the same splits with the same leaf nodes. BUT when looking for the best ...
Gianluca Armeli's user avatar
0 votes
1 answer
77 views

Initially I had the loop import numpy datos = numpy.random.rand(1000,17) clusters = 250 n_variables = 17 centros = numpy.random.rand(clusters,n_variables) desviaciones = numpy.random.rand(n_variables)...
Jesus M.'s user avatar
0 votes
0 answers
70 views

I have a homework problem that I am finding difficult to begin. We are working on Karp (single-call) reductions to show intractability. For this assignment, the problem is intentionally vague. I was ...
enarm4's user avatar
  • 1
1 vote
1 answer
101 views

My computer has 4 cores and now I ran the stringList list on the 4 cores using the parallel method and called the reduce method with the value identity = "A". Normally, this list should be ...
AMZ's user avatar
  • 416
0 votes
1 answer
413 views

From this question and this question I managed to compile a minimal example of summing a vector into a single double inside OpenCL 1.2. /* https://suhorukov.blogspot.com/2011/12/opencl-11-atomic-...
Dávid Tóth's user avatar
  • 3,315
1 vote
1 answer
957 views

Suppose there is a satisfiability problem (call it oscillating-CNF) where the input is a list of CNF clauses and we want to show that this problem is indeed NP-complete (by reducing CNF-SAT to ...
Danny Agir's user avatar
0 votes
1 answer
339 views

I ran into a problem for understanding the logic behind "the last warp loop unrolling" technique in Nvidia's parallel reduction tutorial available here. In case of thread31 (for which tid=31)...
Reza Namvar's user avatar
1 vote
0 answers
264 views

In ManagmentSystem I have a basic descriptor setup, binding attributes by name to the Employee descriptor class in a loop. I want to do this dynamically, passing values to init to bind them to ...
Pepe's user avatar
  • 59

1
2 3 4 5
10