Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
0 answers
38 views

I am running a straightforward parallelized computation using RcppEigen and OpenMP. Here is a minimal reproducible example of the code: // [[Rcpp::depends(RcppEigen)]] // [[Rcpp::plugins(openmp)]] #...
Ari.stat's user avatar
  • 469
2 votes
1 answer
147 views

Consider the following code: #pragma omp parallel for (int run = 0; run < 10; run++) { std::vector<int> out; #pragma omp for for (int i = 0; i < 1'000'000; i++) { ... } } ...
F.X.'s user avatar
  • 7,515
Best practices
0 votes
3 replies
96 views

I'm struggling to finalise the design of my C++17 library. One of the primary goals is to use runtime polymorphism to allow users to extend or rewrite default features of the library for their own use ...
josh_eime's user avatar
  • 196
2 votes
1 answer
117 views

I'm one of the developers of the Lumen code: https://www.lumen-code.org/. That is computational code for condensed matter physics simulations. We are replacing FORALL with DO CONCURRENT, since FORALL ...
attacc's user avatar
  • 21
0 votes
1 answer
95 views

the code below crashes with terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >' Aborted ...
user1407220's user avatar
1 vote
1 answer
114 views

So i'm working on some homework related to fixing som buggy code when i ran into an interesting problem. I don't think it was one of the indended bugs because the lecturer was confused by it as well. ...
Tuned Rockets's user avatar
-1 votes
0 answers
34 views

Overview - simple actuarial model to perform a stochastic simulation on a group of individuals. How it works - In a nutshell, the sequential version works as follows: in each of the nRip iterations, ...
AM A's user avatar
  • 1
3 votes
2 answers
176 views

I am optimising a code already parallelised but not very optimised because of the very different duration of operations among the threads, even though all threads have a similar task to do. So I ...
martinit18's user avatar
0 votes
0 answers
79 views

I'm working on porting the CAMB (Cosmological Boltzmann code) to run with hybrid CPU+GPU parallelization using OpenMP and OpenACC with NVIDIA HPC SDK compilers. The code works perfectly when compiled ...
Sbomba Sbomba's user avatar
0 votes
0 answers
82 views

I'm developing an Android app using the NDK that plays MIDI files with FluidSynth, integrated with a Java frontend. The app crashes with a native crash in libomp.so at __kmpc_barrier, and I suspect it'...
3 votes
2 answers
130 views

I have written some programs with OMP reduction directive in Fortran and in C. But the data types were simple (int, float, arrays with fixed size, ...) and reduction-identifiers used were implicitly ...
Stef1611's user avatar
  • 2,525
2 votes
3 answers
110 views

We are working with the following code: int i, j, k; for (i = 2; i < n; i++){ // S1 for (j = 3; j < n - 3; j++){ // S2 for (k = 4; k < n - 4; k++){ // S3 A[...
user31223185's user avatar
1 vote
1 answer
138 views

I am currently trying to porting a big portion of a Fortran code to GPU devices with OpenMP. I have a working version for AMD, specifically for the MI300A which features unified shared memory. I ...
Giorgio Daneri's user avatar
1 vote
1 answer
166 views

My program needs to perform some heavy calculations on all widgets in the box. The calculations are repeated an appreciable number of times processing multiple variations of each widget. All of the ...
Mikhail T.'s user avatar
  • 4,266
0 votes
0 answers
102 views

I'm implementing a bitwise Radix Sort to sort an array of 64-bit unsigned integers (key_t is uint_fast64_t) that represent encoded points. I have two versions of the algorithm: One using OpenMP only ✅...
Padibel's user avatar
3 votes
1 answer
98 views

I am reading the OMP documentation (6.0) about atomic operations. As I came across clauses I read about write and update. I understand the difference between the two: write is used to atomically ...
GabrijelOkorn's user avatar
0 votes
2 answers
133 views

I am working on a project which performs extensive matrix operations and takes advantage of GPU offloading via OpenMP. Things work well for most of the code, except when I try to run some target-...
Giovanni La Mura's user avatar
0 votes
0 answers
78 views

I have a function that contains an OpenMP-parallelized for loop, which calls a callback at each iteration, similar to this: template<class Callback> void iterate(const Callback& callback, ...
tmlen's user avatar
  • 9,230
2 votes
1 answer
88 views

I have a function that uses an OpenMP-parallelized for loop in its implementation. It should be possible to turn on/off the parallelization at runtime. Currently it is like this: void iterate(bool ...
tmlen's user avatar
  • 9,230
1 vote
0 answers
120 views

I'm working on a C code application, running on Windows 10; my CPU is an i7-7500u with 2 cores. My C compiler is anaconda3/Library/bin/gcc.exe and I use the following flags: -O2 -march=native -...
AmnonJW's user avatar
  • 11
1 vote
2 answers
72 views

I'm trying to implement Conway's Game of Life in C, showing that it can scale with OpenMP. The results in the following picture are from running it on a socket of a machine with 64 cores equally ...
pettepiero's user avatar
0 votes
2 answers
44 views

I'm currently dealing with some benchmarks that are highly dependent on OpenMP settings. I noticed that the results can vary greatly from setting values for variables such as OMP_PLACES OMP_PROC_BIND ...
Tara Tara's user avatar
8 votes
0 answers
131 views

The OpenMP specification names the worksharing-loop construct with syntax #pragma omp for and the loop construct with syntax #pragma omp loop The description in the linked pages sounds a bit like ...
fuz's user avatar
  • 94.8k
0 votes
1 answer
89 views

I've been trying to get clang OpenMP GPU offloading working in the Docker Image and it builds fine at first (cmdline: ~/llvm_project/llvm/utils/docker:$ bash build_docker_image.sh --source nvidia-...
Florian Becker's user avatar
0 votes
0 answers
109 views

For example, When I multithread a "for loop" using OpenMP, making the iterations as different threads, How does it get translated to Assembly? Also can a multithreaded code run on hardware ...
Ali Asgar 3's user avatar
3 votes
1 answer
108 views

I have a found myself most confused by the result of my tests regarding effects of different scheduling for #pragma omp parallel for. Basically my confusion sprouted from the following problem: I have ...
Yas Nas's user avatar
  • 41
0 votes
0 answers
43 views

I'm encountering a strange issue where OpenMP parallel code crashes on certain Android devices when the APK contains large files (100MB+) in assets or res/raw. The crash disappears after removing ...
nan's user avatar
  • 1
1 vote
2 answers
159 views

I have some code that previously looked like: std::vector<MyObject> data; do { // May run 100 times or more prepare(); remaining = compute1(my_data); remaining -= compute2(my_data); ...
user2233709's user avatar
0 votes
0 answers
112 views

I use the following command to compile an executable file for Android: cmake \ -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -...
XUHAO77's user avatar
  • 11
0 votes
1 answer
109 views

I am having trouble identifying a problem in an OpenMP parallelized Fortran code. In doing so, I created a small reproducible that mimics the derived types I am using in the code. The small ...
luismsfern's user avatar
2 votes
0 answers
78 views

For a school assignment I need a precise definition of OpenMP. I find the Wikipedia and other sources defective. Specificaly I need to understand what is the difference between OpenMP and OpenMP API. ...
GabrijelOkorn's user avatar
1 vote
0 answers
63 views

Recent OpenMP standards introduced task multi-dependencies. This feature was originally proposed in OmpSs if I'm not mistaken. I found an example of applying multi-dependencies to the assembly step of ...
IPribec's user avatar
  • 246
1 vote
1 answer
101 views

I'm developing an R package using Rcpp with OpenMP for parallelization. I'm facing an issue where threads are randomly spread across CPU cores, leading often to threads beeing scheduled primarily on ...
roh's user avatar
  • 11
1 vote
1 answer
105 views

According to OpenMP specification (2.1. Directive Format) Directives may not appear in constexpr functions or in constant expressions. Variadic parameter packs cannot be expanded into a directive or ...
Chameleon's user avatar
  • 2,239
0 votes
0 answers
47 views

In gcc (GCC) 14.2.0 the following program int main(int argc, char *argv[]) { #pragma omp parallel for for (int i{0}; i < 1; ++i) { } return 0; } does compile with g++ main.cpp (without ...
phinz's user avatar
  • 1,647
1 vote
0 answers
55 views

I'm using Open MP with Fortran 08 (compiled with GFortran) to create nested parallel regions (in my case, a 3-level nest). Before running my executable, I set some Linux environment variables to ...
Frank's user avatar
  • 11
1 vote
0 answers
94 views

I wrote these two programs to test OMP PARALLEL: 1 : my_idx is declared at the beginning of the program and after a private copy for eah thread is done. program my_pgm use omp_lib implicit ...
Stef1611's user avatar
  • 2,525
0 votes
0 answers
46 views

I wrote the following program : program my_pgm use omp_lib implicit none CALL OMP_set_dynamic(.FALSE.) CALL OMP_set_num_threads(2) !$OMP PARALLEL DEFAULT(SHARED) ...
Stef1611's user avatar
  • 2,525
1 vote
1 answer
116 views

The task was to implement various matrix multiplication algorithms using OpenMP. It turned out that with num_threads(1), the program runs faster than with any other number of threads. Is this due to ...
Илья Анненков's user avatar
2 votes
2 answers
97 views

I am writing some code in C in which I want to add the optional ability to have certain sections of the code accelerated using OpenMP, and with an additional optional ability to have them accelerated ...
Matthew G.'s user avatar
4 votes
4 answers
428 views

I implemented a simple 2d separable convolution code: void SeparableConvolution2D(const float * restrict mI, float * restrict mO, float * restrict mB, int numRows, int numCols, const float * restrict ...
Royi's user avatar
  • 5,133
0 votes
1 answer
106 views

The following code gives multiple outputs Sum of first 100000 natural numbers: 5000050000 Sum of first 100000 natural numbers: 5000050000 Sum of first 100000 natural numbers: 5000050000 Sum of ...
Anjeneya Swami Kare's user avatar
2 votes
1 answer
134 views

I'm trying to perform an std::vector sum reduction with an OpenMP reduction declaration for it: // g++ -fopenmp MRE.cpp -o MRE #include <vector> #include <algorithm> #include <omp.h> ...
Stefan de Souza's user avatar
0 votes
1 answer
53 views

I am getting following results roughly 20% of the execution with the given code and lastprivate clause is not working as expected. Pls note that end value of b should be 50 however at times it is not ...
Muhammad Arshad Islam's user avatar
1 vote
0 answers
81 views

Good morning, I was trying to parallelize a loop with OMP in a program I created for a flood simulator with the goal of achieving a performance boost. The loop I want to parallelize is as follows: #...
NaeSs's user avatar
  • 11
1 vote
3 answers
173 views

I'm exploring OpenMP in Fortran, using the Mandelbrot algorithm as an example: !$omp parallel do reduction(+:iters) do iy=0,30 do ix=0,70 c = cmplx(xmin+stepx*ix, ymin+stepy*iy, qp) ...
Raf's user avatar
  • 1,789
0 votes
1 answer
102 views

I'm trying to write to file a certain value, calculated with some parallelized operations using Eigen, in particular with the flag -fopenmp. I've written a code that works perfectly with -g -o, but I'...
Fra_liturri's user avatar
6 votes
3 answers
254 views

I want to use the inclusive scan operation in OpenMP to implement an algorithm. What follows is a description of my attempt at doing so, and failing to get more than a tepid speedup. The inclusive ...
smilingbuddha's user avatar
1 vote
1 answer
66 views

Leaving out non-essential code: double* max_of_two(double *r,double *n); int npoints = 1000000; double* xy = (double*)malloc(2*npoints*sizeof(double)); double zero_coord[2] = {0.0,0.0}; double ...
Victor Eijkhout's user avatar
0 votes
1 answer
42 views

In OpenMP specification 6.0, the barrier construct "specifies an explicit barrier at the point at which the construct appears", and "all threads of the team that executes that binding ...
Ke Du's user avatar
  • 1

1
2 3 4 5
135