Skip to main content
Filter by
Sorted by
Tagged with
7 votes
1 answer
154 views

I had written a method to update a HashMap, provided an int id value (ex: client id) as key, the method would check if the key is available in the map, if not it would create an entry and associate a ...
Gautham M's user avatar
  • 5,174
3 votes
1 answer
214 views

I have the following C code that I am testing to understand perf and caching. It sequentially accesses an array of doubles. // test.c #include <stdio.h> #include <stdlib.h> #include <...
user180574's user avatar
  • 6,244
2 votes
1 answer
78 views

I have an open source C/C++ program on Linux amd64 that processes a PDF input file and that I did not write by myself. So I'm not familiar with its code. Processing a PDF file read from local disk ...
MrSnrub's user avatar
  • 1,265
1 vote
1 answer
80 views

I'm experimenting with perf record --control to profile select sections of a program. Here's a Rust program that uses perf to profile the call to a function waste_time(): use libc; use log::info; use ...
Edd Barrett's user avatar
  • 3,685
1 vote
0 answers
186 views

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...
plznobug's user avatar
  • 143
0 votes
0 answers
36 views

I'm trying to profile a Python FastAPI application (which uses LangGraph) using Scalene on Windows. Since Scalene's Windows version doesn't support multithreading, I'm running it in WSL instead. When ...
Raffa50's user avatar
  • 23
2 votes
0 answers
63 views

I had a multi-process application to profile using perf with the following command: sudo perf record -a -g -F 99 -e cycles:u -- sleep 50000 & The sleep time is over 13 hours. The program should ...
Bartłomiej Dudek's user avatar
1 vote
0 answers
85 views

This question came up while I was saving a large number of model-inferred embeddings to plain text. To do so, I needed to convert lists of float embeddings into strings, and I found this conversion to ...
K_Augus's user avatar
  • 474
1 vote
0 answers
34 views

I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
smz's user avatar
  • 515
1 vote
1 answer
51 views

I have a json file that contains profiling data that can be opened with chrome's trace-viewer. I can do it manually by opening chrome://tracing, then selecting 'load' and then loading the json file. ...
Crumml's user avatar
  • 81
0 votes
0 answers
108 views

We tried profiling a simple MAC operation using both RISC-V Vector (RVV) intrinsics and plain C code. Surprisingly, the C version performs better, even though the intrinsics code processes 16 ...
shreyas's user avatar
0 votes
0 answers
100 views

I only started to use heaptrack and can not set filtering by modules. It possible to do from gui like this Heap track but output very nosy and this filter doesn't influence to other tabs. Does exist ...
Александр Чулгарев's user avatar
1 vote
1 answer
63 views

I have a Helidon app and would like to take CPU samples and/or start a CPU profiler. This does not work. With the same setup, it works for a simple (non Helidon) app Trying to start the CPU (and also ...
Itchy's user avatar
  • 2,464
1 vote
1 answer
31 views

I am analyzing my numpy/python code by running it with "-m cProfile". Snakeviz shows as the entry with most time spent: 20895038 calls to ufunc_api.py:173(__call__) with the majority of the ...
j13r's user avatar
  • 2,741
1 vote
0 answers
176 views

On Linux, I often find myself perusing perf stat to figure out whether a code change improved things like cache miss rate. (I'm specifically interested in cache miss rates and page faults.) Now I'm ...
Marcus Müller's user avatar
1 vote
1 answer
175 views

I'm developing a Flutter application that doesn't utilize emojis in any part of the UI or logic. However, upon profiling the app using Android Studio's Memory Profiler, I observed that androidx.emoji2....
Filip Golovic's user avatar
1 vote
0 answers
70 views

I am trying to find bottlenecks in some shaders through NVIDIA Nsight Graphics. Right now I am focusing on trying to understand one result that seems impossible. The profiling UI shows that on each ...
Makogan's user avatar
  • 10k
1 vote
1 answer
55 views

Suppose i have function foo void foo() { //do something } Now this fn foo is now called by other function defined in other files. if gprof is enabled it would do profiling activity and subroutines ...
surajrgupta's user avatar
0 votes
0 answers
18 views

I'm trying to use dtrace to generate flamegraphs on MacOS. I've had quite a few problems but I've pinned down the root cause, though I still don't know how to solve it. Adding a bit of logging to my ...
ora's user avatar
  • 1
0 votes
0 answers
64 views

I am in a powershell window on my PC and I am attempting to run 'dotnet-dsrouter android' in order to set up profiling of my app on my android device. After hitting enter, I am seeing several errors ...
George M Ceaser Jr's user avatar
0 votes
0 answers
86 views

I have been trying to do some profiling on MacOS with dtrace, which will be used for a flamegraph. There's a guide I found helpful at here. Despite following the guide, and reading another ...
ora's user avatar
  • 1
1 vote
0 answers
32 views

I would like to analyze the CPU performance of my Flutter app on app launch using Dart DevTools. Currently I only get access to the Dart VM Service url (e.g. http://127.0.0.1:51378/36IZvt4H1b0=/) once ...
Dominik Roszkowski's user avatar
2 votes
1 answer
72 views

I have 2 projects in c. They both are configured with CMake using similar CMakeLists.txt. The first generates gmon.out file normally. It can be seen with gprof. The second generates barely empty gmon....
Ilya Babakov's user avatar
1 vote
0 answers
86 views

I have a bunch of source generators and they are all doing stuff in the background but as the codebase they operate on grows I want to make sure that they are still scaling sensibly. According to this ...
user3797758's user avatar
  • 1,113
0 votes
0 answers
74 views

I'm trying to profile my heap implementation, so I've created this script: class BinaryHeap { array; comparator; constructor(comparator) { this.array = []; this.comparator ...
Eric B's user avatar
  • 373
0 votes
1 answer
76 views

In my SAS (version 9.4M8 for Windows) log, time elapsed (CPU time and real time) is by default displayed as x.xx seconds, i.e. to a precision of 10 milliseconds. In order to profile a program ...
h_bauer's user avatar
  • 33
0 votes
1 answer
84 views

I'm trying to use KDB as a low-latency pub/sub message broker that persists all messages in a queryable format. However, I'm noticing the latency from when the tickerplant receives a message (i.e. ...
mchen's user avatar
  • 10.3k
1 vote
0 answers
71 views

I'm trying to implement continuous profiling for our microservices running on ECS with Amazon Linux 2 hosts, but I'm running into persistent issues when trying to run profiling agents. I've tried ...
Byron Martinez's user avatar
1 vote
0 answers
52 views

In order to verify that the Committed Size may be smaller than the Working Set, I did an experiment: First, a dll project, the size after compilation is about 60M, then rename this dll into 5 ...
Nick's user avatar
  • 11
2 votes
0 answers
69 views

I am trying to read cache events on a AMD Zen2: L1d all read accesses L1d all write accesses L1d read misses (not shown below) L1d write misses (not shown below) According to the perf_event_open(2) ...
onlycparra's user avatar
2 votes
0 answers
89 views

I have a really big CUDA kernel which does a lot of stuff. Like _global_ void bigkernel(args) { func1(); func2(); func3(); func4(); func5(); .... } I want to profile each one of those functions and ...
Rageristic's user avatar
-1 votes
1 answer
72 views

I have this app which takes ~12 seconds to start and another 10 seconds to start accepting requests. Which I would like to make faster specifically for members on my team. I would be satisfied with ~3 ...
vmachacek's user avatar
  • 583
1 vote
1 answer
643 views

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results: Here, I can see some metrics counted twice, once for cpu_atom, and once ...
Osama Ahmad's user avatar
  • 2,368
1 vote
0 answers
114 views

Very similar to this issue. I want to test the AMD profiling "Radeon Developer Panel" tool (v3.2.0.18) using the Vulkan vkcube test application. The connection is established (green light), ...
unvarnished's user avatar
0 votes
1 answer
28 views

I've been trying to use Cprofile to gage any bottlenecks in my python program. I've discovered that all but one of the function calls take about 4 seconds in total, however by sorting the results by ...
Sofie Thomsen's user avatar
2 votes
0 answers
48 views

I am working with Profile-Guided Optimizations (PGO) with GCC, using -fprofile-generate to collect execution data and -fprofile-use to optimize your executable based on the collected profile (.gcda ...
Soma Pal's user avatar
0 votes
0 answers
32 views

I am using pstats.Stats() for profiling purposes, I want to filter out statistics from a particular directory only, is there any way I can get the actual stats information from the Stats object to ...
Sanjith Kumar's user avatar
1 vote
0 answers
107 views

Is there a way to track the duration of file imports in a Node.js application? We have a TypeScript application running in VSCode, with the target set to ES2020 and the module format set to ESNext. ...
Lucky Degen's user avatar
1 vote
1 answer
74 views

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals. I have profiled an application and found that I have some called-into methods that ...
Andrew Lipscomb's user avatar
0 votes
0 answers
26 views

I would like to profile a unikernel I am developing. This is an x86_64 unikernel, written in Rust, which runs on an x86_64 host using QEMU to manage the KVM VM. I have enabled frame pointers. I don't ...
Ferdia McKeogh's user avatar
1 vote
1 answer
45 views

I have a node program I'd like to profile. If I use --prof I can get a profile output (isolate-xxx) that I can process with node --prof-process. Unfortunately, since the program is fairly quick (...
Richard Wheeldon's user avatar
0 votes
1 answer
84 views

Background Let's say I have a complex MPI program with multiple message passing events and computations. The communication pattern is that of bidirectional ring messaging as shown in the figure below. ...
Nitin Malapally's user avatar
2 votes
0 answers
182 views

I tried to profile my app using Live Telemetry Profiler on Android Studio to see the CPU usage percentage. In previous Android Studio versions, I believe we could see it by pointing to the CPU usage ...
Kharda's user avatar
  • 1,388
0 votes
0 answers
128 views

I'm using PyTorch Profiler to inspect inference performance on a Hugging Face Transformer (e.g., Qwen model). I have code that successfully captures operator-level profiling information (like aten::mm,...
AlexL's user avatar
  • 1
2 votes
0 answers
129 views

I timed a fairly naive BLAS-like matrix multiplication (DGEMM) function: void dgemm_naive(const int M, const int N, const int K, const double alpha, const double *A, const int lda, ...
ligro's user avatar
  • 29
0 votes
0 answers
116 views

I am trying to view MPS in software that is not XCode for which an Apple account is needed. Most other hardware (CPU, XPU, NVIDIA GPUs, etc.) allow easy profiling in torch. Can the profiles generated ...
user avatar
1 vote
0 answers
57 views

I’m developing an app on .NET8 for macOS and encountered an issue during App Review with feedback: "An error showed upon launch. The app cannot be opened because the developer cannot be verified. ...
jaroslavic's user avatar
1 vote
1 answer
105 views

I have a program that launches a child process where the code I want to profile is running. While I can run the Debug > Performance Profiler > CPU Usage on the parent process, I can't seem to ...
Russel85's user avatar
1 vote
0 answers
79 views

I have the profiling results of the inference of Llama 3.1 8b model by Meta. I deployed the model on the AI Accelerator. I managed to create a memory trace of the whole model from the Host to the ...
Sudais Alam's user avatar
0 votes
1 answer
151 views

I am using Sentry for node.js. In that context, I have an express route (let's say /myRoute) that has two main branches within it, and the branch chosen to traverse is based on what is passed into ...
drmrbrewer's user avatar
  • 13.4k

1
2 3 4 5
122