6,075 questions
7
votes
1
answer
154
views
Java Map.computeIfAbsent consuming high memory [duplicate]
I had written a method to update a HashMap, provided an int id value (ex: client id) as key, the method would check if the key is available in the map, if not it would create an entry and associate a ...
3
votes
1
answer
214
views
Why does sequential array access have a high cache miss rate?
I have the following C code that I am testing to understand perf and caching. It sequentially accesses an array of doubles.
// test.c
#include <stdio.h>
#include <stdlib.h>
#include <...
2
votes
1
answer
78
views
Find out why program is slow processing files from network share using Valgrind
I have an open source C/C++ program on Linux amd64 that processes a PDF input file and that I did not write by myself. So I'm not familiar with its code.
Processing a PDF file read from local disk ...
1
vote
1
answer
80
views
Perf callgraph output doesn't look as I would expect for a test program with a delay loop that should take near 100% of the time
I'm experimenting with perf record --control to profile select sections of a program.
Here's a Rust program that uses perf to profile the call to a function waste_time():
use libc;
use log::info;
use ...
1
vote
0
answers
186
views
Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?
I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...
0
votes
0
answers
36
views
Scalene wsl no web UI
I'm trying to profile a Python FastAPI application (which uses LangGraph) using Scalene on Windows. Since Scalene's Windows version doesn't support multithreading, I'm running it in WSL instead.
When ...
2
votes
0
answers
63
views
Recovering a perf.data file with size field 0 after perf report terminated improperly
I had a multi-process application to profile using perf with the following command:
sudo perf record -a -g -F 99 -e cycles:u -- sleep 50000 &
The sleep time is over 13 hours. The program should ...
1
vote
0
answers
85
views
Fatest way to convert float array to string in python
This question came up while I was saving a large number of model-inferred embeddings to plain text. To do so, I needed to convert lists of float embeddings into strings, and I found this conversion to ...
1
vote
0
answers
34
views
Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)
I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
1
vote
1
answer
51
views
How to create a html file with a link that automatically opens chrome://tracing with a particular json file?
I have a json file that contains profiling data that can be opened with chrome's trace-viewer.
I can do it manually by opening chrome://tracing, then selecting 'load' and then loading the json file.
...
0
votes
0
answers
108
views
RISC-V vs C Code Comparison for Simple Multiply and Accumulate (MAC) Operation
We tried profiling a simple MAC operation using both RISC-V Vector (RVV) intrinsics and plain C code. Surprisingly, the C version performs better, even though the intrinsics code processes 16 ...
0
votes
0
answers
100
views
How to set filter by module in heaptrack_gui profiler that all gui application contains only my module?
I only started to use heaptrack and can not set filtering by modules. It possible to do from gui like this
Heap track
but output very nosy and this filter doesn't influence to other tabs. Does exist ...
1
vote
1
answer
63
views
CPU Sampling/Profiling of Helidon app in VisualVM
I have a Helidon app and would like to take CPU samples and/or start a CPU profiler. This does not work.
With the same setup, it works for a simple (non Helidon) app
Trying to start the CPU (and also ...
1
vote
1
answer
31
views
jax.numpy profiling: time spent in "ufunc_api.py:173(__call__)"
I am analyzing my numpy/python code by running it with "-m cProfile". Snakeviz shows as the entry with most time spent:
20895038 calls to ufunc_api.py:173(__call__) with the majority of the ...
1
vote
0
answers
176
views
What's the `perf stat` equivalent for MacOS?
On Linux, I often find myself perusing perf stat to figure out whether a code change improved things like cache miss rate. (I'm specifically interested in cache miss rates and page faults.)
Now I'm ...
1
vote
1
answer
175
views
Why is EmojiCompat consuming significant retained memory in my Flutter Android app without explicit usage?
I'm developing a Flutter application that doesn't utilize emojis in any part of the UI or logic.
However, upon profiling the app using Android Studio's Memory Profiler, I observed that androidx.emoji2....
1
vote
0
answers
70
views
Understanding Nsight Graphics output
I am trying to find bottlenecks in some shaders through NVIDIA Nsight Graphics.
Right now I am focusing on trying to understand one result that seems impossible. The profiling UI shows that on each ...
1
vote
1
answer
55
views
How to exclude a function from Gprof activity during running
Suppose i have function foo
void foo()
{
//do something
}
Now this fn foo is now called by other function defined in other files. if gprof is enabled it would do profiling activity and subroutines ...
0
votes
0
answers
18
views
dtrace only samples `kernel_task`
I'm trying to use dtrace to generate flamegraphs on MacOS. I've had quite a few problems but I've pinned down the root cause, though I still don't know how to solve it.
Adding a bit of logging to my ...
0
votes
0
answers
64
views
Getting errors when starting dotnet-dsrouter
I am in a powershell window on my PC and I am attempting to run 'dotnet-dsrouter android' in order to set up profiling of my app on my android device.
After hitting enter, I am seeing several errors ...
0
votes
0
answers
86
views
Why doesn't dtrace produce symbols?
I have been trying to do some profiling on MacOS with dtrace, which will be used for a flamegraph. There's a guide I found helpful at here.
Despite following the guide, and reading another ...
1
vote
0
answers
32
views
Starting the Flutter app in profile mode with predefined Dart VM url
I would like to analyze the CPU performance of my Flutter app on app launch using Dart DevTools. Currently I only get access to the Dart VM Service url (e.g. http://127.0.0.1:51378/36IZvt4H1b0=/) once ...
2
votes
1
answer
72
views
Empty profile with -p on gcc [closed]
I have 2 projects in c. They both are configured with CMake using similar CMakeLists.txt. The first generates gmon.out file normally. It can be seen with gprof. The second generates barely empty gmon....
1
vote
0
answers
86
views
Profile C# Source Generators?
I have a bunch of source generators and they are all doing stuff in the background but as the codebase they operate on grows I want to make sure that they are still scaling sensibly.
According to this ...
0
votes
0
answers
74
views
0x, the flame graph generator tool for node, is missing a lot of data in the generated flame graph
I'm trying to profile my heap implementation, so I've created this script:
class BinaryHeap {
array;
comparator;
constructor(comparator) {
this.array = [];
this.comparator ...
0
votes
1
answer
76
views
SAS log cpu/real time display format
In my SAS (version 9.4M8 for Windows) log, time elapsed (CPU time and real time) is by default displayed as x.xx seconds, i.e. to a precision of 10 milliseconds. In order to profile a program ...
0
votes
1
answer
84
views
How to profile/monitor a KDB tickerplant to trace causes of a slow tickerplant?
I'm trying to use KDB as a low-latency pub/sub message broker that persists all messages in a queryable format.
However, I'm noticing the latency from when the tickerplant receives a message (i.e. ...
1
vote
0
answers
71
views
AWS ECS: runc create failed: unable to start container process: error during container init: open /proc/sys/net/ipv4/
I'm trying to implement continuous profiling for our microservices running on ECS with Amazon Linux 2 hosts, but I'm running into persistent issues when trying to run profiling agents. I've tried ...
1
vote
0
answers
52
views
Confusion about memory display in VMMap and Task Manager
In order to verify that the Committed Size may be smaller than the Working Set, I did an experiment:
First, a dll project, the size after compilation is about 60M, then rename this dll into 5 ...
2
votes
0
answers
69
views
Why perf complains that it cannot open this L1 cache event on Zen 2?
I am trying to read cache events on a AMD Zen2:
L1d all read accesses
L1d all write accesses
L1d read misses (not shown below)
L1d write misses (not shown below)
According to the perf_event_open(2) ...
2
votes
0
answers
89
views
How do I profile the inside of a CUDA kernel?
I have a really big CUDA kernel which does a lot of stuff. Like
_global_ void bigkernel(args)
{
func1();
func2();
func3();
func4();
func5();
....
}
I want to profile each one of those functions and ...
-1
votes
1
answer
72
views
How to profile startup of ASP.NET Core app locally?
I have this app which takes ~12 seconds to start and another 10 seconds to start accepting requests. Which I would like to make faster specifically for members on my team. I would be satisfied with ~3 ...
1
vote
1
answer
643
views
cpu_core vs cpu_atom in perf
I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results:
Here, I can see some metrics counted twice, once for cpu_atom, and once ...
1
vote
0
answers
114
views
Radeon Developer Panel does not detect application
Very similar to this issue.
I want to test the AMD profiling "Radeon Developer Panel" tool (v3.2.0.18) using the Vulkan vkcube test application. The connection is established (green light), ...
0
votes
1
answer
28
views
Meaning of Cprofile python output: myprogram.py:1(<module>)
I've been trying to use Cprofile to gage any bottlenecks in my python program. I've discovered that all but one of the function calls take about 4 seconds in total, however by sorting the results by ...
2
votes
0
answers
48
views
Profile-Guided-Optimizations by GCC - Profile File Analysis
I am working with Profile-Guided Optimizations (PGO) with GCC, using -fprofile-generate to collect execution data and -fprofile-use to optimize your executable based on the collected profile (.gcda ...
0
votes
0
answers
32
views
Is there a way to get the directory information from pstats.Stats object
I am using pstats.Stats() for profiling purposes, I want to filter out statistics from a particular directory only, is there any way I can get the actual stats information from the Stats object to ...
1
vote
0
answers
107
views
Profiling of import times in NodeJS application
Is there a way to track the duration of file imports in a Node.js application?
We have a TypeScript application running in VSCode, with the target set to ES2020 and the module format set to ESNext.
...
1
vote
1
answer
74
views
C++ Profiling - Called method from coroutine function has a higher hit count than its caller
I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.
I have profiled an application and found that I have some called-into methods that ...
0
votes
0
answers
26
views
How can I get a FlameGraph for a custom unikernel running in a same-arch QEMU VM?
I would like to profile a unikernel I am developing. This is an x86_64 unikernel, written in Rust, which runs on an x86_64 host using QEMU to manage the KVM VM. I have enabled frame pointers.
I don't ...
1
vote
1
answer
45
views
How can I alter the polling interval when using node --prof?
I have a node program I'd like to profile. If I use --prof I can get a profile output (isolate-xxx) that I can process with node --prof-process. Unfortunately, since the program is fairly quick (...
0
votes
1
answer
84
views
Assessing the contribution of communication to the runtime of an MPI program
Background
Let's say I have a complex MPI program with multiple message passing events and computations. The communication pattern is that of bidirectional ring messaging as shown in the figure below.
...
2
votes
0
answers
182
views
CPU usage percentage is not showing in Android Studio's live telemetry profiling
I tried to profile my app using Live Telemetry Profiler on Android Studio to see the CPU usage percentage. In previous Android Studio versions, I believe we could see it by pointing to the CPU usage ...
0
votes
0
answers
128
views
How can I group PyTorch Profiler events by layer hierarchy when profiling a Hugging Face Transformer?
I'm using PyTorch Profiler to inspect inference performance on a Hugging Face Transformer (e.g., Qwen model). I have code that successfully captures operator-level profiling information (like aten::mm,...
2
votes
0
answers
129
views
Matrix multiply fastest with -O0 [duplicate]
I timed a fairly naive BLAS-like matrix multiplication (DGEMM) function:
void dgemm_naive(const int M, const int N, const int K, const double alpha,
const double *A, const int lda, ...
0
votes
0
answers
116
views
Can Apple Silicon GPU / MPS profiles be viewed in Chrome or other profile viewer (not XCode)?
I am trying to view MPS in software that is not XCode for which an Apple account is needed.
Most other hardware (CPU, XPU, NVIDIA GPUs, etc.) allow easy profiling in torch.
Can the profiles generated ...
1
vote
0
answers
57
views
.NET 8: Developer Verification Error During App Store Review
I’m developing an app on .NET8 for macOS and encountered an issue during App Review with feedback:
"An error showed upon launch. The app cannot be opened because the developer cannot be verified. ...
1
vote
1
answer
105
views
Is it possible to profile a NET child process using Visual Studio's Profiling Tools
I have a program that launches a child process where the code I want to profile is running.
While I can run the Debug > Performance Profiler > CPU Usage on the parent process, I can't seem to ...
1
vote
0
answers
79
views
profiling of Llama 3.1 8B model on AI Accelerator
I have the profiling results of the inference of Llama 3.1 8b model by Meta. I deployed the model on the AI Accelerator. I managed to create a memory trace of the whole model from the Host to the ...
0
votes
1
answer
151
views
Sentry.getCurrentScope().setTransactionName(name) not having any effect
I am using Sentry for node.js. In that context, I have an express route (let's say /myRoute) that has two main branches within it, and the branch chosen to traverse is based on what is passed into ...