Newest 'profiling' Questions

7 votes

1 answer

154 views

Java Map.computeIfAbsent consuming high memory [duplicate]

I had written a method to update a HashMap, provided an int id value (ex: client id) as key, the method would check if the key is available in the map, if not it would create an entry and associate a ...

Gautham M

5,174

asked yesterday

3 votes

1 answer

214 views

Why does sequential array access have a high cache miss rate?

I have the following C code that I am testing to understand perf and caching. It sequentially accesses an array of doubles. // test.c #include <stdio.h> #include <stdlib.h> #include <...

user180574

6,244

asked Nov 26 at 22:38

2 votes

1 answer

78 views

Find out why program is slow processing files from network share using Valgrind

I have an open source C/C++ program on Linux amd64 that processes a PDF input file and that I did not write by myself. So I'm not familiar with its code. Processing a PDF file read from local disk ...

MrSnrub

1,265

asked Nov 8 at 20:49

1 vote

1 answer

80 views

Perf callgraph output doesn't look as I would expect for a test program with a delay loop that should take near 100% of the time

I'm experimenting with perf record --control to profile select sections of a program. Here's a Rust program that uses perf to profile the call to a function waste_time(): use libc; use log::info; use ...

Edd Barrett

3,685

asked Oct 31 at 15:31

1 vote

0 answers

186 views

Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...

plznobug

143

asked Oct 23 at 12:36

0 votes

0 answers

36 views

Scalene wsl no web UI

I'm trying to profile a Python FastAPI application (which uses LangGraph) using Scalene on Windows. Since Scalene's Windows version doesn't support multithreading, I'm running it in WSL instead. When ...

Raffa50

23

asked Oct 10 at 6:52

2 votes

0 answers

63 views

Recovering a perf.data file with size field 0 after perf report terminated improperly

I had a multi-process application to profile using perf with the following command: sudo perf record -a -g -F 99 -e cycles:u -- sleep 50000 & The sleep time is over 13 hours. The program should ...

Bartłomiej Dudek

21

asked Sep 16 at 8:39

1 vote

0 answers

85 views

Fatest way to convert float array to string in python

This question came up while I was saving a large number of model-inferred embeddings to plain text. To do so, I needed to convert lists of float embeddings into strings, and I found this conversion to ...

K_Augus

474

asked Sep 12 at 10:14

1 vote

0 answers

34 views

Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)

I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...

smz

515

asked Aug 20 at 19:51

1 vote

1 answer

51 views

How to create a html file with a link that automatically opens chrome://tracing with a particular json file?

I have a json file that contains profiling data that can be opened with chrome's trace-viewer. I can do it manually by opening chrome://tracing, then selecting 'load' and then loading the json file. ...

Crumml

81

asked Jul 12 at 19:48

0 votes

0 answers

108 views

RISC-V vs C Code Comparison for Simple Multiply and Accumulate (MAC) Operation

We tried profiling a simple MAC operation using both RISC-V Vector (RVV) intrinsics and plain C code. Surprisingly, the C version performs better, even though the intrinsics code processes 16 ...

shreyas

1

asked Jul 7 at 11:53

0 votes

0 answers

100 views

How to set filter by module in heaptrack_gui profiler that all gui application contains only my module?

I only started to use heaptrack and can not set filtering by modules. It possible to do from gui like this Heap track but output very nosy and this filter doesn't influence to other tabs. Does exist ...

Александр Чулгарев

1

asked Jun 30 at 13:44

1 vote

1 answer

63 views

CPU Sampling/Profiling of Helidon app in VisualVM

I have a Helidon app and would like to take CPU samples and/or start a CPU profiler. This does not work. With the same setup, it works for a simple (non Helidon) app Trying to start the CPU (and also ...

Itchy

2,464

asked Jun 23 at 8:52

1 vote

1 answer

31 views

jax.numpy profiling: time spent in "ufunc_api.py:173(call)"

I am analyzing my numpy/python code by running it with "-m cProfile". Snakeviz shows as the entry with most time spent: 20895038 calls to ufunc_api.py:173(__call__) with the majority of the ...

j13r

2,741

asked Jun 10 at 9:30

1 vote

0 answers

176 views

What's the `perf stat` equivalent for MacOS?

On Linux, I often find myself perusing perf stat to figure out whether a code change improved things like cache miss rate. (I'm specifically interested in cache miss rates and page faults.) Now I'm ...

Marcus Müller

36.9k

asked Jun 1 at 21:59

1 vote

1 answer

175 views

Why is EmojiCompat consuming significant retained memory in my Flutter Android app without explicit usage?

I'm developing a Flutter application that doesn't utilize emojis in any part of the UI or logic. However, upon profiling the app using Android Studio's Memory Profiler, I observed that androidx.emoji2....

Filip Golovic

41

asked May 29 at 9:00

1 vote

0 answers

70 views

Understanding Nsight Graphics output

I am trying to find bottlenecks in some shaders through NVIDIA Nsight Graphics. Right now I am focusing on trying to understand one result that seems impossible. The profiling UI shows that on each ...

Makogan

10k

asked May 18 at 23:02

1 vote

1 answer

55 views

How to exclude a function from Gprof activity during running

Suppose i have function foo void foo() { //do something } Now this fn foo is now called by other function defined in other files. if gprof is enabled it would do profiling activity and subroutines ...

surajrgupta

75

asked May 11 at 17:55

0 votes

0 answers

18 views

dtrace only samples `kernel_task`

I'm trying to use dtrace to generate flamegraphs on MacOS. I've had quite a few problems but I've pinned down the root cause, though I still don't know how to solve it. Adding a bit of logging to my ...

ora

1

asked May 9 at 10:51

0 votes

0 answers

64 views

Getting errors when starting dotnet-dsrouter

I am in a powershell window on my PC and I am attempting to run 'dotnet-dsrouter android' in order to set up profiling of my app on my android device. After hitting enter, I am seeing several errors ...

George M Ceaser Jr

1,831

asked May 8 at 20:54

0 votes

0 answers

86 views

Why doesn't dtrace produce symbols?

I have been trying to do some profiling on MacOS with dtrace, which will be used for a flamegraph. There's a guide I found helpful at here. Despite following the guide, and reading another ...

ora

1

asked May 8 at 12:16

1 vote

0 answers

32 views

Starting the Flutter app in profile mode with predefined Dart VM url

I would like to analyze the CPU performance of my Flutter app on app launch using Dart DevTools. Currently I only get access to the Dart VM Service url (e.g. http://127.0.0.1:51378/36IZvt4H1b0=/) once ...

Dominik Roszkowski

2,608

asked May 7 at 7:26

2 votes

1 answer

72 views

Empty profile with -p on gcc [closed]

I have 2 projects in c. They both are configured with CMake using similar CMakeLists.txt. The first generates gmon.out file normally. It can be seen with gprof. The second generates barely empty gmon....

Ilya Babakov

33

asked May 3 at 21:54

1 vote

0 answers

86 views

Profile C# Source Generators?

I have a bunch of source generators and they are all doing stuff in the background but as the codebase they operate on grows I want to make sure that they are still scaling sensibly. According to this ...

user3797758

1,113

asked Apr 25 at 13:05

0 votes

0 answers

74 views

0x, the flame graph generator tool for node, is missing a lot of data in the generated flame graph

I'm trying to profile my heap implementation, so I've created this script: class BinaryHeap { array; comparator; constructor(comparator) { this.array = []; this.comparator ...

Eric B

373

asked Apr 23 at 18:31

0 votes

1 answer

76 views

SAS log cpu/real time display format

In my SAS (version 9.4M8 for Windows) log, time elapsed (CPU time and real time) is by default displayed as x.xx seconds, i.e. to a precision of 10 milliseconds. In order to profile a program ...

h_bauer

33

asked Apr 19 at 14:30

0 votes

1 answer

84 views

How to profile/monitor a KDB tickerplant to trace causes of a slow tickerplant?

I'm trying to use KDB as a low-latency pub/sub message broker that persists all messages in a queryable format. However, I'm noticing the latency from when the tickerplant receives a message (i.e. ...

mchen

10.3k

asked Apr 15 at 14:51

1 vote

0 answers

71 views

AWS ECS: runc create failed: unable to start container process: error during container init: open /proc/sys/net/ipv4/

I'm trying to implement continuous profiling for our microservices running on ECS with Amazon Linux 2 hosts, but I'm running into persistent issues when trying to run profiling agents. I've tried ...

Byron Martinez

11

asked Apr 9 at 14:07

1 vote

0 answers

52 views

Confusion about memory display in VMMap and Task Manager

In order to verify that the Committed Size may be smaller than the Working Set, I did an experiment: First, a dll project, the size after compilation is about 60M, then rename this dll into 5 ...

Nick

11

asked Apr 6 at 3:12

2 votes

0 answers

69 views

Why perf complains that it cannot open this L1 cache event on Zen 2?

I am trying to read cache events on a AMD Zen2: L1d all read accesses L1d all write accesses L1d read misses (not shown below) L1d write misses (not shown below) According to the perf_event_open(2) ...

onlycparra

845

asked Mar 20 at 5:02

2 votes

0 answers

89 views

How do I profile the inside of a CUDA kernel?

I have a really big CUDA kernel which does a lot of stuff. Like _global_ void bigkernel(args) { func1(); func2(); func3(); func4(); func5(); .... } I want to profile each one of those functions and ...

Rageristic

21

asked Mar 18 at 9:17

-1 votes

1 answer

72 views

How to profile startup of ASP.NET Core app locally?

I have this app which takes ~12 seconds to start and another 10 seconds to start accepting requests. Which I would like to make faster specifically for members on my team. I would be satisfied with ~3 ...

vmachacek

583

asked Mar 18 at 1:03

1 vote

1 answer

643 views

cpu_core vs cpu_atom in perf

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results: Here, I can see some metrics counted twice, once for cpu_atom, and once ...

Osama Ahmad

2,368

asked Mar 12 at 6:13

1 vote

0 answers

114 views

Radeon Developer Panel does not detect application

Very similar to this issue. I want to test the AMD profiling "Radeon Developer Panel" tool (v3.2.0.18) using the Vulkan vkcube test application. The connection is established (green light), ...

unvarnished

125

asked Mar 11 at 13:48

0 votes

1 answer

28 views

Meaning of Cprofile python output: myprogram.py:1(<module>)

I've been trying to use Cprofile to gage any bottlenecks in my python program. I've discovered that all but one of the function calls take about 4 seconds in total, however by sorting the results by ...

Sofie Thomsen

1

asked Mar 8 at 22:37

2 votes

0 answers

48 views

Profile-Guided-Optimizations by GCC - Profile File Analysis

I am working with Profile-Guided Optimizations (PGO) with GCC, using -fprofile-generate to collect execution data and -fprofile-use to optimize your executable based on the collected profile (.gcda ...

Soma Pal

21

asked Feb 24 at 19:59

0 votes

0 answers

32 views

Is there a way to get the directory information from pstats.Stats object

I am using pstats.Stats() for profiling purposes, I want to filter out statistics from a particular directory only, is there any way I can get the actual stats information from the Stats object to ...

Sanjith Kumar

51

asked Feb 18 at 20:23

1 vote

0 answers

107 views

Profiling of import times in NodeJS application

Is there a way to track the duration of file imports in a Node.js application? We have a TypeScript application running in VSCode, with the target set to ES2020 and the module format set to ESNext. ...

Lucky Degen

11

asked Feb 17 at 4:09

1 vote

1 answer

74 views

C++ Profiling - Called method from coroutine function has a higher hit count than its caller

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals. I have profiled an application and found that I have some called-into methods that ...

Andrew Lipscomb

1,078

asked Feb 3 at 21:01

0 votes

0 answers

26 views

How can I get a FlameGraph for a custom unikernel running in a same-arch QEMU VM?

I would like to profile a unikernel I am developing. This is an x86_64 unikernel, written in Rust, which runs on an x86_64 host using QEMU to manage the KVM VM. I have enabled frame pointers. I don't ...

Ferdia McKeogh

449

asked Jan 27 at 17:31

1 vote

1 answer

45 views

How can I alter the polling interval when using node --prof?

I have a node program I'd like to profile. If I use --prof I can get a profile output (isolate-xxx) that I can process with node --prof-process. Unfortunately, since the program is fairly quick (...

Richard Wheeldon

1,147

asked Jan 18 at 11:02

0 votes

1 answer

84 views

Assessing the contribution of communication to the runtime of an MPI program

Background Let's say I have a complex MPI program with multiple message passing events and computations. The communication pattern is that of bidirectional ring messaging as shown in the figure below. ...

Nitin Malapally

648

asked Jan 17 at 11:20

2 votes

0 answers

182 views

CPU usage percentage is not showing in Android Studio's live telemetry profiling

I tried to profile my app using Live Telemetry Profiler on Android Studio to see the CPU usage percentage. In previous Android Studio versions, I believe we could see it by pointing to the CPU usage ...

Kharda

1,388

asked Jan 9 at 2:19

0 votes

0 answers

128 views

How can I group PyTorch Profiler events by layer hierarchy when profiling a Hugging Face Transformer?

I'm using PyTorch Profiler to inspect inference performance on a Hugging Face Transformer (e.g., Qwen model). I have code that successfully captures operator-level profiling information (like aten::mm,...

AlexL

1

asked Jan 7 at 9:47

2 votes

0 answers

129 views

Matrix multiply fastest with -O0 [duplicate]

I timed a fairly naive BLAS-like matrix multiplication (DGEMM) function: void dgemm_naive(const int M, const int N, const int K, const double alpha, const double *A, const int lda, ...

ligro

29

asked Jan 1 at 18:31

0 votes

0 answers

116 views

Can Apple Silicon GPU / MPS profiles be viewed in Chrome or other profile viewer (not XCode)?

I am trying to view MPS in software that is not XCode for which an Apple account is needed. Most other hardware (CPU, XPU, NVIDIA GPUs, etc.) allow easy profiling in torch. Can the profiles generated ...

user14307376

asked Dec 16, 2024 at 13:44

1 vote

0 answers

57 views

.NET 8: Developer Verification Error During App Store Review

I’m developing an app on .NET8 for macOS and encountered an issue during App Review with feedback: "An error showed upon launch. The app cannot be opened because the developer cannot be verified. ...

jaroslavic

11

asked Dec 5, 2024 at 10:50

1 vote

1 answer

105 views

Is it possible to profile a NET child process using Visual Studio's Profiling Tools

I have a program that launches a child process where the code I want to profile is running. While I can run the Debug > Performance Profiler > CPU Usage on the parent process, I can't seem to ...

Russel85

11

asked Dec 3, 2024 at 1:04

1 vote

0 answers

79 views

profiling of Llama 3.1 8B model on AI Accelerator

I have the profiling results of the inference of Llama 3.1 8b model by Meta. I deployed the model on the AI Accelerator. I managed to create a memory trace of the whole model from the Host to the ...

Sudais Alam

11

asked Nov 28, 2024 at 11:42

0 votes

1 answer

151 views

Sentry.getCurrentScope().setTransactionName(name) not having any effect

I am using Sentry for node.js. In that context, I have an express route (let's say /myRoute) that has two main branches within it, and the branch chosen to traverse is based on what is passed into ...

drmrbrewer

13.4k

asked Nov 26, 2024 at 14:45

Collectives™ on Stack Overflow