37 questions
1
vote
1
answer
59
views
Dask-CUDA LocalCUDACluster on WSL2: NVML errors despite enable_nvml=False
I’m trying to set up a LocalCUDACluster on WSL2 (Ubuntu 22.04) from Windows 11 for GPU computations. The cluster starts and runs, but performance is ~10× slower than running directly on the GPU, and ...
0
votes
0
answers
116
views
Why would nvmlDeviceGetCount return 0 devices, if I have a usable GPU?
I'm invoking nvmlDeviceGetCount() on a system with 2 GPUs, and it returns a device count of 0 GPUs - with no error. Why would it say that?
Additional information:
CUDA version: 12.6.68 (and bundled ...
0
votes
1
answer
127
views
Load Balancing Challenges with NVIDIA GPUs in CCTV Video Decoding
We have a CCTV system where we use NVIDIA GPUs for video decoding. Our current requirement is to monitor GPU decoding and memory usage, and if the usage reaches 80%, we need to automatically switch ...
2
votes
1
answer
430
views
Memory leak using Nvidia's NVML
In a project I'm using the nvml lib to get info about the GPU in a system. I use it to query the GPU name and GPU UUID. This happens cyclic 6 to 8 time per minute.
I noticed a small memory leak which ...
0
votes
1
answer
2k
views
Nvidia NVML undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3
./nvml_lib: symbol lookup error: ./nvml_lib: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA ...
0
votes
0
answers
200
views
Why below NVML code is not getting compiled even when include directory added for nvml.h?
I am trying to build a C-code for NVML for A5000 GPU. I got a code from internet which is as below.
#include <stdio.h>
#include <nvml.h>
///usr/include/hwloc/nvml.h
int main() {
...
1
vote
1
answer
739
views
How to understand the SmUtil returned by nvmlDeviceGetProcessUtilization?
I'm writing a program that monitors how processes use the GPU and I found an API provided by nvml, nvmlDeviceGetProcessUtilization.
Acordding the comment of this API, It reads recent utilization of ...
1
vote
1
answer
198
views
Why does nvmlDeviceGetTemperature only work in debug mode?
Using VS2022 the following code snippet works in debug mode but not in release mode:
nvmlInit();
nvmlDevice_t devH;
auto ret = nvmlDeviceGetHandleByIndex_v2(0, &devH);
if (ret != NVML_SUCCESS) ...
-1
votes
1
answer
154
views
Passing struct containing fixed sized array from c# to c++ to be populated is returning odd values
I'm having trouble with the shape of a struct to pass to an NVML Library function via pinvoke. The struct contains a fixed size array and some unsigned long long's
I'm not encountering any compiler ...
0
votes
1
answer
199
views
Passing uint array to Dll to be populated (Nvidia NVML library)
I'm attempting to pass a uint array into the NVML function nvmlDeviceGetAccountingPids(Doc here) from C#, here's a minimum working sample of what I have so far:
{
public const string ...
0
votes
1
answer
246
views
Nvidia nvml nvmlUnit_t query
Finding that code examples for the nvml API for nvidia cards is just really sparse.
Before any nvml calls could be conducted CMAKE required:
target_link_libraries(04_nvml_testing "/usr/lib/x86_64-...
2
votes
0
answers
156
views
How to reset nvidia gpu after error in golang through golang-nvml
is there any command to reset nvidia gpu after error happened in golang through golang-nvml?
i only found how to get GPU infos API in golang-nvml like:
nvml.DeviceGetCount()
nvml....
1
vote
0
answers
222
views
How to check bus utilization / bus load for GPU during ML inference?
I am running an ML inference for image recognition on the GPU using onnxruntime and I am seeing an upper limit for how much performance improvement batching of images is giving me - there is reduction ...
3
votes
0
answers
829
views
Slurm not optimally allocating multiple GPUs
We are using Slurm 20.02 with NVML autodetect, and on some 8-GPU nodes with NVLink, 4-GPU jobs get allocated by Slurm in a surprising way that appears sub-optimal.
On a system with 8 Nvidia A40 GPUs, ...
2
votes
4
answers
19k
views
NVML: Driver/library version mismatch [closed]
I don't know why nvidia-smi doesn't work
what I need to do for fix it?
I think my library and driver version is match but nvidia-smi dosen't recognize it
test
5
votes
1
answer
1k
views
NVidia NVML nvmlDeviceGetMemoryInfo() loads and unloads nvapi64.dll immediately
I use some NVIDIA Management Library features to produce metrics in my application.
Every 1 second I call nvmlDeviceGetMemoryInfo() in a thread, and after a few minutes, in the output of Visual Studio,...
1
vote
1
answer
1k
views
Which function returns "nvmlDevice_t" type variable in cuda/nvml library?
I am working with gpu's and want to get the serial numbers of the gpu's.In NVIDIA Management Library there is a function that I can use. The function prototype is.
nvmlReturn_t nvmlDeviceGetSerial ( ...
1
vote
1
answer
853
views
How do NVML and NVAPI compare?
I want to get some basic GPU data: name, RAM size, and do temperature monitoring.
From NVIDIA docs, it's not clear which library to use. Is NVAPI a legacy API which should be avoided?
3
votes
1
answer
3k
views
How to measure GPU usage per process in Windows using python?
I would like to measure the GPU usage per process as done in Windows taskmgr.exe, but I have encountered several problems when attempting to use the pyNVML library. As a result, I have a few questions....
2
votes
2
answers
648
views
GPU MHZ Utilization
I am developing a monitoring agent for GPU cards that is capable of providing real-time telemetry using CUDA and NVML libraries.
I want to understand a little more about GPU core operation vs how ...
4
votes
1
answer
1k
views
AMD's NVML counterpart (c++)
I would like to know what library AMD has that mimics the NVML counterpart of nvidia. What I want is to get temperature, powerusage, etc. in c++.
Best regards!
3
votes
1
answer
4k
views
NVIDIA-SMI, NVML, Power usage: [NOT SUPPORTED]
I tried to get current power usage with the following command in Windows 10 x64:
nvidia-smi.exe --format=csv,noheader --query-gpu=power.draw
And got the next result:
[Not Supported]
I checked it ...
3
votes
1
answer
8k
views
NVML library path
I compiled a software (GROMACS 2016.3) using cmake (3.5.1) with the following flags:
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_MPI=on -DGMX_GPU=on -DGMX_OPENMP=on -...
0
votes
0
answers
722
views
Error: bool undeclared (first use in the function). Have already included <stdbool.h> in the code
I'm trying to compile Perl bindings to Nvidia Management library (NVML). When I use the makefile, following errors appear:
/usr/lib/powerpc64le-linux-gnu/perl/5.22/CORE/handy.h:117:34: error: ‘bool’ ...
1
vote
1
answer
812
views
nvmlDeviceGetPowerManagementMode() always returning NVML_ERROR_INVALID_ARGUMENT?
I am writing a code to measure the power usage of an NVIDIA Tesla K20 GPU (Kepler architecture) periodically using the NVML API.
Variables:
nvmlReturn_t result;
nvmlEnableState_t pmmode;
...
1
vote
1
answer
7k
views
NVML code doesn't compile
I am implementing an example program with nvml library as shown at https://devtalk.nvidia.com/default/topic/504951/how-to-call-nvml-apis-/
The program is as follows:
#include <stdio.h>
#...
0
votes
1
answer
1k
views
Using nvidia-smi what is the best strategy to capture power
I am using Tesla K20c and measuring power with nvidia-smi as my application is run. My problem is power consumption does not reach a steady state but keeps rising. For example, if my application runs ...
3
votes
0
answers
737
views
How can I get gpu utilization?
Previously, I tried NVML by using the function nvmlDeviceGetUtilizationRates(). I test it by this way, while the collection is running, I excute a DFT ( the Kernel is organised as <7,32>) on Tesla ...
0
votes
2
answers
2k
views
NVML supported on Jetson TK1?
I installed NVML on Jetson TK1 and compiled a cuda program. The compilation does not show any error but when running it shows the error
/NVML-installed-path/usr/src/gdk/nvml/lib//libnvidia-ml.so: ...
5
votes
1
answer
3k
views
Is there any way or even possible to get the overall utilization of a GPU during a period of time?
I am trying to get the information about the overall utilization of a GPU (mine is an NVIDIA Tesla K20, running on Linux) during a period of time.
By "overall" I mean something like, how many ...
0
votes
2
answers
4k
views
how to get utilization rates of gpu? (nvml)
I need gpu information for my cuda project test.
I am using nvml library, and I successfully get temperature information.
But, nvml reports ERROR_NOT_SUPPORTED in nvmlDeviceGetUtilizationRates().
So ...
1
vote
1
answer
3k
views
Nvidia-smi showing fan speed as not available
My machine has nvidia Tesla K20m gpu. I would like to know gpu utilzation, memory utilization, temperature and fan speed. So I have used nvidia-smi to know the details. Nvidia-smi log is as follows
==...
1
vote
1
answer
1k
views
NVML Power readings with nvmlDeviceGetPowerUsage
I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any ...
6
votes
4
answers
25k
views
Cannot run CUDA code that queries NVML - error regarding libnvidia-ml.so
Recently a colleague needed to use NVML to query device information, so I downloaded the Tesla development kit 3.304.5 and copied the file nvml.h to /usr/include. To test, I compiled the example code ...
2
votes
1
answer
7k
views
GPU Utilization
I have been using NVML library to get the values of graphics and memory utilization for
Rodinia benchmark suite. I observe that with different frequencies, the utilization of the same application ...
0
votes
1
answer
3k
views
nvidia-smi -ac equivalent in NVML
I learnt than nvidia-smi -ac can be used to change the clock
rate of GPU cores and memory. Is nvidia-smi built upon the NVML library?
What is its equivalent in NVML since I checked the document
http:/...
3
votes
2
answers
9k
views
NVML Header File Missing
I am trying to execute some CUDA code which happens to have some
NVML library functions like nvmlSystemGetDriverVersion.
But, when I try to compile the code it says nvml.h not found.
How should I ...