Author |
Message |
|
I have nvidia gtx 1650.
Maybe it is too old? |
|
|
|
Another got RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Third workunit finished and validated in 1645 seconds. |
|
|
jjchSend message
Joined: 10 Nov 13 Posts: 101 Credit: 15,579,200,388 RAC: 3,887,859 Level
Scientific publications
|
First off I would say that the Python apps seem to have a high error rate. I'm noting about 40% failures on my windows systems without finding a good reason why. There could be a cause for this but it might also be normal.
The error you noted below seems to be from a variation of the memory used on the GPU. I think the GTX 1650 should be adequate to run the Python apps, so it could be a problem with the Python app.
What might be happening is you are also using GPU memory for something else at the same time or prior to GPUgrid. Don't run any other GPU projects or play games etc.
I also noted some of your tasks failed where it looked like you were running out of system memory. 16GB is on the low side of what will work well with other things running.
I would suggest setting things up so you are only running one GPUgrid Python app and look at your system memory usage. I have seen it be around 10Gb but it can be more.
Also check your available free disk space and the swap space you are using while you are monitoring it. Make sure you are not pushing the limits there and running out too.
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,756,915 RAC: 13,455,284 Level
Scientific publications
|
There's a problem with how Windows allocates virtual memory for Python libraries.
Linux does not have the issue because it allocates memory differently.
See this message of mine.
https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908 |
|
|
|
One also crashed because of CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
|
|
|
jjchSend message
Joined: 10 Nov 13 Posts: 101 Credit: 15,579,200,388 RAC: 3,887,859 Level
Scientific publications
|
The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there.
I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version.
I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps.
It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming.
Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc.
|
|
|
|
The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there.
I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version.
I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps.
It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming.
Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc.
from what I remember, the python app was using more than 4GB of VRAM. it's definitely possible that 4GB isnt enough.
____________
|
|
|
jjchSend message
Joined: 10 Nov 13 Posts: 101 Credit: 15,579,200,388 RAC: 3,887,859 Level
Scientific publications
|
That would be an interesting development. From what I have been gathering the Python app is not putting much of a load on the GPU. Not quite sure about the actual memory usage.
I tried to find a reference on what GPU memory is needed in the Forum but I only found one that mentioned a GTX980Ti .... gpu memory usage is almost constant at 2.679MB
If you find something that indicates they need 4Gb or more I would like to see it. I don't know of a good way to check on the GPU memory usage because you have to catch it when it's actually using it.
The error mentioned below in this thread is only referencing 28.00 MiB more than what was being used at 1.36 GiB and there is 1011.70 MiB free
CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch)
That actually seems more like a memory error related to CUDA or the driver etc. Not the memory capacity of the card.
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,756,915 RAC: 13,455,284 Level
Scientific publications
|
The memory utilization seems to be constant on my gpus when they are running a Python task. Currently using 3349MB out of the 8GB on the card.
You can see that with nvidia-smi in a Terminal.
Or if you want to watch it in real-time then I can use this:
watch -n 1 nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
which besides showing the amount of memory being used, also shows the memory bus and gpu utilization, clocks, watts and link width and speed. |
|
|
jjchSend message
Joined: 10 Nov 13 Posts: 101 Credit: 15,579,200,388 RAC: 3,887,859 Level
Scientific publications
|
I found a few tasks running on my Windows servers and checked them with GPU-Z. The GPU memory used was between 2518 and 3287 MB. I think with that usage these should run OK on a 4GB card.
|
|
|