Advanced search

Message boards : Graphics cards (GPUs) : GTX 970 switching to default clock value (1152MHz) after a while

Author Message
Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46470 - Posted: 8 Feb 2017 | 11:58:56 UTC
Last modified: 8 Feb 2017 | 12:24:17 UTC

After I had a similar problem with one of my GTX750ti some 4 weeks ago, today my GTX970, during crunching a PABLO_adaptive_goal_KIX, was switching back to default clock 1152MHz.
Only after rebooting the PC, the clock value could be increased via the NVIDIA Inspector. However, after a few minutes, the GPU clock again falls to 1152MHz.
At this clock, TDP usage is about 55% (whereas it was about 80-90% [depending on the specific task] when running at around 1380MHz).
And again, the clock value 1152MHz cannot be changed manually, in neither direction, except when rebooting the system.
BTW, NVIDIA Inspector and GPU-Z show exactly same values.

Both the GTX750ti and the GTX970 have been crunching four about 1 year.
Do I have to assume that after a crunching period of about 1 year, the cards become defective? BTW, I always took good care of the temperature situation, both GPUs never got warmer than 60-63°C.

Has anyone else made the same kind of experience?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46472 - Posted: 8 Feb 2017 | 13:15:21 UTC

Yes, it is very intermittent and random, and has occured on multiple of my GPUs, only maxwell. Both of my 970s have dropped to 1164mhz multiple times and sometimes they stay there until the computer is rebooted. I believe it has nothing to do with the GPU itself, as it is probably a software issue that one of the scientists can investigate.

3de64piB5uZAS6SUNt1GFDU9d...
Avatar
Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 46473 - Posted: 8 Feb 2017 | 14:21:12 UTC
Last modified: 8 Feb 2017 | 14:21:37 UTC

Can this be the (same) reason why the older 980ti surpasses the 1080 which should be noticably faster just from its specification?

See also this posting:
http://www.gpugrid.net/forum_thread.php?id=4494&nowrap=true#46410
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46474 - Posted: 8 Feb 2017 | 14:27:50 UTC
Last modified: 8 Feb 2017 | 14:31:43 UTC

Can this be the (same) reason why the older 980ti surpasses the 1080 which should be noticably faster just from its specification?

Joerg, This is the core of the GPU downclocking for an unknown reason. The difference between the 980ti and the 1080 is mostly the ram. Pascal has a lower IPC than Maxwell so even though the core runs at almost 2ghz it is very similar in performance to the higher IPC of Maxwell. Because this project doesn't utilize the extra speed and bandwidth of the GDDR5X, It's mostly down to the core. The 980ti has 8 billion transistors while the GP104 die (gtx 1070 and 1080) only have 7.2 billion. Even if the performance of each transistor is ever so slightly higher than the Maxwell chip, it's hard to make up for 800 million transistors. That, and there could potentially be more software optimization on the Cuda 6.5 app because it's been around longer.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46475 - Posted: 8 Feb 2017 | 15:30:14 UTC - in response to Message 46472.

Yes, it is very intermittent and random, and has occured on multiple of my GPUs, only maxwell. Both of my 970s have dropped to 1164mhz multiple times and sometimes they stay there until the computer is rebooted. I believe it has nothing to do with the GPU itself, as it is probably a software issue that one of the scientists can investigate.

it's interesting to read that this phenomenon obviously does not only occur with my cards (which I would have been surprised about anyway).

When you say that the reduced GPU clock "sometimes" stays until the computer is rebooted - does this mean that sometimes you are able to raise the clock again without rebooting? In my case (with the GTX750ti as well as with the GTX970) there is no way to change the clock - neither up nor down - until the PC is rebooted.
Further, after the reboot of the system, can you go back to the same high clock as before without the clock reverting back to the default value for lenghty time?
In my cases, the clock would revert back within few minutes.

Final question: for how many months have your cards been crunching? I am asking this because, as said before, with both of my cards this behavour started about a year. And it's never ever happened before. And this seems strange.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46476 - Posted: 8 Feb 2017 | 15:54:51 UTC - in response to Message 46475.

When you say that the reduced GPU clock "sometimes" stays until the computer is rebooted - does this mean that sometimes you are able to raise the clock again without rebooting? In my case (with the GTX750ti as well as with the GTX970) there is no way to change the clock - neither up nor down - until the PC is rebooted.

From my experience, I've had it return back to full boost before, after a seemingly random amount of time. Often times, I will wait 8+ hours and it still won't come back.

Further, after the reboot of the system, can you go back to the same high clock as before without the clock reverting back to the default value for lenghty time?
In my cases, the clock would revert back within few minutes.

Typically, when I restart, it will be at full boost instantly.

Final question: for how many months have your cards been crunching? I am asking this because, as said before, with both of my cards this behavour started about a year. And it's never ever happened before. And this seems strange.

Both of these cards have been crunching, not just GPUGrid, for well over a year. Temps are typically in the 60s and at most they barely hit 70. I truly think this is some type of software bug inside the application itself.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46477 - Posted: 8 Feb 2017 | 16:58:34 UTC - in response to Message 46476.

... I truly think this is some type of software bug inside the application itself.

when this problem first happened to one of my gtx750ti four weeks ago, the card was crunching BNBS tasks, which are known to put extremely heavy load on the GPU.
When lateron other tasks were running, the problem did not re-occur.

Hence, I now was even more surprised that the same thing happened on the GTX970 with a PABLO-adaptive task.

As said before, this has not occurred ever before, with non of my 5 GPUs.

Could it really by some bug in the recent GPUGRID applications?

Would be great to find out if some other crunchers, too, have experienced this problem.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46479 - Posted: 9 Feb 2017 | 6:23:50 UTC

When this morning I took the first look at my PCs, I noticed the following situation on both the one with the GTX970 and also the one with the GTX750ti (both crunching a PABLO_adaptive_goal_KIX):

In the NVIDIA Inspector, the GPU clock was down at 540 MHz(!), Memory clock 2700MHz (default), GPU Load 0, Power between 84 and 89%. Changing the clock values by the sliders not possible.

In GPU-Z, no values were shown at all for GPU clock, memory clock, GPU load, Video Engine load (which normally is 0 anyway) - no values means a "-" in the fields were normally values (or "0") are shown. Power consumption shows same values as the Inspector.

However, the "progress" column of the BOINC manager shows a progress in the percentage; as it seems to me (but I might be mistaken) with about same speed as usual.

What's going on with these two cards?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46520 - Posted: 17 Feb 2017 | 15:21:27 UTC

Is the problem still occurring? It keeps happening, but only to my 970s. They are doing a mix between folding@home and GPUGrid, and they seem to randomly go to stock clock. When I restart the computer they almost always jump right back up to boost clock but then sometime later they drop down again. Both cards are on different drivers so it could be a persistent error with the driver. Sometimes when I start folding on top of GPUGrid it raises the boost clock, but sometimes it stays at stock clock. I am truly baffled.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46521 - Posted: 17 Feb 2017 | 15:54:29 UTC - in response to Message 46520.
Last modified: 17 Feb 2017 | 16:01:42 UTC

Try to turn off graphic acceleration in your browser, and in Microsoft Excel too (if you have it).

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46569 - Posted: 1 Mar 2017 | 10:05:31 UTC - in response to Message 46521.

Try to turn off graphic acceleration in your browser, and in Microsoft Excel too (if you have it).

I could not find such a setting in the MS Edge browser (the only one on this system)

Whenever crunching PABLO_adaptive_goal WUs, the card is showing the same behaviour: GPU clock falling back (not immediately, but after a while) to default 1152MHz whenever it's overclocked beyond around 1250MHz :-(

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46570 - Posted: 1 Mar 2017 | 12:08:55 UTC - in response to Message 46569.

Try to turn off graphic acceleration in your browser, and in Microsoft Excel too (if you have it).

I could not find such a setting in the MS Edge browser (the only one on this system)

Whenever crunching PABLO_adaptive_goal WUs, the card is showing the same behaviour: GPU clock falling back (not immediately, but after a while) to default 1152MHz whenever it's overclocked beyond around 1250MHz :-(
Do you even browse the internet on this PC? :)
To turn off graphic acceleration in IE and Edge:
Press Windows key + R
Type inetcpl.cpl and press enter
Click the advanced tab
Scroll down to the Graphic acceleration settings
Check "Use software rendering instead of GPU rendering"
Click OK
Restart Your PC

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46571 - Posted: 1 Mar 2017 | 15:24:11 UTC

yes, this is the PC I do most of the Internet browsing :-)

So I now changed the settings according to your instructions and restartet the machine.
Let's wait and see whether this helps. I will report here.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46572 - Posted: 1 Mar 2017 | 16:31:57 UTC - in response to Message 46571.
Last modified: 1 Mar 2017 | 16:42:52 UTC

Let's wait and see whether this helps. I will report here.

Unfortunately, this change in the settings did not help here.
While crunching a PABLO_adaptive_goal, I had the GPU at about 1320MHz, power was about 77-85%, and after 20 minutes or so the GPU clock dropped the the default value 1152MHz.

So either the GPU is defective, or this type of WU does something strange to the GPU.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46573 - Posted: 1 Mar 2017 | 21:35:28 UTC - in response to Message 46572.
Last modified: 1 Mar 2017 | 21:35:59 UTC

Have you tried to update the firmware (BIOS) of the card?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46574 - Posted: 2 Mar 2017 | 5:38:27 UTC - in response to Message 46573.

Have you tried to update the firmware (BIOS) of the card?

no

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46585 - Posted: 4 Mar 2017 | 5:37:04 UTC

I've seen my GTX 980 Ti GPUs sometimes crunch at non-boost 3d clocks (like 1150), when they normally crunch at boost 3d clocks (like 1320).

When they are in this "broken mode", while I'm crunching, if I run GPU-Z, then click the question mark "?" to do the "PCI Express Render Test", it shows a graphic, and the GPU clock ramps back up to boost 3d clocks. Then when I close the GPU-Z windows, it goes back down to non-boost 3d clocks, even though I'm still crunching GPUGrid tasks.

Long story short: I believe the driver is not correctly recognizing that a 3d compute app wants full boost 3d clocks. I'm not sure if there's a setting the app can make to do that, but I suspect it's a driver bug.

And it won't get fixed, unless you can supply a completely reproducible case, and then can report it and get them to listen. I've tried to get that reproducible example, but have never been able to get it. If you have one, please list the steps (all of them, details matter!) here.

Thanks,
Jacob

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46590 - Posted: 4 Mar 2017 | 10:38:47 UTC - in response to Message 46585.

When they are in this "broken mode", while I'm crunching, if I run GPU-Z, then click the question mark "?" to do the "PCI Express Render Test", it shows a graphic, and the GPU clock ramps back up to boost 3d clocks. Then when I close the GPU-Z windows, it goes back down to non-boost 3d clocks, even though I'm still crunching GPUGrid tasks.

Long story short: I believe the driver is not correctly recognizing that a 3d compute app wants full boost 3d clocks. I'm not sure if there's a setting the app can make to do that, but I suspect it's a driver bug.

I seem to recall a similar case, though I don't remember the card. The fix for me was simply to run Nvidia Inspector (which is based on GPU-Z), and just allow the clocks to remain at the default speed. It prevented the downclocking.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46634 - Posted: 10 Mar 2017 | 16:59:42 UTC - in response to Message 46585.

I've seen my GTX 980 Ti GPUs sometimes crunch at non-boost 3d clocks (like 1150), when they normally crunch at boost 3d clocks (like 1320).

When they are in this "broken mode", while I'm crunching, if I run GPU-Z, then click the question mark "?" to do the "PCI Express Render Test", it shows a graphic, and the GPU clock ramps back up to boost 3d clocks. Then when I close the GPU-Z windows, it goes back down to non-boost 3d clocks, even though I'm still crunching GPUGrid tasks.

Long story short: I believe the driver is not correctly recognizing that a 3d compute app wants full boost 3d clocks. I'm not sure if there's a setting the app can make to do that, but I suspect it's a driver bug.

A few minutes ago, the GPU clock again dropped to default value, and so I tried this thing with the "?" in GPU-Z.
Same behavour as described by Jacob.

What comes into my mind is that this problem only occurs with the GPUs in Windows 10 systems, never so far in XP.
So, either this indeed is a driver bug of the newer drivers that come for Windows 10, or it has to do with the WDDM.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 0
Level
Val
Scientific publications
wat
Message 46663 - Posted: 15 Mar 2017 | 11:32:26 UTC

For what it's worth, I am experiencing exactly the same issue with my 970 on win 10. Sometimes the clock comes down because I start Chrome (which I can understand) but sometimes it just drops the clock for no reason. GPU usage is still pegged at 100% however.

I really have no clue.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46666 - Posted: 15 Mar 2017 | 19:48:30 UTC

By now I am pretty much convinced that the problem has to do with:

1) Win10 itself, or
2) the WDDM overhead, or
3) the driver, or
4) a mix of above

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46686 - Posted: 18 Mar 2017 | 6:54:45 UTC

In my mind, the possibilities are:

a) It's actually working correctly, in a situation where the code is moderately to heavily cpu-bound and there is no tangible benefit toward running at a faster gpu clock

or

b) Stupid NVIDIA driver is stupid.

....

It is very likely to be b.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46976 - Posted: 17 Apr 2017 | 7:08:40 UTC

Unfortunately, after the change from CUDA 65 to 80 a few days ago, and the compulsary driver update from 376.53 to 381.65, the problem has become even worse.
It looks like as if crunching above the default clock 1152MHz in no longer possible.

For example, I had recent tasks run at 1280MHz and around 77% TDP. After a few minutes, the clock fell back to 1152MHz, TDP around 55%.

Until a few months ago, it was no problem at all with this card to crunch at 1420MHz and about 95% TDP.

What's going wrong?

Still today, I will have to update driver on the PC with the GTX750ti inside, with which I have been experiencing the same problem for a while.
So I am afraid that also there, the problem will become even worse.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,205,482,676
RAC: 29,855,510
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47161 - Posted: 3 May 2017 | 5:44:02 UTC

It would really be interesting how many other crunchers now have the same problem with a GTX970 in combination with the new software acemd_918.80 and the new driver 381.65.
Here, even mild overclocking makes the GPU jump back to default clock 1152MHz :-(((

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47162 - Posted: 3 May 2017 | 12:15:20 UTC

Have you tried using a different tool to do your overclocking? I overclock with MSI Afterburner, and highly recommend it.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47163 - Posted: 3 May 2017 | 13:22:58 UTC - in response to Message 47161.

It would really be interesting how many other crunchers now have the same problem with a GTX970 in combination with the new software acemd_918.80 and the new driver 381.65.
Here, even mild overclocking makes the GPU jump back to default clock 1152MHz :-(((



I have one which is OC'd and doesn't display that behaviour.

Post to thread

Message boards : Graphics cards (GPUs) : GTX 970 switching to default clock value (1152MHz) after a while

//