Advanced search

Message boards : Graphics cards (GPUs) : Ubuntu two GPU troubleshooting

Author Message
Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44672 - Posted: 13 Oct 2016 | 16:57:51 UTC

Hi!

Two GTX 980 on Ubuntu, followed the FAQ guide, work fine. but...
I get 4 GPU WU but only one GPU is running, the (0) GPU.
I tryed to low CPU usage to 50% on WCG WU.... no success here.

Did a need to edit somekind of confing file or maybe connect the (1) GPU to a monitor.

Thank!


captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,788,379,060
RAC: 17,774,024
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44674 - Posted: 13 Oct 2016 | 18:59:30 UTC

Francois,

When you start up BOINC, the Event Log should have 4 lines that look like this:

Thu 13 Oct 2016 01:45:20 PM CDT | | CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 361.42, CUDA version 8.0, compute capability 5.2, 4095MB, 4004MB available, 3919 GFLOPS peak)
Thu 13 Oct 2016 01:45:20 PM CDT | | CUDA: NVIDIA GPU 1: GeForce GTX 970 (driver version 361.42, CUDA version 8.0, compute capability 5.2, 4096MB, 4004MB available, 3919 GFLOPS peak)
Thu 13 Oct 2016 01:45:20 PM CDT | | OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 361.42, device version OpenCL 1.2 CUDA, 4095MB, 4004MB available, 3919 GFLOPS peak)
Thu 13 Oct 2016 01:45:20 PM CDT | | OpenCL: NVIDIA GPU 1: GeForce GTX 970 (driver version 361.42, device version OpenCL 1.2 CUDA, 4096MB, 4004MB available, 3919 GFLOPS peak)


If not, try these steps,

Connect a monitor to the second GPU.

Start the computer.

If the "Nvidia X Server Settings" application is not installed on your system, install it from the Ubuntu Software installer.

You will need to modify some settings using the "Nvidia X Server Settings software" and save those settings to a configuration file. In order to save the settings to a configuration file, you have to start the software using Root permissions.

Open a terminal session and key in the following: "sudo nvidia-settings" and press enter. It will ask for your system password.

In the upper left corner of the "Nvidia X Server Settings" window, click on the option for" X Server Display Configuration". It should show the display layout. One of the displays should show as disabled. Enable it, then click on "Save to X Configuration File" in the bottom right corner of the window.

Reboot the computer.

Check the BOINC Event Log and see if it shows that both GPU's are enabled. If yes, you should be able to use both GPU's. If not, post back here and let us know what you encountered.

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44675 - Posted: 13 Oct 2016 | 19:37:20 UTC

Thank alot for the reply Captain Jack,

My two GPU show in the log at start up of BOINC.
I have suspend the WCG projet and now my two GPU are working.
If I resume WCG just one GPU is working.


Surely something about projet priority in BOINC or inside projet setting.



captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,788,379,060
RAC: 17,774,024
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44676 - Posted: 13 Oct 2016 | 19:51:45 UTC

Francois,

I had the same thing happen with WCG recently. Some of the WCG tasks had a short deadline time so BOINC prioritized them in front of the GPU tasks. I had to limit the WCG tasks until they finished so the GPU tasks would keep running. If I remember correctly, the high priority WCG tasks were Fight Aids @ home tasks.

Glad you found a way to get it going.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44677 - Posted: 13 Oct 2016 | 20:47:38 UTC - in response to Message 44676.

Maybe a priority issue; possibly Beta CPU work or resends with lower return time?

On an 8 core CPU, most people would be running more than 2 GPU tasks and 2 CPU tasks, so you might want to check if your preferences are being pulled from WCG, against your computers venue (general/global, home, school, work) and edit them if need be.

https://www.gpugrid.net/prefs.php?subset=global

PS. Remember to open NVidid X Server, goto the Thermal Settings of your GPU's, Enable GPU Fan Settings and increase the fan speed to >70%.

BTW. Anyone with a repository Boinc install wanting to edit their cc_config.xml file can do the following:
Open a Terminal (Top Left icon {Search your computer}, type Terminal, type sudo gedit /etc/boinc-client/cc_config.xml
Add the appropriate lines you want (such as use all GPU's) and then click save (top right).
eg.

<cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config>

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 44678 - Posted: 13 Oct 2016 | 20:51:40 UTC - in response to Message 44677.
Last modified: 13 Oct 2016 | 20:52:20 UTC

Remember to open NVidid X Server, goto the Thermal Settings of your GPU's, Enable GPU Fan Settings and increase the fan speed to >70%.



Hi. Do you by chance recommend this for any GPU or only in this case?

Thanks.

EDIT:

I mean making the fan to 70%.
____________
Cruncher/Learner in progress.

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44681 - Posted: 13 Oct 2016 | 22:00:00 UTC
Last modified: 13 Oct 2016 | 22:02:31 UTC

I changed the "Store at least xx day of work" in computing preferences to 0.1 and "Store additional work" to 1. Now everything is crunching.

I'm unable to control or find the fan option in Nvidia X Server...

(0) GPU load at 90% 68C Fan at 50% (monitor connected)Gigabyte windforce.
(1) GPU load at 70% 68C Fan at 20% EVGA SC ACX2.0

The 2 WU are not the same maybe that explain why.
(0) is a SDOERR
(1) is a GERARD

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,788,379,060
RAC: 17,774,024
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44686 - Posted: 14 Oct 2016 | 0:22:48 UTC

Francois,

To enable the fan control option, you have to set cool-bits to 4 (or greater). Instruction on how to do that can be found in this thread:

http://www.gpugrid.net/forum_thread.php?id=4305&nowrap=true#43503

Once that is set, open up Nvidia X Server Settings, on the left pane right below GPU0 should be an option for "Thermal Settings". Click on Thermal Settings. In the middle of the right pane should be a check box for "Enable GPU Fan Settings". When you check that box, there will be a slider underneath where you can set % fan speed. Same for GPU1.

Hope that helps.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,727,913,360
RAC: 1,140,635
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44687 - Posted: 14 Oct 2016 | 0:23:05 UTC
Last modified: 14 Oct 2016 | 0:24:30 UTC

Your solution is under COOLBITS:http://www.gpugrid.net/forum_thread.php?id=3525&nowrap=true#33760
Sorry have not seen that it is alredy answered.

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44693 - Posted: 14 Oct 2016 | 2:46:00 UTC

Coolbits worked for the GPU0 fan adjustment but not the GPU1

I connected a monitor to GPU1, but no change.

In the file i have these line did i need to add somehting like another Identifer "Screen1"?

Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "Coolbits" "4"
SubSection "Display"
Depth 24
EndSubSection

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44694 - Posted: 14 Oct 2016 | 3:06:25 UTC

I got it... in NVIDIA X SERVER, needed to activate the 2nd monitor, this will add line in the config file for Screen1, so just adding Option "Coolbits" "4" to Screen1 section activate the fan option.

How to save in Nvidia X Server?.... rebooting make the fan setting return to default.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44698 - Posted: 14 Oct 2016 | 13:13:15 UTC - in response to Message 44694.

For reference (blower) designs and average single/dual fan GPU's, 70% fan is about right, but different tasks stress the cards in different ways and to different amounts. Also, the case design and ambient temps can change things a lot. If you have better designed cards and systems then fan speed and temp control is less of an issue.
I've a dual fan Zotac card sitting in an open case and I need to have the fans at 77% to keep the temperature around 71C. GPU utilization is ~90% running a GERARD_CXCL12_

Every time you reboot you need to enable fan control again from X Server and raise it to whatever you think is appropriate.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,741,017,411
RAC: 10,101,593
Level
Tyr
Scientific publications
watwatwatwatwat
Message 44727 - Posted: 16 Oct 2016 | 17:49:43 UTC
Last modified: 16 Oct 2016 | 17:50:39 UTC

In boinc manager, how many CPUs does it say on each GPU task? 1 CPU + 1NV? 0.5 CPU + 1NV? If you haven't set an app_config file, do so and lower the cpu_usage number to allow for more GPU work to be running at once. I have changed this number before and CPU threads would stop/start depending on the leftover CPU cores left.

<cpu_usage>.25</cpu_usage>

NV settings in linux will have to be reset after each restart along with your OC.

Post to thread

Message boards : Graphics cards (GPUs) : Ubuntu two GPU troubleshooting

//