Advanced search

Message boards : Number crunching : GTX 770 won't get work

Author Message
Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47212 - Posted: 14 May 2017 | 7:40:21 UTC

I don't know what's wrong, but on this system, I can't get work since weeks although the server indicates that tasks are available - and I have completed a lot of tasks using this same machine before:

https://www.gpugrid.net/show_host_detail.php?hostid=342877

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,366,597,676
RAC: 28,877,974
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47215 - Posted: 14 May 2017 | 14:01:06 UTC
Last modified: 14 May 2017 | 14:03:24 UTC

Could you please tell us which OS this is, which graphic card (including driver version) and which crunching (acemd ...) software.

When did the machine stop crunching? On April 14?

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47222 - Posted: 15 May 2017 | 11:59:53 UTC - in response to Message 47215.
Last modified: 15 May 2017 | 12:00:52 UTC

Could you please tell us which OS this is, which graphic card (including driver version) and which crunching (acemd ...) software.

Ubuntu Linux 16.04 LTS x64, kernel: 4.4.0-75-generic
BOINC version: 7.6.31
CPU: GenuineIntel Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz [Family 6 Model 30 Stepping 5] (8 Prozessoren)
GPU: NVIDIA GeForce GTX 770 (1998MB)
GPU driver: 375.39

Credits generated on this machine: 73,260,609

When did the machine stop crunching? On April 14?

Unfortunately, I can't specify the date precisely but it is a problem since several weeks (can't find the last successfully processed task in the GPUGRID database of my user account anymore).


Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Seek the Truth: Jesus Is...
Avatar
Send message
Joined: 21 Mar 15
Posts: 10
Credit: 48,092,354
RAC: 8,470
Level
Val
Scientific publications
watwat
Message 47242 - Posted: 16 May 2017 | 20:34:30 UTC
Last modified: 16 May 2017 | 20:37:39 UTC

I have the very same issue (no longer getting any work), only with a different GPU.
I bought a new machine six weeks ago (4 APRIL 2016)

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz [Family 6 Model 94 Stepping 3]
(8 processors)
with
NVIDIA GeForce GTX 950M (2048MB) driver: 369.9

http://gpugrid.net/show_host_detail.php?hostid=421103

For about two weeks I was getting WUs just fine. Then suddenly, out of the blue (I made no changes) I was no longer getting ANY tasks.
I have tried about everything I can: I set the computing preferences
store at least __ days and
store up to an additional ___ days
each to various larger amounts (e.g., 2, 3, 4, 6.6 etc, etc)
I have reset the project (several times)
I have even 'Removed' the project and then added it again (I did this twice).
ALL to no avail. I have sort of given up and have been using my GPU for another project.
However, I would prefer to be able to run GPUgrid tasks.
(Since this thread is about 770s, I think I need to post to a new thread.)

Thanks,
LP
____________
Essential biomedical science:
At fertilization, a new and unique member of the species homo sapiens is formed.
Abortion wounds the Mother, and kills a very tiny baby girl or baby boy.
Life!
Les P., PhD Prof. Engr.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47248 - Posted: 16 May 2017 | 21:37:55 UTC - in response to Message 47242.
Last modified: 16 May 2017 | 21:41:14 UTC

Update the driver and you should get the new app and WU's

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47258 - Posted: 17 May 2017 | 11:35:30 UTC - in response to Message 47248.

Update the driver and you should get the new app and WU's

In my case, the driver is certainly not the problem: It is the same driver which I use for my GTX 970 and that machine receives one WU after the other without any issues.
Moreover, this propr. NVIDIA driver (375.39) is the latest you get with Ubuntu console update.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Seek the Truth: Jesus Is...
Avatar
Send message
Joined: 21 Mar 15
Posts: 10
Credit: 48,092,354
RAC: 8,470
Level
Val
Scientific publications
watwat
Message 47263 - Posted: 17 May 2017 | 20:15:24 UTC - in response to Message 47248.
Last modified: 17 May 2017 | 20:21:48 UTC

The driver is not the problem in my case either.
First, It is the same driver which I used for the first two weeks I had the PC (the first two weeks of April, 2017) when I was getting WUs just fine.
Second, I am getting and running WUs from other projects (primarily Einstein-at-home).
Third, I did update drivers yesterday, and I am still not able to get any work (even when GPUGRID has generated plenty of tasks).

Moreover, I do wish that GPUGRID had better 'diagnostic' (or whatever) messages when no WUs are sent to a host even though work is available.

I really do wish some project administrator would see these messages!

Keeping my fingers crossed,
LP
____________
Essential biomedical science:
At fertilization, a new and unique member of the species homo sapiens is formed.
Abortion wounds the Mother, and kills a very tiny baby girl or baby boy.
Life!
Les P., PhD Prof. Engr.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47285 - Posted: 19 May 2017 | 13:42:44 UTC
Last modified: 19 May 2017 | 13:43:33 UTC

Today I updated to Linux kernel 4.4.0-78-generic keeping NVIDIA driver 375.39. Still no tasks for my GTX 770 even after resetting GPUGRID.

Curiously, when auto-updating the GTX 970 machine to the same kernel, the NVIDIA driver got additionally updated to version 375.51.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47293 - Posted: 20 May 2017 | 12:51:50 UTC - in response to Message 47285.

Today I updated to Linux kernel 4.4.0-78-generic keeping NVIDIA driver 375.39. Still no tasks for my GTX 770 even after resetting GPUGRID.

I know little about Linux drivers, except that they must be matched to the Linux version, and work for me when I get them from the Ubuntu software center. And even if the drivers are apparently installed properly, they must implement CUDA properly to work here.

Given that the GTX 770 is now an older card and you are now using the 375.39 drivers, it is much more likely that it is a Linux problem rather than a GPUGrid problem.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47294 - Posted: 20 May 2017 | 16:23:59 UTC - in response to Message 47293.

Given that the GTX 770 is now an older card and you are now using the 375.39 drivers, it is much more likely that it is a Linux problem rather than a GPUGrid problem.

No, because otherwise the other machine (GTX 970) with the same Linux kernel would also not receive tasks. But it does on a daily basis.

I know for sure that the GTX 970 box does use CUDA 8.
I am not sure whether the GTX 770 also does (originally that machine was setup using CUDA 7), but assume(d) that the autoupdate (apt update & apt upgrade) will also update CUDA as it does update the NVIDIA drivers.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47295 - Posted: 20 May 2017 | 17:20:48 UTC - in response to Message 47294.

You seem to be assuming that at Kepler card runs CUDA, or even has it available in those drivers, the same way that a Maxwell card does, just because the version numbers of Linux and the drivers are comparable for the cards. I would not assume that.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47296 - Posted: 20 May 2017 | 19:21:19 UTC

Please restart your PC, and check the first lines of the event log of BOINC manager for the GPU report.
It should look similar to this:

2017. 05. 17. 3:44:59 CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 9654 GFLOPS peak) 2017. 05. 17. 3:44:59 OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 9654 GFLOPS peak)
Could you please post yours?

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47306 - Posted: 22 May 2017 | 9:15:09 UTC - in response to Message 47296.

Please restart your PC, and check the first lines of the event log of BOINC manager for the GPU report.
It should look similar to this:
2017. 05. 17. 3:44:59 CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 9654 GFLOPS peak) 2017. 05. 17. 3:44:59 OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 9654 GFLOPS peak)
Could you please post yours?

Here it is:

Mo 22 Mai 2017 11:06:31 CEST | | CUDA: NVIDIA GPU 0: GeForce GTX 770 (driver version 375.39, CUDA version 8.0, compute capability 3.0, 1999MB, 1948MB available, 3693 GFLOPS peak)
Mo 22 Mai 2017 11:06:31 CEST | | OpenCL: NVIDIA GPU 0: GeForce GTX 770 (driver version 375.39, device version OpenCL 1.2 CUDA, 1999MB, 1948MB available, 3693 GFLOPS peak)

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47307 - Posted: 22 May 2017 | 10:40:38 UTC - in response to Message 47306.
Last modified: 22 May 2017 | 10:43:29 UTC

You have a CC3 card and need an earlier driver before CUDA 8.0 and you will get Cuda 6.5 app.

I use 359.6 driver for windows but don't know what the equavelent driver is for Linux.

If you look at my computers one of them has an earlier driver (660ti) as it is also a CC3 card

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47308 - Posted: 22 May 2017 | 16:33:04 UTC - in response to Message 47307.

You have a CC3 card and need an earlier driver before CUDA 8.0 and you will get Cuda 6.5 app.
According to the applications page, there's no CUDA6.5 client for Linux, so this won't work under Linux.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47309 - Posted: 22 May 2017 | 17:05:50 UTC - in response to Message 47306.
Last modified: 22 May 2017 | 17:07:35 UTC

You should check the venue of your host #342877. Then you should check the "GPUGrid settings" in your profile that the venue of your host #342877 should have the "Use NVidia GPU" selected, also the "Run only the selected applications" should have the "ACEMD long runs (8-12 hours on fastest GPU)" selected.

If the settings are OK, and you still don't receive work, then you should edit/create the cc_config.xml file in the BOINC manager's data folder to include the work_fetch_debug option.
If there's no cc_config.xml, you should create one with the following content:

<cc_config> <log_flags> <work_fetch_debug> </log_flags> </cc_config>

If there's a cc_config.xml, then you should copy the following after the first line:
<log_flags> <work_fetch_debug> </log_flags>

Then click settings -> re-read configuration files, and update the GPUGrid project.
Then post us the messages in the event log after the line:
GPUGRID | update requested by user

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47310 - Posted: 22 May 2017 | 20:05:24 UTC
Last modified: 22 May 2017 | 20:09:39 UTC

As said above, all the settings are correct as that machine picked tasks on a daily basis. Nothing was changed on the website's end.

Also, I never used an .xml configuration files for GPUGRID.
What exactly is this work_fetch_debug about? Or is it just reporting more elaborate what's actually happening when not receiving tasks?

Something has changed at GPUGRID's end such that my card does not receive work anymore.
That card is listed as being supported by the project (GTX 770, CC/SM: 3.0).

This Linux machine completed tasks using CUDA 7.5. So, why can't I just backport to CUDA 7.5, reset the project and receive the former app which worked just perfectly?
It appears to me that the auto-downloaded CUDA 8 app just won't work under Linux when shader model 3 is in use. I have no idea why this should be the case, though...

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47311 - Posted: 22 May 2017 | 22:42:51 UTC - in response to Message 47310.

Also, I never used an .xml configuration files for GPUGRID.
What exactly is this work_fetch_debug about? Or is it just reporting more elaborate what's actually happening when not receiving tasks?
Exactly. You can set options by cc_config.xml which are not accessible trough the GUI of the BOINC manager (this parameter does not change the settings of GPUGrid, or any project). See the BOINC client configuration wiki for details. See this post about how to read the elaborate info provided by work_fetch_debug.

Something has changed at GPUGRID's end such that my card does not receive work anymore.
That card is listed as being supported by the project (GTX 770, CC/SM: 3.0).

This Linux machine completed tasks using CUDA 7.5. So, why can't I just backport to CUDA 7.5, reset the project and receive the former app which worked just perfectly?
The previous GPUGrid client was CUDA6.5, your host has processed 126 (short runs) + 249 (long runs) of them.
The CUDA6.5 client has deprecated, there's only CUDA8.0 client available for Linux.
I think that you wouldn't receive CUDA6.5 tasks either.

It appears to me that the auto-downloaded CUDA 8 app just won't work under Linux when shader model 3 is in use.
I have no idea why this should be the case, though...
The single CUDA8.0 task your host has received finished successfully on your host, so the CUDA 8 app is working on your host.
We need to find the reason why your host does not asks for / provided with new tasks.

BTW how many GPU project your host is attached to?

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47313 - Posted: 23 May 2017 | 8:33:50 UTC - in response to Message 47309.
Last modified: 23 May 2017 | 8:42:49 UTC

You should check the venue of your host #342877. Then you should check the "GPUGrid settings" in your profile that the venue of your host #342877 should have the "Use NVidia GPU" selected, also the "Run only the selected applications" should have the "ACEMD long runs (8-12 hours on fastest GPU)" selected.

If the settings are OK, and you still don't receive work, then you should edit/create the cc_config.xml file in the BOINC manager's data folder to include the work_fetch_debug option.
If there's no cc_config.xml, you should create one with the following content:
<cc_config> <log_flags> <work_fetch_debug> </log_flags> </cc_config>

If there's a cc_config.xml, then you should copy the following after the first line:
<log_flags> <work_fetch_debug> </log_flags>

Then click settings -> re-read configuration files, and update the GPUGrid project.
Then post us the messages in the event log after the line:
GPUGRID | update requested by user

Here are the messages:

Di 23 Mai 2017 10:27:18 CEST | | Re-reading cc_config.xml
Di 23 Mai 2017 10:27:18 CEST | | Config: GUI RPCs allowed from:
Di 23 Mai 2017 10:27:18 CEST | | log flags: file_xfer, sched_ops, task, work_fetch_debug
Di 23 Mai 2017 10:27:18 CEST | | [work_fetch] Request work fetch: Core client configuration
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] shortfall 299527.82 nidle 0.00 saturated 24073.06 busy 0.00
Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] shortfall 84909.97 nidle 0.00 saturated 1490.03 busy 0.00
Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 693.27, inc 600.00)
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] No project chosen for work fetch
Di 23 Mai 2017 10:27:26 CEST | GPUGRID | update requested by user
Di 23 Mai 2017 10:27:26 CEST | | [work_fetch] Request work fetch: project updated by user
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] shortfall 299556.98 nidle 0.00 saturated 24061.56 busy 0.00
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] shortfall 84914.93 nidle 0.00 saturated 1485.07 busy 0.00
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] share 1.000
Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 1.00 req_secs 84914.93
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (84914.93 sec, 1.00 inst)
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | Sending scheduler request: Requested by user.
Di 23 Mai 2017 10:27:27 CEST | GPUGRID | Requesting new tasks for NVIDIA GPU
Di 23 Mai 2017 10:27:29 CEST | GPUGRID | Scheduler request completed: got 0 new tasks
Di 23 Mai 2017 10:27:29 CEST | GPUGRID | No tasks sent
Di 23 Mai 2017 10:27:29 CEST | | [work_fetch] Request work fetch: RPC complete
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] REC 6457.612 prio 0.000 can't request work: scheduler RPC backoff (25.93 sec)
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] shortfall 299601.76 nidle 0.00 saturated 24047.75 busy 0.00
Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] shortfall 84921.89 nidle 0.00 saturated 1478.11 busy 0.00
Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] share 0.000
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] No project chosen for work fetch
Di 23 Mai 2017 10:27:45 CEST | | Contacting account manager at https://bam.boincstats.com/
Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! User: 3739, Michael H.W. Weber
Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! Host: 653689
Di 23 Mai 2017 10:27:47 CEST | | Account manager: Number of BAM! connections for this host: 7467
Di 23 Mai 2017 10:27:47 CEST | | Account manager contact succeeded
Di 23 Mai 2017 10:28:00 CEST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 299801.40 nidle 0.00 saturated 23987.94 busy 0.00
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 84953.52 nidle 0.00 saturated 1446.48 busy 0.00
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 1.000
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 1.00 req_secs 84953.52
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (84953.52 sec, 1.00 inst)
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Sending scheduler request: To fetch work.
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Requesting new tasks for NVIDIA GPU
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | Scheduler request completed: got 0 new tasks
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | No tasks sent
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | [work_fetch] backing off NVIDIA GPU 740 sec
Di 23 Mai 2017 10:28:05 CEST | | [work_fetch] Request work fetch: RPC complete
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] REC 6457.612 prio 0.000 can't request work: scheduler RPC backoff (25.92 sec)
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] shortfall 299843.54 nidle 0.00 saturated 23976.43 busy 0.00
Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] shortfall 84959.51 nidle 0.00 saturated 1440.49 busy 0.00
Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 734.50, inc 600.00)
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] No project chosen for work fetch
Di 23 Mai 2017 10:28:37 CEST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] REC 6457.298 prio -1.000 can request work
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] shortfall 300039.40 nidle 0.00 saturated 23916.62 busy 0.00
Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] shortfall 84989.53 nidle 0.00 saturated 1410.47 busy 0.00
Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 704.19, inc 600.00)
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] No project chosen for work fetch

Remember: All GPUGRID projects are chosen for this machine.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47318 - Posted: 25 May 2017 | 12:35:09 UTC - in response to Message 47313.

Thank you for posting the details, but I didn't get any smarter.

The GPUGrid project was in resource backoff (=it won't ask for work), but at 10:28:00 it has ended:

Di 23 Mai 2017 10:27:45 CEST | | Contacting account manager at https://bam.boincstats.com/
Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! User: 3739, Michael H.W. Weber
Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! Host: 653689
Di 23 Mai 2017 10:27:47 CEST | | Account manager: Number of BAM! connections for this host: 7467
Di 23 Mai 2017 10:27:47 CEST | | Account manager contact succeeded
Di 23 Mai 2017 10:28:00 CEST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- start work fetch state -------
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- project states ---
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for CPU ---
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 299801.40 nidle 0.00 saturated 23987.94 busy 0.00
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs

Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for NVIDIA GPU ---
Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 84953.52 nidle 0.00 saturated 1446.48 busy 0.00
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 1.000

Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- end work fetch state -------
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 1.00 req_secs 84953.52
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (84953.52 sec, 1.00 inst)
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Sending scheduler request: To fetch work.
Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Requesting new tasks for NVIDIA GPU
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | Scheduler request completed: got 0 new tasks
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | No tasks sent
Di 23 Mai 2017 10:28:05 CEST | GPUGRID | [work_fetch] backing off NVIDIA GPU 740 sec


The BOINC manager asked for 84953.52 seconds of work for NVidia GPU 1 instance, but the project did not send tasks (while there was work in the queue).

Some other project is using your GPU, so should GPUGrid.

Remember: All GPUGRID projects are chosen for this machine.

That's ok.
Do you have the "Use NVidia GPU" and the "Use Graphics Processing Unit (GPU) if available" selected in GPUGrid preferences?
Do you have at least 8 GB disk space in the partition the BOINC data directory resides?
How many other GPU project this host is attached to?

You could try to increase the work buffer (it is set to 1 day now) for testing.

If nothing works try the following:
1. detach this host from all projects
2. uninstall BOINC manager
3. restart your host
4. install BOINC manager
5. attach this host only to GPUGrid, and test it
6. attach this host to other projects one by one, only one at a day.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47321 - Posted: 25 May 2017 | 20:05:34 UTC - in response to Message 47318.

Do you have the "Use NVidia GPU" and the "Use Graphics Processing Unit (GPU) if available" selected in GPUGrid preferences?

Yes.

Do you have at least 8 GB disk space in the partition the BOINC data directory resides?

Will check, but would quite certainly say yes.

How many other GPU project this host is attached to?

Just Primegrid and GPUGRID but even if I suspend Primegrid, the machine won't fetch work for GPUGRID.

You could try to increase the work buffer (it is set to 1 day now) for testing.

I did, but it won't change the situation even if I set the work buffer to 10/10 days.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,925,538,316
RAC: 18,542,176
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47322 - Posted: 25 May 2017 | 22:17:34 UTC - in response to Message 47321.

Getting new work is a two-part collaboration between your computer and the project server.

The first necessary condition is that your computer requests new work. The work fetch log has confirmed that your machine is requesting work for your NVidia GPU - job done. You can turn that logging off now, and save some disk space and processing cycles.

The second necessary condition is that the server responds by allocating new work - which it isn't. The question is - why not?

One more to check - are you allowing 'ACEMD long runs' (project preferences)? Short run jobs are as rare as hen's teeth these days.

After that, it's a question of verifying that your GPU's 'compute capability' and graphics driver together match the current minimum project requirements.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47323 - Posted: 25 May 2017 | 23:13:48 UTC - in response to Message 47322.
Last modified: 25 May 2017 | 23:41:34 UTC

After that, it's a question of verifying that your GPU's 'compute capability' and graphics driver together match the current minimum project requirements.
It's been done. Moreover this host has already successfully completed a single CUDA8.0 task, but no more sent by the project.

I've experienced the same behavior once when I was trying my GTX 1080 under Linux. I thought then that I'd messed up something while trying to make the SWAN_SYNC work under Linux (well, I couldn't). This host stopped receiving work, even if GPUGrid was the only project on this host all the time. Then I've installed Windows 10 on the same hardware, and it has received work, and it haven't stopped receiving work after I've set SWAN_SYNC on.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47324 - Posted: 26 May 2017 | 3:38:05 UTC - in response to Message 47322.

One more to check - are you allowing 'ACEMD long runs' (project preferences)?

As said above: Yes, ALL GPUGRID subprojects are allowed for this machine.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47325 - Posted: 26 May 2017 | 7:56:13 UTC - in response to Message 47324.

I would detach from everything and reattach only to GPUGrid as Retvari suggests. If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47326 - Posted: 26 May 2017 | 12:58:31 UTC - in response to Message 47325.
Last modified: 26 May 2017 | 13:00:36 UTC

I would detach from everything and reattach only to GPUGrid as Retvari suggests.

I detached from GPUGRID, rebooted the system and re-attached to GPUGRID. No improvement.

If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.

No, I do not know or accept that. This is science, not homeopathy (although homeopathy at least offers the placebo effect (for those of us who are believers) which can't principally be excluded to be something scientifically accessable, too - although we still have not found a clue how that might be possible).

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47327 - Posted: 26 May 2017 | 13:51:02 UTC - in response to Message 47326.
Last modified: 26 May 2017 | 13:58:49 UTC

I am not sure you followed the instructions. Homeopathy is your idea, not mine.

However, if your enthusiastic to spend more time, I would try earlier drivers (still CUDA 8). Nvidia may have introduced problems in the later ones. I have seen it on other projects on occasion.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47328 - Posted: 27 May 2017 | 1:55:26 UTC
Last modified: 27 May 2017 | 1:55:47 UTC

It's frustrating that the server doesn't give more details in its reply.

I think your problem can only be investigated further by the project admins, who really should throw us more bones in the server replies in the Event Log, to further explain WHY tasks were not given.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47329 - Posted: 27 May 2017 | 7:11:49 UTC - in response to Message 47326.

I would detach from everything and reattach only to GPUGrid as Retvari suggests.

I detached from GPUGRID, rebooted the system and re-attached to GPUGRID. No improvement.

If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.

No, I do not know or accept that. This is science, not homeopathy (although homeopathy at least offers the placebo effect (for those of us who are believers) which can't principally be excluded to be something scientifically accessable, too - although we still have not found a clue how that might be possible).

Michael.


Are you sure its not your work buffer or some other config in BOINC

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47330 - Posted: 27 May 2017 | 13:32:39 UTC - in response to Message 47329.

Are you sure its not your work buffer or some other config in BOINC

Yes, I am sure about that.
This machine just did not receive any work anymore from one day to the other without me having altered any of the BOINC or project settings.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

mindcrime
Send message
Joined: 27 Feb 14
Posts: 4
Credit: 121,376,887
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwat
Message 47332 - Posted: 27 May 2017 | 21:20:22 UTC

I have two 750ti's on different machines at different physical locations using the same BOINC and GPUGrid settings/prefs. I noticed one was getting GPUGrid work and the other wasn't. After a couple of days of not getting work I began to investigate. I found this thread did some reading and noticed the driver difference between the two. I updated the driver to the newest (382.33) and got work.

TL;DR update your drivers.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47333 - Posted: 28 May 2017 | 5:21:58 UTC - in response to Message 47323.

Moreover this host has already successfully completed a single CUDA8.0 task...

How did you actually find out about that?
I could't see any of the completed WUs in my client's history even before I started posting this thread.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47334 - Posted: 28 May 2017 | 6:46:18 UTC

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47335 - Posted: 28 May 2017 | 10:54:17 UTC - in response to Message 47334.

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.

Indeed. Never checked that link.
One question though: Why aren't all the tasks completed using CUDA 6.5 indicated as valid (although they all were valid)?

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47336 - Posted: 28 May 2017 | 11:14:07 UTC - in response to Message 47335.
Last modified: 28 May 2017 | 11:14:33 UTC

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.

Indeed. Never checked that link.
One question though: Why aren't all the tasks completed using CUDA 6.5 indicated as valid (although they all were valid)?

Michael.


Your assumption, that they were all valid, seems invalid :)

From my experience, if a task is suspended+resumed, or stopped+resumed, then it has a chance of being invalid, even if you watched it complete without error. Something in the validator must not like the output, sometimes, when those scenarios happen.

Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux?

Also, can you see if you can upgrade your driver (I looked briefly and there might be a minor update available to you).

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47338 - Posted: 29 May 2017 | 12:56:23 UTC - in response to Message 47336.
Last modified: 29 May 2017 | 13:02:57 UTC

Your assumption, that they were all valid, seems invalid :)

Not really. They generated at least 73 billions of credits, so a few should have been OK. :)

The point is that not a single valid task is listed (and none invalid, too).

Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux?

Also, can you see if you can upgrade your driver (I looked briefly and there might be a minor update available to you).

Two things:

(1) the NVIDIA proprietary driver is updated from time to time using auto-update of Ubuntu. I actually do not like to change this manually as everything except for GPUGRID works perfectly.
(2) This GTX 770 machine uses the same driver as my GTX 970 machine. The latter does receive tasks on a daily basis, the former not. So, I don't really see a reason for why the current driver should be the problem. Especially since, as stated above, even the GTX 770 completed a CUDA 8 WU successfully.

But why should I care?
It is not my project and the GTX 770 now contributes to some other project until the GPGRID team decides to do something in order to keep or increase their number of contributers.
I find it really kind of strange that - if I got it correctly - so far this topic has exclusively been discussed by volunteers? Thank you guys, I think you did your best.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,925,538,316
RAC: 18,542,176
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47339 - Posted: 29 May 2017 | 13:53:00 UTC - in response to Message 47338.

The point is that not a single valid task is listed (and none invalid, too).

Don't worry about that. Task data is kept in a short-term 'transactional' database, and purged (to save space and processing time) when no longer needed - usually after 10 days or so.

The important scientific data is transferred to a long-term scientific database and kept indefinitely.

But from the same 'application details' link for your machine, we can see for cuda65 (long tasks):

Number of tasks completed 249
Max tasks per day 1
Number of tasks today 0
Consecutive valid tasks 0

'Max tasks per day' and 'consecutive valid tasks' together imply that your machine produced a considerable number of invalid tasks at some point: no shame in that, we all did the same thing when the cuda65 licence expired, but it shows the sort of inferences you can draw.

Two things:

(1) the NVIDIA proprietary driver is updated from time to time using auto-update of Ubuntu. I actually do not like to change this manually as everything except for GPUGRID works perfectly.
...

I'm not a Linux user, but I have read comments that Linux GPU drivers tend to be compiled against a specific Linux kernel. If your kernel also auto-updates, you may need to take precautions to ensure that your kernel and driver updates are kept in sync.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47342 - Posted: 30 May 2017 | 10:28:50 UTC - in response to Message 47339.
Last modified: 30 May 2017 | 10:36:55 UTC

But from the same 'application details' link for your machine, we can see for cuda65 (long tasks):

Number of tasks completed 249
Max tasks per day 1
Number of tasks today 0
Consecutive valid tasks 0

'Max tasks per day' and 'consecutive valid tasks' together imply that your machine produced a considerable number of invalid tasks at some point: no shame in that, we all did the same thing when the cuda65 licence expired, but it shows the sort of inferences you can draw.

Hm, I do not understand how that conclusion can be drawn. The number of tasks is anyway limited to two per day by the GPUGRID server. The GTX 770 mostly got long runs, so it rarely can complete more than one per day. Moreover, I virtually checked the machine and its output on a daily basis during the entire year 2016 and early 2017. Rarely have I seen an invalid task, and when it happened, I caused it by accidentally updating the system inlcuding NVIDIA drivers during full DC operation.
I must confess, though, that around the time when GPUGRID stopped sending tasks to my system, I had not checked regularly for probably a few weeks.

IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...


I'm not a Linux user, but I have read comments that Linux GPU drivers tend to be compiled against a specific Linux kernel. If your kernel also auto-updates, you may need to take precautions to ensure that your kernel and driver updates are kept in sync.

See, that is exactly why I am hesitant to manually install a more recent NVIDIA driver: When you use the console to update the whole system, everything is coordinately (!) brought to the most recent state. A new kernel plus the corresponding and tested GPU driver will be delivered.

For now, I will just wait and see whether GPUGRID will again retrieve WUs for my GTX 770 after a future system update with even more recent drivers than the ones I currently have in use. Until then, other DC projects will be supported.

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47345 - Posted: 30 May 2017 | 20:04:07 UTC - in response to Message 47342.
Last modified: 30 May 2017 | 20:05:15 UTC

IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...
This came to my mind too. Perhaps you should try to force the BOINC manager to request a new host ID for your host. You can do it by stopping the BOINC manager, editing the client_state.xml, searching for <hostid>342877</hostid>, and replace the number to the number of a previous host of yours (or a random number, if you don't have an older host), saving the client_state.xml, and restaring the BOINC manager.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47346 - Posted: 31 May 2017 | 8:27:39 UTC

Thanks for this suggestion.

A second idea of mine was that the client indicate a GPU memory of 1998 MB instead of the expected 2048 MB.
What is the minimum V-RAM which GPUGRID requires to send tasks, is this value stored somewhere in the BOINC system files and can it be modified without causing trouble?

Michael.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,925,538,316
RAC: 18,542,176
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47347 - Posted: 31 May 2017 | 9:24:24 UTC - in response to Message 47346.

The information sent out to our clients (on Windows, at least) suggests that even the current cuda80 applications require 512 MB of GPU RAM - which sounds too small, and is probably a left-over from earlier, simpler days.

The effective limit will be stored in the plan class specification on the project server, and is not visible to volunteers like us. Because the problem is that work is simply not being sent to your computer, there's nothing useful you can do locally to specify how your BOINC client should run the tasks you haven't got. [I did look at the <coproc> option in cc_config.xml, but that doesn't allow you to modify the declared memory size]

The only other possibility that comes to mind is to leverage the Anonymous platform mechanism: re-define the project's own application in an app_info.xml file. That bypasses the plan_class checking at the server, but it's tricky and time-consuming to get right, and you probably won't think it's worth the effort.

Profile Michael H.W. Weber
Send message
Joined: 9 Feb 16
Posts: 71
Credit: 607,916,391
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47351 - Posted: 1 Jun 2017 | 9:26:42 UTC

Ok, so today Ubuntu has released an NVIDIA driver update to version 375.66.
The problem of not getting WUs still persists.

Michael.

P.S.: To my knowledge, GPUs with V-RAM equal or below 1 GB won't get any work since quite a while.
____________
President of Rechenkraft.net - Germany's first and largest distributed computing organization.

liderbug
Send message
Joined: 29 Jul 16
Posts: 24
Credit: 80,719,054
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 47734 - Posted: 3 Aug 2017 | 14:11:58 UTC

I think the bottom line is: Gpugrid changed servers - we'll be right back. When they did that something changed. I was averaging aprox. 500,000 credits per day pre change - now: "... got 0 tasks ...".

From what I can see Gpugrid is a small group of Phd bodies who's focus is the critical thinking about how "all-atom biomolecular simulations" work and not how the work is accomplished. First you open the box and take the hardware out, set it on the bench, take the powercord and ... "But my simulation formula?".

Dear Management, There are hundreds of 'us' out here who live for unpacking the box, writing the code, designing a database. biomolecWHO? Not our thing and you can take/have 100% of the credit. I'm(we're) happy as I fall asleep thinking "they used my code (and I'm the only one who will ever know). <smile> zzzzzzz....

liderbug
Send message
Joined: 29 Jul 16
Posts: 24
Credit: 80,719,054
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 47735 - Posted: 3 Aug 2017 | 16:20:46 UTC - in response to Message 47351.
Last modified: 3 Aug 2017 | 16:21:57 UTC

Michael H.W. Weber:
Fedora-26, 20gb mem, GTX770 w 4gb ram, Cuda 8.0, Nivdia 384.59 = "got 0 tasks".

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47740 - Posted: 4 Aug 2017 | 21:41:19 UTC - in response to Message 47735.

If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2346
Credit: 16,293,065,968
RAC: 7,113,235
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47741 - Posted: 4 Aug 2017 | 22:11:42 UTC - in response to Message 47740.

If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0
This solution works only for Windows hosts, as there's no CUDA6.5 client present for Linux (see the applications page)

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47743 - Posted: 5 Aug 2017 | 7:16:05 UTC - in response to Message 47741.

If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0
This solution works only for Windows hosts, as there's no CUDA6.5 client present for Linux (see the applications page)


Aah, sorry.

Post to thread

Message boards : Number crunching : GTX 770 won't get work

//