New CUDA65 beta app

Message boards : News : New CUDA65 beta app

Author	Message
MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38148 - Posted: 29 Sep 2014 \| 9:49:51 UTC
	Dear all, please give the new acemdbeta app, ver 845, a work out. This supports all GPUs now. It's Windows only - if you don't get WUs, you'll need to update your driver. Matt
	ID: 38148 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38150 - Posted: 29 Sep 2014 \| 10:02:06 UTC Last modified: 29 Sep 2014 \| 10:10:54 UTC
	Matt, is the 343.98 Driver accepted? I've been trying to get Beta tasks. 14/09/29 06:12:36 \| GPUGRID \| No tasks are available for ACEMD beta version I have correct configure-- /run testing app/Beta app checked, not accepting other short or long. I never update to WHQL drivers, from being limited for certain functional areas, unlike Betas or Developer Driver.
	ID: 38150 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38152 - Posted: 29 Sep 2014 \| 10:23:10 UTC - in response to Message 38150. Last modified: 29 Sep 2014 \| 10:26:35 UTC
	Huh, yes. You should be getting something... According to the logs your host #159309 got given work at 12:15 CEST.
	ID: 38152 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38153 - Posted: 29 Sep 2014 \| 10:49:17 UTC - in response to Message 38152.
	14/09/29 06:48:02 \| GPUGRID \| No tasks are available for ACEMD beta version Strange, I see no Beta tasks running on Boinc Manager. I just tried again. If driver is accepted, I will continue to try. Thanks for the help. 14/09/29 06:50:55 \| GPUGRID \| No tasks are available for ACEMD beta version
	ID: 38153 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38154 - Posted: 29 Sep 2014 \| 10:52:33 UTC - in response to Message 38153.
	Now you should get something..
	ID: 38154 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38157 - Posted: 29 Sep 2014 \| 11:03:29 UTC - in response to Message 38156. Last modified: 29 Sep 2014 \| 11:33:21 UTC
	I did, indeed. Update: unknown error) - exit code -97 (0xffffff9f)after 8s The simulation has become unstable. Terminating to avoid lock-up (1)(this first time I've had this during my time at GPUGRID. GPU1 Temp was 58C. If you don't mind errors, I will try again. Update2: same error. GPU usage go's to 90% for seconds, after GPU usage to 0% then crashes.
	ID: 38157 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 468 Credit: 8,515,572,716 RAC: 11,566,841 Level Scientific publications	Message 38158 - Posted: 29 Sep 2014 \| 11:14:37 UTC
	On the first test unit, I got an error. 9/29/2014 7:13:41 AM \| GPUGRID \| Computation for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 finished 9/29/2014 7:13:41 AM \| GPUGRID \| Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_1 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent 9/29/2014 7:13:41 AM \| GPUGRID \| Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_2 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent 9/29/2014 7:13:41 AM \| GPUGRID \| Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_3 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent Name 21-MJHARVEY_TEST4000-0-10-RND0794_0 Workunit 10123268 Created 29 Sep 2014 \| 9:50:11 UTC Sent 29 Sep 2014 \| 11:10:35 UTC Received 29 Sep 2014 \| 11:13:11 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 127986 Report deadline 4 Oct 2014 \| 11:10:35 UTC Run time 4.10 CPU time 3.48 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.45 (cuda65) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:07:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU 0 : 67C # GPU 1 : 42C # GPU 2 : 69C # GPU 3 : 70C # The simulation has become unstable. Terminating to avoid lock-up (1) </stderr_txt> ]]>
	ID: 38158 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38163 - Posted: 29 Sep 2014 \| 12:46:15 UTC Last modified: 29 Sep 2014 \| 13:30:12 UTC
	Update#3 I've received 5 Beta tasks- all have failed, but two caused a system hang ( no error files). FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1965/ Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2) Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2) Update#4 Still failing on both cards with same error-- (unknown error) - exit code -97 (0xffffff9f) [url] http://www.gpugrid.net/workunit.php?wuid=10099983 [/url] This work unit has 3 Linux failures (all with GTX 780) and 2 Win8.1 failures. Update#5 received 5 more beta for total of ten-- all failed with same error number. All Tasks have started fine (90+GPUusage/14%MCU) with progress .016 intervals, before failing. Wingman with Tesla K20c/GTX780 (c.c3.5, along C.C3.0 wingman, failed also.
	ID: 38163 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38164 - Posted: 29 Sep 2014 \| 12:50:59 UTC - in response to Message 38163.
	Yes, looks like CUDA65 is bad on everything but GM204s. Ho hum.
	ID: 38164 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38166 - Posted: 29 Sep 2014 \| 13:18:43 UTC Last modified: 29 Sep 2014 \| 13:19:08 UTC
	-97 error here, on my GTX 460 # Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2) ========================= http://www.gpugrid.net/result.php?resultid=13149151 Name 43-MJHARVEY_TEST1999-1-10-RND5744_2 Workunit 10123176 Created 29 Sep 2014 \| 11:40:28 UTC Sent 29 Sep 2014 \| 12:54:10 UTC Received 29 Sep 2014 \| 13:17:01 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 153764 Report deadline 4 Oct 2014 \| 12:54:10 UTC Run time 2.56 CPU time 0.00 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.45 (cuda65) Stderr output <core_client_version>7.4.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 460] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 460 # ECC : Disabled # Global mem : 1024MB # Capability : 2.1 # PCI ID : 0000:07:00.0 # Device clock : 1526MHz # Memory clock : 1900MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2) </stderr_txt> ]]>
	ID: 38166 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38169 - Posted: 29 Sep 2014 \| 13:30:10 UTC
	Yikes, I'm seeing these same errors on the Short Run queue -- I guess the Cuda65 app has been deployed there too?
	ID: 38169 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38171 - Posted: 29 Sep 2014 \| 13:49:11 UTC - in response to Message 38169.
	It was on acemdshort briefly. It is no longer. Matt
	ID: 38171 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38172 - Posted: 29 Sep 2014 \| 14:10:59 UTC
	14/09/29 09:48:39 \| GPUGRID \| No tasks are available for ACEMD beta version Has beta app been pulled for non-C.C 5.2 cards?
	ID: 38172 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38173 - Posted: 29 Sep 2014 \| 14:39:10 UTC - in response to Message 38172.
	Has beta app been pulled for non-C.C 5.2 cards? Yes, it's served its purpose there. The CUDA65 build is broken on non-5.2 Matt
	ID: 38173 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38175 - Posted: 29 Sep 2014 \| 17:37:26 UTC
	846 on acemdbeta now. CUDA65 for sm 3.0 and higher. Matt
	ID: 38175 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38176 - Posted: 29 Sep 2014 \| 18:20:38 UTC
	My 2 GTX 660 Tis, and my GTX 460, in my main rig, are now successfully simultaneously crunching 3 ACEMD beta version 8.46 (cuda65) tasks. Thank you!
	ID: 38176 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38177 - Posted: 29 Sep 2014 \| 18:27:58 UTC - in response to Message 38175.
	So far, so good. .004% progress intervals-- 1.000% in four minutes. 24,000s est. time to complete.
	ID: 38177 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38178 - Posted: 29 Sep 2014 \| 18:32:29 UTC
	Matt: I even think the canary behavior works better for me now. I tried the scenario where it was failing on the 8.41 app, and now it worked fine without failure on the 8.46 beta app. Can you please explain, in detail, how the canary behavior was changed? How exactly does behave in 8.46? Thanks, Jacob
	ID: 38178 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 38179 - Posted: 29 Sep 2014 \| 18:46:49 UTC
	Running fine after 25 minutes on a GTX 650 Ti. It will complete in 3 hours 16 minutes (344.11 driver, Win7 64-bit).
	ID: 38179 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 38180 - Posted: 29 Sep 2014 \| 22:17:47 UTC Last modified: 29 Sep 2014 \| 22:19:25 UTC
	It completed OK on the GTX 650 Ti, but seems to be causing problems on some higher-end cards. But their versions of ACEMD probably have more changes than the one I got (8.46). http://www.gpugrid.net/workunit.php?wuid=10123336 I will be trying my GTX 660 Ti next on the same machine to see what happens.
	ID: 38180 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 6,772,414,375 RAC: 5,454,422 Level Scientific publications	Message 38181 - Posted: 29 Sep 2014 \| 22:21:54 UTC
	All WU's completed & validated thus far on my GTX980 with beta app versions 8.44, 8.45 and 8.46. I'm running windows 8.1 and nvidia drivers v. 344.16. http://www.gpugrid.net/results.php?hostid=142719
	ID: 38181 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 468 Credit: 8,515,572,716 RAC: 11,566,841 Level Scientific publications	Message 38182 - Posted: 29 Sep 2014 \| 23:05:55 UTC
	All the beta units are finishing valid. Though, the output files are rather large, 44 Megabytes. 52-MJHARVEY_TEST4000-0-10-RND4601_3 Workunit 10123299 Created 29 Sep 2014 \| 12:46:43 UTC Sent 29 Sep 2014 \| 19:41:27 UTC Received 29 Sep 2014 \| 22:41:52 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 127986 Report deadline 4 Oct 2014 \| 19:41:27 UTC Run time 6,363.92 CPU time 6,033.14 Validate state Valid Credit 1,500.00 Application version ACEMD beta version v8.46 (cuda65) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:04:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU 0 : 63C # GPU 1 : 73C # GPU 2 : 74C # GPU 3 : 74C # GPU 0 : 64C # GPU 0 : 65C # GPU 0 : 66C # GPU 0 : 67C # GPU 0 : 68C # GPU 0 : 69C # GPU 0 : 70C # GPU 0 : 71C # Time per step (avg over 2500000 steps): 2.549 ms # Approximate elapsed time for entire WU: 6371.977 s # PERFORMANCE: 23558 Natoms 2.549 ns/day 0.000 ms/step 0.000 us/step/atom 18:24:40 (5228): called boinc_finish </stderr_txt> ]]>
	ID: 38182 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38183 - Posted: 29 Sep 2014 \| 23:25:59 UTC
	What's the meaning of ns/day performance? Number is same as time (ms) per step. 23558 Natoms 4.726 ns/day-GTX650Ti 23558 Natoms 2.549 ns/day-GTX690 23558 Natoms 1.633 ns/day-GTX980
	ID: 38183 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38184 - Posted: 30 Sep 2014 \| 8:31:12 UTC - in response to Message 38180.
	It completed OK on the GTX 650 Ti, but seems to be causing problems on some higher-end cards. But their versions of ACEMD probably have more changes than the one I got (8.46). http://www.gpugrid.net/workunit.php?wuid=10123336 I will be trying my GTX 660 Ti next on the same machine to see what happens. I think the errors on the higher-end cards where caused by to old drivers Jim. I had a lot errors on my 780Ti's yesterday, but when I updated to the latest driver, they run smooth as usual again. The beta did okay on my 660, so your 660Ti will do great as well. ____________ Greetings from TJ
	ID: 38184 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 38186 - Posted: 30 Sep 2014 \| 10:28:53 UTC - in response to Message 38184.
	TJ, Thanks, that is probably it. My GTX 660 Ti did finish fine; I will be trying a couple of GTX 750 Ti's now just for fun.
	ID: 38186 \| Rating: 0 \| rate: / Reply Quote

Matt Send message Joined: 11 Jan 13 Posts: 216 Credit: 846,538,252 RAC: 0 Level Scientific publications	Message 38208 - Posted: 1 Oct 2014 \| 1:46:11 UTC Last modified: 1 Oct 2014 \| 1:49:43 UTC
	Just enabled Test Apps for my GTX 680 and GTX 780Ti cards. I'll check back in a while to see how they're doing. Edit: I saw that TJ recommended updating to the latest drivers. Is this the latest Beta or WHQL driver? I'm currently running 344.11. Thanks.
	ID: 38208 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 38210 - Posted: 1 Oct 2014 \| 2:03:50 UTC - in response to Message 38208.
	Edit: I saw that TJ recommended updating to the latest drivers. Is this the latest Beta or WHQL driver? I'm currently running 344.11. Thanks. 344.11 works fine on my GTX 650 Ti and 660 Ti on the test apps. I am running it on my GTX 750 Ti also, but haven't picked up the new apps yet
	ID: 38210 \| Rating: 0 \| rate: / Reply Quote

Matt Send message Joined: 11 Jan 13 Posts: 216 Credit: 846,538,252 RAC: 0 Level Scientific publications	Message 38213 - Posted: 1 Oct 2014 \| 2:59:31 UTC
	Thanks, Jim1348. I'll stick with 344.11 for now, then.
	ID: 38213 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38214 - Posted: 1 Oct 2014 \| 3:45:25 UTC - in response to Message 38213.
	Thanks, Jim1348. I'll stick with 344.11 for now, then. What other options are there? :)
	ID: 38214 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38216 - Posted: 1 Oct 2014 \| 7:23:02 UTC - in response to Message 38208.
	Just enabled Test Apps for my GTX 680 and GTX 780Ti cards. I'll check back in a while to see how they're doing. Edit: I saw that TJ recommended updating to the latest drivers. Is this the latest Beta or WHQL driver? I'm currently running 344.11. Thanks. Hello Matt, yes I am running 344.11 the latest WHQL driver. But to be clear it was recommended by Matt from the project. The older driver I was using, was a bit faster on Win7 as the WDDM was introduced with Vista and can not be switched of, but that is besides the scope of this thread. ____________ Greetings from TJ
	ID: 38216 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38217 - Posted: 1 Oct 2014 \| 8:46:10 UTC - in response to Message 38216. Last modified: 1 Oct 2014 \| 8:47:07 UTC
	If I've got things right, the 65 apps shouldn't be sent any driver older than 343.00. The exception to that will be the Linux app, when that finally exists. That will give the WU out to any client that reports CUDA 6.5 capability, as only our patched client reports the driver version. Matt
	ID: 38217 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38219 - Posted: 1 Oct 2014 \| 8:58:34 UTC - in response to Message 38217. Last modified: 1 Oct 2014 \| 9:00:13 UTC
	If I've got things right, the 65 apps shouldn't be sent any driver older than 343.00. The exception to that will be the Linux app, when that finally exists. That will give the WU out to any client that reports CUDA 6.5 capability, as only our patched client reports the driver version. Matt Well Matt with driver 331 on my 780Ti's win7 where a bit faster but then I got cuda65 tasks and errored out. With your advice I updated the driver and no more errors (yesterday one, but that was another reason). But if you have made changes yesterday or today, then you are probably right. ____________ Greetings from TJ
	ID: 38219 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 286 Level Scientific publications	Message 38220 - Posted: 1 Oct 2014 \| 8:58:46 UTC
	I think it's safe to promote the CUDA6.5 application to the long queue.
	ID: 38220 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38221 - Posted: 1 Oct 2014 \| 8:59:36 UTC - in response to Message 38220.
	Not just yet...
	ID: 38221 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 6,772,414,375 RAC: 5,454,422 Level Scientific publications	Message 38222 - Posted: 1 Oct 2014 \| 9:28:39 UTC - in response to Message 38217.
	If I've got things right, the 65 apps shouldn't be sent any driver older than 343.00. The exception to that will be the Linux app, when that finally exists. That will give the WU out to any client that reports CUDA 6.5 capability, as only our patched client reports the driver version. Matt boinc 7.4.22 (development version) now reports the driver version: Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu CUDA: NVIDIA GPU 0: GeForce GTX 780 Ti (driver version 343.22, CUDA version 6.5, compute capability 3.5, 3072MB, 2814MB available, 5345 GFLOPS peak) Shows up here too: http://www.gpugrid.net/show_host_detail.php?hostid=183991
	ID: 38222 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38224 - Posted: 1 Oct 2014 \| 10:18:09 UTC Last modified: 1 Oct 2014 \| 10:19:40 UTC
	MJH: I've been processing Beta tasks, and although nearly all are successful for me on the 8.46 app, I did have a failure last night. This is on a completely-stable Windows 8.1 Update 1 x64 machine, on one of my GTX 660 Ti GPUs, using 344.11 driver. Any ideas? http://www.gpugrid.net/result.php?resultid=13154266 Name 79-MJHARVEY_TEST4001-2-10-RND8149_0 Workunit 10126844 Created 30 Sep 2014 \| 18:55:59 UTC Sent 1 Oct 2014 \| 3:25:13 UTC Received 1 Oct 2014 \| 4:35:24 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 153764 Report deadline 6 Oct 2014 \| 3:25:13 UTC Run time 1,431.05 CPU time 384.63 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.46 (cuda65) Stderr output <core_client_version>7.4.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:08:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r343_98 : 34411 # GPU 0 : 69C # GPU 1 : 64C # GPU 2 : 69C # GPU 1 : 65C # GPU 1 : 66C # GPU 1 : 67C # GPU 0 : 70C # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 5505000) # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:08:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r343_98 : 34411 # The simulation has become unstable. Terminating to avoid lock-up (1) </stderr_txt> ]]>
	ID: 38224 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38225 - Posted: 1 Oct 2014 \| 11:05:22 UTC Last modified: 1 Oct 2014 \| 11:06:43 UTC
	Question: would a 4.2CUDA long task running on one card slow down CUDA 6.5 short or Beta tasks running on other or vise versa? I just noticed a CUDA 4.2 Noelia Long task running, with 6.5 Beta and Short. Runtime for Long task is more than normal. It takes ~40Hr to complete, but at ~40Hr the 4.2 task is 80%.
	ID: 38225 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38226 - Posted: 1 Oct 2014 \| 14:55:20 UTC - in response to Message 38225.
	Maybe, if the processes are competing for CPU. Matt
	ID: 38226 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38227 - Posted: 1 Oct 2014 \| 15:09:47 UTC Last modified: 1 Oct 2014 \| 15:10:05 UTC
	Any idea why my task failed, 3 posts up?
	ID: 38227 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38228 - Posted: 1 Oct 2014 \| 16:12:36 UTC - in response to Message 38227. Last modified: 1 Oct 2014 \| 16:13:52 UTC
	Any idea why my task failed, 3 posts up? Have you checked event viewer to locate any occurrences at the time task failed? Any kernel failures ? Or database instances? If you have automatic windows updates or auto Maintenance enabled- this can trigger random failures for other processes. (or sometimes fault any heavy usage process) Also, a security "audit" can trigger background task (GPUGRID) failures.
	ID: 38228 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38229 - Posted: 1 Oct 2014 \| 16:28:57 UTC - in response to Message 38228.
	Thanks, but it was a couple "simulation became unstable" errors, which I believe to be a problem with the GPUGrid application.
	ID: 38229 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 286 Level Scientific publications	Message 38234 - Posted: 1 Oct 2014 \| 17:08:17 UTC - in response to Message 38221.
	I think it's safe to promote the CUDA6.5 application to the long queue. Not just yet... Are we waiting for your GTX980 to arrive?
	ID: 38234 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 6,772,414,375 RAC: 5,454,422 Level Scientific publications	Message 38248 - Posted: 2 Oct 2014 \| 8:59:52 UTC - in response to Message 38234.
	I think it's safe to promote the CUDA6.5 application to the long queue. Not just yet... Are we waiting for your GTX980 to arrive? It's probably due to me overclocking my GTX980. I had several tasks fail while I was at work yesterday. Since I clocked back, I've had 4 short run tasks complete successfully. My apologies for messing up the beta test.
	ID: 38248 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38267 - Posted: 3 Oct 2014 \| 1:47:09 UTC
	I had another failure, where simulation became unstable on a 8.46 Cuda 6.5 beta task. http://www.gpugrid.net/result.php?resultid=13161365 I am not entirely convinced that the error is the fault of the task or the application. Perhaps the new 344.11 drivers push the GPUs even harder than previous drivers. I will do additional testing, with Heaven, to attempt to confirm. Thanks, Jacob Name 30-MJHARVEY_TEST1999-5-10-RND7983_0 Workunit 10132352 Created 2 Oct 2014 \| 16:49:51 UTC Sent 2 Oct 2014 \| 21:36:23 UTC Received 2 Oct 2014 \| 22:20:50 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 153764 Report deadline 7 Oct 2014 \| 21:36:23 UTC Run time 386.44 CPU time 100.75 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.46 (cuda65) Stderr output <core_client_version>7.4.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:08:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r343_98 : 34411 # GPU 0 : 69C # GPU 1 : 64C # GPU 2 : 70C # GPU 1 : 65C # GPU 1 : 66C # GPU 1 : 67C # BOINC suspending at user request (exit) # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:08:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r343_98 : 34411 # GPU 0 : 62C # GPU 1 : 58C # GPU 2 : 54C # GPU 0 : 63C # GPU 1 : 59C # GPU 2 : 55C # GPU 0 : 64C # GPU 1 : 60C # GPU 2 : 56C # GPU 0 : 65C # GPU 2 : 57C # GPU 0 : 66C # GPU 1 : 61C # GPU 2 : 58C # GPU 2 : 59C # GPU 0 : 67C # GPU 1 : 62C # GPU 2 : 60C # GPU 2 : 61C # GPU 0 : 68C # GPU 1 : 63C # GPU 2 : 62C # GPU 2 : 63C # GPU 0 : 69C # GPU 1 : 64C # GPU 2 : 64C # GPU 2 : 65C # GPU 0 : 70C # GPU 0 : 71C # GPU 1 : 65C # GPU 2 : 66C # GPU 1 : 66C # GPU 2 : 67C # GPU 0 : 72C # GPU 1 : 67C # GPU 2 : 68C # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 12630000) # GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 2 : # Name : GeForce GTX 660 Ti # ECC : Disabled # Global mem : 3072MB # Capability : 3.0 # PCI ID : 0000:08:00.0 # Device clock : 1045MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r343_98 : 34411 # The simulation has become unstable. Terminating to avoid lock-up (1) </stderr_txt> ]]>
	ID: 38267 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38272 - Posted: 3 Oct 2014 \| 8:43:55 UTC - in response to Message 38267.
	I don't think the 344.11 driver is pushing the cards harder as with this driver my 780Ti's are around 700 seconds slower than with the 331 driver I used until 309 September when I was forced to update as I got errors with the new app. See below in this thread. ____________ Greetings from TJ
	ID: 38272 \| Rating: 0 \| rate: / Reply Quote

Rion Family Send message Joined: 13 Jan 14 Posts: 21 Credit: 15,404,426,517 RAC: 0 Level Scientific publications	Message 38278 - Posted: 3 Oct 2014 \| 15:08:16 UTC
	Hello - I have noticed that my dual gtx 780 machine has been getting mostly beta tasks lately - only 2 short runs and no long runs over the past few days ? I even set my prefs to no beta and no other apps but still pulling only beta tasks? My other 3 systems on the account - gtx 770 & 660 do not show any beta tasks? Just curious
	ID: 38278 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38279 - Posted: 3 Oct 2014 \| 15:44:46 UTC
	I too seem to only be getting Beta, even though I've re-enabled all applications. Is the scheduler prioritizing the Beta application somehow?
	ID: 38279 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38281 - Posted: 3 Oct 2014 \| 16:12:58 UTC - in response to Message 38272.
	I don't think the 344.11 driver is pushing the cards harder as with this driver my 780Ti's are around 700 seconds slower than with the 331 driver I used until 309 September when I was forced to update as I got errors with the new app. See below in this thread. With new technologies being added to 343 branch driver for Second Generation Maxwell GM204: Dynamic Super Resolution, Third Generation Delta Color Compression, Multi-Pixel Programming Sampling, NVidia VXGI (Real-Time-Voxel-Global Illumination), VR Direct, Multi-Projection Acceleration, and Multi-Frame Sampled Anti-Aliasing(MFAA) with support for CSAA removed. HDMI 2.0 support was also added. I'd say this driver branch is not fully complete yet. A couple more releases should find driver's full potential. Considering how support for pre-Fermi cards were dropped, and amount differences between SM/SMX/SMM, these first 343 branch drivers have room to be refined.
	ID: 38281 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38310 - Posted: 5 Oct 2014 \| 16:13:08 UTC - in response to Message 38178.
	Matt: I even think the canary behavior works better for me now. I tried the scenario where it was failing on the 8.41 app, and now it worked fine without failure on the 8.46 beta app. Can you please explain, in detail, how the canary behavior was changed? How exactly does behave in 8.46? Thanks, Jacob Any answer on this?
	ID: 38310 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38311 - Posted: 5 Oct 2014 \| 17:40:18 UTC - in response to Message 38310.
	Can you please explain, in detail, how the canary behavior was changed? How exactly does behave in 8.46? It doesn't. I've disabled it altogether. I'm counting on the newer drivers to do a better job at recovering from deadlocks. Matt
	ID: 38311 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38313 - Posted: 5 Oct 2014 \| 18:01:12 UTC - in response to Message 38311.
	Sound good to me. If you ever decide to re-add it, or modify its functionality, please be sure to let us know. Thanks, Jacob
	ID: 38313 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 38323 - Posted: 6 Oct 2014 \| 15:40:56 UTC - in response to Message 38311.
	Can you please explain, in detail, how the canary behavior was changed? How exactly does behave in 8.46? It doesn't. I've disabled it altogether. I'm counting on the newer drivers to do a better job at recovering from deadlocks. Matt Thanks much for disabling that feature, I've lost a lot of WUs to it :-)
	ID: 38323 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : News : New CUDA65 beta app

	About	Science	Volunteers	Performance	Forum	Join us	Donate