Error invoked kernel

Message boards : Number crunching : Error invoked kernel

Author	Message
_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54377 - Posted: 19 Apr 2020 \| 15:35:20 UTC
	# Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? ____________
	ID: 54377 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54378 - Posted: 19 Apr 2020 \| 18:33:03 UTC - in response to Message 54377. Last modified: 19 Apr 2020 \| 19:18:08 UTC
	# Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip. NaN on Wikipedia
	ID: 54378 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54383 - Posted: 20 Apr 2020 \| 8:38:42 UTC
	Thank you, I will lookup und let run the cards still in standard frequency. _heinz ____________
	ID: 54383 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54411 - Posted: 22 Apr 2020 \| 10:54:43 UTC
	I would try a app_config.xml <app_config> <app> <name>acemd3</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> but BOINC says: 22.04.2020 12:31:02 \| GPUGRID \| Your app_config.xml file refers to an unknown application 'acemd3'. Known applications: None Can someone tell me the right name please. ____________
	ID: 54411 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54422 - Posted: 22 Apr 2020 \| 14:38:53 UTC
	admin pleasedelete the multiple messages
	ID: 54422 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54427 - Posted: 23 Apr 2020 \| 1:11:05 UTC - in response to Message 54378. Last modified: 23 Apr 2020 \| 1:13:09 UTC
	Retvari Zoltan said: # Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip. NaN on Wikipedia I don't think limiting your GPU usage will solve your errors. it's not a matter of percentage, but a matter of frequency that is causing tasks to fail when the wrapper starts the GPU. Some appear to be easier to crash than others. GPUGRID WUs are the most sensitive tasks I've seen to processor overclocking errors and I had to slow my GTX 1060 down when I came here even though it ran games and other BOINC projects OK. My errors were hit and miss like yours only not as many. they usually occurred at ~30 sec. Your base clock speed is 1000MHz per [url]https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications [/url]
	ID: 54427 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 180,567 Level Scientific publications	Message 54431 - Posted: 24 Apr 2020 \| 14:43:05 UTC - in response to Message 54411.
	I would try a app_config.xml <app_config> <app> <name>acemd3</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> but BOINC says: 22.04.2020 12:31:02 \| GPUGRID \| Your app_config.xml file refers to an unknown application 'acemd3'. Known applications: None Can someone tell me the right name please. Yours looks exactly like mine except I only run one GG WU per GPU. BOINC always says that when you don't have a acemd3 WU downloaded. Wow! Eight duplicates. My record was three. No worries, it happens to us all and I don't know why.
	ID: 54431 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,880,156,640 RAC: 20,004,259 Level Scientific publications	Message 54432 - Posted: 24 Apr 2020 \| 18:09:37 UTC
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs.
	ID: 54432 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54433 - Posted: 24 Apr 2020 \| 20:11:24 UTC - in response to Message 54432.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. I concur. Not all NaN errors are the result of a misbehaving card. Sometimes the task is just badly formatted.
	ID: 54433 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54434 - Posted: 24 Apr 2020 \| 20:42:20 UTC - in response to Message 54433.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. I concur. Not all NaN errors are the result of a misbehaving card. Sometimes the task is just badly formatted. This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Perhaps the previous host made an error, which resulted in a permanent NaN error on all hosts.
	ID: 54434 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54441 - Posted: 25 Apr 2020 \| 17:22:51 UTC - in response to Message 54432.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. Hi Richard Haselgrove, Your 1660-S is not overclocked; Correct? It looks like we'll have to wait until that WU reaches the Apr 29 deadline on iBat's machine (after viewing it's task status), before seeing if it crashes again. I've been getting more tasks lately which have crashed on 1 or 2 other hosts before being sent to mine. I noticed several error prone machines were Science United and a few were grcpool hosts. Fascinating.
	ID: 54441 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54442 - Posted: 25 Apr 2020 \| 17:43:59 UTC - in response to Message 54434.
	This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Zoltan, I think you meant to write 8th of 10, as the first one is always named 0-10. Or am I confused? 🤔
	ID: 54442 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54444 - Posted: 25 Apr 2020 \| 18:08:08 UTC - in response to Message 54442.
	This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Zoltan, I think you meant to write 8th of 10, as the first one is always named 0-10. Or am I confused? 🤔 You're right, it's the 8th. Probably the host doing the 7th piece made an error. (that's what I should post to correctly include the number 7 in my post)
	ID: 54444 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,880,156,640 RAC: 20,004,259 Level Scientific publications	Message 54447 - Posted: 26 Apr 2020 \| 7:46:05 UTC - in response to Message 54441.
	Hi Richard Haselgrove, Your 1660-S is not overclocked; Correct? Correct. I gave that machine a complete motherboard/CPU/RAM transplant at the end of January, and fitted two brand-new, identical, 1600-S GPUs. It's in a high airflow case with a Corsair modular power supply. I can do basic hardware work on computers, but I'm not a hardware specialist, so I bought the motherboard bundle pre-assembled and tested from a local trade supplier, with CPU cooler ready attached. It ran on SETI until that project stopped sending out new work (bad timing on my part!), and started working here at the beginning of April. Application details Tasks I think 4 errors, with 1167 completed tasks, indicates the machine is basically healthy. Two of the other errors reached the full 8 failures on all machines that attempted them, and one seems to have been a ghost that I never received.
	ID: 54447 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 755,434,080 RAC: 186,180 Level Scientific publications	Message 54455 - Posted: 27 Apr 2020 \| 17:10:39 UTC - in response to Message 54422.
	admin pleasedelete the multiple messages If you try soon enough, you should be able to do part of the work by editing all but one of them down to just one character. Making most of them the same single character is likely to trigger an automatic process for hiding duplicate messages.
	ID: 54455 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54491 - Posted: 29 Apr 2020 \| 3:18:46 UTC - in response to Message 54447.
	Richard, you have less errors than I do, I think. Looking at how long this WU runs before throwing an error, It probably is a bad egg. I see all the other defective ones I've gotten last around 10-20 seconds before bombing. The errors I saw from instability were a bit later, around 30 seconds. I figure that it takes about that long for the wrapper to get its task running on the GPU. (That's also the approximate timing of getting a mismatched GPU error on restarts, the cause of most of my errors.) I do recall that recently I showed 10 errors and 5 of them were bad WUs. I'll lay my nickel on WU 19441088 being another dud task.💣
	ID: 54491 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54501 - Posted: 30 Apr 2020 \| 17:43:23 UTC - in response to Message 54491.
	I do recall that recently I showed 10 errors and 5 of them were bad WUs. I'll lay my nickel on WU 19441088 being another dud task.💣 ...And sure enough, it lasted no longer than 16 seconds before it choked on everybody's hosts. Heinz is getting errors at later stages of the tasks than we are experiencing them when running bad WUs. https://www.gpugrid.net/results.php?hostid=159065 I have had errors before that were caused by running short of memory, although I see that is not a problem in Heinz's case. I had 7 Rosetta threads and two GPUGRID wrappers running in 8 GB of ram with 8182MB swapfile. Every time a Rosetta COVID task would suddenly hog memory, one of the wrappers would give a message that an output file could not be found (can't remember which) and throw an error. I've since increased to 12GB of ram and solved that issue. I had a PSU failure on my fast host today (a recycled 600W cheapo from the days of molex connectors) and it makes me wonder if Heinz might have power issues with his 3 GTX Titans in one host. Just a thought, but if they're clocked higher than factory specs IMHO that is the first thing to suspect. 🤔
	ID: 54501 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54502 - Posted: 30 Apr 2020 \| 19:35:00 UTC - in response to Message 54447.
	It ran on SETI until that project stopped sending out new work (bad timing on my part!) Ha ha LOL. I did the same thing. Rebuilt completely/upgraded the 3900X host for Seti and put it back online a few days before Seti pulled the plug. Now it just sits there, idle, looking pretty.
	ID: 54502 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54503 - Posted: 30 Apr 2020 \| 20:37:48 UTC - in response to Message 54501.
	I have had errors before that were caused by running short of memory, although I see that is not a problem in Heinz's case. I had 7 Rosetta threads and two GPUGRID wrappers running in 8 GB of ram with 8182MB swapfile. Every time a Rosetta COVID task would suddenly hog memory, one of the wrappers would give a message that an output file could not be found (can't remember which) and throw an error... ...Which made me curious what that particular host is running on the CPU. I see that _heinz has recently switched to running World Community Grid- https://boinc.netsoft-online.com/e107_plugins/boinc/get_user.php?cpid=5e024335320e436c4d050e073963e326 Does anyone here know how much memory those tasks use? I found that LHC@home tasks were too memory hungry to run at 2GB of ram per thread. That might be an issue here.
	ID: 54503 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54504 - Posted: 30 Apr 2020 \| 21:45:10 UTC - in response to Message 54503.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py
	ID: 54504 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54505 - Posted: 30 Apr 2020 \| 22:58:22 UTC - in response to Message 54504. Last modified: 30 Apr 2020 \| 22:59:09 UTC
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py I wondered how did they get their data. I realized the answer when I browsed to the root of this site: http://wuprop.boinc-af.org/ This is actually a BOINC project collecting data about the apps of other BOINC projects as you run it with your other projects. Nice! It's 10 years old, and I can't recall I heard about it. I'm shocked.
	ID: 54505 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 180,567 Level Scientific publications	Message 54508 - Posted: 1 May 2020 \| 1:37:01 UTC - in response to Message 54504.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py Nice page. Pretty close to my values. I always leave some head room, e.g. LHC ATLAS needs 2 GB. Rosetta is a problem as they stuff every project they have into one queue. Some need much more RAM than others so this is clearly an average that would benefit from knowing its standard deviation. Most of the time 0.8 GB is enough but a couple of projects use a good bit more so reserve a minimum of 1 GB and it'll come out in the wash.
	ID: 54508 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,815,362,024 RAC: 20,602,106 Level Scientific publications	Message 54510 - Posted: 1 May 2020 \| 10:20:33 UTC - in response to Message 54504.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py Thank you for sharing this! Not only did I like it, but I also joined WUProp@Home and I'm running my first task... Its seems to be very low resource demanding: Application: Data collect version 4 4.25 (nci) Name: data_collect_v4_1586607902_360862 State: Running (non-CPU-intensive) Received: Fri 01 May 2020 10:34:46 WEST Report deadline: Fri 08 May 2020 10:34:45 WEST Estimated computation size: 1,000 GFLOPs CPU time: 00:00:05 CPU time since checkpoint: 00:00:00 Elapsed time: 00:25:15 Estimated time remaining: 05:25:27 Fraction done: 7.202% Virtual memory size: 9.68 MB Working set size: 7.41 MB Directory: slots/4 Process ID: 28681 Progress rate: 16.560% per hour Executable: data_collect_v4_425_x86_64-pc-linux-gnu__nci
	ID: 54510 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54568 - Posted: 3 May 2020 \| 23:24:27 UTC
	Unfortunately, it appears that _heinz has given up on GPUGRID, looking at his task list. Regrettable as he is a veteran cruncher. I see that some of his errors were detected memory leaks. That might hint at what Zoltan wrote about failing memory being one cause of nan errors. You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip.
	ID: 54568 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54570 - Posted: 4 May 2020 \| 2:27:47 UTC - in response to Message 54510. Last modified: 4 May 2020 \| 2:39:13 UTC
	ServicEnginIC wrote Thank you for sharing this! Not only did I like it, but I also joined WUProp@Home and I'm running my first task... Its seems to be very low resource demanding: That goes for me too! 👍👍 The more of us that contribute, the more accurate the statistics. "(non-CPU-intensive)" = extra BOINC credit without stopping anything else to do it... Genius! And, we get more insignias apestosas, hombres. 🥇😊
	ID: 54570 \| Rating: 0 \| rate: / Reply Quote

[AF] fansyl Send message Joined: 26 Sep 13 Posts: 20 Credit: 1,714,356,441 RAC: 0 Level Scientific publications	Message 54572 - Posted: 4 May 2020 \| 11:43:13 UTC - in response to Message 54505. Last modified: 4 May 2020 \| 11:44:30 UTC
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py I wondered how did they get their data. I realized the answer when I browsed to the root of this site: http://wuprop.boinc-af.org/ This is actually a BOINC project collecting data about the apps of other BOINC projects as you run it with your other projects. Nice! It's 10 years old, and I can't recall I heard about it. I'm shocked. WUProp is a tool developped by an Alliance Francophone's developper. Feel free to come on our forum, and especially on the dedicated thread : https://forum.boinc-af.org/index.php?topic=3438.new;topicseen#new ____________
	ID: 54572 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : Error invoked kernel

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54377 - Posted: 19 Apr 2020 \| 15:35:20 UTC
	# Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? ____________
	ID: 54377 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54378 - Posted: 19 Apr 2020 \| 18:33:03 UTC - in response to Message 54377. Last modified: 19 Apr 2020 \| 19:18:08 UTC
	# Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip. NaN on Wikipedia
	ID: 54378 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54383 - Posted: 20 Apr 2020 \| 8:38:42 UTC
	Thank you, I will lookup und let run the cards still in standard frequency. _heinz ____________
	ID: 54383 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54411 - Posted: 22 Apr 2020 \| 10:54:43 UTC
	I would try a app_config.xml <app_config> <app> <name>acemd3</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> but BOINC says: 22.04.2020 12:31:02 \| GPUGRID \| Your app_config.xml file refers to an unknown application 'acemd3'. Known applications: None Can someone tell me the right name please. ____________
	ID: 54411 \| Rating: 0 \| rate: / Reply Quote

_heinz Send message Joined: 20 Sep 13 Posts: 16 Credit: 3,433,447 RAC: 0 Level Scientific publications	Message 54422 - Posted: 22 Apr 2020 \| 14:38:53 UTC
	admin pleasedelete the multiple messages
	ID: 54422 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54427 - Posted: 23 Apr 2020 \| 1:11:05 UTC - in response to Message 54378. Last modified: 23 Apr 2020 \| 1:13:09 UTC
	Retvari Zoltan said: # Engine failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719) most Wu errored out on My 3 NVIDIA Titan any hints ? You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip. NaN on Wikipedia I don't think limiting your GPU usage will solve your errors. it's not a matter of percentage, but a matter of frequency that is causing tasks to fail when the wrapper starts the GPU. Some appear to be easier to crash than others. GPUGRID WUs are the most sensitive tasks I've seen to processor overclocking errors and I had to slow my GTX 1060 down when I came here even though it ran games and other BOINC projects OK. My errors were hit and miss like yours only not as many. they usually occurred at ~30 sec. Your base clock speed is 1000MHz per [url]https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications [/url]
	ID: 54427 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 180,567 Level Scientific publications	Message 54431 - Posted: 24 Apr 2020 \| 14:43:05 UTC - in response to Message 54411.
	I would try a app_config.xml <app_config> <app> <name>acemd3</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> but BOINC says: 22.04.2020 12:31:02 \| GPUGRID \| Your app_config.xml file refers to an unknown application 'acemd3'. Known applications: None Can someone tell me the right name please. Yours looks exactly like mine except I only run one GG WU per GPU. BOINC always says that when you don't have a acemd3 WU downloaded. Wow! Eight duplicates. My record was three. No worries, it happens to us all and I don't know why.
	ID: 54431 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,880,156,640 RAC: 20,004,259 Level Scientific publications	Message 54432 - Posted: 24 Apr 2020 \| 18:09:37 UTC
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs.
	ID: 54432 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54433 - Posted: 24 Apr 2020 \| 20:11:24 UTC - in response to Message 54432.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. I concur. Not all NaN errors are the result of a misbehaving card. Sometimes the task is just badly formatted.
	ID: 54433 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54434 - Posted: 24 Apr 2020 \| 20:42:20 UTC - in response to Message 54433.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. I concur. Not all NaN errors are the result of a misbehaving card. Sometimes the task is just badly formatted. This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Perhaps the previous host made an error, which resulted in a permanent NaN error on all hosts.
	ID: 54434 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54441 - Posted: 25 Apr 2020 \| 17:22:51 UTC - in response to Message 54432.
	Just got a "# Engine failed: Particle coordinate is nan" error on WU 19441088 - as have all my wingmates. I'm pretty sure that will be a mistake in the data prepared for the run, nothing to do with unstable GPUs. Hi Richard Haselgrove, Your 1660-S is not overclocked; Correct? It looks like we'll have to wait until that WU reaches the Apr 29 deadline on iBat's machine (after viewing it's task status), before seeing if it crashes again. I've been getting more tasks lately which have crashed on 1 or 2 other hosts before being sent to mine. I noticed several error prone machines were Science United and a few were grcpool hosts. Fascinating.
	ID: 54441 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54442 - Posted: 25 Apr 2020 \| 17:43:59 UTC - in response to Message 54434.
	This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Zoltan, I think you meant to write 8th of 10, as the first one is always named 0-10. Or am I confused? 🤔
	ID: 54442 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54444 - Posted: 25 Apr 2020 \| 18:08:08 UTC - in response to Message 54442.
	This task is a 2ph7A01_348_3-TONI_MDADpr4sp-7-10-RND7696 So it's the 7th of 10 workunits. Zoltan, I think you meant to write 8th of 10, as the first one is always named 0-10. Or am I confused? 🤔 You're right, it's the 8th. Probably the host doing the 7th piece made an error. (that's what I should post to correctly include the number 7 in my post)
	ID: 54444 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,880,156,640 RAC: 20,004,259 Level Scientific publications	Message 54447 - Posted: 26 Apr 2020 \| 7:46:05 UTC - in response to Message 54441.
	Hi Richard Haselgrove, Your 1660-S is not overclocked; Correct? Correct. I gave that machine a complete motherboard/CPU/RAM transplant at the end of January, and fitted two brand-new, identical, 1600-S GPUs. It's in a high airflow case with a Corsair modular power supply. I can do basic hardware work on computers, but I'm not a hardware specialist, so I bought the motherboard bundle pre-assembled and tested from a local trade supplier, with CPU cooler ready attached. It ran on SETI until that project stopped sending out new work (bad timing on my part!), and started working here at the beginning of April. Application details Tasks I think 4 errors, with 1167 completed tasks, indicates the machine is basically healthy. Two of the other errors reached the full 8 failures on all machines that attempted them, and one seems to have been a ghost that I never received.
	ID: 54447 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 755,434,080 RAC: 186,180 Level Scientific publications	Message 54455 - Posted: 27 Apr 2020 \| 17:10:39 UTC - in response to Message 54422.
	admin pleasedelete the multiple messages If you try soon enough, you should be able to do part of the work by editing all but one of them down to just one character. Making most of them the same single character is likely to trigger an automatic process for hiding duplicate messages.
	ID: 54455 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54491 - Posted: 29 Apr 2020 \| 3:18:46 UTC - in response to Message 54447.
	Richard, you have less errors than I do, I think. Looking at how long this WU runs before throwing an error, It probably is a bad egg. I see all the other defective ones I've gotten last around 10-20 seconds before bombing. The errors I saw from instability were a bit later, around 30 seconds. I figure that it takes about that long for the wrapper to get its task running on the GPU. (That's also the approximate timing of getting a mismatched GPU error on restarts, the cause of most of my errors.) I do recall that recently I showed 10 errors and 5 of them were bad WUs. I'll lay my nickel on WU 19441088 being another dud task.💣
	ID: 54491 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54501 - Posted: 30 Apr 2020 \| 17:43:23 UTC - in response to Message 54491.
	I do recall that recently I showed 10 errors and 5 of them were bad WUs. I'll lay my nickel on WU 19441088 being another dud task.💣 ...And sure enough, it lasted no longer than 16 seconds before it choked on everybody's hosts. Heinz is getting errors at later stages of the tasks than we are experiencing them when running bad WUs. https://www.gpugrid.net/results.php?hostid=159065 I have had errors before that were caused by running short of memory, although I see that is not a problem in Heinz's case. I had 7 Rosetta threads and two GPUGRID wrappers running in 8 GB of ram with 8182MB swapfile. Every time a Rosetta COVID task would suddenly hog memory, one of the wrappers would give a message that an output file could not be found (can't remember which) and throw an error. I've since increased to 12GB of ram and solved that issue. I had a PSU failure on my fast host today (a recycled 600W cheapo from the days of molex connectors) and it makes me wonder if Heinz might have power issues with his 3 GTX Titans in one host. Just a thought, but if they're clocked higher than factory specs IMHO that is the first thing to suspect. 🤔
	ID: 54501 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54502 - Posted: 30 Apr 2020 \| 19:35:00 UTC - in response to Message 54447.
	It ran on SETI until that project stopped sending out new work (bad timing on my part!) Ha ha LOL. I did the same thing. Rebuilt completely/upgraded the 3900X host for Seti and put it back online a few days before Seti pulled the plug. Now it just sits there, idle, looking pretty.
	ID: 54502 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54503 - Posted: 30 Apr 2020 \| 20:37:48 UTC - in response to Message 54501.
	I have had errors before that were caused by running short of memory, although I see that is not a problem in Heinz's case. I had 7 Rosetta threads and two GPUGRID wrappers running in 8 GB of ram with 8182MB swapfile. Every time a Rosetta COVID task would suddenly hog memory, one of the wrappers would give a message that an output file could not be found (can't remember which) and throw an error... ...Which made me curious what that particular host is running on the CPU. I see that _heinz has recently switched to running World Community Grid- https://boinc.netsoft-online.com/e107_plugins/boinc/get_user.php?cpid=5e024335320e436c4d050e073963e326 Does anyone here know how much memory those tasks use? I found that LHC@home tasks were too memory hungry to run at 2GB of ram per thread. That might be an issue here.
	ID: 54503 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,682,021,308 RAC: 13,131,179 Level Scientific publications	Message 54504 - Posted: 30 Apr 2020 \| 21:45:10 UTC - in response to Message 54503.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py
	ID: 54504 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,228,265,968 RAC: 2,203,425 Level Scientific publications	Message 54505 - Posted: 30 Apr 2020 \| 22:58:22 UTC - in response to Message 54504. Last modified: 30 Apr 2020 \| 22:59:09 UTC
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py I wondered how did they get their data. I realized the answer when I browsed to the root of this site: http://wuprop.boinc-af.org/ This is actually a BOINC project collecting data about the apps of other BOINC projects as you run it with your other projects. Nice! It's 10 years old, and I can't recall I heard about it. I'm shocked.
	ID: 54505 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 180,567 Level Scientific publications	Message 54508 - Posted: 1 May 2020 \| 1:37:01 UTC - in response to Message 54504.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py Nice page. Pretty close to my values. I always leave some head room, e.g. LHC ATLAS needs 2 GB. Rosetta is a problem as they stuff every project they have into one queue. Some need much more RAM than others so this is clearly an average that would benefit from knowing its standard deviation. Most of the time 0.8 GB is enough but a couple of projects use a good bit more so reserve a minimum of 1 GB and it'll come out in the wash.
	ID: 54508 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,815,362,024 RAC: 20,602,106 Level Scientific publications	Message 54510 - Posted: 1 May 2020 \| 10:20:33 UTC - in response to Message 54504.
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py Thank you for sharing this! Not only did I like it, but I also joined WUProp@Home and I'm running my first task... Its seems to be very low resource demanding: Application: Data collect version 4 4.25 (nci) Name: data_collect_v4_1586607902_360862 State: Running (non-CPU-intensive) Received: Fri 01 May 2020 10:34:46 WEST Report deadline: Fri 08 May 2020 10:34:45 WEST Estimated computation size: 1,000 GFLOPs CPU time: 00:00:05 CPU time since checkpoint: 00:00:00 Elapsed time: 00:25:15 Estimated time remaining: 05:25:27 Fraction done: 7.202% Virtual memory size: 9.68 MB Working set size: 7.41 MB Directory: slots/4 Process ID: 28681 Progress rate: 16.560% per hour Executable: data_collect_v4_425_x86_64-pc-linux-gnu__nci
	ID: 54510 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54568 - Posted: 3 May 2020 \| 23:24:27 UTC
	Unfortunately, it appears that _heinz has given up on GPUGRID, looking at his task list. Regrettable as he is a veteran cruncher. I see that some of his errors were detected memory leaks. That might hint at what Zoltan wrote about failing memory being one cause of nan errors. You have also # Engine failed: Particle coordinate is nan which is usually the result of to much overclocking, or your card has a failing memory chip.
	ID: 54568 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54570 - Posted: 4 May 2020 \| 2:27:47 UTC - in response to Message 54510. Last modified: 4 May 2020 \| 2:39:13 UTC
	ServicEnginIC wrote Thank you for sharing this! Not only did I like it, but I also joined WUProp@Home and I'm running my first task... Its seems to be very low resource demanding: That goes for me too! 👍👍 The more of us that contribute, the more accurate the statistics. "(non-CPU-intensive)" = extra BOINC credit without stopping anything else to do it... Genius! And, we get more insignias apestosas, hombres. 🥇😊
	ID: 54570 \| Rating: 0 \| rate: / Reply Quote

[AF] fansyl Send message Joined: 26 Sep 13 Posts: 20 Credit: 1,714,356,441 RAC: 0 Level Scientific publications	Message 54572 - Posted: 4 May 2020 \| 11:43:13 UTC - in response to Message 54505. Last modified: 4 May 2020 \| 11:44:30 UTC
	I recently discovered a website with the ability to dive deep into the data for all the BOINC projects. This page has the RAM requirements for all the projects cpu apps. http://wuprop.boinc-af.org/results/ram.py I wondered how did they get their data. I realized the answer when I browsed to the root of this site: http://wuprop.boinc-af.org/ This is actually a BOINC project collecting data about the apps of other BOINC projects as you run it with your other projects. Nice! It's 10 years old, and I can't recall I heard about it. I'm shocked. WUProp is a tool developped by an Alliance Francophone's developper. Feel free to come on our forum, and especially on the dedicated thread : https://forum.boinc-af.org/index.php?topic=3438.new;topicseen#new ____________
	ID: 54572 \| Rating: 0 \| rate: / Reply Quote