Advanced search

Message boards : Number crunching : All tasks failed with Exit status 195 (0xc3) EXIT_CHILD_FAILED

Author Message
Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59018 - Posted: 23 Jul 2022 | 21:57:32 UTC
Last modified: 23 Jul 2022 | 22:01:54 UTC

There seems to be a bug in these tasks.
I'm seeing a 100% failure on my system and the wingmen behind me.
Windows 10 or 11 does not make a difference.
A linux user also has this.
One of my tasks had 4-5 failures behind me.
Another task my first wingman failed but he runs a 780 and that does not have something in its firmware/software that will allow it to run these tasks.
I have a 1080 and it failed. The last person had a 1050 and it ran ok.

I don't get what is going on and why this was not picked up in testing.

I find this to be a common error message in the stderr file: OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "D:\data\slots\1\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59019 - Posted: 23 Jul 2022 | 23:18:25 UTC - in response to Message 59018.

That is a problem with Windows and memory reservation allocation when loading all the Python dll's.

Linux does not have the issue.

See this message of mine. https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908


The solution is to increase the size of your paging file.

jjch
Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,620,008,144
RAC: 4,718,646
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59020 - Posted: 24 Jul 2022 | 2:19:47 UTC - in response to Message 59019.
Last modified: 24 Jul 2022 | 2:29:42 UTC

I had to go back 6 tasks to find the one that failed with the paging file error. More recent tasks are having a different problem running out of memory somewhere.

You system looks like it has 48GB of physical memory so that should be sufficient to run the GPUgrid Python tasks unless there is another conflict with something else.

I have a Server running Win Server 2012 with the same amount of physical memory. The swap file is still set at "Automatically manage paging file size for all drives"

I left this one that way since is was working OK. With one GPUgrid Python task running it shows Currently allocated at 12800 MB which is typical.

Check the free space available on your swap drive and make sure it has a minimum of 16GB available. If you have plenty of space there then I would suggest you set the swap space separately.

I have found that sometimes it seems the Automatic isn't fast enough so try setting it to System managed size first. If that doesn't help then set it to Custom size.

You might need to play with the sizing a bit but you can try try Initial size 16384 and Maximum size 24576 or more.

The last 5 tasks are failing with various not enough memory errors but the first traceback is something I have been seeing with a lot of the tasks failing.

Just make sure you are not running anything that is tying up too much memory and not leaving enough available for GPUgrid.

Other than that these could be an internal error in the GPUgrid Python tasks causing it.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59025 - Posted: 24 Jul 2022 | 8:12:21 UTC - in response to Message 59020.
Last modified: 24 Jul 2022 | 8:17:11 UTC

I have a whole HDD set aside for BOINC with 303GB of space left.
All the data files are there.
I run FAH plus all the projects you see in my profile here.
I am just around 73% memory usage.

Disk setting is leave 20GB free
Memory setting is computer in use 90%
Not in use 98%
Leave non GPU in memory (yes)
Page/Swap use at most 90%

You would think with these settings it has more than enough space to do what it needs to do.

According to BOINC tasks the current task uses 1932 physical and 3632 virtual.
BOINC says virtual size is 3.55 and working set is 1.89


Checked again after maxing everything out and this error keeps repeating:
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "D:\data\slots\1\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\data\slots\1\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\data\slots\1\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)

Paging size, this seems to be an error in the code, I've opened up BOINC to the max. I think this was also a teething error in python CPU and RAH. But not paging size.


And after adjustments I get this: Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {13199} normal block at 0x000001B0A0972890, 8 bytes long.
Data: < > 00 00 94 A0 B0 01 00 00
..\lib\diagnostics_win.cpp(417) : {11918} normal block at 0x000001B0A0998B40, 1080 bytes long.
Data: <<j 4 > 3C 6A 00 00 CD CD CD CD 34 01 00 00 00 00 00 00
..\zip\boinc_zip.cpp(122) : {397} normal block at 0x000001B0A09708F0, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{383} normal block at 0x000001B0A096AA80, 52 bytes long.
Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00
{378} normal block at 0x000001B0A096ABD0, 43 bytes long.
Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00
{373} normal block at 0x000001B0A096AD90, 44 bytes long.
Data: < > 01 00 00 00 00 00 CD CD B1 AD 96 A0 B0 01 00 00
{368} normal block at 0x000001B0A096AD20, 44 bytes long.
Data: < A > 01 00 00 00 00 00 CD CD 41 AD 96 A0 B0 01 00 00
Object dump complete.
09:46:01 (13124): wrapper (7.9.26016): starting
09:46:01 (13124): wrapper: running python.exe (run.py)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {13134} normal block at 0x0000023C80BA32A0, 8 bytes long.
Data: < R < > 00 00 52 82 3C 02 00 00
..\lib\diagnostics_win.cpp(417) : {11853} normal block at 0x0000023C80BCF400, 1080 bytes long.
Data: <$2 P > 24 32 00 00 CD CD CD CD 50 01 00 00 00 00 00 00
..\zip\boinc_zip.cpp(122) : {397} normal block at 0x0000023C80BA3C60, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{383} normal block at 0x0000023C80B9AA70, 52 bytes long.
Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00
{378} normal block at 0x0000023C80B9AC30, 43 bytes long.
Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00
{373} normal block at 0x0000023C80B9A840, 44 bytes long.
Data: < a < > 01 00 00 00 00 00 CD CD 61 A8 B9 80 3C 02 00 00
{368} normal block at 0x0000023C80B9A990, 44 bytes long.
Data: < < > 01 00 00 00 00 00 CD CD B1 A9 B9 80 3C 02 00 00
Object dump complete.

But then it goes on to start running.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,976,102,259
RAC: 18,018,150
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59026 - Posted: 24 Jul 2022 | 10:12:46 UTC

I posted some screenshots of paging file settings in message 58934. I'd had similar failures with only 8 GB system RAM installed: with 16 GB and those settings, the Python app ran, though it's not a very efficient use of that particular machine.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59027 - Posted: 24 Jul 2022 | 11:01:44 UTC - in response to Message 59026.

I've searched windows and the net on how to do that and nothing matches those screen shots and nothing from the net matches my win 10 64bit software.

Can you tell me how to get to the tabs you did the screenshot of?

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59028 - Posted: 24 Jul 2022 | 12:41:23 UTC

Found this info in boinc_task_state.xml

<project_master_url>https://www.gpugrid.net/</project_master_url>
<result_name>e00028a00502-ABOU_rnd_ppod_expand_demos6_again2-0-1-RND4470_2</result_name>
<checkpoint_cpu_time>31287.720000</checkpoint_cpu_time>
<checkpoint_elapsed_time>15281.828158</checkpoint_elapsed_time>
<fraction_done>0.059200</fraction_done>
<peak_working_set_size>2470195200</peak_working_set_size>
<peak_swap_size>6816833536</peak_swap_size>
<peak_disk_usage>17117387104</peak_disk_usage>

I am assuming these huge values are in bytes?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,976,102,259
RAC: 18,018,150
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59029 - Posted: 24 Jul 2022 | 13:59:55 UTC - in response to Message 59027.

Can you tell me how to get to the tabs you did the screenshot of?

All these low-level Windows management tools have barely changed since Windows NT 4 days, but the roadmap for finding them changes every time. The ones I posted were from Windows 7, but here's the routing for Windows 11 - split the difference...









For the final one, unset the first and third ('Automatic' and 'System' management), and set 'Custom' to open up all the options.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59030 - Posted: 24 Jul 2022 | 15:25:56 UTC - in response to Message 59029.

after a little trial and error I found a way to that location.
Set it to 144MB 3x physical to start and gave it 154MB max
See if this helps anything.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59031 - Posted: 24 Jul 2022 | 16:57:00 UTC - in response to Message 59030.

after a little trial and error I found a way to that location.
Set it to 144MB 3x physical to start and gave it 154MB max
See if this helps anything.

That's way undersized. It should be GB's . . . . not MB's

From your task data . . . <peak_disk_usage>17117387104</peak_disk_usage>

That is 17GB's of disk usage.

I would set 17GB or 17000MB for initial size and double it for max size.
or
34GB or 34000MB

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59032 - Posted: 24 Jul 2022 | 21:54:03 UTC - in response to Message 59031.
Last modified: 24 Jul 2022 | 21:58:31 UTC

after a little trial and error I found a way to that location.
Set it to 144MB 3x physical to start and gave it 154MB max
See if this helps anything.

That's way undersized. It should be GB's . . . . not MB's

From your task data . . . <peak_disk_usage>17117387104</peak_disk_usage>

That is 17GB's of disk usage.

I would set 17GB or 17000MB for initial size and double it for max size.
or
34GB or 34000MB


oh! thanks...will make the change
170000 and 340000

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59033 - Posted: 25 Jul 2022 | 20:37:44 UTC

Well that seems to have solved the problem on my Win10 machine.
2 tasks run and completed ok.

Thanks Keith!

Curious though why if it has to much space it errors out, but only here, not in other projects?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59034 - Posted: 26 Jul 2022 | 2:17:38 UTC - in response to Message 59033.

Go back and read this post of mine.

https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908

Only affects projects that use pytorch in Windows that have large DLL's that Windows MUST reserve a lot of memory for.

Don't think there are any other BOINC projects that use pytorch.

So not affected.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59035 - Posted: 26 Jul 2022 | 18:57:46 UTC - in response to Message 59034.
Last modified: 26 Jul 2022 | 19:50:53 UTC

Go back and read this post of mine.

https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908

Only affects projects that use pytorch in Windows that have large DLL's that Windows MUST reserve a lot of memory for.

Don't think there are any other BOINC projects that use pytorch.

So not affected.



I have never heard of that. I wondered what that was.
So after reading that, it explains why Python GPU or anything in GPU is used at my oldest project RAH. They have Python CPU to run, generated by an external client, but that's about it for us BOINC users. They keep all the really interesting stuff inhouse for the AI system.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59036 - Posted: 27 Jul 2022 | 0:40:49 UTC
Last modified: 27 Jul 2022 | 0:55:40 UTC

Once again, GPUGrid is on the cutting edge of gpu science for BOINC projects with its machine learning and AI development. They were the first BOINC project to use gpus. I like they are still pushing the envelope.

The only other machine learning BOINC project I know about is MLC@home and they only use cpus now. Had a gpu app a few years ago but I don't think they are producing any tasks for gpus currently.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59037 - Posted: 27 Jul 2022 | 19:36:56 UTC

I like projects that push the boundaries. Look for stuff that has not been done before either in code or in ideas of what to send out for crunching.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59038 - Posted: 27 Jul 2022 | 21:19:42 UTC

Today, it is impossible for a human to take into account the results, even limited to the most important data, for millions of known molecules. The second objective of this project is to radically change the approach developing artificial intelligence and optimization methods in order to explore efficiently the highly combinatorial molecular space.

https://quchempedia.univ-angers.fr/athome/about.php

QuChemPedIA is an AI project, though CPU only. And it works best with Linux. You can use Windows with VirtualBox, but there are a lot of stuck work units you have to deal with.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59045 - Posted: 28 Jul 2022 | 18:02:08 UTC - in response to Message 59038.
Last modified: 28 Jul 2022 | 18:02:28 UTC

Today, it is impossible for a human to take into account the results, even limited to the most important data, for millions of known molecules. The second objective of this project is to radically change the approach developing artificial intelligence and optimization methods in order to explore efficiently the highly combinatorial molecular space.

https://quchempedia.univ-angers.fr/athome/about.php

QuChemPedIA is an AI project, though CPU only. And it works best with Linux. You can use Windows with VirtualBox, but there are a lot of stuck work units you have to deal with.



I know it and due to that exact reason and other technical errors, I gave up.
I can't get it to run stable on my windows system, so forget it.

GPU's get enough action with this project and primegrid and FAH as well as Eisenstein.

I think I am attached to enough to projects to keep this system busy all the time it runs (16 hours a day)

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59046 - Posted: 29 Jul 2022 | 8:46:48 UTC

so...a new wrinkle.
I have two tasks running at the same time and RAH is complaining about disk space with the CPU Python.
I've maxed out the upper value.
rosetta python projects needs 3624.20MB more disk space. You currently have 15449.28 MB available and it needs 19073.49 MB.

So what do I have to do? I suppose I will have to restrict this project to 1 GPU in order to solve this disk space problem?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,976,102,259
RAC: 18,018,150
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59047 - Posted: 29 Jul 2022 | 9:21:50 UTC - in response to Message 59046.

Disk space limits can be solved by tweaking BOINC's limits.

They're quite separate and distinct from the memory (RAM) problems you were having here earlier.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59048 - Posted: 29 Jul 2022 | 15:52:47 UTC - in response to Message 59047.

Disk space limits can be solved by tweaking BOINC's limits.

They're quite separate and distinct from the memory (RAM) problems you were having here earlier.



Ok thanks...fixed

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59050 - Posted: 30 Jul 2022 | 8:18:40 UTC

New problem...I stopped last night with 98% done and about a hour and half to go on the end of the task. I do all the normal shut down procedures, suspend all computing, shut down client, exit program. When I restart this morning the task has gone to hell. Time to finish 159 days and 2% done and time remaining counts UP and not down.

CPU time
6d 11:39:36
CPU time since checkpoint
00:14:10
Elapsed time
3d 06:14:26
Estimated time remaining
159d 17:47:48
Fraction done
2.000%

Now after several restarts the time remaining goes down, but still 159 days.


I had another task that was also close to done, but the server considered it timed out. I guess I missed the deadline.

I'll let this task run for a bit longer, but to me it looks all messed up.
I don't see anything wrong in stderr or boinc_task_state

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59051 - Posted: 30 Jul 2022 | 10:01:14 UTC - in response to Message 59050.

New problem...I stopped last night with 98% done and about a hour and half to go on the end of the task. I do all the normal shut down procedures, suspend all computing, shut down client, exit program. When I restart this morning the task has gone to hell. Time to finish 159 days and 2% done and time remaining counts UP and not down.

CPU time
6d 11:39:36
CPU time since checkpoint
00:14:10
Elapsed time
3d 06:14:26
Estimated time remaining
159d 17:47:48
Fraction done
2.000%

Now after several restarts the time remaining goes down, but still 159 days.

It settled down now. 47 minutes left.



I had another task that was also close to done, but the server considered it timed out. I guess I missed the deadline.

I'll let this task run for a bit longer, but to me it looks all messed up.
I don't see anything wrong in stderr or boinc_task_state

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59052 - Posted: 30 Jul 2022 | 10:05:52 UTC

Different question: When looking at Boinc Tasks program and looking at the CPU%, why do I see 197% and 131% CPU usage? Is that just how these tasks work?
I thought CPU was for control and guidance only? This almost looks like it is processing as well.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59053 - Posted: 31 Jul 2022 | 1:19:35 UTC - in response to Message 59052.

It is normal for tasks to temporarily revert to 2% completion upon restart.

But they quickly jump back to their original completion done percentage at the point they were stopped in just a few minutes.

And then continue on till finish.

At least that is what they always do on all my Linux hosts.

But I have seen similar comments from others running Windows. Probably best not to chance stopping them on Windows.

The application does in fact use the cpu. Quite a bit in fact. The task will jump back and forth from running on the cpu to a quick spurt on the gpu and then back to the cpu.

The tasks spawn 32 individual python processes on the cpu so you are really using more than 100% of a single cpu core. That is what BoincTasks is detecting and showing.

From The reason Reinforcement Learning agents do not currently use the whole potential of the cards is because the interactions between the AI agent and the simulated environment are performed on CPU while the agent "learning" process is the one that uses the GPU intermittently.
Message 59980

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 40
Credit: 1,556,114,862
RAC: 2,449,432
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59057 - Posted: 4 Aug 2022 | 17:43:39 UTC - in response to Message 59018.

The failure rate on the GPU tasks has reached the point where I feel it is a waste to even try to explain the processes of the failures: 97 out of 101 tasks have failed on either a GTX 1060 or an RTX 3080 and I aborted the RTX task after it wasted 5 days+ of running time, exceeded the return time limit, and still had double-digit days remaining. The three tasks that succeeded used only about 1800 to 3500 seconds of run time.

My patience has expired and I am terminating tasking on Grid for a couple of weeks or so and perhaps the problem can be solved using internal GPUs.

Added Comment: Just for the hell of it: I downloaded a new task just now on the GTX 1060 machine and the initial time to compute was shown as 30 DAYS; OH SURE!!!This does not constitute a sound confidence builder.

Billy Ewell 1931 (Yes, my year of birth)



Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59058 - Posted: 4 Aug 2022 | 21:20:06 UTC - in response to Message 59057.

Sorry to hear you go.

The estimated time to complete values can be completely ignored at GPUGrid.

BOINC does not have the mechanism to compute the time remaining values of the dual cpu-gpu nature of these tasks and cannot estimate the time to complete correctly.

On modern gpus of at least Pascal generation, the tasks complete well within the standard 5 day deadlines. Typical compute times of around 20 minutes to 12 hours.

Windows needs to be set up correctly however to run these tasks properly.

The Windows pagefile size needs to be increased to around 35-50GB for the tasks to run and finish properly.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59059 - Posted: 4 Aug 2022 | 22:36:10 UTC - in response to Message 59057.

Billy, scroll down this thread a bit.
There is a post where Keith gives some upper and lower limits to the page file size. This cleared things up for me really fast.

I run a 1080 and a 1050 and once I did the page file setting I have never had an error on either card. Run time is about 3 days on these cards, but I am sharing them with Folding At Home, so that might slow things down a bit.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59060 - Posted: 5 Aug 2022 | 3:32:42 UTC - in response to Message 59059.

Thanks for the confirmation Greg that the Python tasks CAN in fact be properly run to completion well within their deadlines AS LONG as Windows is configured correctly.

Glad to hear you are successfully processing this new work and contributing to cutting edge science.

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 131,375,138
RAC: 911,430
Level
Cys
Scientific publications
wat
Message 59072 - Posted: 6 Aug 2022 | 20:28:50 UTC - in response to Message 59045.
Last modified: 6 Aug 2022 | 20:29:26 UTC

Try to install boinc on rocky linux 8 in vmware workstation player . It is free for home use.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 135
Credit: 121,589,325
RAC: 27,413
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 59075 - Posted: 7 Aug 2022 | 13:54:34 UTC - in response to Message 59060.

Thanks for the confirmation Greg that the Python tasks CAN in fact be properly run to completion well within their deadlines AS LONG as Windows is configured correctly.

Glad to hear you are successfully processing this new work and contributing to cutting edge science.


Just chugging along now. Once that swap space issue was taken care of, no problems. This is a Win10 machine with AMD Ryzen.

Profile God is Love, Jesus proves...
Send message
Joined: 23 Mar 15
Posts: 1
Credit: 21,695,263
RAC: 5,680
Level
Pro
Scientific publications
wat
Message 59089 - Posted: 9 Aug 2022 | 19:05:17 UTC

Adria, please fix the bug in your WUs.
error code 195

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,976,102,259
RAC: 18,018,150
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59090 - Posted: 9 Aug 2022 | 19:10:10 UTC - in response to Message 59089.

(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
23:23:38 (19824): wrapper (7.9.26016): starting
23:23:38 (19824): wrapper: running bin/acemd3.exe (--boinc --device 0)
ACEMD failed:
Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)
23:50:50 (19824): bin/acemd3.exe exited; CPU time 1611.078125

A GeForce GTX 1660 Ti should be OK: check your drivers.

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 40
Credit: 1,556,114,862
RAC: 2,449,432
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59103 - Posted: 12 Aug 2022 | 1:58:25 UTC - in response to Message 59059.

Keith: Thanks for the input but I am personally cautious in changing items for fear I will screw up what I cannot fix.

Here are the current page filing settings on automatic and I have changed nothing so far. This is as currently specified:

Minimum allowed----16 MB

Recommended--------4957 MB

Currently----------45056 MB

As I understand the suggestion is I unclick the automatic setting option and set the Minimum as 35 and the others as ?????.

Await your reply: Bill

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 131,375,138
RAC: 911,430
Level
Cys
Scientific publications
wat
Message 59106 - Posted: 12 Aug 2022 | 12:21:16 UTC - in response to Message 59103.

Try to set it 51200

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59108 - Posted: 12 Aug 2022 | 15:31:56 UTC - in response to Message 59103.

Keith: Thanks for the input but I am personally cautious in changing items for fear I will screw up what I cannot fix.

Here are the current page filing settings on automatic and I have changed nothing so far. This is as currently specified:

Minimum allowed----16 MB

Recommended--------4957 MB

Currently----------45056 MB

As I understand the suggestion is I unclick the automatic setting option and set the Minimum as 35 and the others as ?????.

Await your reply: Bill


Those setting pages are enumerated in MB's, not GB's, which it needs to be for Python tasks.

So you need to add X1000 to your 35 IOW 35000 MB's

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 40
Credit: 1,556,114,862
RAC: 2,449,432
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59135 - Posted: 19 Aug 2022 | 17:59:12 UTC

Keith Myers and Kotenok2000:

Once I reset the pagefiles to the recommended values I have processed bunches of tasks without a skip. Thanks for the great advice. BET

The bottom number is 35000MB and the top is 51200MB.

It would seem practical to me for the admins/techs to incorporate the pagefiles criteria in such a way that all contributors will find it easy to find the instructions and likewise easy to modify their machines.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1352
Credit: 7,748,783,831
RAC: 11,873,712
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59137 - Posted: 19 Aug 2022 | 23:05:57 UTC

I just made a post about the pagefile mod needed for Python task in the FAQ section.

Just need a admin to make it sticky.

Life v lies: Dont be a DN...
Send message
Joined: 14 Feb 20
Posts: 16
Credit: 27,395,983
RAC: 643
Level
Val
Scientific publications
wat
Message 59151 - Posted: 23 Aug 2022 | 16:31:52 UTC

If the GPUGrid project is willing to ask for and accept the in-kind donations of people's GPU time, then GPUGrid has an obligation to do what they can to resolve problematic tasks and code

If WUs require mods to the defaults in config files, etc., people should NOT have to hunt around in forum posts to glean a solution.

BOINC manager does have a Notices tab, and it is negligent of GPUGrid not to post needed instructions there, or at least a direct link to the specific forum post, for resolution
...in particular when the problem is not an isolated issue to just a few PCs

Other projects DO extend the coutesy to communicate via the Notices tab.

LLP, PhD, Prof. Engr.

Post to thread

Message boards : Number crunching : All tasks failed with Exit status 195 (0xc3) EXIT_CHILD_FAILED

//