Message boards : Number crunching : something changed linux: chemestry all erroring out
Author | Message |
---|---|
I re-enabled quantum chemistry on 3 of my Linux systems for the first time in over a month and it appears all are going to error out.
<core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 01:21:33 (24790): wrapper (7.7.26016): starting 01:21:33 (24790): wrapper (7.7.26016): starting 01:21:33 (24790): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda) Python 3.6.3 :: Anaconda, Inc. cat: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/.messages.txt: No such file or directory 01:21:45 (24790): miniconda-installer exited; CPU time 10.905773 01:21:45 (24790): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py) CondaError: FileNotFoundError(2, 'No such file or directory') CondaError: FileNotFoundError(2, 'No such file or directory') CondaError: FileNotFoundError(2, 'No such file or directory') Traceback (most recent call last): File "pre_script.py", line 13, in <module> raise Exception("Error installing h5py") Exception: Error installing h5py 01:25:43 (24790): $PROJECT_DIR/miniconda/bin/python exited; CPU time 90.551737 01:25:43 (24790): app exit status: 0x1 01:25:43 (24790): called boinc_finish(195) </stderr_txt> ]]>
| |
ID: 49426 | Rating: 0 | rate: / Reply Quote | |
All QC tasks error on my 2 Linux boxes, one with Opteron 1210 and the other AMD E-450, both running SuSE Linux Leap 42.3. | |
ID: 49431 | Rating: 0 | rate: / Reply Quote | |
All QC tasks error on my 2 Linux boxes, one with Opteron 1210 and the other AMD E-450, both running SuSE Linux Leap 42.3. They might error at first but if you let it run it might end up working. | |
ID: 49434 | Rating: 0 | rate: / Reply Quote | |
These were 16 thread dual Xeon and at one time I recall having 14 threads assigned. Now I see only 4 and something is not working as before. These are small systems with 32gb flash drives and running 16.10 but that was not a problem before. There's a bug in the app, which prevents more than 1 task starting simultaneously. When the task started successfully, you can start another task manually. Since there's no automated way for this, you should pause all but one task before you shut down your computer, then start them one by one. The other option is to limit the concurrently running QC apps to 1. Since this app uses only 4 threads (cores) you should utilize your other CPU cores with a different project. To do this you should create / modify your app_config.xml file in the projects\www.gpugrid.net folder. <app_config>
<app>
<name>QC</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>QC</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>4</avg_ncpus>
</app_version>
</app_config> | |
ID: 49435 | Rating: 0 | rate: / Reply Quote | |
All QC tasks error on my 2 Linux boxes, one with Opteron 1210 and the other AMD E-450, both running SuSE Linux Leap 42.3. I've run nine of them, all failed. Tullio | |
ID: 49436 | Rating: 0 | rate: / Reply Quote | |
All QC tasks error on my 2 Linux boxes, one with Opteron 1210 and the other AMD E-450, both running SuSE Linux Leap 42.3. I've had 9 fail consecutively and eventually they start running without error. Leave it running for say an hour and see if it works. It all has to do with the consecutive start bug. | |
ID: 49438 | Rating: 0 | rate: / Reply Quote | |
These were 16 thread dual Xeon and at one time I recall having 14 threads assigned. Now I see only 4 and something is not working as before. These are small systems with 32gb flash drives and running 16.10 but that was not a problem before. I wonder if this is similar to a problem I have at MilkyWay. My AMD S9100 runs FP64 at 2600 flops, triple what an HD7950 can do (717), but the more concurrent programs I run, exponentially more "invalid" work units are generated. I keep the number of concurent work units to only 3 to prevent too many invalids from haveing to be farmed out to other wingmen. OTOH, my genuine HD7950s can process 5 with not a single invalid. Other users report the same exact problem with S9150. I will try your app_config with 14 thread instead of 4 and see what happens. I have 14 available. | |
ID: 49443 | Rating: 0 | rate: / Reply Quote | |
Nope, did not work for me. I suspended all projects then allowed only one 14 cpu task to run and it quit with an error. I then allowed another still same problem. I changed the 4 above to 14 as that had worked before and issued a "read app_config", saw the 14 show up but it crashed. Looks like same error as before, this computer http://www.gpugrid.net/results.php?hostid=472211 | |
ID: 49444 | Rating: 0 | rate: / Reply Quote | |
I've downloaded 2 tasks on the HP laptop which has no GPU board, so it won't download GPU tasks. The first erred after 1'51". It is running a single core Atlas task with VirtualBox. | |
ID: 49445 | Rating: 0 | rate: / Reply Quote | |
I've downloaded 2 tasks on the HP laptop which has no GPU board, so it won't download GPU tasks. The first erred after 1'51". It is running a single core Atlas task with VirtualBox. Is VirtualBox required for this project? Looking around I don't see a list of applications with requirements like at other projects. I might have overlooked the app list. Maybe that is my problem | |
ID: 49446 | Rating: 0 | rate: / Reply Quote | |
ID: 49447 | Rating: 0 | rate: / Reply Quote | |
https://www.gpugrid.net/apps.php Thanks Richard! The hyperlink at "... has two types of apps: ACEMD ...." is not obvious in Edge on the system I am using. I even looked in the "Donate" page for the application list. When I looked for it useing Firefox on my Linux system that has a better monitor the hyperlink showed up immediately. Earlier I had found gpugrid was listed as using VirtualBox but this problem system was Linux and not Windows. Looking through that list you found for me, I do not see VirtualBox listed explicitly. Maybe the failures are because I enabled "test apps" ? The last time I ran QC successfully was when i was using GRCPOOL which manages configurations. I no longer use GRCPOOL and am setting parameters myself through venue which is not possible when using that pool manager. | |
ID: 49452 | Rating: 0 | rate: / Reply Quote | |
All LHC@home projects use VirtualBox save SixTrack. I am running only Atlas@home on my 3 PCs. two Linux and one Windows 10 since all other VirtualBox projects fail, for reasons nobody explained to me. I was one of the Alpha tester at Test4Theory@home on invitation by dr.Ben Segal of CERN, a member of the Internet Hall of Fame. | |
ID: 49456 | Rating: 0 | rate: / Reply Quote | |
Same problem over here. | |
ID: 49559 | Rating: 0 | rate: / Reply Quote | |
Please provide the id of a failed task. We have been debugging one specific issue (see the other thread). | |
ID: 49561 | Rating: 0 | rate: / Reply Quote | |
Please provide the id of a failed task. 3 examples of an i5 machine: http://www.gpugrid.net/result.php?resultid=17650661 http://www.gpugrid.net/result.php?resultid=17650648 http://www.gpugrid.net/result.php?resultid=17650310 3 examples of an i7 machine: http://www.gpugrid.net/result.php?resultid=17465735 http://www.gpugrid.net/result.php?resultid=17465712 http://www.gpugrid.net/result.php?resultid=17464261 We have been debugging one specific issue (see the other thread). Well, these errors occurred this morning around 10:30 am (i5 machine). So, if you haven't fixed anything thereafter, the bug is still alive... Please note that most wing-men attempts to complete these tasks successfully failed as well. In two cases, however, the task was completed by another machine (often after multiple failures), but I could not identify a similarity in these machines which is making them unique compared to my own settings or those of the other failing computers. Which of the many other threads relevant to this issue exactly do you mean? Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 49562 | Rating: 0 | rate: / Reply Quote | |
Michael asked Which of the many other threads relevant to this issue exactly do you mean? https://www.gpugrid.net/forum_thread.php?id=4750&nowrap=true#49560 | |
ID: 49563 | Rating: 0 | rate: / Reply Quote | |
So, where is the ultimate solution to the problem? | |
ID: 49564 | Rating: 0 | rate: / Reply Quote | |
So, where is the ultimate solution to the problem? Sounds like its on server/admin side and nothing something a user can fix since tasks were canceled. | |
ID: 49565 | Rating: 0 | rate: / Reply Quote | |
Looks like the problem is solved. Both of my machines now return valid CPU QC results. | |
ID: 49610 | Rating: 0 | rate: / Reply Quote | |
There's a bug in the app, which prevents more than 1 task starting simultaneously. To do this you should create / modify your app_config.xml file in the projects\www.gpugrid.net folder.Have you ever tried using??? <fetch_minimal_work>1</fetch_minimal_work> Fetch one job per device (see --fetch_minimal_work). --fetch_minimal_work Fetch only 1 job per device (CPU, GPU). Used with --exit_when_idle, the client will process one job per device, then exit. [/code] | |
ID: 51193 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : something changed linux: chemestry all erroring out