Author |
Message |
|
I've crunched quietly on my Linux Vm until today.
Now, suddenly, this message: Quantum Chemistry needs 13195.37MB more disk space. You currently have 15414.86 MB available and it needs 28610.23 MB.
30 gb for a wu?? Are you kidding? |
|
|
|
In a previous message from Stefan he mentions these new WUs are much larger simulations that require much larger data-sets. It's just the nature of the beast. |
|
|
|
Ok, i changed the vm disk, now i have 40gb free.
But wus stuck at 10%: "Waiting to run" |
|
|
|
Ok, i changed the vm disk, now i have 40gb free.
But wus stuck at 10%: "Waiting to run"
These WUs use an extremely large amount of memory, most likely it has completely filled your ram and is taking from swap (hard drive) so it cannot run at full speed. I suggest everyone simply wait as unless you have more ram, there is no way to speed this up. |
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
I have aborted one which seemed to go on forever. Now I have another running and one waiting to run. Let's hope they complete.
Tullio
____________
|
|
|
|
These WUs use an extremely large amount of memory, most likely it has completely filled your ram and is taking from swap (hard drive) so it cannot run at full speed.
6gb of ram is not enough for a single wu?
I think they have to work on the app code...
|
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
I have 8 GB RAM also on the laptop. Two tasks have completed so far, another is running. I have aborted one which was going on endlessly.
Tullio |
|
|
mmonninSend message
Joined: 2 Jul 16 Posts: 337 Credit: 7,617,757,013 RAC: 10,860,147 Level
Scientific publications
|
These are needing 30gb of disk space and 6gb of memory? Wow. |
|
|
|
So my Ryzen r7 1700 system runs 4 of these WUs at once. I had 8gb of RAM and ran the old WUs no problem, 100% CPU usage and ~6GB of RAM usage.
With these new WUs I was getting extremely low CPU usage with 4 running and maxed out my RAM. So I switched to 16GB of RAM. It now uses ~13gb of RAM but I still have very low CPU usage? Why is this? It's not maxing out RAM, is it taking from the SSD constantly? Swap is completely empty as well. I would assume the application would put all the vital info in the RAM until it couldn't anymore. |
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
I get 197% CPU usage if no other task is running,173% if a GPU task is running alongside. I have only two cores CPU,both on the Linux laptop and the Linux SUN workstation. The Windows 10 PC has 4 cores but all GPU tasks overheat its GTX 1050 Ti, not overclocked. Einstein@home GPU tasks run fine on it.
Tullio
____________
|
|
|
|
Ok. I passed from 6 to 8 gb on Vm.
Now the wu doesn't "pause", but still remain at 10% until the end of the remaining time. When time is 0, the wu passes immediatly to 100% and continue to crunch.....and cpu use is 0%. |
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
Toni in a News post has explained why the task progress remains at 10%. But my tasks end and report when they finish.
Tullio
____________
|
|
|
|
Toni in a News post has explained why the task progress remains at 10%. But my tasks end and report when they finish.
Tullio
Ok, i've read the post. And i leave the wu crunching
But i have this error:
09:28:46 (2923): wrapper (7.7.26016): starting
09:28:46 (2923): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda &&
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -n qmml2 --override-channels -c defaults -c gpugrid --file requirements.txt ")
Python 3.6.5 :: Anaconda, Inc.
==> WARNING: A newer version of conda exists. <==
current version: 4.5.4
latest version: 4.5.11
Please update conda by running
$ conda update -n base conda
09:29:31 (2923): /usr/bin/flock exited; CPU time 39.730884
09:29:31 (2923): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml2/bin/python (run.py)
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpcm.so.1 00007F1FC12EE24F Unknown Unknown Unknown
libpthread-2.19.s 00007F1FDC054330 Unknown Unknown Unknown
libpthread-2.19.s 00007F1FDC0533AD read Unknown Unknown
core.so 00007F1FCAA5F503 _ZN3psi4PSIO2rwEm Unknown Unknown
core.so 00007F1FCA561B28 _ZN3psi8DiskDFJK1 Unknown Unknown
core.so 00007F1FCA55E3DB _ZN3psi8DiskDFJK1 Unknown Unknown
core.so 00007F1FCA44FD23 _ZN3psi2JK7comput Unknown Unknown
core.so 00007F1FCA20A9FC Unknown Unknown Unknown
core.so 00007F1FCA1E5092 Unknown Unknown Unknown
core.so 00007F1FCA1EDC89 Unknown Unknown Unknown
core.so 00007F1FC8878881 Unknown Unknown Unknown
core.so 00007F1FC84A99B6 Unknown Unknown Unknown
python3.6 00007F1FDC595B94 _PyCFunction_Fast Unknown Unknown
python3.6 00007F1FDC6257CE Unknown Unknown Unknown
python3.6 00007F1FDC647CBA _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC620459 PyEval_EvalCodeEx Unknown Unknown
python3.6 00007F1FDC621376 Unknown Unknown Unknown
python3.6 00007F1FDC59599E PyObject_Call Unknown Unknown
python3.6 00007F1FDC649470 _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC620459 PyEval_EvalCodeEx Unknown Unknown
python3.6 00007F1FDC621376 Unknown Unknown Unknown
python3.6 00007F1FDC59599E PyObject_Call Unknown Unknown
python3.6 00007F1FDC649470 _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC620A9E PyEval_EvalCodeEx Unknown Unknown
python3.6 00007F1FDC621376 Unknown Unknown Unknown
python3.6 00007F1FDC59599E PyObject_Call Unknown Unknown
python3.6 00007F1FDC649470 _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC61EDAE Unknown Unknown Unknown
python3.6 00007F1FDC61F941 Unknown Unknown Unknown
python3.6 00007F1FDC625755 Unknown Unknown Unknown
python3.6 00007F1FDC648A7A _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC61EA94 Unknown Unknown Unknown
python3.6 00007F1FDC61F941 Unknown Unknown Unknown
python3.6 00007F1FDC625755 Unknown Unknown Unknown
python3.6 00007F1FDC648A7A _PyEval_EvalFrame Unknown Unknown
python3.6 00007F1FDC620459 PyEval_EvalCodeEx Unknown Unknown
python3.6 00007F1FDC6211EC PyEval_EvalCode Unknown Unknown
python3.6 00007F1FDC69B9A4 Unknown Unknown Unknown
python3.6 00007F1FDC69BDA1 PyRun_FileExFlags Unknown Unknown
python3.6 00007F1FDC69BFA4 PyRun_SimpleFileE Unknown Unknown
python3.6 00007F1FDC69FA9E Py_Main Unknown Unknown
python3.6 00007F1FDC5674BE main Unknown Unknown
libc-2.19.so 00007F1FDBC9CF45 __libc_start_main Unknown Unknown
python3.6 00007F1FDC64E773 Unknown Unknown Unknown
18:01:43 (1686): wrapper (7.7.26016): starting
18:01:43 (1686): wrapper (7.7.26016): starting
18:01:43 (1686): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml2/bin/python (run.py)
18:29:54 (1686): $PROJECT_DIR/miniconda/envs/qmml2/bin/python exited; CPU time 5398.666724
13:57:32 (1694): wrapper (7.7.26016): starting
13:57:32 (1694): wrapper (7.7.26016): starting
14:46:12 (1694): called boinc_finish(0)
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>6952_16_18_20_23_47359a98_n00001-SDOERR_SELE2-0-1-RND9339_0_1</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error> |
|
|
Jim1348Send message
Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level
Scientific publications
|
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>6952_16_18_20_23_47359a98_n00001-SDOERR_SELE2-0-1-RND9339_0_1</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
They have already been notified.
http://www.gpugrid.net/forum_thread.php?id=4785&nowrap=true#50349
|
|
|
|
They have already been notified.
http://www.gpugrid.net/forum_thread.php?id=4785&nowrap=true#50349
And no answer.... |
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
So I just discovered something interesting.
I'm running 4 GPUs along with 2 CPU work units. 64 GB in the machine.
Virtual memory for the GPU tasks are 32 GB each.......
Not sure how much virtual memory the CPU are using but even if it's the say...45 GB. I can see why I'm running out of space on my SSD
4 x 32 = 128 GB. The SSD is only 120GB total. So yeah, think it's time to upgrade the SSD.....
____________
|
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
Virtual memory size is 4.15 GB on my 8 GB RAM on a Linux box. But more than 700 GB are available to BOINC of a 1 TB disk. I had SSDs on my HP laptop and they all failed. I installed a hybrid disk on it with 8 MB SSD out of 1 TB and it works.
Tullio |
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
Virtual memory size is 4.15 GB on my 8 GB RAM on a Linux box. But more than 700 GB are available to BOINC of a 1 TB disk. I had SSDs on my HP laptop and they all failed. I installed a hybrid disk on it with 8 MB SSD out of 1 TB and it works.
Tullio
I'm getting old..haha... My memory is faulty. Turns out that I had already upgraded that SSD to a 500GB. So that leaves the question of why it's running out of space. I turned off 1 QC work unit, leaving only 1 running along with the 4 GPUs. Better for the temps on the CPU, they were spiking to 100C, now they are back down to 60C-70C
____________
|
|
|
StefanProject administrator Project developer Project tester Project scientist Send message
Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level
Scientific publications
|
A single WU will only use 4GB RAM. If it uses significantly more than that (i.e. 6GB) report to me because it might be a bug in the software.
Other than that yes I mentioned that some of the WUs I submitted now might use up to 50GB of scratch space on the disk. There is not much I can do about it other than simply not simulating them. |
|
|
|
Turns out that I had already upgraded that SSD to a 500GB. So that leaves the question of why it's running out of space.
New beta wus on my linux VM
<![CDATA[
<message>
Maximum disk usage exceeded
</message> |
|
|
StefanProject administrator Project developer Project tester Project scientist Send message
Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level
Scientific publications
|
Right, we increased now the limit. Should not happen much anymore |
|
|
tullioSend message
Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level
Scientific publications
|
First error
Stderr output
<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt> |
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
Yes I just checked my error numbers, 120 total, up from 55 before. So a lot of work units errored out before I started to ones that complete. I'm guessing they erred before he made the change. Will keep an eye on results to see if the new parameters make a difference on the completion and validations.
Thanks Stefan
____________
|
|
|
StefanProject administrator Project developer Project tester Project scientist Send message
Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level
Scientific publications
|
Yes, please inform me if they still crash due to disk space limitations (not due to upload file size limit, that's a different issue). |
|
|
|
Yes, please inform me if they still crash due to disk space limitations (not due to upload file size limit, that's a different issue).
I have 45gb free space on my VM for boinc, but this message:
Quantum chemistry needs 11087.61Mb more disk space |
|
|
StefanProject administrator Project developer Project tester Project scientist Send message
Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level
Scientific publications
|
Yes, we require now 60GB of disk space for one WU |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
To clarify: only a few WUs will actually use that disk space. But we need to set a maximum.
In summary, resource occupation for each QC:
* 4 GB memory max
* 4 Threads max (less if not available)
* Large-ish (up to 60 GB, likely much less) temporary disk space while running
Additionally:
* Moderate (~3 GB) disk space for downloading and storing the app (can be reclaimed resetting the project) |
|
|
|
Ok, i'll wait for a windows version, hoping in a less space request.. |
|
|