Advanced search

Message boards : Server and website : New acemd version under test

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52562 - Posted: 4 Sep 2019 | 14:19:32 UTC

The acemd3 app is again under test. It should work on windows (including RTX!).

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 34
Credit: 967,276,174
RAC: 610,853
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52563 - Posted: 4 Sep 2019 | 15:08:30 UTC

Reactivated my RTX 2080 on an i7 Windows 10 and unfortunately a NON-acemd3 task downloaded and errored out after 8 seconds. Hopefully, correctly, I excluded all GPUGrid tasks except acemd3 until conditions change.

_Ryle_
Send message
Joined: 7 Jun 09
Posts: 24
Credit: 1,138,093,416
RAC: 102,609
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52564 - Posted: 4 Sep 2019 | 15:26:53 UTC

Thanks Toni, I'm looking forward to it's release. I hope Linux version also will be released at that time. :)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52565 - Posted: 4 Sep 2019 | 16:05:25 UTC - in response to Message 52562.

The acemd3 app is again under test. It should work on windows (including RTX!).


Windows 8.1 RTX 2080ti error at start of WU. Wu loop until Suspend/resume is used. Error message occurs each time the Wu restarts.

http://www.gpugrid.net/result.php?resultid=21341937
http://www.gpugrid.net/result.php?resultid=21341954

Problem signature:
Problem Event Name: BEX64
Application Name: acemd3.exe
Application Version: 0.0.0.0
Application Timestamp: 5d6535ed
Fault Module Name: ucrtbase.DLL
Fault Module Version: 10.0.17134.12
Fault Module Timestamp: 587decd7
Exception Offset: 000000000006e75e
Exception Code: c0000409
Exception Data: 0000000000000007
OS Version: 6.3.9600.2.0.0.768.101
Locale ID: 1033
Additional Information 1: 723f
Additional Information 2: 723ff68f3f17ee5cfa26fbef8ee09749
Additional Information 3: 096f
Additional Information 4: 096f337e301f747985865265c5b96cfe

Profile [PUGLIA] kidkidkid3
Avatar
Send message
Joined: 23 Feb 11
Posts: 79
Credit: 953,228,044
RAC: 1,959,858
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52566 - Posted: 4 Sep 2019 | 16:57:56 UTC - in response to Message 52565.
Last modified: 4 Sep 2019 | 16:59:07 UTC

Hi all,
these are my setting :

ACEMD short runs (2-3 hours on fastest card): yes
ACEMD long runs (8-12 hours on fastest GPU): yes
ACEMD3 Beta: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no

Actually i have got only a long time WU, no acemd3 ... with 4 in queue !
Any suggestion ?

Thanks in advance
K.

edit : now 0 WU
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52567 - Posted: 4 Sep 2019 | 18:47:39 UTC

Task http://www.gpugrid.net/result.php?resultid=21342341

Errored immediately out:
Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)</message>
<stderr_txt>
# GPU [GeForce RTX 2070] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce RTX 2070
# ECC : Disabled
# Global mem : 8192MB
# Capability : 7.5
# PCI ID : 0000:1F:00.0
# Device clock : 1815MHz
# Memory clock : 7001MHz
# Memory width : 256bit
# Driver version : r430_00 : 43160
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

</stderr_txt>
]]>

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,306,959
RAC: 6,461,380
Level
Arg
Scientific publications
watwatwatwatwat
Message 52568 - Posted: 4 Sep 2019 | 20:24:24 UTC

You have to have the new acemd3 app enabled and the run test applications setting set. Did you get the new wrapper app for Windows acemd3 version 2.05?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52569 - Posted: 4 Sep 2019 | 21:17:02 UTC - in response to Message 52568.
Last modified: 4 Sep 2019 | 21:17:57 UTC

The app (v206) is out for Linux and Windows.

There has been a problem with units with -1-3- in their name (solved).

The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture").

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 52570 - Posted: 4 Sep 2019 | 22:14:57 UTC - in response to Message 52569.

The app (v206) is out for Linux and Windows.

There has been a problem with units with -1-3- in their name (solved).

The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture").


One of the bad ones:
https://www.gpugrid.net/result.php?resultid=21342392

3 other completed successfully in Linux.

One of my PC sran under cuda80 plan class and another PC with cuda100 plan class. Both with Pascal cards. The plan class is determined by compute capability of the driver I guess.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,195,021,966
RAC: 10,422,078
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52571 - Posted: 4 Sep 2019 | 22:52:04 UTC

I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).


http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32


The scheduler is still a problem.





Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,195,021,966
RAC: 10,422,078
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52572 - Posted: 5 Sep 2019 | 1:49:43 UTC - in response to Message 52571.

The -2-3- units using (cuda100) is the combination that finishes successfully. I had one more unit that was valid, the others failed.



I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).


http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32


The scheduler is still a problem.






Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52574 - Posted: 5 Sep 2019 | 7:05:30 UTC - in response to Message 52572.

I'm fixing things incrementally. Failing stuff may be resent and succeed.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 52575 - Posted: 5 Sep 2019 | 8:39:40 UTC
Last modified: 5 Sep 2019 | 8:57:04 UTC

Not getting any tasks on Linux, nothing but errors on RTX on Windows so far.

I have a Windows system with both GTX and RTX cards in it. Do I have to exclude non-ACEMD3 tasks for the RTX via cc_config?

EDIT: Looks like I've been getting some new tasks on Linux too, but they're erroring out:
http://www.gpugrid.net/result.php?resultid=21343717
http://www.gpugrid.net/result.php?resultid=21341872

Two more were validated on the same system (very short, though?).

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52576 - Posted: 5 Sep 2019 | 9:00:22 UTC - in response to Message 52575.
Last modified: 5 Sep 2019 | 9:02:35 UTC

Errors with "nelems != 1" were solved. Should go away sooner or later. Please ignore them.

All tests are very short (a few minutes) not to waste your time. They are however very important because I can see the behavior in many realistic card/app combinations.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52577 - Posted: 5 Sep 2019 | 15:49:54 UTC

Both computers with Windows 10 and lateste generation Nvidia Cards receive the following application: Long runs (8-12 hours on fastest card) v9.23 (cuda80). Both fail immidiately:
http://www.gpugrid.net/results.php?hostid=504655
http://www.gpugrid.net/results.php?hostid=512242

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 6,493,864,375
RAC: 2,796,812
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52578 - Posted: 5 Sep 2019 | 16:28:18 UTC

I completed 27 ACEMD v2.06 (cuda100) tasks without an error on Linux.

http://www.gpugrid.net/results.php?userid=5539

Profile [AF>Libristes] hermes
Send message
Joined: 11 Nov 16
Posts: 26
Credit: 710,087,297
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwat
Message 52579 - Posted: 5 Sep 2019 | 16:29:44 UTC - in response to Message 52577.
Last modified: 5 Sep 2019 | 16:31:31 UTC

I had 4 WU todays, 4 are OK.
Well done Toni !

On Arch Linux [5.2.11-zen1-1-zen|libc 2.29 (GNU libc)]
NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 435.21

One of them, a toni_test ;-)

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper: running acemd3 (--boinc input --device 0)
15:31:06 (15838): acemd3 exited; CPU time 63.446112
15:31:06 (15838): called boinc_finish(0)

</stderr_txt>
]]>

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,306,959
RAC: 6,461,380
Level
Arg
Scientific publications
watwatwatwatwat
Message 52580 - Posted: 5 Sep 2019 | 20:58:27 UTC

Still waiting on some new apps to go along with new work to test. Not lucky so far.

Profile [AF>Libristes] hermes
Send message
Joined: 11 Nov 16
Posts: 26
Credit: 710,087,297
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwat
Message 52581 - Posted: 6 Sep 2019 | 6:04:07 UTC - in response to Message 52580.

On error:

https://www.gpugrid.net/result.php?resultid=21349753

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper: running acemd3 (--boinc input --device 0)
EXCEPTIONAL CONDITION: /home/user/conda/conda-bld/acemd3_1566914012210/work/src/mdio/bincoord.c, line 193: "nelems != 1"
07:54:14 (1654): acemd3 exited; CPU time 1.975979
07:54:14 (1654): app exit status: 0x86
07:54:14 (1654): called boinc_finish(195)

</stderr_txt>
]]>

Post to thread

Message boards : Server and website : New acemd version under test

//