GTX 970, 2 WU (Long), 2 CPU.

Message boards : Graphics cards (GPUs) : GTX 970, 2 WU (Long), 2 CPU.

Author	Message
Francois Normandin Send message Joined: 8 Mar 11 Posts: 71 Credit: 654,432,613 RAC: 0 Level Scientific publications	Message 41485 - Posted: 5 Jul 2015 \| 18:43:29 UTC
	I average 19 hours each for 2 WU (long one) on my GTX970 with a dedicated cpu core for each, did someone know if this seem ok performance? GERARD_FXCXCL12_LIG_1035426 Also Running Rosetta@home on 6 core. GPU load 84% Overcloked to 1440mhz Stable. +150mhz on memory. Windows 7 64bits Fx-8350 4ghz Asus m5a97 r2.0 8gig ram
	ID: 41485 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 41487 - Posted: 6 Jul 2015 \| 8:38:17 UTC - in response to Message 41485. Last modified: 6 Jul 2015 \| 8:45:12 UTC
	I average 19 hours each for 2 WU (long one) on my GTX970 with a dedicated cpu core for each, did someone know if this seem ok performance? GERARD_FXCXCL12_LIG_1035426 GPU load 84% Overcloked to 1440mhz Stable. +150mhz on memory. Windows 7 64bits Fx-8350 4ghz Asus m5a97 r2.0 8gig ram 19 hours is a bit too long for this GPU, it should be around 12 hours (even on a WDDM OS like Windows 7). Also Running Rosetta@home on 6 core. Rosetta@home has the most demanding CPU app regarding CPU and memory usage (bandwidth & working set size), so this could be the reason for the GPU tasks taking this long to finish on your host. If the rosetta@home project usually grants less credits for the finished tasks than your host claims, it is a sign of that host is overcommitted. See your host vs my laptop (recently my laptop runs only CPU tasks). As GPU tasks are more rewarding, I usually prioritize these (i.e. I reduce the CPU tasks running until the GPU tasks don't suffer the lack of bandwidth). 8 CPU cores need a lot of RAM bandwidth, so this CPU's dual channel memory controller could be a serious bottleneck while using all cores for running many instances of the same demanding application (like rosetta@home's). If you have only one RAM module in this host, I suggest you to put in a same one to achieve dual channel memory (as it will really double the RAM's bandwidth).
	ID: 41487 \| Rating: 0 \| rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 13,920,977,393 RAC: 9,448,390 Level Scientific publications	Message 41488 - Posted: 6 Jul 2015 \| 9:20:46 UTC
	Not sure but I think he is saying that 19 hours is running two WUs at the same time. If it were the case, it would not be that bad, but just guessing.
	ID: 41488 \| Rating: 0 \| rate: / Reply Quote

Francois Normandin Send message Joined: 8 Mar 11 Posts: 71 Credit: 654,432,613 RAC: 0 Level Scientific publications	Message 41489 - Posted: 6 Jul 2015 \| 10:16:03 UTC
	Yes, my bad. Two WU at the same time completed in 19-20 hours of work. (10hours/Wu) On the cpu, 2 core around 50%, and 2 core around 90% and the last 4 core at 99%-100%. (the cpu seem to feed the card, cpu usage 80%) Ram are 2 x 4gig. Will test later if Rosetta@home kinda kill something on gpugrid side.
	ID: 41489 \| Rating: 0 \| rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 13,920,977,393 RAC: 9,448,390 Level Scientific publications	Message 41491 - Posted: 6 Jul 2015 \| 11:05:50 UTC
	It would be nice if some people with the right cards could do this same exercise with the GTX 980 and GTX 980 Ti, i.e two simultaneous Gerard WUs in a single card. The results will be interesting to reevaluate the cost/efficiency relation among these three card models.
	ID: 41491 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 41492 - Posted: 6 Jul 2015 \| 11:37:05 UTC - in response to Message 41489.
	Then it's ok. :)
	ID: 41492 \| Rating: 0 \| rate: / Reply Quote

Francois Normandin Send message Joined: 8 Mar 11 Posts: 71 Credit: 654,432,613 RAC: 0 Level Scientific publications	Message 41496 - Posted: 6 Jul 2015 \| 18:49:07 UTC
	Thanks Trotador, if i rememeber i passed from 71% gpu load to 84%, so maybe 10% gain? My only problem is gpugrid let me just donwload two WU at a time, so when 1 finish one, i have to wait during the upload and donwload of the next one before starting crunching again at (full) gpu load. Thanks Retvari Zoltan*.
	ID: 41496 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 41504 - Posted: 8 Jul 2015 \| 5:33:56 UTC - in response to Message 41496. Last modified: 8 Jul 2015 \| 6:08:59 UTC
	For Ref/Comparison *970 on WinXP (one task at a time): GERARD_FXCXCL12_LIG_6644051-0-1-RND1102_1 34,159.06 (~9.5h) 255,000.00 NOELIA_ETQunboundx2-0-2-RND2064_0 14,912.33 (~4.25h) 75,000.00 970 on Win7 (two tasks at a time)†: GERARD_FXCXCL12_LIG 75,167.70 (~20.9h) 255,000.00 NOELIA_ETQunboundx1 23,301.72 (~6.5h*) 75,000.00 Slower system (CPU/system bus) only 90% power, 85% GPU usage. † I've these GPU's throttled to 80% power, though they sneak an extra 5% (roughly 7% slower but using 15% less energy each [based on 2 task at a time runtimes]; ~9% more efficient on top of any overall performance gain from running 2 tasks at a time). Note that two tasks at a time doesn't (and can't) make up for the WDDM overhead. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 41504 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 41505 - Posted: 8 Jul 2015 \| 10:26:31 UTC - in response to Message 41504. Last modified: 8 Jul 2015 \| 11:14:15 UTC
	PCI2.0 x4 Vs. PCI3.0 x8 GTX970 -- one task at a time Win8.1 ref/comparsion: My 970 is pushed to it's ACEMD stable OC limit: 1519MHz NOELIA's and 1506MHz GERALD. No CPU tasks compute nor is SWAN enabled. Everything is the same except PCIe width: 3.0 x8 lane NOELIA_ETQunbound: 13258sec (135W) 73% core 2.0 x4 lane NOELIA_ETQunbound: 16903sec (120W) 73% core -- NOELIAs >20% slower on 2.0 x4 -- GPU core temps rose another 4C with 3.0 x8 compared to 2.0 x4. (30C Ambient) -- +20C delta between core and ambient -- the average delta used to be +14 to 18C depending on ambient. If ambient is 20-25C - the GPU core would be 35-45C depending on ACEMD WU type. 2.0 x4 lane GERARD_FXCXCL: 46500sec (140W) 72% core 3.0 x8 lane GERARD_FXCXCL: estimated 36500sec (160W) 78% core --2.0 x4 is slower >20% again for GERALD.
	ID: 41505 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : GTX 970, 2 WU (Long), 2 CPU.

	About	Science	Volunteers	Performance	Forum	Join us	Donate