Author |
Message |
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
Target in sight....
Locking on to his tail pipes!!
Firing bananas!!
Reargunner to pilot.. fast moving object on our 6
Pilot to reargunner... Let loose with the oil slick and smoke!!
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
Go get Bob. |
|
|
|
Am I correct that this project runs with no cache at all? Even after increasing the resource share to 100 because SETI@Home is gone, I get:
Wed 01 Apr 2020 08:35:55 AM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 340500.27 seconds; 0.00 devices
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | [sched_op] Server version 613
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | No tasks sent
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | This computer has reached a limit on tasks in progress
This is with only two active tasks, a queue of one and one uploading, set for two days cache.
Another issue that is going to become prevalent with the influx of new power hosts is the size of the uploaded result files (3MB for one of them) choking the upload server. |
|
|
|
this project has a limit of 2 WU per GPU, and a max of 16 total in progress.
you might be able to get around that with Pandora's box, but I know Toni has said before not to try to get around the limits so maybe we can be nice here.
____________
|
|
|
|
Perfect...thanks Ian.
I wonder if using the previous spoof client to give all the computers 8 GPUs for the maximum of 16 cached each would be frowned upon... lol. Spoofing within policy. :^) |
|
|
|
it does work. When I first attached to the project I was still running the old spoofed client with 64 GPUs and it gave me the max 16.
I don't think anyone would be the wiser if you put the GPU count to 8. I have a legitimate 10-GPU system running. (but only 8 are assigned to GPUGRID) and the stderr txt file doesnt report how many GPUs the system has like SETI does.
____________
|
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
Not used to seeing any others here, lol. Well, guess it's something to get used to. Just read the entire thread. Welcome everyone...
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
I just set the Pandora config to 2X the number of gpus in the host. Same as what the project default is.
I had issues sometimes avoiding EDF on tasks when I was spoofed on gpus and carried 16 tasks in the cache.
Basically the project runs on "turn one in - get one" mechanism. I do want to try and stay at the 6 or 8 cache level because I have had many times where the stuck uploads prevented replenishing the cache and I still want to crunch tasks while waiting for the server disk congestion to clear the uploads. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
Except it stopped working overnight and my cache fell and wasn't being replenished because it never asked for work.
????? |
|
|
FreewillSend message
Joined: 18 Mar 10 Posts: 20 Credit: 32,717,282,894 RAC: 126,410,111 Level
Scientific publications
|
Hi Guys,
I've just started one PC on here and processed a few tasks. It looks like the GTX 1070Ti card gets more points/time than the RTX 2070 Super running "New version of ACEMD v2.10." Neither card is starved on the PCIe interface throughput. Has anyone else noticed that? If so, it seems like I should put my slower GPUs on this project and faster ones on E@H. |
|
|
|
you've only submitted a handful of tasks, and the tasks being distributed now can be a bit variable for runtime and credit received. I would give it more time and then check the averages after both cards have submitted a couple hundred tasks. what motherboard are you running with that system? which slots are the cards in?
the two cards also have similar CUDA core counts, the 2070S only has 128 more cores than the 1070ti though the RTX cards in general have 2x the SM count since they run 64 cores per SM vs the 128 cores per SM that Pascal had. It can come down to the application code also. its possible the acemd3 app scales more linearly with straight core count than with SM count which might be why they perform similarly. whereas petri's SETI code seemed to scale more with SM count, which is why his code ran better on the 2070 (36SMs, 2304 cores) than on a 1080ti (28SMs, 3584 cores).
just some things to keep in mind.
____________
|
|
|
FreewillSend message
Joined: 18 Mar 10 Posts: 20 Credit: 32,717,282,894 RAC: 126,410,111 Level
Scientific publications
|
Thanks for the points, Ian. I am seeing that each new task has a different run time on the same card. And, the 2070 Super was finishing much faster, but the first cases on each were 4.6 pts/sec for the 2070 and 8.0 pts/sec for the 1070Ti. I'll keep watching to get more signal/noise. :) |
|
|
|
I'm dabbling a bit with some further power reductions and efficiency boost.
took my 7x2070 system, which was running all cards already power limited to 165W (stock 175W).
Old settings:
7x RTX 2070
power limited all cards to 165W
+75 core OC
+300 mem OC
New settings:
power limited all cards to 150W (9.1% reduction)
+100 core OC
+400 mem OC
and comparing the averaged data from valid results (discarding statistical outliers in both cases), it looks like overall production dropped by only 2%, while I reduced power draw by about 9% and temps of the cards also dropped by about 5-6C across the board, which will be welcomed as we move into the summer months. testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can.
I'll probably dabble around with this on the 10x2070 system also after I receive the 2 special risers I need (in the mail from China, ETA unknown). I'm running that system also power limited a tiny bit (higher end cards 185W TDP stock, PL'd to 175W atm).
More efficiency can be had by power limiting deeper, but I don't really want to give up too much raw performance.
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
I'm still waiting on another flowmeter I ordered from China over a month ago. Other than China post saying it is in the system, no further progress.
Sure hope that this new one lasts longer than all the other ones I've tried and had fail very fast. |
|
|
|
testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can.
+125 core/+400 mem got me back that 2%. so now it's performing the same at 150W as it did at 165W (x7). with cooler temps and cuts about 100W off the system power draw. win-win if it can stay stable. it's run for 2 days now at 150W so at least the +100/+400 and the +125/+400 settings seem stable.
I run fan speeds static at 75% for all cards, temps range from about 50C on the coolest card, to 60C on the hottest card.
trying +125core/+600mem now to see if it speeds up or not. memory speeds aren't really throttled by power limit, but the increased power required on the mem OC might cause the core clocks to drop and might drop performance. I'll evaluate the results tomorrow.
____________
|
|
|
|
+125/+600 showed slight decrease in production. (very slight) probably due to the power situation I mentioned in my previous post. I did see a very slight bump in average core clock speeds (visually) when I reduced the mem OC from 600 to 400. It doesn't seem that GPUGRID benefits much from memory OC.
so i think PL 150W, +125core/+400mem is a nice setting for these 2070 cards.
____________
|
|
|
|
Low credits yesterday?
Looking at most hosts they showed about 20-30% reduction in credits yesterday as compared to the past few weeks. Was the project down part of the day yesterday or something? Or did they have a string of low paying WUs or something? I don’t check my systems as diligently since they have been so stable, but I don’t see that any of them had any issues yesterday.
____________
|
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
Don't know. I have all computers down until I find a new job. Hope all is well. TTYL
____________
|
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
So restart a computer back here. Only a 2 GPU machine. It tried to run Python and failed miserably. So now it's running ACEMD. Temps on the top GPU is 52 C. Will need to keep an eye on that. I have Einstein set as back up. Putting out a small amount of heat. Hope it will help move the cold air out of the main room. Will see
Z
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
I would avoid the experimental python tasks for now.
Maybe in a few months, the admins and scientists will figure out a workable configuration set for all hosts.
You can use the Pandora client configuration file to up your cache to the maximum 16 allowed.
This is my pandora_config file snippet for this project. Courtesy of Ian.
project: https://www.gpugrid.net/
gpu_serverside_limit: 2
gpu_spoof_tasks: 16
gpu_limit: 16
request_min_cooldown: 180
|
|
|
|
FYI, you can go beyond 16, barring any other issues like the absurdly long run times preventing downloading more.
but under normal circumstances, you can download as many as you like. I had my faster hosts doing a cache of 80 (with the same pandora settings format keith listed there, just swap "16" with your target cache size).
but beware, this project gives a +50% bonus for tasks returned within 24hrs, so it's detrimental to cache tasks longer than this. play it safe. if you're not running the Python tasks, targetting 0.75 days return time is pretty safe and gives you a nice buffer for when the project goes down occasionally.
I'm still running both task types, but it takes time for things to even out.
____________
|
|
|
Zalster Send message
Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level
Scientific publications
|
I would avoid the experimental python tasks for now.
Maybe in a few months, the admins and scientists will figure out a workable configuration set for all hosts.
You can use the Pandora client configuration file to up your cache to the maximum 16 allowed.
This is my pandora_config file snippet for this project. Courtesy of Ian.
project: https://www.gpugrid.net/
gpu_serverside_limit: 2
gpu_spoof_tasks: 16
gpu_limit: 16
request_min_cooldown: 180
Thank Ian and Keith, for now I'm leaving it as is. I swapped out the intake fan in the back for a be quiet 3 140mm and ordered a be quiet 120mm for the front intake fan. Hopefully that will be enough to move some air to keep that top GPU temps down. The bottom GPU is only at 38C
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
Yes, my slowest host has a turnaround time of .43 days.
Well within the 24 hour bonus return window. That's with a 16 task cache and also running Einstein and Milkyway gpu tasks concurrently also.
The project basically runs on 1 for 1 return/replacement mechanism.
I don't have 10 gpus running like Ian on a single host though.
Max is 4 gpus and mostly 3 on most hosts.
16 task cache is fine for me. |
|
|
|
the Python tasks are reaching insane levels of credit reward.
with the last round of Python tasks, I was able to determine that credit reward was ONLY a function of runtime and peak flops reported from the device. I also determined that faking your peak flops value to be higher would result in more credit. and third there seems to be a fail-safe credit limit where if exceeded would result in a default credit rewarded regardless of runtime or flops (if your combined credit reward came out to be greater than about 1,750,000, you get hit with the penalty value. at first this penalty was 20.83 credits, then it increased to 34,722.22 credits with the change from 3,000 to 5,000,000 GFlop task estimate.
on the last round of python tasks last week, a 2080ti would earn about 100cred/second of runtime. that's corresponding to a gpu peak_flops value of about 14 TFlops.
on this current round, a 2080ti is earning about 1000credit/second of run time (and with a bit more variance than before). easily 100x the credit reward of an MDAD task per unit time.
another observation is that for some unknown reason, some hosts are earning disproportionately less credit per unit time, and that's with already excluding tasks that his the 34,722 barrier. It seems that peak flops and run time aren't the only factors anymore, but I can't determine exactly why some hosts are getting HUGE reward, and some are not.
for example,
1. both of my 2080ti hosts are earning about 1000cred/sec on the tasks that stay below the penalty threshold.
2. my 2070 system which has about half of the peak_flops value, earns about 350cred/sec (less than the half you'd expect based on peak flops).
3. my GTX 1660 Super (in the RTX 3070 host as device 1) is earning about 71cred/sec, which is about the same as it earned before when it was using the peak flops value from the RTX 3070. but that's even WAY low if the new tasks were using the per device flops value (which it wasnt before).
So, I'm enjoying the credits, but obviously something isn't really being calculated fairly here since credit earned isnt consistent with work performed across all hosts.
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1354 Credit: 7,801,839,872 RAC: 9,169,925 Level
Scientific publications
|
I sure hope they can properly debug this python app. Would be nice to have an alternate application and task source other than acemd3.
I hope they can sort out the credit situation also. I have no complaints with the acemd3 credit awards. |
|
|
|
They’ve already made an improvement to the app, at least in getting the efficiency back up. With the last round of Python, it was similar to the Einstein GW app where the overall speed (GPU utilization) seemed to depend on the CPU speed, and it used a lot more GPU memory, and very little GPU PCIE. However with this latest round of Python tasks, they are back to the same basic setup as the MDAD tasks with low GPU mem use, good 95+% GPU utilization even on slow CPUs, and PCIe use is back up to the same as MDAD. So at least that’s better.
The MDAD credit scheme seems to work well IMO. I too hope they figure it out. Maybe they aren’t super concerned with credit on these since they are beta tasks.
Also still waiting on the app for ampere support :(
____________
|
|
|