Author |
Message |
|
I don't normally complain about credits, but what is fair is fair, and this is not. My computer 74707 finished a unit within 24 hours and didn't get the 50% bonus. See link below.
http://www.gpugrid.net/workunit.php?wuid=3002036
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This is a known problem with the new server/credit system.
When you get sent a resend, and both you and the first recipient return the task around the same time, after 5days from the original date of sending, the credit is averaged. My take on this is that it's a default routine in the server code, and would require significant testing, modification and debugging to sort out. It strikes me that the code is still very much CPU centric. This was looked into but at the time there was no easy solution.
This situation does not occur very often. Probably <<1% of tasks/credit are affected this way. In your case a GTX590 did not return the task until after 5days. Probably a seasonal/holiday thing. With the old system they would not have received credit. It might also be the case that this requires the original recipient to return/report the task before you. So the window of opportunity for this to occur is quite small.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
RayzorSend message
Joined: 19 Jan 11 Posts: 13 Credit: 294,225,579 RAC: 0 Level
Scientific publications
|
skgiven;
Thanks for explaining the error situation. I also lost credit on this one too;
http://www.gpugrid.net/workunit.php?wuid=2981843 |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
Thanks for reporting. As skgiven explained, it's a problem which occurs when a slow card computes a result very close to the deadline, so the task is recreated. Even if the second user does the task in time, he gets the same credit as the first.
I'm looking at this, but the fix involves a change in the validation state machine and is therefore a very delicate matter. |
|
|
|
Well, it happened to me again! See link below.
http://www.gpugrid.net/workunit.php?wuid=3103283 |
|
|
|
Still occurring :(
This issue has been known for sometime,please correct it.
WU here
Thanks. |
|
|
|
What kind of points are you supposed to get from the long runs?
My 470@800core usually churns through the wu's in between 4.2-4.5hours, while the odd balls take 3.8hours or 5.5hours. I don't care about the points that much but was just wondering :D
Looks pretty much like this always
E: oh skgiven, noticed you have 470's also, you have an opinion on the "best" driver for 470 when running gpugrid?
I'm using 285.62 currently and seems to be working very nice indeed.
And I know I know, if ain't broken don't fix it! :D |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Exactly! The best driver is the one you have, if it works ;P
260.99 is one of the earliest drivers that works with the present apps - I use this on one system.
Typically driver updates contain code to facilitate games better and fix the odd bug. While there might occasionally be an improvement from a driver, in theory, more often the improvements come from the developer apps. Quite often the extra driver code slows the drivers down; so the earlier the driver the better, unless the driver brings known improvements (faster or more stable). In the past it's been shown that you can gain a few percent in performance by using earlier drivers.
Obviously if you game then read the driver release notes and see if there is anything in the update for you specifically.
Don't forget to check your monitor too. There is no point updating to gain improvements at a resolution your monitor can't reach.
If you don't game then don't update because its probably not going to improve performance and there is always a chance that it will destabilize your system.
The more updates you install the closer you are to a failure, and when it arrives it might require substantial maintenance. Uninstall, driver sweep, reinstall, and you may even need to reinstall the operating system.
Sometimes if you have a card specific issue there might be a fix:
[GeForce GTX 590]: The GPU fan randomly spikes to 100% and then takes a few
seconds to return to normal.
Then there are some things that get taken away:
[NVIDIA Control Panel]: With extended mode enabled, there is no edit and delete
option on the NVIDIA Control Panel‐> Change Resolution‐>Customize window after
creating a custom resolution.
You would need to trawl through the release notes for such info.
295.73 is the latest driver. I know there are some reports of issues with this driver (well, when it was in its beta form) so I'm going to test this now. Some of the things that come with it I view this with caution; 'new physX software'. It's more likely to be NVidia getting ready for GF600 than helping anyone now!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Been getting a tad weirdish points from the long runs.
Instead of the usual 35811 or something close to that I've been getting this kind of points.
Not that I care much but someone else might be getting similar and might care so figured I'd report here lol. Although I forgot to put my 470 to crunching clocks before I left for the weekend so it's at stock but that shouldnt affect points, only runtime afaik. Sorry on my phone cant rotate the scrn shot, took it from phone browser.
|
|
|
|
It happened to me again. See the link below:
http://www.gpugrid.net/workunit.php?wuid=3336817
|
|
|
|
It happened again! See link:
http://www.gpugrid.net/workunit.php?wuid=3365773 |
|
|
|
Here is the fifth (third in the last month or so) that I didn't receive the bonus for finishing a task within 24 hours. Maybe this problem is more common.
http://www.gpugrid.net/workunit.php?wuid=3376850 |
|
|
|
Seems to accord to resended wus? Perhaps it is taking the credits from the guy above. Only as hint for Admins ;)
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
|
And now they are credited less than before ?
http://www.gpugrid.net/result.php?resultid=5347036
9 May 2012 | 10:04:38 UTC 10 May 2012 | 5:11:01 UTC
67,666.98 / 86,850.00
http://www.gpugrid.net/result.php?resultid=5351603
10 May 2012 | 23:33:00 UTC 11 May 2012 | 20:04:18 UTC
9 May 2012 | 10:04:38 UTC
72,725.98 / 57,900.00
The second is a bit slower than the other but well within 24hrs.
Has the crediting formula changed or perhaps I misunderstood it before : receive and sent back within 24hrs gives a 50% bonus and within 48hrs a 25% bonus ? Or have I missed some news regarding this ? |
|
|
|
And now they are credited less than before ?
http://www.gpugrid.net/result.php?resultid=5347036
9 May 2012 | 10:04:38 UTC 10 May 2012 | 5:11:01 UTC
67,666.98 / 86,850.00
http://www.gpugrid.net/result.php?resultid=5351603
10 May 2012 | 23:33:00 UTC 11 May 2012 | 20:04:18 UTC
9 May 2012 | 10:04:38 UTC
72,725.98 / 57,900.00
The second is a bit slower than the other but well within 24hrs.
Has the crediting formula changed or perhaps I misunderstood it before : receive and sent back within 24hrs gives a 50% bonus and within 48hrs a 25% bonus ? Or have I missed some news regarding this ?
Your second workunit is a reissued one (the 5 day deadline was over), and you had bad luck with it, since the first host returned it before your host, and this canceled the 24h bonus on your host too. This is a known bug on the server side. |
|
|
|
And this known bug will be fixed when ?????
To prevent this from re-occuring again you'll have to check the wu-deadline before going on. |
|
|
|
Four months later and still no fix.
Seems rather disrespectful to the volunteer crunchers imo to react that inactive.
First I look for a interesting project (medical mostly) and then I get competitive, DC costs me about 60% of my annual electrical bill so I expect the people behind this project to do their best to keep it running as expected and should be, but this doesn't give me a good feeling.
I just lost one bonus but it's more the inactivity that restricts my staying with this project or just hop on.
And lets be clear on one thing : It's not our responsibility to take care of the software for a good running project. We can only donate our idle cpu-time.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This credit glitch is down to Boinc's new (>2years old) credit system and the Boinc server software. GPUGrid is a production research project, not a developer of server software or credit systems. It therefore relies on the server kit to have as few such issues as possible. A solution was looked for but not found. It's not likely to be fixed until new server side software containing a fix is used.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
So just check wether the first distribution of the WU wasn't to long ago when yous systems starts with it.
|
|
|
|
And this known bug will be fixed when ?????
To prevent this from re-occuring again you'll have to check the wu-deadline before going on.
Complain long enough and loud enough, and it will get fixed, sooner or later.
I doubt that very much. After getting some more info from my teammates it's clear the people behind this project lack enough respect for this projects participants imo. And that's more of a disappointment than loss of credits. Bye. |
|
|
|
And this known bug will be fixed when ?????
To prevent this from re-occuring again you'll have to check the wu-deadline before going on.
Complain long enough and loud enough, and it will get fixed, sooner or later.
I doubt that very much. After getting some more info from my teammates it's clear the people behind this project lack enough respect for this projects participants imo. And that's more of a disappointment than loss of credits. Bye.
This is an annoying bug from the cruncher's point of view, but it has a very little impact on the project's performance.
Just look at it from the project's point of view:
The first host returned the result after the 5 days deadline, but the workunit was active (because of the reissue), the first result was validated, and credits were granted without any bonus for the first host. After that, the second host's work became redundant, so the second result has no use for the project, therefore its value is zero credits. The project should cancel the workunit on the second host after the first one returns the result. In this case no credits would be granted for the second host, and this would be a much more disappointment for the cruncher. So for this 'bad luck' workunits you should take the normal credit as a compensation for your efforts, but because the result is redundant, it's quite logical not to receive bonus for it. We are here for making progress, not for credits. Credits are meaningless in the end.
I think there is a workaround for this problem: long workunits should not be sent to hosts with low RAC, or hosts with more than 2 user canceled workunits. This would increase the overall throughput of the long queue. I very often get workunits resent 6 times. If the 6th resend fails too, the workunit 'dies', so the previous work with that workunit is lost. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I also have a few ideas for workarounds but I don't expect the researchers will have time, especially when developing new apps.
Basically, no resend's! Tasks that fail are 'rebuilt' rather than rescheduled, and in a way that they are treated as new, high priority tasks. These only go out to clients that can return them in a shorter time. The server would still need to keep track of the tasks, and if they fail say 5 times, don't rebuild them. After each rebuild it could have a shorter deadline. Repeat rebuilds would not count towards errors.
As Boinc works on a per app basis, these could even go into an opt-in resends/rebuilds feeder with some additional credit bonus for returning these WU's. Preferably the opt-in would only be available to systems with high success rates.
The 7.0.2x clients should automatically report completed work for GPUGrid, reducing this problem.
I will move these posts later, to the appropriate thread.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
These types of priority/scheduling schemes were debugged in the 1960s/70s in computer science departments around the globe.
That the research teams involved are not aware of the decades-old elegant solutions is but a black mark on their inter-departmental liaison skills/efforts.
And that contributors have to flail around in the dark to re-invent the obvious s a frustration for me personally, having learned these pragmas from almost infancy. |
|
|
ritterm Send message
Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level
Scientific publications
|
It doesn't sound the same to me, but did I just become a victim of this same problem?
Workunit 3445340 |
|
|
|
It doesn't sound the same to me, but did I just become a victim of this same problem?
Workunit 3445340
No, I've received the same credit (53700) as you have.
It seems that these MJHARVEY_MJH120523 workunits give less credit than expected.
5411968
5411890
5411871
5412050 |
|
|
ritterm Send message
Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level
Scientific publications
|
No, I've received the same credit (53700) as you have.
It seems that these MJHARVEY_MJH120523 workunits give less credit than expected.
Okay, thanks, RZ. I don't remember getting any MJHARVEY's in the past and had nothing to compare it too. |
|
|
|
Here is number 6 for me:
http://www.gpugrid.net/workunit.php?wuid=3440185
|
|
|
|
And here is number 7:
http://www.gpugrid.net/workunit.php?wuid=3499241 |
|
|
|
I don't know that this is the same issue,
but I just completed http://www.gpugrid.net/workunit.php?wuid=3734690
As you can see I was the third computer to receive this task, and I completed it just before 24 hours from when it was originally sent, but I was the first (and only) to complete it
No bonus points awarded... any ideas?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I have no idea why the bonus was not awarded.
The task was even returned within 24h of originally being sent to the first client.
Some glitch in the crediting system.
Are these resends of a previous batch still in the system?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Are these resends of a previous batch still in the system
I don't know. Is there a way to tell?
It is an odd one isn't it?
This task was created on 25 September, but was not sent until 25 Oct.
Unless the creation date for the task got messed up (possible since the dates are exactly 1 month apart) |
|
|
|
There is nothing unusual about this. The NATHAN_RPS units have 60,900 credits with the 24 hour bonus. They take a little over 5 hours to complete on my computers. See examples below:
http://www.gpugrid.net/workunit.php?wuid=3786901
http://www.gpugrid.net/workunit.php?wuid=3786456
http://www.gpugrid.net/workunit.php?wuid=3785935
http://www.gpugrid.net/workunit.php?wuid=3785733
You aren't being short changed. |
|
|
|
It's not one of those units though.
it took 46225sec to complete.
It's weird, if you look at the creation date of the workunit,
it was over a month ago, yet was only sent out for the first time
2 days ago.
anyway, I don't really mind, just thought I would let the admins
know in case there was a wider issue with a batch of old
workunits appearing in the system.
|
|
|
TheFiendSend message
Joined: 26 Aug 11 Posts: 99 Credit: 2,500,112,138 RAC: 0 Level
Scientific publications
|
It's not one of those units though.
It is one of those units..... and you got full bonus for it. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I see what the issue is; it ran using the old cuda3.1 app,
I2R96-NATHAN_RPS1120528-65-166-RND2961
5970734 128855 26 Oct 2012 | 7:03:20 UTC 26 Oct 2012 | 20:16:32 UTC Completed and validated 46,225.16 1,619.07 60,900.00 Long runs (8-12 hours on fastest card) v6.16 (cuda31)
There are several Nathan tasks of different run times,
I4R49-NATHAN_RPS1120801_N-59-111-RND7357_0 3777104 20 Oct 2012 | 9:22:23 UTC 20 Oct 2012 | 20:05:32 UTC Completed and validated 36,696.43 1,252.14 91,200.00 Long runs (8-12 hours on fastest card) v6.16 (cuda42)
I1R25-NATHAN_RPS1120913_respawn2-23-100-RND7674_0 3773979 18 Oct 2012 | 6:06:18 UTC 18 Oct 2012 | 17:48:51 UTC Completed and validated 40,229.50 1,414.98 101,400.00 Long runs (8-12 hours on fastest card) v6.16 (cuda42)
I3R143-NATHAN_RPS1120528-75-166-RND3071_0 3721689 27 Oct 2012 | 5:53:08 UTC 27 Oct 2012 | 13:04:28 UTC Completed and validated 24,182.60 924.84 60,900.00 Long runs (8-12 hours on fastest card) v6.16 (cuda42)
This must have been one of the last type, but just ran slower on the 3.1 app.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Forgive my stupidity but do you get the bonus for your gpu completing it within 24 hours run time or do you get the bonus for completing it within 24 hours from the date it was downloaded? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The 24h is from download time to Report time.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,720,986,736 RAC: 1,886,867 Level
Scientific publications
|
Just got one from the ultra-long NOELIAs "18x30_17-NOELIA_hfXA_long-0-2-RND2102_1"
(http://www.gpugrid.net/workunit.php?wuid=3975643)
A little bit frustrating after 11.4 hours of crunching and always babysitting the up-load. But we keep the spirits up!
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It looks like we are over the hill with this batch of tasks; the number of tasks on the server has been falling for a couple of days, which means we are more than half way through the batch. In about a week we should start to see the last of the batch, and within a fortnight most resends will be back. At that stage we might see some other tasks while the researchers analyse the performance of this new type of research method. After that, a similar or modified version of this type of batch might turn up, or not, depending on the methods performance...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
ziorigaSend message
Joined: 30 Oct 08 Posts: 46 Credit: 502,232,425 RAC: 3,757,737 Level
Scientific publications
|
My "Long runs" cases:
Task 6295312 - received 5 Jan 2013 | 4:09:59 UTC -
reported 5 Jan 2013 | 19:03:15 UTC
that is only 15 hours for 90100 credits
task 6291504 - received 4 Jan 2013 | 4:19:15 UTC -
reported 4 Jan 2013 | 21:31:16 UTC
that is few minutes after 17 hours for 135150 credits
task 6281411 - received 31 Dec 2012 | 23:02:42 UTC
reported 1 Jan 2013 | 23:06:06 UTC
that is 24 hours for 112625 credits
Every task used near 36000 seconds (35735 - 35965)
There is no correlation between those cases !!!!!
|
|
|
TheFiendSend message
Joined: 26 Aug 11 Posts: 99 Credit: 2,500,112,138 RAC: 0 Level
Scientific publications
|
My "Long runs" cases:
Task 6295312 - received 5 Jan 2013 | 4:09:59 UTC -
reported 5 Jan 2013 | 19:03:15 UTC
that is only 15 hours for 90100 credits
task 6291504 - received 4 Jan 2013 | 4:19:15 UTC -
reported 4 Jan 2013 | 21:31:16 UTC
that is few minutes after 17 hours for 135150 credits
task 6281411 - received 31 Dec 2012 | 23:02:42 UTC
reported 1 Jan 2013 | 23:06:06 UTC
that is 24 hours for 112625 credits
Every task used near 36000 seconds (35735 - 35965)
There is no correlation between those cases !!!!!
The two WU's that have lower credit have occurred for the following reasons:-
Task 6295312 - It was a re-issued WU and somebody else returned the same WU before you- you didn't qualify for the 24Hr bonus.
Task 6281411 - you missed the 24 hour bonus deadline by 4 minutes. |
|
|
|
This has happened again, see link:
http://www.gpugrid.net/workunit.php?wuid=4432240
The 8th time for me, though it's been almost a year since the last time.
|
|
|
|
This time, I didn't get the bonus for 2 units, on 2 computers, on the same day. See links below:
http://www.gpugrid.net/workunit.php?wuid=4479055
http://www.gpugrid.net/workunit.php?wuid=4477522
|
|
|
|
Same here, there must be a bug;
http://www.gpugrid.net/result.php?resultid=6911246
WU completed within 24hrs yet only 111k v the usual 167k
Given Boinc uses MySQL, would have thought it would be pretty straight forward to rectify? |
|
|
|
In all 3 cases you fell victim to this old bug:
TheFiend, a few posts above yours wrote: It was a re-issued WU and somebody else returned the same WU before you- you didn't qualify for the 24Hr bonus.
Sorry!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
hmm, not very reassuring, a year and a half since the first post and no solution.
|
|
|
|
And here we go again:
http://www.gpugrid.net/workunit.php?wuid=4495567
|
|
|
|
I think you could minimize the chance of this happening to your hosts by setting your workunit queue to as low as possible (0.1 or 0.01 days).
Maybe you have done it already, because I see your hosts have only as many workunits in progress as many GPUs they have.
It used to happen to my hosts as well, but I don't mind. This is not a big deal. |
|
|
|
I already have it set at .05 days, which means that I have a little more than 1 hour between the time the current unit finishes and the new unit starts crunching, and some of these units take up to 12 hours to complete on a windows 7 computer. I don't think, cutting the margin down further is going to help the situation that much. |
|
|
|
If one sees the bonus credit as actual "bonus", things are not that bad. Sure, it's never nice not to get the bonus deespite deserving it, but the actual credit calculation was done for the base credits without bonus (actually, there's another bonus for choosing the long-runs over short-runs).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I thought I resurrect a thread from the past about a problem that still annoys me.
People get credit for late finishing tasks, while other people who were issued that task not getting full bonus for finishing that task with 24 hours, just because the first person finished before the second person, though after the 5 day deadline.
Yes, this a minor issue, but it is woefully unfair and annoying. It makes the rules meaningless, and the project looks bad.
name e26s102_e17s57p0f25-GIANNI_D3C36bCHL1-0-1-RND0879
application Long runs (8-12 hours on fastest card)
created 28 Aug 2016 | 5:46:39 UTC
canonical result 15258535
granted credit 351,400.00
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
15258535 231723 28 Aug 2016 | 10:16:46 UTC 3 Sep 2016 | 6:53:42 UTC Completed and validated 479,614.86 32,224.01 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15266311 263612 2 Sep 2016 | 11:24:37 UTC 3 Sep 2016 | 10:32:59 UTC Completed and validated 64,702.78 64,575.41 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
http://www.gpugrid.net/workunit.php?wuid=11709767 |
|
|
|
It happened on one of my hosts lately, and I've checked my other hosts to find such tasks. While doing this I came to the conclusion that my previous advice was wrong (to set a very short queue), and if you want to minimize the number of such tasks on your host you should set a cache size to the time it takes to process a workunit (or longer, but it doesn't matter). In this case there's a chance that the first host returns the result while it's in your queue, and the server will cancel it on your host. This way you can give the original host another 8-12 hours to process the workunit. This can be done only on the fastest hosts, as if you have a GPU which can finish a task just a little under 24h, then you can't have a +1 task long queue because you'll miss the deadline of the full bonus.
This is an annoying bug, but it affects everyone to the same extent, so it does not cause imbalance in credit earnings. |
|
|
|
I have noticed one task about a week back now that I knew went over the 5 day period and saw it got credit. I was confused but since it only happened once, I thought there might just be a reporting issue and might even get corrected as time moved on to remove the credit. Now I see the project gives the credit if returned late and I probably stole some points from someone who got the task after mine reached the 5 days and before I reported. I will keep this in mind on my weaker systems and make sure if a task will not complete, dump it early and let someone else get full credit. I just dumped one today after a day of running because it was going to go about 6 days. Had I known in the past that the project would still give credit, I would have dumped less, but not also with this credit averaging oddity as well.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org |
|
|
|
Here is another example:
http://www.gpugrid.net/workunit.php?wuid=11707537
15254583 358064 25 Aug 2016 | 15:03:32 UTC 30 Aug 2016 | 15:03:32 UTC Timed out - no response 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15262169 158961 30 Aug 2016 | 15:34:38 UTC 4 Sep 2016 | 17:01:02 UTC Completed and validated 148,016.14 32,485.84 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15269533 263612 4 Sep 2016 | 16:30:31 UTC 4 Sep 2016 | 20:11:04 UTC Aborted by user 10,646.69 10,617.28 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65)
If a crunchers misses the 5 day deadline, that task should be canceled before being sent out to the next host. Since there is no point to redundancy, I aborted the task, so I can crunch the next task.
|
|
|
|
Bedrich,
When cruncher A misses the deadline, and cruncher B gets their copy, cruncher A's task is intentionally not cancelled. This is because it is possible that A will be Completed and Validated, even before B starts the task, and there is support for the server cancelling B's task in that situation.
If anything, I'd like to see one of the tasks get cancelled, when the other has been Completed and Validated, regardless of whether it has been started. If GPUGrid isn't doing that, they should! |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Bedrich,
When cruncher A misses the deadline, and cruncher B gets their copy, cruncher A's task is intentionally not cancelled. This is because it is possible that A will be Completed and Validated, even before B starts the task, and there is support for the server cancelling B's task in that situation.
If anything, I'd like to see one of the tasks get cancelled, when the other has been Completed and Validated, regardless of whether it has been started. If GPUGrid isn't doing that, they should!
We get really really whiny when running tasks are cancelled. Few projects do it anymore. ;-)
I'd say cancel the WU that's past 5 days. The deadline is clear. Violators get madame guillotine! |
|
|
|
Bedrich,
When cruncher A misses the deadline, and cruncher B gets their copy, cruncher A's task is intentionally not cancelled. This is because it is possible that A will be Completed and Validated, even before B starts the task, and there is support for the server cancelling B's task in that situation.
If anything, I'd like to see one of the tasks get cancelled, when the other has been Completed and Validated, regardless of whether it has been started. If GPUGrid isn't doing that, they should!
We get really really whiny when running tasks are cancelled. Few projects do it anymore. ;-)
I'd say cancel the WU that's past 5 days. The deadline is clear. Violators get madame guillotine!
I agree with Jacob on this or to give both full credit. I mean what you are saying is that letting someone else's computer do 5 day and 3 hours worth of work is less important than letting yours do 22 hours worth of work. And on top of that importance, it is more important that you not waste 3 hours worth of work, assuming it starts as soon as it is downloaded and doesn't sit in queue, instead of still the more important of the two.
Bottom line, if I have a WU that the server takes back forcibly after a few hours and it wasn't near finished or half done and the reason was that someone had returned it faster who had been working on it for almost a week, then give it to them.
Based on the nature of needing a 5 day deadline for the students doing the work after we give them results, the faster the result can be returned, the better as well. Why make me keep it for another day when someone just gave it back to you complete? Just get it in the server and start working on the results desired from the project instead of waiting for mine and rejecting the one already there.
The only issue here is the credit. And if you can live with the credit glitch, then the way its done now is fine and cancelling the second one after the first one comes in is more desirable for the project. I can see the argument for cancelling the first one as the second one is being sent out at exactly the 5 day deadline since there is in fact a stated deadline, new action is being taken to fix the deadline being past, and the project does run on needing the WUs back in reasonable amounts of time and 5 days has been made that time. Students need to make progress on their school work and a week is a good time to wait for results. That makes sense. But then waiting a "potential" second week for a second volunteer to finish some work someone gave you after a week and a few hours makes less sense as well. |
|
|
|
The way the BOINC manager & the task scheduler works is incoherent with the way you want tasks to be canceled (1), but the project still could modify its app to achieve the desired behavior (2).
(1) Cancelling a task by the server needs communication between the server and the BOINC manager, but it's the BOINC manager's job to initiate this communication, and it does it only when it's needed. The reasons for calling the server (requesting new work to fill the queue, sending in the files of the result, reporting the result) does not include that if the task is in overdue, only if it does not started until the deadline (in this case the BOINC manager will cancel it on its own right, then will report it to the server as it now fits the reasons for calling the server).
(2) The app should be modified to assess its own progress (compared to real world time, not to processing time), and cancel itself if the progress is too slow (either because the GPU is too slow, or the task does not allowed to run frequently enough to meet the 5-day deadline)
Some of your suggestions require time travel at the given circumstances, as the workflow of this annoying behavior is the following:
1. the slow host requests work from the scheduler
2. the scheduler assigns the work for the host, and sends the files
3. the host begin processing
4. over 1 day the bonus is reduced to 25%
5. over 2 days the bonus is reduced to 0%
6. over 5 days the workunit sent out to another host
7. the slow host finish the task and sends in the result, so the credit given for the task has 0% bonus. It is accepted by the server, because the task is still active, as the second host did not finished it.
8. the fast host finish the task and sends in the result, but as the credit is already assigned to this result, the fast host will receive the same amount as the slow host (0% bonus), as there can be only one credit assigned to a given task. |
|
|
|
I agree with Jacob on this or to give both full credit. I mean what you are saying is that letting someone else's computer do 5 day and 3 hours worth of work is less important than letting yours do 22 hours worth of work. And on top of that importance, it is more important that you not waste 3 hours worth of work, assuming it starts as soon as it is downloaded and doesn't sit in queue, instead of still the more important of the two.
Let’s just extend the deadlines and be honest about this. It’s a slippery slope. If you reward tardiness, and you are encouraging it.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
IMO GPUGrid should keep looking for a fix/work around; perhaps enable partial job progress reporting and use the trickle up system to delay a resend when steady progress (within reason) is observed or alter the minimum quorum when there is no reported progress after 48h to ensure full credit.
If a task is sitting doing nothing on someone's computer for 2days without starting - just abort it and block the host with a Notice saying why.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Let’s just extend the deadlines and be honest about this. It’s a slippery slope. If you reward tardiness, and you are encouraging it.
At some projects if a WU is being run but is going to be late a post on the forum will get an extension. I'm all for that but it takes admin action and that's not going to happen here. Heck, we can't even get a fault tolerant app or a 3rd queue. As you say, extending the deadline would work fine and I'm all for it. If the admins don't want to extend it, it should be enforced IMO. All of us know (or should know) the deadline time. If your host can't make the deadline, abort the WU ASAP (hopefully we can all do simple math) and let a faster host run it. |
|
|
|
Beyond:
Not everybody micromanages like you assume.
In my case, I am attached to 60 projects, do work for about 15 of them, and have 9 long-running (~80 days average per task) RNA World VM tasks, all on the same machine, working about 20 tasks at the same time (9 RNA World, 6 CPU, 4 GPU, 1 non-CPU-intensive).
I also game a lot, and also I keep GPU suspended a lot when I want the room to not be noisy (since it's also my home office, where I do my day job). So, sometimes my PC may approach or even slightly miss a GPUGrid deadline. So be it.
What any project should aim for (and what BOINC's scheduler tries to satisfy for all projects), is to not waste resources.
So ... When a task goes beyond deadline
- If it hasn't been started yet on Client A, the client will abort it.
- If it HAS been started on Client A, the client will work on it, unless the server says abort. The server will also assign to Client B
- If Client A or Client B reports "completed" and it has passed validation (including any wingmen or quorum validation), then the server should tell any remaining clients to cancel. (Not sure if it's coded this way, but it should be, in regards to not wasting resources.).
PS: If you look at the "sched_reply_www.gpugrid.net.xml" scheduler reply file, you'll see:
<next_rpc_delay>3600.000000</next_rpc_delay>
... which means, according to https://boinc.berkeley.edu/trac/wiki/ProjectOptions
... that the project is being pinged every hour, to support cancelling when necessary.
I know you're saying "But Client A had it's chance, it should just give up", but what if it was at 99.5% completion? Try to think about it from a resources perspective.
So, try not to micromanage :) And try not to be whiny about credits. Just get the job done as well as you can, meeting your own personal needs. Mostly I make the deadlines, but I miss deadlines sometimes - that's my situation. I don't worry about it. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Jacob, if you read the thread you could see that I'm just agreeing with Bedrich, and still do. I also offered some possible solutions. The project supposedly needs fast result turnaround unlike most BOINC projects so the comparisons to them is not very useful. As far as maximizing resources, the largest gain there would be fixing the app to be fault tolerant. Yesterday I lost 15 WUs to a power outage, most of which were far along towards completion. That's in the neighborhood of 250 hours of GPU time wasted. The insult about credit whining is just lame, and I expect more of you than that. The insults about micromanaging is really about making sure that WUs finish by deadline. Apparently the BOINC people haven't done their job well enough to ensure that, since it's not happening. It doesn't take long to glance at BoincTasks to make sure that things are going correctly: perhaps 15 seconds a few times a day. Jacob, perhaps disagreeing with someone else doesn't have to include personal attacks. |
|
|
|
I'm sorry. It wasn't meant to be a personal attack, honestly. I was just responding to the "we get whiny" post that I believe you posted earlier.
It is unfortunate that GPUGrid tasks aren't more fault-tolerant to power outages. Worse yet, GPUGrid's deficiency here can cause other tasks to fail too - and I have some tasks that are approaching 400 days RunTime! :)
It's BOINC's job to try to make tasks meet deadline, and it does a great job of it, when given the correct data for estimation. GPUGrid still uses a project-wide "Duration correction factor" (cringe), <rsc_fpops_est> values that are often generic and sometimes inaccurate (cringe), and only 2 real applications (buckets) to lump their tasks (cringe). BOINC can only do so much to guess how long a task will take, based on this info, and I believe it is doing the best it can.
I agree with you that GPUGrid should make their tasks more fault-tolerant to power outages. And I think I agree with you that, if they're going to keep this "bonus credit" system (which I care nothing for), then they should correct it to work properly for the crunchers that receive resends. (ie: If cruncher B completes before cruncher A, give B bonus credit. If cruncher A completes before cruncher B, at least give cruncher B some credit for their time spent before the server cancelled their task).
Are we having fun yet? Smile. My wife says I lack tact. I apologize for lacking tact in my prior post. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
I'm sorry. It wasn't meant to be a personal attack, honestly. I was just responding to the "we get whiny" post that I believe you posted earlier.
It is unfortunate that GPUGrid tasks aren't more fault-tolerant to power outages. Worse yet, GPUGrid's deficiency here can cause other tasks to fail too - and I have some tasks that are approaching 400 days RunTime! :)
It's BOINC's job to try to make tasks meet deadline, and it does a great job of it, when given the correct data for estimation. GPUGrid still uses a project-wide "Duration correction factor" (cringe), <rsc_fpops_est> values that are often generic and sometimes inaccurate (cringe), and only 2 real applications (buckets) to lump their tasks (cringe). BOINC can only do so much to guess how long a task will take, based on this info, and I believe it is doing the best it can.
I agree with you that GPUGrid should make their tasks more fault-tolerant to power outages. And I think I agree with you that, if they're going to keep this "bonus credit" system (which I care nothing for), then they should correct it to work properly for the crunchers that receive resends. (ie: If cruncher B completes before cruncher A, give B bonus credit. If cruncher A completes before cruncher B, at least give cruncher B some credit for their time spent before the server cancelled their task).
Are we having fun yet? Smile. My wife says I lack tact. I apologize for lacking tact in my prior post.
Thanks Jacob. I think it was Joe Biden that said his greatest lesson in working effectively in the senate was to stick to the issues and not to attribute imagined motivations to other people. I try to keep bringing that to mind, with limited success. I know that BOINC is pretty much doing what it can, was just feeling grouchy because I felt that I wasn't being understood. As you say, sometimes when a GPUGrid task fails it sometimes brings down other tasks with it. I think that Climate Prediction was also having that problem if I remember correctly. An improperly formed WU can also occasionally cause a machine to reset, not the usual result but a number of us have seen it happen. Most recently had it happen with one of the bad ADRIAs (also causing the WU on the 2nd GPU to fail). For me the biggest improvement that GPUGrid could make is to increase the fault tolerance of their application. The waste over all the affected users has to be astronomical. I know my case is unusual but upwards of 250 hours in a single power glitch? How does that help anyone (rhetorical question).
The good news is that Zoltan posted something that I hope will help. It still does not in any way excuse the bad app behavior, but you do what you can. If I may repost his helpful tip for others that may not have seen it:
Have you tried to turn off write caching for your disks?
Windows key + R ->
Devmgmt.msc <ENTER> ->
Disk drives ->
select your BOINC disk (double click) ->
Policies tab ->
Un-check (both) write caching option(s) ->
OK ->
Close device manager
Another way to get to the disk policies is through Control Panel/Administrative Tools/Computer Management/Device Manager/Disk Drives.
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,277,932,676 RAC: 29,168,538 Level
Scientific publications
|
after a quick review of all the above comments, my question is:
which time span is being taken to determine the 24-hours limit or the 48-hours limit with regard to the extra bonus: from the time when the WU was downloaded viv-a-vis the time the finished WU is being uploaded?
If so, in the BOINC computing preferences, under "store at least ... days of work", the smallest value possible should be entered, I guess.
I am asking this because one of my hosts (GTX750Ti), for example, crunches the current Gerard tasks in about 23,4 hours.
However, I did NOT get the 24hours bonus, probably because, as a result of my settings (0,2 days), this WU was downloaded a few hours before the previous WU got finished and uploaded.
Is this right thinking, or am I missing something? |
|
|
|
after a quick review of all the above comments, my question is:
which time span is being taken to determine the 24-hours limit or the 48-hours limit with regard to the extra bonus:
from the time when the WU was downloaded vis-a-vis the time the finished WU is being uploaded?
Yes. Uploaded & reported, so the <report_results_immediately> option should be turned on in the cc_config.xml file.
If so, in the BOINC computing preferences, under "store at least ... days of work", the smallest value possible should be entered, I guess.
Yes, so you can set it to 0. In this case the manager will ask for new task only when the actual one is finished, so there will be a little idle time between the finished task and the new task. It is advisable to have only one another GPU project on such hosts with 0 resource share to avoid the host running completely idle while there are no available work from GPUGrid.
I am asking this because one of my hosts (GTX750Ti), for example, crunches the current Gerard tasks in about 23,4 hours.
However, I did NOT get the 24hours bonus, probably because, as a result of my settings (0,2 days), this WU was downloaded a few hours before the previous WU got finished and uploaded.
Is this right thinking, or am I missing something?
You're right. |
|
|
|
I wanted to clarify a couple things, based on what I know.
Setting <report_results_immediately> (a setting that affects all attached projects) to 1 in cc_config.xml, is NOT necessary if you have a new enough client, because GPUGrid tasks are already configured (by the project) to be reported immediately, as can be seen by inspecting a COPY of client_state.xml and seeing <report_immediately/> set for each GPUGrid <result> (task). It's actually best to NOT set <report_results_immediately>, so results from other projects can be sent in a single scheduler request, thus easing network traffic to those other projects.
Regarding running a cache setting of "Store at least: 0 days", the BOINC client scheduler actually is setup to more-gracefully handle it. It doesn't let you run completely out of work, before asking for more work. Instead, it should start asking for work, when you have about 3 minutes of work left. I had David make this change a while back, because I told him it took about 3 minutes sometimes to ask enough projects for work, when attached to several projects that don't have any. So, with that 0 setting, you can expect it to start asking when a resource would go idle in 3 minutes. :) And yes, it's best to be attached to other GPU projects, in case GPUGrid doesn't have any work available, that way your GPU doesn't sit idle/wasted.
Regards,
Jacob |
|
|
|
... GPUGrid tasks are already configured (by the project) to be reported immediately, as can be seen by inspecting a COPY of client_state.xml and seeing <report_immediately/> set for each GPUGrid <result> (task).
Thanks Jacob, I didn't know that it'd been set by the project. |
|
|
|
This is ridicules, getting a task 4 seconds after it was completed by the previous host:
http://www.gpugrid.net/workunit.php?wuid=11716407
15272295 369296 7 Sep 2016 | 0:22:56 UTC 12 Sep 2016 | 0:23:07 UTC Completed and validated 305,201.46 69,056.07 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15276588 263612 12 Sep 2016 | 0:23:11 UTC 12 Sep 2016 | 0:24:13 UTC Aborted by user 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65)
There was no point crunching it.
|
|
|
|
Here is an example of the opposite, finishing after the 24 hour deadline and getting the 24 hours bonus:
name 1tfj-SDOERR_OPMcharmm1-0-1-RND6735
application Long runs (8-12 hours on fastest card)
created 21 Sep 2016 | 14:15:30 UTC
canonical result 15295403
granted credit 368,676.00
minimum quorum 1
initial replication 2
max # of error/total/success tasks 7, 10, 6
Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
15295403 30790 21 Sep 2016 | 18:00:15 UTC 22 Sep 2016 | 13:50:55 UTC Completed and validated 62,910.69 61,672.50 368,676.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15295404 172813 21 Sep 2016 | 16:16:28 UTC 24 Sep 2016 | 10:49:59 UTC Completed and validated 198,940.38 51,013.57 368,676.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
http://www.gpugrid.net/workunit.php?wuid=11731674
This, I find to be humorous!
For the record, my computer finished within the 24 hour deadline, the other person's didn't, in case you didn't notice that. |
|
|