Advanced search

Message boards : Graphics cards (GPUs) : Aborted by project... redundant result

Author Message
JAMC
Send message
Joined: 16 Nov 08
Posts: 28
Credit: 12,688,454
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4953 - Posted: 27 Dec 2008 | 17:43:05 UTC

First one of these I have seen-
http://www.gpugrid.net/result.php?resultid=183670
10 hours of crunching down the toilet...

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4955 - Posted: 27 Dec 2008 | 18:04:18 UTC - in response to Message 4953.
Last modified: 27 Dec 2008 | 18:08:33 UTC

First one of these I have seen-
http://www.gpugrid.net/result.php?resultid=183670
10 hours of crunching down the toilet...


I hate that too ...

This is another case where I think the credit policy designed way back when does not make as much sense as I would like. I know it it cannot be used for anything ... but, it is the currency by which we gain acknoledgement of our contributions ... to lose a task through no fault of your own stinks.

ONe of the reasons I like CPDN, you get paid for trickles as you complete phases of work so that even if it dies ... well, you get paid for the effort to the last "payday" (Trickle).

This is one of the reasons I stopped doing work for Sztaki and am not enthused about Orbit even though Orbit, for one, is the type of project I really want to support.

Heck, I am only here because I want to use my GPU and to exercise this option and have no enthusiasm for supporting SaH ... for too many reasons to go into here ...

Desti
Send message
Joined: 10 Jul 07
Posts: 19
Credit: 1,272,950
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 4957 - Posted: 27 Dec 2008 | 18:19:48 UTC

The workunits are resend some hours bevore the deadline is over?
____________
Linux Users Everywhere @ BOINC

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4960 - Posted: 27 Dec 2008 | 19:06:44 UTC - in response to Message 4957.

The workunits are resend some hours bevore the deadline is over?

No ... it shouldn't ...

HOWEVER, Let us say that the task does not come in by the deadline ... the work gets re-issued ... THEN, in this case, 7 hours late, the task is returned by the first participant. The second participant gets, ahem, cheated ...

I know that the project can fiddle with the settings to minimize this, like changing the speed at which the tasks are re-issued so that there is more of a delay.

Again, this is a place where the interests of the participant community are not taken into account well by the BOINC System design. I think most of us are willing to accept a certain "loss rate" of tasks that don't complete for one reason or another ... but ... in this case it was, well, one of those losses that seems a little unfair ...

This can be especially annoying in that BOINC is designed to not contact the server that often and so, he started the task and wasted time on a task that was canceled.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4998 - Posted: 28 Dec 2008 | 13:23:00 UTC

As Paul already said, what we're seeing here is the following: someone went past the deadline, the WU got reissued and the first guy returned the WU first and got credit for it. The WU was aborted by the server on JAMCs PC. So far everything is alright - it's better to waste 10h of work than to let JAMC crunch along and let him waste e.g. 16h for the full WU.

What is not correct is that JAMC doesn't get credit for the time he already spent. Could help to create a new rule:
- if a WU is cancelled by the server the client should tell the server how much work was already done
- credits are awarded according to this percentage

- to avoid cheating it may be necessary to upload the current results and to compare them against the validated result
- problem: I guess only the final result is uploaded and the steps in between are not saved (huge amount of data and I/O activity)
- it's getting clumsy.. maybe a checksum could be generated every 1 or 10% and be submitted? These could be used to check if host 2 really calculated xx% of the WU, which was aborted by the server

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5002 - Posted: 28 Dec 2008 | 14:26:34 UTC
Last modified: 28 Dec 2008 | 14:28:24 UTC

This is really pretty simple. If you issue a WU to someone and they crunch it in good faith they need to get "paid". You do NOT need to "pay" the user who returned it late as they broke the contract you had with them, that's up to you. Simple contract law, if you want to look at it in legal terms. If you want to look at it in terms of good will, you will destroy your relationship with your users if you fail to "pay" them for work you asked them to do and they worked on in a timely manner. IMO a duplicate WU here and there isn't going to kill you. Don't allow the server to cancel WUs that have already been started, and allow credit for duplicate WUs.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5009 - Posted: 28 Dec 2008 | 14:53:34 UTC - in response to Message 5002.

IMO a duplicate WU here and there isn't going to kill you. Don't allow the server to cancel WUs that have already been started, and allow credit for duplicate WUs.


You'd trade off performance for better user relations. It's not a bad solution, but it's not ideal either. I still prefer my solution - if it can be implemented without too much hassle.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5010 - Posted: 28 Dec 2008 | 14:59:42 UTC - in response to Message 5009.

What is not correct is that JAMC doesn't get credit for the time he already spent. Could help to create a new rule:
- if a WU is cancelled by the server the client should tell the server how much work was already done
- credits are awarded according to this percentage

You'd trade off performance for better user relations. It's not a bad solution, but it's not ideal either.
I still prefer my solution - if it can be implemented without too much hassle.

MrS

It sounded difficult to do but if you CAN do it without too much trouble it's a good solution IMO.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5017 - Posted: 28 Dec 2008 | 16:10:16 UTC - in response to Message 4998.
Last modified: 28 Dec 2008 | 16:11:04 UTC

As Paul already said, what we're seeing here is the following: someone went past the deadline, the WU got reissued and the first guy returned the WU first and got credit for it. The WU was aborted by the server on JAMCs PC. So far everything is alright - it's better to waste 10h of work than to let JAMC crunch along and let him waste e.g. 16h for the full WU.

... MrS


I think a rule that can be implemented on the server without too much trouble though I am not sure if the canceled result record contains the information, no, it does not appear to have the run time saved ...

Anyway, my feeling is that we don't have to get too complex here, a partial payment, even if it is only 25% of the norm for work started but not needed is a token of good faith.

The problem is if it can be determined if the task has been started by the participant. What I see in the database does not indicate that the information is preserved. As you say, it should be ... but probably isn't.

This is one of those areas where I have been yelling at the ceiling for years ... and writing about. The developers and in many cases the project staffs pay no attention to the interests and desires of the participant community on which BOINC depends.

Which, I feel, is one of the reasons that the number of participants has not really been growing that much. When I stopped doing BOINC about three years ago we had a million something participants ... we have 1.5 million roughly at this time. A very anemic growth ... with the number of long term projects rising, well, even with the increases in speed of processing the ability to get the work done is not really improving ...

Anyway, I am not saying that this is the case here ... just that the design of BOINC assumes that the participant will take all manner of abuse and that the risk / reward calculus is meaningless and that participants love abuse ...

Forgive the rant ...

And, again, I am not saying that GPU Grid is in anyway the owner of this issue ... I think it is an inherent flaw in the design decisions made while finalizing BOINC ... note that the expected average runtime for tasks when these decisions were made was 2-6 hours ... with 2 hours being more the norm... I have been noticing that most work now issued is 6-40 hours with at least 3 projects issuing tasks that have one to two hundred hours of run time as the norm and only one of those uses the "trickle" system which was intended for tasks with runtimes in this realm ...

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5022 - Posted: 28 Dec 2008 | 19:19:37 UTC - in response to Message 5017.
Last modified: 28 Dec 2008 | 19:19:56 UTC

I agree with you Paul about the "time served" issue. I've have very few WU's that I've gotten no credit for. but I am totally in concurrence.

Even though the total number of users has only increased 500k in three years Moores Law has double cpu and in some cases tripled gpu performance. I will take into account my personal situation.

Up until last year (DEC 2007) I had been running the same rig since I joined BOINC on in 2005. It took me 3yrs to get to 75000 credits. With my new rig, in one year I have gone from 75000 to 2.7 million credits. 35x's the processing power in a 3:1 ratio or 100x in a 1:1. So even if 500k users are new and they use a computer that is only 50x faster then that is the equivialant of 25000000 users at a 1:1 ratio of when BOINC started, not including the original 1000000 users. As long as we geeks keep donating large FLOPS and GFLOPS, BOINC doesn't care about what credit or acknowledgement the user would like just as long as there projects get complete and they get there money.

I never got a thank you for your support but when I went off line for 2-3 years I got e-mails saying please come back. They don't know what they have until it is gone.

Pat

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 240,582,380
RAC: 4,767,526
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 5023 - Posted: 28 Dec 2008 | 19:26:03 UTC

Hmm.. I always thought if the server sends a request to BOINC to delete a redundant result, BOINC would only delete it if it hasn't started crunching. Otherwise it would run it to completition and the user should also get the Credit for it.

At least that is how it worked a few months ago. Don't know if it was changed or it is maybe a bug?
____________

pixelicious.at - my little photoblog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5025 - Posted: 28 Dec 2008 | 22:37:01 UTC - in response to Message 5023.

Don't know if it was changed or it is maybe a bug?


I could imagine it's a setting in the BOINC software. GPU-Grid is speed / latency limited, so they want results back fast.. that's why I suppose cancelling a WU which was already started is by design and not a bug.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5026 - Posted: 29 Dec 2008 | 0:44:48 UTC - in response to Message 5022.

I agree with you Paul about the "time served" issue. I've have very few WU's that I've gotten no credit for. but I am totally in concurrence.

Even though the total number of users has only increased 500k in three years Moores Law has double cpu and in some cases tripled gpu performance. I will take into account my personal situation.

Up until last year (DEC 2007) I had been running the same rig since I joined BOINC on in 2005. It took me 3yrs to get to 75000 credits. With my new rig, in one year I have gone from 75000 to 2.7 million credits. 35x's the processing power in a 3:1 ratio or 100x in a 1:1. So even if 500k users are new and they use a computer that is only 50x faster then that is the equivialant of 25000000 users at a 1:1 ratio of when BOINC started, not including the original 1000000 users. As long as we geeks keep donating large FLOPS and GFLOPS, BOINC doesn't care about what credit or acknowledgement the user would like just as long as there projects get complete and they get there money.

I never got a thank you for your support but when I went off line for 2-3 years I got e-mails saying please come back. They don't know what they have until it is gone.

Pat


Pat,

Yes your basic math is correct. BUT, the models have increased in complexity and run times. My newest rig is i7 based. * Virtual CPUs and each of those (L3 Cache? CLock?) is more capable than my not that old Q9300 ... regardless ... were I able to get the preference page to accept changes I would be able to show my signature showing that I have nearly 40 projects currently going.

Stefan just below has 38 inactive and 24 active ... On my tracking chart I have 70+ to which I have contributed, are active, or to which my benchmark participant has contributed to in their history.

What I am trying to say is that the demand, I think, is growing faster than the productivity curve of us geeks updating our systems.

Though I do grant you that the power of these systems is such that astounding leaps in CS score are possible. I have made over 30K here at GPU Grid and my GPU is nothing special at all as these things go. I was looking at my April printout and since then I have nearly doubled (for example) my Rosetta contribution and I have not had them even as a priority. Still, by basic point remains. BOINC was intended to capture idle CPU of large masses of computers and it has not done that.

Heck, nearly half, if memory serves, of the SaH CLassic folks did not make the transition to BOINC ...

And given the commentary of many in the SaH forums I am not so sure how common the adoption of GPU computing will be ... will it be only a sub-set of us geeks, a large sub-set, what? No one knows for sure ...

I am wild about it for a couple reasons, one is that my wife is paring down my farm from 10 to who knows where she will stop ... (at 5 right now) ... so this gives me a way to increase computing power and tasks in flight over base machines, and the ability to gradually increase processing power over time on an incremental basis without having to scrap whole machines, or the major core of them.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5027 - Posted: 29 Dec 2008 | 0:45:47 UTC - in response to Message 5025.

Don't know if it was changed or it is maybe a bug?


I could imagine it's a setting in the BOINC software. GPU-Grid is speed / latency limited, so they want results back fast.. that's why I suppose cancelling a WU which was already started is by design and not a bug.

MrS


It may not be a bug to the project, but it may be a bug to the participant ... :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5033 - Posted: 29 Dec 2008 | 10:14:36 UTC - in response to Message 5027.

It may not be a bug to the project, but it may be a bug to the participant ... :)


It's a feature not perfectly implemented yet ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Aborted by project... redundant result

//