Advanced search

Message boards : Graphics cards (GPUs) : Redundent Result

Author Message
mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8041 - Posted: 1 Apr 2009 | 19:46:01 UTC

Why do I have a bunch of "Redundent Result"?

1 Apr 2009 8:28:46 UTC 1 Apr 2009 19:25:33 UTC Over Redundant result Cancelled by server 0.00 --- ---
475819 350586 1 Apr 2009 1:27:08 UTC 1 Apr 2009 11:19:36 UTC Over Redundant result Cancelled by server 0.00 --- ---
475532 350440 1 Apr 2009 14:00:10 UTC 6 Apr 2009 14:00:10 UTC In progress --- New --- --- ---
475146 350206 31 Mar 2009 20:32:21 UTC 1 Apr 2009 14:00:10 UTC Over Success Done 1,362.38 2,883.44 4,613.50
473604 349200 1 Apr 2009 3:41:34 UTC 6 Apr 2009 3:41:34 UTC In progress --- New --- --- ---
473175 349017 31 Mar 2009 14:07:04 UTC 1 Apr 2009 1:27:08 UTC Over Redundant result Cancelled by server 0.00

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8043 - Posted: 1 Apr 2009 | 19:55:24 UTC

The most common cause is that these tasks are flawed and were canceled because someone else already ran them into the wall. So, they got canceled to save you the trouble ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8044 - Posted: 1 Apr 2009 | 19:56:18 UTC - in response to Message 8041.

That's interesting: these tasks have initial replication 2 or 3. And as soon as the 1st result is in the others are canceled.

Looks like the project has enough GPUs, so that enough WUs are started in parallel. But they need to get results back quick to finish these WUs, that's why they went for a higher initial replication.

MrS
____________
Scanning for our furry friends since Jan 2002

mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8045 - Posted: 1 Apr 2009 | 21:28:23 UTC - in response to Message 8044.

I am new to GPUGRID, just joined on March 9th with a GTX 260. Did I do anything wrong. It looks like I only had three successes today where I have been averaging 4 a day before.

- Mitch

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8049 - Posted: 2 Apr 2009 | 4:56:55 UTC - in response to Message 8045.

I am new to GPUGRID, just joined on March 9th with a GTX 260. Did I do anything wrong. It looks like I only had three successes today where I have been averaging 4 a day before.

Which of your two systems do you want to discuss?

The one has not returned errors, but, through no fault of its own had a lot of tasks canceled.

The other system is having errors and missed deadlines.

One looks to me like it is working and the other isn't.

As to the 260, you can, based on *MY* personal experience can see between 2 and 4 tasks per day downloaded and processed. Depending on the time reported you can see your daily number varying between 1 and 7 ... While I am up and in the computer room I do a force to push my work up and am slowly moving to 6.6.20 and as I do adding the "report results immediately" flag to do this more auto-magically ...

Anyway, I am confused ...

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 8056 - Posted: 2 Apr 2009 | 10:12:11 UTC - in response to Message 8044.

That's interesting: these tasks have initial replication 2 or 3. And as soon as the 1st result is in the others are canceled.


That's right. It improves a little bit WU turnaround times for us, although we are going to try different things to improve this as this "redundant result" thing causes too much confusion to the users.

thanks,
ignasi

mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8061 - Posted: 2 Apr 2009 | 12:07:43 UTC - in response to Message 8049.

The one the question is about is the one that is working. FOX-AMD-X4-940. The one that is not working is becasue I tried to connect a 8600 GTS to GPUGRID and did not relaize it was not supported.

It looks like I am still getting redundent results today, so the problem has not gone away.

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 8066 - Posted: 2 Apr 2009 | 15:02:30 UTC - in response to Message 8061.

The one that is not working is becasue I tried to connect a 8600 GTS to GPUGRID and did not relaize it was not supported.


The 8600GTS is a supported card. It is not recommended due to its speed, but 32 shader cards will easily meet the new 5-day deadline in a single core machine, and unless shader clocked very slow (I'd say under 1000) should also have no problems in a dual-core. In an i7 such as yours, it will not be able to consistently meet deadlines as a single card due to the 4+ workunits downloaded, but if paired with another card should have no problems.


mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8080 - Posted: 2 Apr 2009 | 19:37:41 UTC - in response to Message 8066.

When I tried the 8600 GTS I only got an error for clinet detached. I had no successes over two days so I disconnected it and attached it to SETI instead. I have 2 8600 GTS, 1 8500 GT, and 8400 GS on SETI. I have the 260 GTX on GPUGRID and it was working fine until yesterday and today where I started to get a lot of "Redundant Result" messages mixed in with successes.

Here is my two days of trying the 8600 GTS, I had no successes.

22 Mar 2009 6:41:22 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
431335 325033 21 Mar 2009 22:25:14 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
430860 324820 22 Mar 2009 11:43:41 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
430235 324507 22 Mar 2009 2:03:37 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
429495 324141 21 Mar 2009 18:40:09 UTC 22 Mar 2009 11:43:41 UTC Over Redundant result Cancelled by server 0.00 --- ---
429222 323988 21 Mar 2009 16:43:48 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
429087 323904 21 Mar 2009 15:46:10 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 --- ---
429085 323903 21 Mar 2009 15:44:55 UTC 21 Mar 2009 18:40:09 UTC Over Client error Compute error 301.04 2,883.90 ---
429031 323870 21 Mar 2009 15:22:47 UTC 21 Mar 2009 16:43:47 UTC Over Client error Compute error 51.56 2,478.99 ---
428976 323844 21 Mar 2009 15:45:34 UTC 22 Mar 2009 6:41:22 UTC Over Client error Compute error 986.61 2,960.09 ---
428714 323679 21 Mar 2009 15:46:10 UTC 23 Mar 2009 14:09:26 UTC Over Client detached New 0.00 ---

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8086 - Posted: 2 Apr 2009 | 21:00:07 UTC - in response to Message 8080.

OK, to point it out more clearly: the redundant results are intentional, they're nothing to worry about.

MrS
____________
Scanning for our furry friends since Jan 2002

mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8124 - Posted: 3 Apr 2009 | 13:56:06 UTC - in response to Message 8086.

I understand they are intentional. I was only curious becasue all of a sudden I seemed to get a lot of them. Is a WU marked Redundent result before you ever start processing the WU, during the processing of a workunit, or after you have completed processing the WU?

- Mitch

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 8125 - Posted: 3 Apr 2009 | 14:33:08 UTC - in response to Message 8124.

I understand they are intentional. I was only curious becasue all of a sudden I seemed to get a lot of them. Is a WU marked Redundent result before you ever start processing the WU, during the processing of a workunit, or after you have completed processing the WU?

- Mitch


Before.

mclaver
Send message
Joined: 9 Mar 09
Posts: 25
Credit: 3,321,711,931
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8143 - Posted: 3 Apr 2009 | 20:39:33 UTC - in response to Message 8125.

Then I really dont care :)

Thanks,
Mitch

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8144 - Posted: 3 Apr 2009 | 20:43:06 UTC

Ignasi & GDF,

with the higher initial replication it sometimes happens that a WU is crunched by 2 hosts at the same time. I can see an intersting opportunity here:

- make the server check for such WUs
- compare their results
-> if they're identical: great
-> if there are differences:
* trace the WU more closely
* does it error shortly afterwards?
* for some of them: issue both results as seeds for the following WUs and observe if the results converge

Maybe you already tested this carefully and extensively. And as I understand GDF is quite confident in the error finding mechanisms.

But I think it would be quite interesting to see how reliable the GPU calculations are in the real world, the wild west of overclocking country. And as long as you don't issue new WUs this error checking and tracing is basically free.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8223 - Posted: 5 Apr 2009 | 18:12:26 UTC - in response to Message 8144.

We have done a couple of weeks of tests with redundant results and I did not like much. It provides a better return time but it generates confusion. We are implementing a better and clever way to do it which does not waste so many resources and guarantees better balancing between WUs belonging to the same batch.

For what it regards replication for validation, it is not easy to create a general validator for MD (if at all possible) and in fact not even so useful, as we are practically validating by hand when we do the analysis every week.

gdf

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8224 - Posted: 5 Apr 2009 | 18:14:12 UTC - in response to Message 8223.

I forgot.
We have implemented and testing a remote submission mechanism for BOINC and gpugrid which seems to work very well which will in the future provide load balancing of workunits (as said in previous message).

GDF

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8227 - Posted: 5 Apr 2009 | 18:43:11 UTC - in response to Message 8224.

OK.. thanks for the answer!

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Redundent Result

//