Client detached by itself?

Message boards : Graphics cards (GPUs) : Client detached by itself?

Author	Message
X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 15837 - Posted: 19 Mar 2010 \| 19:49:58 UTC
	2 WU: 2017815 2016525 but i got a record from boinctasks that it was completed successfully. whats going on? ____________
	ID: 15837 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16133 - Posted: 2 Apr 2010 \| 22:27:24 UTC Last modified: 2 Apr 2010 \| 22:41:38 UTC
	Its happaning again! There are no error messages in the logs. edit: Something is wrong with the server. Here's what happen on my side after successfully finishing a WU and uploaded fine: 04/02/2010 6:33:05 PM GPUGRID Computation for task p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1 finished 04/02/2010 6:33:06 PM GPUGRID Starting a158-TONI_HERG77a-17-100-RND0346_3 04/02/2010 6:33:06 PM GPUGRID Starting task a158-TONI_HERG77a-17-100-RND0346_3 using acemd2 version 603 04/02/2010 6:33:06 PM GPUGRID Sending scheduler request: To fetch work. 04/02/2010 6:33:06 PM GPUGRID Requesting new tasks for GPU 04/02/2010 6:33:07 PM GPUGRID Started upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_0 04/02/2010 6:33:07 PM GPUGRID Started upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_1 04/02/2010 6:33:11 PM GPUGRID Scheduler request completed: got 0 new tasks 04/02/2010 6:33:11 PM GPUGRID Message from server: No work sent 04/02/2010 6:33:11 PM GPUGRID Message from server: No work is available for ACEMD - GPU molecular dynamics 04/02/2010 6:33:11 PM GPUGRID Message from server: No work is available for ACEMD beta version 04/02/2010 6:33:11 PM GPUGRID Message from server: (reached limit of 4 GPU tasks in progress) 04/02/2010 6:33:11 PM GPUGRID Message from server: Project has no jobs available 04/02/2010 6:33:26 PM GPUGRID Finished upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_0 04/02/2010 6:33:26 PM GPUGRID Started upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_2 04/02/2010 6:33:29 PM GPUGRID Finished upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_1 04/02/2010 6:33:29 PM GPUGRID Started upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_3 04/02/2010 6:33:43 PM GPUGRID Finished upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_2 04/02/2010 6:33:43 PM GPUGRID Started upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_7 04/02/2010 6:33:44 PM GPUGRID Finished upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_7 04/02/2010 6:33:45 PM GPUGRID Finished upload of p12-IBUCH_05012_pYEEI_100319-3-80-RND5598_1_3 04/02/2010 6:33:46 PM GPUGRID Sending scheduler request: To report completed tasks. 04/02/2010 6:33:46 PM GPUGRID Reporting 1 completed tasks, requesting new tasks for CPU and GPU 04/02/2010 6:33:51 PM GPUGRID Scheduler request completed: got 1 new tasks 04/02/2010 6:33:53 PM GPUGRID Started download of h200-TONI_CAPBIND1-13-LICENSE Can some take a look on whats happening? I also have this issue in the other machine yesterday. WUs: 2079037 2078443 2077898 2076113 ____________
	ID: 16133 \| Rating: 0 \| rate: / Reply Quote

Stoneageman Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 186 Level Scientific publications	Message 16135 - Posted: 3 Apr 2010 \| 0:47:40 UTC - in response to Message 16133.
	Just had to do a manual update to get more tasks. Something is not quite right!
	ID: 16135 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 16136 - Posted: 3 Apr 2010 \| 1:10:27 UTC - in response to Message 16135.
	Boinc Advanced View, Activity, Activity is always available??? I think this changed for me by itself on one system a couple of days ago, dont know why. Other systems were OK - network issues, or a freak event? You may want to check All your network settings, if you have not already done so.
	ID: 16136 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16137 - Posted: 3 Apr 2010 \| 1:30:48 UTC Last modified: 3 Apr 2010 \| 1:56:59 UTC
	All my systems are fine. Like I said, something is happening on the server. To make it clear: Server (as noted in the website): says client detached Client (using boinc manager or boinc tasks): says its running and ready to run -Do a manual update to sync with server, nothing happens -Client upload a completed WU, server accepts but website still says client detached Now, where does my completed WU goes? Oblivion? I think its similar to "ghost wu"(listed on website but not on client) but the opposite (listed on client but not on website). Edit: this is the only error i've found during the time window (-4 UTC) 02-Apr-2010 15:51:33 [GPUGRID] Sending scheduler request: To fetch work. 02-Apr-2010 15:51:33 [GPUGRID] Requesting new tasks for GPU 02-Apr-2010 15:56:49 [---] Project communication failed: attempting access to reference site 02-Apr-2010 15:56:50 [---] Internet access OK - project servers may be temporarily down. 02-Apr-2010 15:56:52 [GPUGRID] Scheduler request failed: Timeout was reached ____________
	ID: 16137 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 16138 - Posted: 3 Apr 2010 \| 8:21:35 UTC
	Do you use an account manager? There was a similar complaint from someone recently but it turned out to their account manager settings. ____________ BOINC blog
	ID: 16138 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16140 - Posted: 3 Apr 2010 \| 8:46:30 UTC - in response to Message 16138.
	Do you use an account manager? There was a similar complaint from someone recently but it turned out to their account manager settings. Nope, not using. ____________
	ID: 16140 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 16149 - Posted: 3 Apr 2010 \| 17:32:48 UTC - in response to Message 16140. Last modified: 3 Apr 2010 \| 17:46:53 UTC
	I'm trying to figure out what's wrong. May have to do with http://boinc.bio.wzw.tum.de/boincsimap/forum/viewtopic.php?f=3&t=853&start=15 The server logs say that your CPID changed at some point. Are your running multiple projects and, if so, do the e-mail addresses match?
	ID: 16149 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16150 - Posted: 3 Apr 2010 \| 18:17:50 UTC - in response to Message 16149.
	The server logs say that your CPID changed at some point. Are your running multiple projects and, if so, do the e-mail addresses match? I'm running WCG for cpu and SETI as backup for gpu. The same email address used for the 3 projects. And I think CPID only change if I change my email which I didn't. Its mind bugling! Now I have to read the link you've posted to gain some sights. ____________
	ID: 16150 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16151 - Posted: 3 Apr 2010 \| 19:00:54 UTC
	Can you confirm the time I got from "Scheduler request failed: Timeout was reached" and the server logs that says my CPID has changed? It could happen that the server increaments my rpcno while the client has not. Based on the 2 post I've read on the link you've provided: p.s.: Scheduler request failed: Timeout was reached might cause it (even though I do not think so), the server increments the RPC seqno, the client does not because it assumes that the server didn't receive the request. On a fetch request this causes "ghost WUs", WUs that you have in your list in your server side profile but never had them on your BOINC host. We see a 'hanging' TCP connection, that somehow stays open, or is reopened by chance, that will the sport a different 'server contacts #' flag, which is why the server discards the host. So, if you are a candidate for a hanging connection, you will need to contact the server in the meantime to 'qualify' for a detach; The detach itself will take place on an update to the server, ... ok... now it fits... but only in a very rare among the rare conditions the detach will be executed on the result-report-update send to the server. Meaning, the pick-up of the hanging connection actually takes place in the connection designated to tell the server about the finished results. Very, very, very rare. ____________
	ID: 16151 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 16152 - Posted: 3 Apr 2010 \| 20:01:12 UTC - in response to Message 16151.
	So it was an odd networking problem in which a network service may have failed (probably on the server, but perhaps on a router). If it was client side a simple restart would have resolved it, for subsequent tasks. Perhaps this resulted in a partial database entry or a corruption.
	ID: 16152 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 16160 - Posted: 4 Apr 2010 \| 18:52:56 UTC - in response to Message 16152.
	Did you try to stop and restart the client?
	ID: 16160 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16161 - Posted: 4 Apr 2010 \| 20:27:45 UTC - in response to Message 16160.
	yes I did. ____________
	ID: 16161 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 16167 - Posted: 5 Apr 2010 \| 10:11:56 UTC - in response to Message 16161.
	> Can you confirm the time I got from "Scheduler request failed: Timeout was > reached" and the server logs that says my CPID has changed? Yes. Is the problem solved now?
	ID: 16167 \| Rating: 0 \| rate: / Reply Quote

X-Files 27 Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level Scientific publications	Message 16169 - Posted: 5 Apr 2010 \| 10:45:55 UTC - in response to Message 16167.
	> Can you confirm the time I got from "Scheduler request failed: Timeout was > reached" and the server logs that says my CPID has changed? Yes. Is the problem solved now? Actually, it resolved by itself. Crunching the 4 WUs thats been detached then got a new one (like normal crunching). The only thing that's lost is the crunched time. Luckily on the 2nd rig, I caught it so I aborted the ghost wu and started anew. So the caused was rpcno not in sync? Maybe its a good idea to enable re-sends? ____________
	ID: 16169 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 16175 - Posted: 5 Apr 2010 \| 17:17:39 UTC - in response to Message 16169.
	Would reset project not do that?
	ID: 16175 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : Client detached by itself?

	About	Science	Volunteers	Performance	Forum	Join us	Donate