Message boards : Number crunching : ERR: cudart64_80.dll all nulls. Should be just a link to the real one
Author | Message |
---|---|
Got the error message as shown below. I did a binary compare of the file to the same dll on another computer and the problem file had its content all "0". Directory of C:\ProgramData\BOINC\projects\www.gpugrid.net 06/18/2019 11:06 PM 366,016 _cudart64_80.dll 06/18/2019 11:12 PM 145,769,016 _cufft64_80.dll 06/18/2019 11:07 PM 1,262,080 _tcl86.dll 06/18/2019 11:07 PM 112,640 _zlib1.dll IMHO, the DLLs in the slots should just be links to the actual DLLs in the project directory. Something is wrong, kaput or not kosher. Anyway, I replaced the nulled out dll with a good one and am keeping my finger crossed. Anyone know what happened to the BIONC website? If down for maintenance they should have put a redirect to an info page. I wanted to ask if the message below was from their program. I think it is a windows 10 error and not from their image check. | |
ID: 52129 | Rating: 0 | rate: / Reply Quote | |
Message in the Seti@home forums regarding BOINC. Dr. Anderson is quoted as saying the server for BOINC is dead. They look to replace it tomorrow sometime. Weekend and no one there to do it. | |
ID: 52132 | Rating: 0 | rate: / Reply Quote | |
It is the deployment policy of this project that the CUDA DLLs are copied, rather than linked, before each task starts. Many projects do this, so that they can download a fully-versioned reference file, but restore it to the generic name on copy as required for the dynamic linking to work. <file_ref>
<file_name>_cudart64_80.dll</file_name>
<open_name>cudart64_80.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>_cufft64_80.dll</file_name>
<open_name>cufft64_80.dll</open_name>
<copy_file/>
</file_ref> Note that in the GPUGrid case, the renaming is very subtle - simply removing the prefix underscore. If the file copy on your machine resulted in a nulled image, then something is wrong with your hardware. | |
ID: 52133 | Rating: 0 | rate: / Reply Quote | |
Message in the Seti@home forums regarding BOINC. Dr. Anderson is quoted as saying the server for BOINC is dead. They look to replace it tomorrow sometime. Weekend and no one there to do it. David said "hopefully we will have a new one up tomorrow" (Monday, June 24 today) | |
ID: 52134 | Rating: 0 | rate: / Reply Quote | |
delete | |
ID: 52135 | Rating: 0 | rate: / Reply Quote | |
I think it is a windows 10 error and not from their image check. It does look like a Windows error, rather than an Application message. If the file copy on your machine resulted in a nulled image, then something is wrong with your hardware Your other two machines don't appear to have the same issue. So I would suspect hardware or OS corruption issue on that one machine. Your PCs appear identical, so perhaps try swapping the RAM etc to another PC to see if the issue follows the hardware swap. Are there any errors in the Event log, or has any software changed such as AV? If AV has been updated try White listing your BOINC data directory. 0xc000012f error in Windows can point to Software Driver issues, corruption and failed hardware. | |
ID: 52136 | Rating: 0 | rate: / Reply Quote | |
I think it is a windows 10 error and not from their image check. Been looking at this. Each GTX1070 works fine on other systems. Both gtx1070 work fine for SETI, Einstein, etc but NOT gpugrid. Symptom: I put the 2nd gtx1070 in and start boinc. Project is delayed 60 seconds to allow some debug else computer freezes. I get to look at the event messages and sure enough things do not look good (remember, this was working fine before I returned the GTX1070 back to system): 1 6/27/2019 11:17:35 AM Starting BOINC client version 7.14.2 for windows_x86_64 2 6/27/2019 11:17:35 AM log flags: file_xfer, sched_ops, task 3 6/27/2019 11:17:35 AM Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 4 6/27/2019 11:17:35 AM Data directory: C:\ProgramData\BOINC 5 6/27/2019 11:17:35 AM Running under account frick 6 6/27/2019 11:17:35 AM [error] Couldn't parse account file account_www.gpugrid.net.xml 7 6/27/2019 11:17:36 AM [error] Couldn't parse statistics_www.gpugrid.net.xml 8 6/27/2019 11:17:38 AM CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.64, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak) 9 6/27/2019 11:17:39 AM CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 430.64, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6463 GFLOPS peak) 10 6/27/2019 11:17:39 AM OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.64, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak) 11 6/27/2019 11:17:39 AM OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 430.64, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6463 GFLOPS peak) 12 GPUGRID 6/27/2019 11:17:39 AM [error] Project GPUGRID is in state file but no account file found 13 6/27/2019 11:17:39 AM [error] Application acemdlong outside project in state file 14 6/27/2019 11:17:39 AM [error] Application acemdshort outside project in state file I stop boinc (this is not necessary if gpugrid is suspended) and go to \projectdata\boinc and examine those two files account_www.gpugrid... is full of nulls statistics_www.gpugrid…. is full of nulls These files have been re-written by, it seems, cs_account.cpp, a module in the boinc client. At least that is where I traced the error messages to. If I could run the client under VS2017 I could possibly help debug the problem. It could be hardware but since both GTX1070 seem to work fine with other GPU projects then code is suspect. I replaced those two files from another system that also had gtx1070 and same problem: Both were re-written full of nulls when the boinc client started up. Could one of the gtx1070's be attempting to process the checkpoint file that was created by the other gtx1070? There are differences as one is SC the other is plain jane. Gpugrid makes more demands on hardware than SETI so possibly the hardware problem may not show up easily or frequencly. [EDIT] With SETI running, I reboot the system. This ensure that SETI will continue running and GPUGRID will have to wait its turn. The account file did not get re-written with nulls. That indicates that cs_account.cpp probably did not re-write so the project executable must have attempted a download of the account and either it works or nulls happen then. Also, the statistics_www.gpugrid is not the same on that I had copied form another computer (which is understandable) so there must be a problem elsewhere. However, the event messages shows a problem the statistics file (but not the account file) which is strange as looking at the xml I don't see anything unusual. 6/27/2019 11:58:27 AM [error] Couldn't parse statistics_www.gpugrid.net.xml I keep temps low with evga precision x but sometimes that program does not start with windows. Probably will pull the 1070 and use it elsewhere. I am thinking there is a hardware problem or incompatibility of some type. I only noticed this problem after that 1903 Microsoft feature update. [EDIT-2] Want to clarify something: I dragged the "good" account and statistic file from another system that has gtx1070 and dropped them into the boinc projectdata directory. The files are good, they are only filled with nulls after I start the client and that happens when the gpugrid is NOT suspended. ie: the nulls occur (hardware or software or whichever) when the project code starts executing. | |
ID: 52153 | Rating: 0 | rate: / Reply Quote | |
Sounds like you have spent some time on this! | |
ID: 52155 | Rating: 0 | rate: / Reply Quote | |
Have you tried a check disk on the hard drive? Pretty sure this is a hardware problem. I discovered that it can quickly reboot and pick up where it left off to where I don't notice the problem. I ran that x86 memtest ands swapped out video board and got a flash light and looked for bad capacitors. I checked power supply fan. No obvious problem and I know what bad caps look like. Doing second check disk but unaccountably there is no display of the check disk information, only a very dim raster. I know check disk is working because I can see the LED blink and hear clicking noise from the disk. Do you know of any way to check the results of the chkdsk /f/r after the system boots? This is definitely a hardware problem as boinc was not even running and the system reset the instant I logged in remotely which was totally unexpected. By "reset" I mean it simply turned off as if the power cord was pulled at the exact instant I pressed the return key after entering my password. | |
ID: 52156 | Rating: 0 | rate: / Reply Quote | |
Do you know of any way to check the results of the chkdsk /f/r after the system boots? Chkdsk results can be found in the Event Viewer, Application Log (Hopefully your Windows starts). This website describes the process of finding chkdsk results: https://support.4it.com.au/article/how-to-extract-the-check-disk-chkdsk-logs-from-event-viewer-on-windows/ The sudden power outages can cause Disk corruption. Also check your Event Log for unusual restarts of your PC. The Power Supply is your usual suspect, but have seen bad motherboard, bad RAM or bad CPU cause reboots. Good luck. | |
ID: 52157 | Rating: 0 | rate: / Reply Quote | |
Replaced power supply and things are back to normal. | |
ID: 52163 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : ERR: cudart64_80.dll all nulls. Should be just a link to the real one