Message boards : Graphics cards (GPUs) : One of my GPUs stopped crunching
Author | Message |
---|---|
Hello, I have a Win 7 x64 machine with a GTX780Ti (GPU 0) and a GTX670 (GPU 1), i DONT RUN 24/7. Yesterday both cards were crunching, this morning only GPU 0 is crunching after startup, tried restarting, no difference. | |
ID: 37224 | Rating: 0 | rate: / Reply Quote | |
Hello, I have a Win 7 x64 machine with a GTX780Ti (GPU 0) and a GTX670 (GPU 1), i DONT RUN 24/7. Yesterday both cards were crunching, this morning only GPU 0 is crunching after startup, tried restarting, no difference. There is a bug in Boinc that occurs when you two Nvidia cards in one machine, but since it worked before unless you changed Boinc versions that is unlikely. There is another bug that SOMETIMES Boinc feeds the cache of one gpu but not the cache of a second gpu in the same machine. The only way I know of to fix either is to put one gpu on one project and the other gpu on a different project. In your case a sub project just might work, maybe you could put the faster gpu on the long units and the slower gpu on the short units? That way to do that is thru some exclude lines in a cc_config.xml file, something like this that excludes the project Poem from gpu 1. <exclude_gpu> <url>http://boinc.fzk.de/poem/</url> <device_num>1</device_num> </exclude_gpu> Now the actual details on how to exclude gpu zero from the short units, or gpu 1 from the long units, is beyond my knowledge, but I do believe it can be done. Maybe PM Jacob Klein as he is a wiz with those things. | |
ID: 37226 | Rating: 0 | rate: / Reply Quote | |
Err, I just found the problem, though Im very confused. I added a POEM@HOME app_config file yesterday as that project is having issues when multiple GPUs are used, so I did an exclude_gpu command. I excluded GPU 1 because GPU 0 produced least errors on POEM. Here is my app-config for POEM: | |
ID: 37227 | Rating: 0 | rate: / Reply Quote | |
I deleted the POEM app_config and Im running 2 GPUGRID tasks again. | |
ID: 37228 | Rating: 0 | rate: / Reply Quote | |
Hi Scalextrix, | |
ID: 37229 | Rating: 0 | rate: / Reply Quote | |
Yes the concurrent was set to 0 temporarily, I didnt get time to say earlier that when I created the app_config file and tried to get POEM only to run on GPU 0, BOINC didnt respect that and still ran tasks on GPU 1 as well. | |
ID: 37230 | Rating: 0 | rate: / Reply Quote | |
captainjack, can you post how your cc_config.xml is done for POEM please, I just got a GPU task for POEM and its using GPU 1 even though my app_config is set to exclude that GPU. | |
ID: 37231 | Rating: 0 | rate: / Reply Quote | |
Sorry, I am on the road and can't get to my pc. You should be able to go to the POEM message boards and find a cc_config from either skgiven or Jacob Klein that will work for you. | |
ID: 37232 | Rating: 0 | rate: / Reply Quote | |
exclude_gpu is a cc_config.xml option, and should not be used within an app_config.xml file. <cc_config>
<log_flags>
<!-- The 3 flags that are on by default are: file_xfer, sched_ops, task -->
<file_xfer>1</file_xfer>
<file_xfer_debug>0</file_xfer_debug>
<sched_ops>1</sched_ops>
<sched_op_debug>0</sched_op_debug>
<task>1</task>
<task_debug>0</task_debug>
<unparsed_xml>1</unparsed_xml>
<work_fetch_debug>1</work_fetch_debug>
<rr_simulation>0</rr_simulation>
<rrsim_detail>0</rrsim_detail>
<cpu_sched>0</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<cpu_sched_status>0</cpu_sched_status>
<coproc_debug>0</coproc_debug>
<mem_usage_debug>0</mem_usage_debug>
<checkpoint_debug>0</checkpoint_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<network_status_debug>0</network_status_debug>
<scrsave_debug>1</scrsave_debug>
<notice_debug>0</notice_debug>
<android_debug>0</android_debug>
<app_msg_receive>0</app_msg_receive>
<app_msg_send>0</app_msg_send>
<async_file_debug>0</async_file_debug>
<benchmark_debug>0</benchmark_debug>
<dcf_debug>0</dcf_debug>
<disk_usage_debug>0</disk_usage_debug>
<priority_debug>0</priority_debug>
<gui_rpc_debug>0</gui_rpc_debug>
<heartbeat_debug>0</heartbeat_debug>
<poll_debug>0</poll_debug>
<proxy_debug>0</proxy_debug>
<slot_debug>0</slot_debug>
<state_debug>0</state_debug>
<statefile_debug>0</statefile_debug>
<suspend_debug>0</suspend_debug>
<time_debug>0</time_debug>
<trickle_debug>0</trickle_debug>
</log_flags>
<options>
<!-- =================================================== TESTING OPTIONS =================================================== -->
<!--
<start_delay>20</start_delay>
<ncpus>12</ncpus>
<exclusive_app>NotepadTest01.exe</exclusive_app>
<exclusive_gpu_app>NotepadTest02.exe</exclusive_gpu_app>
-->
<!-- =================================================== REGULAR OPTIONS =================================================== -->
<report_results_immediately>0</report_results_immediately>
<fetch_on_update>0</fetch_on_update>
<max_event_log_lines>50000</max_event_log_lines>
<max_file_xfers>10</max_file_xfers>
<max_file_xfers_per_project>4</max_file_xfers_per_project>
<exclusive_app>iRacingSim.exe</exclusive_app>
<exclusive_app>iRacingSim64.exe</exclusive_app>
<exclusive_app>Aces.exe</exclusive_app>
<exclusive_app>TmForever.exe</exclusive_app>
<exclusive_app>TmForeverLauncher.exe</exclusive_app>
<!-- ===================================================== SETUP GPUS ====================================================== -->
<use_all_gpus>1</use_all_gpus>
<!-- =========================================== SETUP GPU 0: GeForce GTX 660 Ti [eVGA FTW] =========================================== -->
<!--
<ignore_nvidia_dev>0</ignore_nvidia_dev>
-->
<!-- Exclude World Community Grid's "Help Conquer Cancer" GPU app (hcc1) on main display - makes graphics slow, even on 660 Ti -->
<!-- Commenting out, for now, since this round of hcc1 is completed, and next round may not exhibit the issue. -->
<!--
<exclude_gpu>
<url>http://www.worldcommunitygrid.org</url>
<device_num>0</device_num>
<app>hcc1</app>
</exclude_gpu>
-->
<!-- Exclude several projects, since work from other GPU projects should give enough work to keep this GPU busy. -->
<!-- Commenting out, because POEM is often out of work, and GPUGrid sometimes does run out. -->
<!-- Commenting back in, because 7.4.0 work fetch erroneously fetches work for backup projects even when no GPUs are idle. -->
<!-- Commenting out, since all 3 GPUs can now work on main projects. -->
<!--
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://albert.phys.uwm.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiathome.berkeley.edu/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiweb.ssl.berkeley.edu/beta/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>0</device_num>
</exclude_gpu>
-->
<!-- =========================================== SETUP GPU 1: GeForce GTX 460 =========================================== -->
<!--
<ignore_nvidia_dev>1</ignore_nvidia_dev>
-->
<!-- Exclude POEM's "POEM++ OpenCL version" GPU app (poemcl) from a second heterogeneous GPU, since it does not work properly -->
<!-- Also exclude POEM's Test Project, which has the same issue -->
<!-- Note: Although 320.18 drivers successfully run smalltest_3, the drivers still do not work right with POEM. -->
<!-- Note: Also, it appears that running POEM only on the GTX 460, does not work. So, it must run on the GTX 660 Ti! -->
<!-- Note: Tested their new OpenCL application on 3/22/2014 -- still does not start when running only on the GTX 460. So, it must run on the GTX 660 Ti! -->
<!-- Commenting out, to more easily test how the issue affects my new arrangment of 3 GPUs -->
<!-- 20140624 Commenting back in, as it's still bugged, even in the new 3-GPU system -->
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>1</device_num>
<app>poemcl</app>
</exclude_gpu>
<exclude_gpu>
<url>http://int-boinctest.int.kit.edu/poem/</url>
<device_num>1</device_num>
<app>poemcl</app>
</exclude_gpu>
<!-- Exclude World Community Grid's "Help Conquer Cancer" GPU app (hcc1) on main display - makes graphics slow, even on 660 Ti -->
<!-- Commenting out, for now, since this round of hcc1 is completed, and next round may not exhibit the issue. -->
<!--
<exclude_gpu>
<url>http://www.worldcommunitygrid.org</url>
<device_num>1</device_num>
<app>hcc1</app>
</exclude_gpu>
-->
<!-- Reminder: For GPUGrid.net, if going to run 2-tasks-on-1-GPU, exclude this GPU (it only has 1 GB memory) -->
<!-- Commenting out, decided to include this GPU and run 1 task per GPU. -->
<!--
<exclude_gpu>
<url>http://www.gpugrid.net</url>
<device_num>1</device_num>
</exclude_gpu>
-->
<!-- Exclude several projects, since work from other GPU projects should give enough work to keep this GPU busy. -->
<!-- Commenting out, because POEM is often out of work, and GPUGrid sometimes does run out. -->
<!-- Commenting back in, because 7.4.0 work fetch erroneously fetches work for backup projects even when no GPUs are idle. -->
<!-- Commenting out, since all 3 GPUs can now work on main projects. -->
<!--
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://albert.phys.uwm.edu/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiathome.berkeley.edu/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiweb.ssl.berkeley.edu/beta/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>1</device_num>
</exclude_gpu>
-->
<!-- =========================================== SETUP GPU 2: GeForce GTX 660 Ti [MSI OC] =========================================== -->
<!--
<ignore_nvidia_dev>2</ignore_nvidia_dev>
-->
<!-- Exclude POEM's "POEM++ OpenCL version" GPU app (poemcl) from a second heterogeneous GPU, since it does not work properly -->
<!-- Also exclude POEM's Test Project, which has the same issue -->
<!-- Note: Although 320.18 drivers successfully run smalltest_3, the drivers still do not work right with POEM. -->
<!-- Note: Also, it appears that running POEM only on the GTX 460, does not work. So, it must run on the GTX 660 Ti! -->
<!-- Note: Tested their new OpenCL application on 3/22/2014 -- still does not start when running only on the GTX 460. So, it must run on the GTX 660 Ti! -->
<!-- Commenting out, to more easily test how the issue affects my new arrangment of 3 GPUs -->
<!-- 20140624 Commenting back in, as it's still bugged, even in the new 3-GPU system -->
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>2</device_num>
<app>poemcl</app>
</exclude_gpu>
<exclude_gpu>
<url>http://int-boinctest.int.kit.edu/poem/</url>
<device_num>2</device_num>
<app>poemcl</app>
</exclude_gpu>
<!-- Exclude several projects, since work from other GPU projects should give enough work to keep this GPU busy. -->
<!-- Commenting out, because POEM is often out of work, and GPUGrid sometimes does run out. -->
<!-- Commenting back in, because 7.4.0 work fetch erroneously fetches work for backup projects even when no GPUs are idle. -->
<!-- Commenting out, since all 3 GPUs can now work on main projects. -->
<!--
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>2</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://albert.phys.uwm.edu/</url>
<device_num>2</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiathome.berkeley.edu/</url>
<device_num>2</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiweb.ssl.berkeley.edu/beta/</url>
<device_num>2</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>2</device_num>
</exclude_gpu>
-->
</options>
</cc_config>
-------------------------------------- app_config.xml for GPUGrid -------------------------------------- <!-- GPUGrid.net -->
<!-- GPU tasks do properly use higher process and thread priorities, compared to CPU tasks. -->
<!-- GPU tasks sometimes use CPU sometimes don't, based on type of GPU task runs on. -->
<!-- Recommend 1 gpu_usage, if user also has CPU projects. -->
<!-- Recommend 0.001 cpu_usage, but might try 0.5, since if 2 are running, I KNOW the Kepler is using CPU -->
<!-- Also might try 1 cpu_usage, so as not to overcommit per Task Manager's CPU Utilization -->
<!-- Although x-at-a-time provides the best per-task-throughput, it ends up using a lot more CPU -->
<!-- Switching to 0.4995, such that if an 8-CPU MT job is running, 2 GPUGrid jobs and 1 0.001 GPU job can all run together -->
<!-- 0.5 cpu_usage so that 2+ GPU tasks will intentionally reserve at least 1 core -->
<!-- 1.0 cpu_usage because, when SETI tasks run on 3rd GPU reserving a core, they still aren't getting enough CPU -->
<!-- 0.5 cpu_usage because REC calculations and Process Explorer agree that CPU projects can get more cycles this way (3150cyc * 6inst, vs 2900cyc * 7inst)-->
<!-- 0.2 cpu_usage because 334.67 drivers make the Kepler no longer utilize a full core. Not sure if bug or not. -->
<!-- 0.4 cpu_usage to reflect what I actually see, and to reserve a core when 0.4 + 0.4 + 0.3 > 1.0 -->
<!-- 0.5 cpu_usage so, when a 1-CPU task is running on GTS 240, 2 GPUGrid tasks will still reserve core, keeping CPU always slightly undercommitted -->
<!-- 1.0 cpu_usage, in attempt to bolster and increase throughput for GPUGrid tasks, and keep GPU clocked at maximum boost -->
<!-- 0.5 cpu_usage, to better load CPU -->
<!-- 1.0 cpu_usage, for better throughput -->
<!-- 0.5 cpu_usage, to better load CPU -->
<!-- 1.0 cpu_usage, for better throughput -->
<!-- 0.2 cpu_usage, to match what I actually see getting used -->
<!-- 1.0 cpu_usage, for better throughput -->
<!-- 0.4 cpu_usage, now that 3 GPUs are running this project, as a better compromise between throughput and CPU load -->
<!-- 1.0 cpu_usage, for better throughput -->
<!-- 0.666 cpu_usage, to reserve at least 1 CPU when 2+ tasks are running -->
<!-- 0.666 gpu_usage, to not allow 2-GPUGrid-on-1-GPU, but to allow GPUGrid+Poem on 1 GPU -->
<!-- 1 gpu_usage, because otherwise work fetch fetches too much, making my low-cache settings irrelevant -->
<!-- 0.667 cpu_usage, so that when 3 tasks are running, 2 CPUs are reserved, as a better compromise -->
<app_config>
<!-- Short runs (2-3 hours on fastest card) -->
<app>
<name>acemdshort</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.667</cpu_usage>
</gpu_versions>
</app>
<!-- Long runs (8-12 hours on fastest card) -->
<app>
<name>acemdlong</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.667</cpu_usage>
</gpu_versions>
</app>
<!-- ACEMD beta version -->
<app>
<name>acemdbeta</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.667</cpu_usage>
</gpu_versions>
</app>
</app_config>
-------------------------------------- app_config.xml for POEM@Home -------------------------------------- <!-- POEM@Home -->
<app_config>
<!-- POEM++ OpenCL version
My research indicates that increasing to 5-at-a-time provides the best per-task-throughput.
But, since each task utilizes a full core during its increased length, x-at-a-time ends up using a lot more CPU.
So... I thought about doing less at a time.
I'm now using 3-at-a-time as a happy middle-ground.
3/22/2014: With new OpenCL release, the GPU Usage is almost high enough to do only 2-at-a-time. Still recommend 3-at-a-time.
Actually, 3 now causes a stutter in the display. But even 1 does. So, staying at 3-at-a-time.
Additional testing shows that, when only running POEM OpenCL GPU, 1 task does not stutter, but 2 does. Switching to 1-at-a-time.
6/17/2014: GPU Usage indicates that 2-at-a-time will appropriately saturate the GPU with no other tasks, but need 3-at-a-time to saturate with other tasks.
-->
<app>
<name>poemcl</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.333</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>
| |
ID: 37233 | Rating: 0 | rate: / Reply Quote | |
Thanks Jacob, I had not seen that exclude_gpu must be in cc_config not app_config. Ill try that today and im certain that will resolve my issues. Thanks also to captainjack. | |
ID: 37236 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : One of my GPUs stopped crunching