Author |
Message |
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
For those interested in buying a CUDA card or adding one to a GPU project, I collected some reported Boinc GPU ratings, added some I tested and create a Boinc GFLOPS performance list.
Note. These are hopefully ALL Native scores only!
CUDA card list with Boinc ratings in GFLOPS
The following are mostly compute capability 1.1:
GeForce 8400 GS PCI 256MB, est. 4GFLOPS
GeForce 8400 GS PCIe 256MB, est. 5GFLOPS
GeForce 8500 GT 512MB, est. 5GFLOPS
Quadro NVS 290 256MB, est. 5GFLOPS
GeForce 8600M GS 256MB, est. 5GFLOPS
GeForce 8600M GS 512MB, est. 6GFLOPS
Geforce 8500 GT, 512MB PCIe, 6GFLOPS
GeForce 9600M GT 512MB, est. 14GFLOPS
GeForce 8600 GT 256MB, est. 14GFLOPS
GeForce 8600 GT 512MB, est. 15GFLOPS
GeForce 9500 GT 512MB, est. 15GFLOPS
GeForce 8600 GTS 256MB, est. 18GFLOPS
GeForce 9600 GT 512MB, est. 34GFLOPS
GeForce 9600 GT 512MB, est. 37GFLOPS
GeForce 8800 GTS, 640MB, est. 41GFLOPS [compute capability 1.0]
Geforce 9600 GSO, 768MB (DDR2) 46GFLOPS
Geforce 9600 GSO, 384MB (DDR3) 48GFLOPS
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 8800 GTX 768MB, est. 62GFLOPS [compute capability 1.0,] (OC)?
GeForce 9800 GT 1024MB, est. 60GFLOPS
GeForce 9800 GX2 512MB, est. 69GFLOPS
GeForce 8800 GTS 512MB, est. 77GFLOPS
GeForce 9800 GTX 512MB, est. 77GFLOPS
GeForce 9800 GTX+ 512MB, est. 84GFLOPS
GeForce GTX 250 1024MB, est. 84GFLOPS
Compute capability 1.3:
GeForce GTX 260 896MB (192sp), est. 85GFLOPS
Tesla C1060 1024MB, est. 93GFLOPS (only)?
GeForce GTX 260 896MB, est. 100GFLOPS
GeForce GTX 260 896MB, est. 104GFLOPS (OC)?
GeForce GTX 260 896MB, est. 111GFLOPS (OC)?
GeForce GTX 275 896MB, est. 123GFLOPS
GeForce GTX 285 1024MB, est. 127GFLOPS
GeForce GTX 280 1024MB, est. 130GFLOPS
GeForce GTX 295 896MB, est. 106GFLOPS (X2=212)?
You should also note the following if you’re buying a new card or thinking about attaching it to a CUDA project:
Different cards have different numbers of shaders (the more the better)!
Different speeds of shader and RAM will effect performance (these are sometimes factory over clocked and different manufacturers using the same GPU chipset and speed can tweak out slightly different performances)!
Some older cards use DDR2 while newer cards predominately use DDR3 (DDR3 is about 20% to 50% faster but varies, faster is better)!
The amount of RAM (typically 256MB, 384MB, 512MB, 768MB, 896MB and 1GB) will significantly affect performance (more is better)!
Some older cards may be PCI, Not PCI-E (PCI-E is faster)!
Mismatched pairs of PCIE cards will likely underperform.
If you overclock your Graphics card, you will probably get more performance, but you might get more errors and you will reduce the life expectancy of the card, motherboard and PSU - you probably know this already ;)
If you have a slower card (say under 10GFLOPS) don’t attach it to the GPU-Grid; you are unlikely to finish any tasks in time, so you will not produce any results or get any points. You may wish to attach to another project that uses a longer return deadline (Aqua-GPU for example). With a 20GFLOPS card most tasks will probably timeout. Even with a 9600 GT (about 35GFLOPS) your computer would need to be on most of the time to get a good success/failure ratio.
Please post your NATIVELY CLOCKED Boinc GFLOPS Ratings here, or any errors, to create a more complete list.
You can find them here; Open Boinc (Advanced View), select the Messages Tab, about the 12th line down it will say CUDA Device... or No CUDA devices found. Include Card Name, Compute Capability (1.0, 1.1 or 1.3 for example), RAM and est. GFLOPS. Even if it is already on the list, it will confirm the ratings, and help other people decide what graphics card they want to get.
PS. If you want more details about an NVIDIA card look here, http://en.wikipedia.org/wiki/Comparison_of_Nvidia_Graphics_Processing_Units
Thanks,
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Hi,
that's quite some work you put into collecting this. Let me add a few points / comments:
- we have a comparable list here, including prices (somewhat outdated) and some power consumption numbers
- we found GT200 to be 41% faster per GFLOP than G9x, so the BOINC benchmark underestimates this influence (it could not possibly reflect it correctly unless it used the actual GPU-Grid code)
- that 8800GTX is probably not OC'ed, as it has more raw power than a 9800GT
- 9800GX2 would also get that value times 2 (2 chips)
- the Tesla 1060 is just a GT200 in a slightly different config, so that score is reasonable
- GPU-Grid is not terribly limited by gpu memory speed, so DDR2 / GDDR3 doesn't matter much.. and any card with DDR2 is likely too slow anyway
- the amount of GPU memory does not affect GPU-Grid performance and likely will not for a long time (currently 70 - 100 MB used). See e.g. the 9600GSO (384 / 368) or the 9800GTX+ / GT250 (512 / 1024)
- PCIe speed does not matter as long as it doesn't get extremely slow
- any card with PCI is likely too slow for GPU-Grid anyway
- "mismatched" pairs (I'd call them mixed ;) of PCIe cards do not underperform. Folding@home has this problem, but it has not been reported here, even in G9x / GT200 mixes
- if you overclock you will get more performance
- just increasing clock speed does not decrease GPU lifetime much. Temperature is much more important, so if you increase the fan speed slightly you'll easily offset any lifetime losses due to higher frequencies. Just don't increase GPU voltage!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks for your input MrS. You made many good points, and well spotted with the mistakes:
The GeForce 9800 GX2 has 2 X 69GFLOPS = 138GFLOPS,
GPU-Grid performance will not improve with more RAM (GPU_GRID uses 70-100MB), Different card pairings do not imper GPU-GRID Performance.
The lower rated CUDA capable cards are for reference, I did not mean to suggest anyone should use a DDR2 or PCI card (5GFLOPS) on GPU-GRID. Dont do it!
Can I ask you to clarify something regarding the G200 GPU core range?
The NVIDIA GeForce GTX 250, despite appearing to be part of the 200 range, actually uses a G92 Core (it’s almost identical to the GeForce 9800 GTX+), so am I correct in thinking that Boinc rates this correctly as 85GFLOPS, and that the cards name is just an oddity/misnoma?
The NVIDIA GeForce GTX 260 (192sp) on the other hand does use a G200 Core (as does the denser 260, the 270, 275, 280, 285, 290, and 295 cards). So does Boinc under rate this GTX 260 (192sp) as a 85GFLOPS card?
Would it be more accurate for Boinc to rate this card as 85 X 1.41 = 120GFLOPS ?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Hi,
The NVIDIA GeForce GTX 250, despite appearing to be part of the 200 range, actually uses a G92 Core (it’s almost identical to the GeForce 9800 GTX+), so am I correct in thinking that Boinc rates this correctly as 85GFLOPS, and that the cards name is just an oddity/misnoma?
It's actually the GTS 250, not GTX 250. NVidia apparently thinks this single unassuming letter is enough for people to realize that what they are going to buy is performance wise identical to the 9800GTX+. Or they just want to screw customers into thinking the GTS 250 is more than it actually is.
The NVIDIA GeForce GTX 260 (192sp) on the other hand does use a G200 Core (as does the denser 260, the 270, 275, 280, 285, 290, and 295 cards). So does Boinc under rate this GTX 260 (192sp) as a 85GFLOPS card?
Would it be more accurate for Boinc to rate this card as 85 X 1.41 = 120GFLOPS ?
In short: yes :D
You can see in the post I linked to that the GTS 250 is theoretically capable of 705 GFlops, whereas "GTX 260 Core 192" is rated at 715 GFlops. So the BOINC benchmark is quite accurate in reproducing this.
However, due to advanced functionality in the G200 design GPU-Grid can extract more real-world-GFlops from G200 than from G92 (these 41%). You could say the GTX 260 and all other G200-based cards deserve their rating to be multiplied by 1.41.
And since the BOINC benchmark uses a different code it can not reproduce this accurately, or not at all. If it was changed to include this effect it might get inaccurate for seti or aqua, as in their case G92 and G200 may be as fast, on a "per theoretical GFlop" basis. That's why I think a single benchmark number is not any more useful than the theoretical values.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I think I found the reading Boinc uses for its GFLOPS count...
From CUDA-Z:
32-bit Integer: 120753 Miop/s
From Boinc:
6/20/2009 12:55:51 CUDA device: GeForce GTX 260 (driver version 18585, CUDA version 1.3, 896MB, est. 121GFLOPS)
Bob
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Perhaps you are correct?
I only have a GeForce GTX 260 (192). Boinc rates it as 85GFLOPS.
CUDA-Z, Performance, 32-bit Integer, rates it as about 86000 Miop/s (but it fluctuates).
Is your card overclocked, as the other GTX 260 cards I listed were between 100 and 111GFLOPS? |
|
|
Ross*Send message
Joined: 6 May 09 Posts: 34 Credit: 443,507,669 RAC: 0 Level
Scientific publications
|
The influence of the CPU
If you look at the top computers that are crunching GPUGRD the CPU times are low between 1000 to 2000 for 4000 to 5000 credits.
Most are i7 CPUs and are using a couple of 295s
So has anyone done some research into what CPU setup is doing the best?
While the GPU cards are a known factor "we found GT200 to be 41% faster per GFLOP than G9x, so the BOINC benchmark underestimates this influence (it could not possibly reflect it correctly unless it used the actual GPU-Grid code)"
here is some huge differences in the amount of CPU time to do WUs
Assuming the WU is done within 24hrs I have had diffences of 1000 cpu time for 4500 credits to 6000 for 4500 credits.
Ross
____________
|
|
|
|
Is your card overclocked, as the other GTX 260 cards I listed were between 100 and 111GFLOPS?
Very much so...
current clocks
702/1566/1107
I'm tempted to use the voltage tuner to up the speed more though :) Temps at 67c (fan @ 70%) so lots of room there...
Bob |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Below is an updated CUDA Performance Table for cards on GPU-GRID, with reported Boinc GPU ratings, and amended ratings for G200 cores (in brackets) -only for compute capable 1.3 cards. (MrS calculated that G200 core CUDA cards operate at 141% efficiency compared to the reported Boinc GFLOPS).
This is a guide to Natively clocked card performance on GPU-GRID only (not for other projects)!
The following are mostly compute capability (CC) 1.1:
Don’t use with GPU-GRID, won’t finish in time!
GeForce 8400 GS PCI 256MB, est. 4GFLOPS
GeForce 8400 GS PCIe 256MB, est. 5GFLOPS
GeForce 8500 GT 512MB, est. 5GFLOPS
Quadro NVS 290 256MB, est. 5GFLOPS
GeForce 8600M GS 256MB, est. 5GFLOPS
GeForce 8600M GS 512MB, est. 6GFLOPS
Geforce 8500 GT, 512MB PCIe, 6GFLOPS
Not Recommended for GPU-GRID, unless on 24/7
GeForce 9600M GT 512MB, est. 14GFLOPS
GeForce 8600 GT 256MB, est. 14GFLOPS
GeForce 8600 GT 512MB, est. 15GFLOPS
GeForce 9500 GT 512MB, est. 15GFLOPS
GeForce 8600 GTS 256MB, est. 18GFLOPS
Entry Performance cards for GPU-GRID
GeForce 9600 GT 512MB, est. 34GFLOPS
GeForce 9600 GT 512MB, est. 37GFLOPS
GeForce 8800 GTS, 640MB, est. 41GFLOPS [CC 1.0]
Geforce 9600 GSO, 768MB (DDR2) 46GFLOPS
Geforce 9600 GSO, 384MB (DDR3) 48GFLOPS
Average Performance Cards for GPU-GRID
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 8800 GTX 768MB, est. 62GFLOPS [CC 1.0]
GeForce 9800 GT 1024MB, est. 60GFLOPS
Good Performance Cards for GPU-GRID
GeForce 8800 GTS 512MB, est. 77GFLOPS
GeForce 9800 GTX 512MB, est. 77GFLOPS
GeForce 9800 GTX+ 512MB, est. 84GFLOPS
GeForce GTX 250 1024MB, est. 84GFLOPS
Compute capability 1.3 [mostly]:
High End Performance Cards for GPU-Grid
GeForce GTX 260 896MB (192sp), est. 85GFLOPS (120)
Tesla C1060 1024MB, est. 93GFLOPS (131)
GeForce GTX 260 896MB, est. 100GFLOPS (141)
GeForce GTX 275 896MB, est. 123GFLOPS (173)
GeForce GTX 285 1024MB, est. 127GFLOPS (179)
GeForce GTX 280 1024MB, est. 130GFLOPS (183)
GeForce 9800 GX2 512MB, est. 138GFLOPS [CC 1.1]
GeForce GTX 295 896MB, est. 212GFLOPS (299)
I would speculate that given the 41% advantage in using compute capable 1.3 (G200) cards, GPU-GRID would be likely to continue to support these cards’ advantageous instruction sets.
For those that have compute capable 1.0/1.1 cards and 1.3 cards and participate in other GPU projects, it would make sense to allocate your 1.3 cards to GPU-GRID.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
current clocks 702/1566/1107
I'm tempted to use the voltage tuner to up the speed more though :) Temps at 67c (fan @ 70%) so lots of room there...
Bob
I would be happy enough with that performance - its about the same as a Natively clocked GeForce GTX 275!
If you are going to up the Voltage, select No New Tasks, and finish your existing work units first ;) |
|
|
|
Ross, please don't start the same discussion in 2 different threads!
popandbob wrote: I'm tempted to use the voltage tuner to up the speed more though :) Temps at 67c (fan @ 70%) so lots of room there...
You may want to take a look here.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
CUDA device: GeForce GTX 285 (driver version 18585, compute capability 1.3, 2048MB, est. 136GFLOPS)
This is an EVGA "FTW Edition", or factory overclocked to:
Core - 702MHz
Shader - 1584MHz
Memory - 2448MHz
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The influence of the CPU. If you look at the top computers that are crunching GPUGRD the CPU times are low between 1000 to 2000 for 4000 to 5000 credits. Most are i7 CPUs and are using a couple of 295s
So has anyone done some research into what CPU setup is doing the best?
Yes, I did a bit of research into this and found some interesting results!
Ultimately any given Work Unit will require a set amount of CPU processing and the overall time to complete this CPU Processing will vary with different CPU performances (or even an over clocked CPU). So, on the face of it, the faster the CPU, the faster you will complete a CUDA Work Unit (everything else being equal).
However, typical WU completion times of systems with fast CPUs V’s slow CPUs are not massively different. This is because the typical amount of CPU usage (running GPU-GRID) is only about 0.12 for a good CPU (the CPU runs 12% of the time), and because most systems are reasonably well balanced in terms of hardware.
Even if a slow CPU (Celeron 440) ran GPU-GRID 40% of the time, there would still be plenty of unused CPU time. It wouldn’t quite be the bottleneck you might think because the CPU is continuously doing small amounts of processing, waiting nanoseconds and then doing more small amounts... It does not have to run all the CPU calculations from start to finish before running the GPU CUDA calculations or vice versa; so there is not a massive bottleneck with slower CPU’s. The entire architecture of the CPU is not being exploited/stressed 100% of the time. My guess is that the differences (in terms of getting through a single GPU-GRID work unit on an average card) between an i7 and a Celeron 440 would be mainly to do with FSB speeds, Cache and instruction sets rather than CPU frequency or having 4 cores, and it would not be much!
If you take an extreme example of a Celeron 440 with a GTX 295, the Video card is obviously going to fly through its calculations, and ask more of the CPU than a GeForce GT 9600 would, over any given time. Obviously not many people are going to have such an unbalanced system, so it would be difficult to compare the above to a Q9650 (same socket) and a GTX 295.
Add another GTX 295 and the Celeron 440 would probably struggle to compute for 4 GPU tasks.
A Q9650 on the other hand would do just fine.
If you had a Q9650 and just the GT 9600, the impact on the CPU by running GPU-GRID would be negligible – but again this would be an imbalanced system (just like having an i7 with 512MB RAM would)!
Moving back into the world of the common sense systems, most people with Quad core (or better) CPU systems that crunch GPU-GRID WU’s also crunch CPU tasks, such as WCG, Climate Change... So the research I did was to work out if it was overall more beneficial to use 1 less CPU when crunching such tasks and GPU-GRID tasks. I actually looked at a more extreme example than CPU-GRID tasks. Aqua was running tasks that required 0.46 CPU’s + 1 CUDA. As I was using a Quad core, this actually meant that Aqua would use 46% of one core + the Graphics card. After calculating the credit I would get for the Aqua WU compared to the credit for a WCG work unit of approximately the same time to complete, I did find that it would be beneficial to manually configure Boinc to use 3 cores, and basically use 1 for Aqua. I found that when doing this that there was also some improvement in the other 3 CPUs throughput! So, Aqua sped up noticeably (on a card with either 60 or 77GFLOPS) and the other 3 WCG tasks sped up slightly, offsetting some of the 4th CPU loss.
Given the variety of CUDA cards, CPU’s, Projects and Work Units, you would probably have to do the analysis yourself, on your system.
I would guess that if you had a low end Quad CPU and a GTX 295’s you would be better to use no more than 3 CPU cores for crunching other projects and leave one CPU core free to facilitate the CPU processing of the GPU-GRID WU’s. At the minute you would probably need to do this manually by suspending and Resuming Boinc CPU tasks. But, if you had a GeForce 9600 and a Q9750, disabling a CPU core would overall reduce your contributions.
|
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
I've used (so far) 3 cards with BOINC.
9800GT 512Mb, 60Gflops reported by BOINC (as you'd expect same as 1Mb card)
GTS250 512Mb, 84Gflops reported by BOINC
GTX260 896Mb (216 shaders), 96Gflops reported by BOINC
All cards are stock speeds. The only one that appears to be different to the list above is the GTX260.
____________
BOINC blog |
|
|
|
GTX260 896Mb (216 shaders), 96Gflops reported by BOINC
We established that 85 GFlops is quite correct for the GTX 260 Core 192. Scaling just the number of shaders up should result in 85*216/192 = 95.6 GFlops, which is just what you're getting. The 100 are likely obtained from a mildly factory overclocked card.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
9800GT 512Mb, 60Gflops reported by BOINC (as you'd expect same as 1Mb card) I know you meant 1GB and that you know GPU-GRID uses between 70MB and 100MB [MrS], so for anyone else reading this, if you have 256MB or 1GB it should not make any difference for GPU-GRID.
Thanks for your confirmations, especially the 260 (216) :)
Someones bound to have a Quadro, come on, own up!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 1GB, est. 60GFLOPS
Reasons, 8800GT and 9800GT are almost identical,
512MB or 1GB makes no difference when crunching for GPU-GRID.
|
|
|
fractalSend message
Joined: 16 Aug 08 Posts: 87 Credit: 1,248,879,715 RAC: 0 Level
Scientific publications
|
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 1GB, est. 60GFLOPS
Reasons, 8800GT and 9800GT are almost identical,
512MB or 1GB makes no difference when crunching for GPU-GRID.
The only expected difference would be power consumption and the transition from 65nm to 55nm. |
|
|
|
I have got 141Gflops at my GTX 275 with overcklocked shader domain up to 1700Mhz. |
|
|
|
8800GT and 9800GT both never went to 55 nm officially. It's the same G92 chip. Really the only difference is that 9800GT supports hybrid power whereas 8800GT doesn't. Oh, and 9 sells better than 8, of course. There's got to be progress, after all!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Updated Boinc GFLOPS performance list.
Again, these are hopefully ALL Native scores only!
Your card might vary by a few GFLOPS due to different timings, manufacturers and versions of the card (they change the GPU, Memory and Shader clocks a bit).
First a Note on Compute Capable Requirements:
GPUGrid now only supports Compute Capable 1.1 and above (1.3).
Anyone with a 1.0 Compute Capable card will not be able to contribute!
Unfortunately that excludes the following cards,
GeForce 8800 GTS, 640MB, est. 41GFLOPS (compute capability 1.0)
GeForce 8800 GTX 768MB, est. 62GFLOPS (compute capability 1.0)
I would suggest that a minimum spec is 30GFLOPS.
So nothing below 30GFLOPS is listed this time.
CUDA CARD LIST WITH BOINC RATINGS IN GFLOPS
The following are mostly compute capability 1.1:
(check versions for obsolete G80 GPU versions)
GeForce 9600 GT 512MB, est. 34GFLOPS to 37GFLOPS
Geforce 9600 GSO, 768MB (DDR2) est. 46GFLOPS
Geforce 9600 GSO, 384MB (DDR3) est. 48GFLOPS
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 1024MB, est. 60GFLOPS
GeForce 8800 GTS 512MB, est. 77GFLOPS
GeForce 9800 GTX 512MB, est. 77GFLOPS
GeForce 9800 GTX+ 512MB, est. 84GFLOPS
GeForce GTX 250 1024MB, est. 84GFLOPS
GeForce GTX 260 896MB (192sp), est. 85GFLOPS
GeForce 9800 GX2 512MB, est. 138 GFLOPS
COMPUTE CAPABILITY 1.3:
Tesla C1060 est. 93GFLOPS (131)
GeForce GTX 260 est. 96GFLOPS to 111GFLOPS (135 to 156)
GeForce GTX 275 est. 123GFLOPS (173)
GeForce GTX 285 est. 127GFLOPS (179)
GeForce GTX 280 est. 130GFLOPS (183)
GeForce GTX 295 est. 212GFLOPS (299)
(1.41% Improvement Factor, for being 1.3 Capable)
I did not include RAM with the 1.3 capable cards, it’s irrelevant for GPUGRID; all are 896MB+
I left the RAM with the 1.1 cards, to help distinguish between the many models.
If you have a Compute Capable 1.1 or above NVIDIA card Not on this list, that is either Natively clocked or Factory Overclocked please add it to this post.
If you have a self overclocked card, post it with the native and overclocked ratings.
With details this time (You can Use GPU-Z):
For Example,
NVIDIA GeForce GTS 250:
<Card Manufacturer>
GPU G92, 128 Shaders
Native Clock rates; GPU 745MHz, Memory 1000MHz, Shader 1848 MHz
84GFLOPS
http://www.techpowerup.com/gpuz/hkf67/
Thanks, |
|
|
|
In your list of 1.1 cards: "GTX 250" should read "GTS 250" and the "GTX 260 896MB (192sp)" is capability 1.3.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
In your list of 1.1 cards: "GTX 250" should read "GTS 250" and the "GTX 260 896MB (192sp)" is capability 1.3.
MrS
Thanks again,
Corrected CUDA CARD LIST WITH BOINC RATINGS IN GFLOPS
The following are mostly compute capability 1.1:
(check versions for obsolete G80 GPU versions)
GeForce 9600 GT 512MB, est. 34GFLOPS to 37GFLOPS
Geforce 9600 GSO, 768MB (DDR2) est. 46GFLOPS
Geforce 9600 GSO, 384MB (DDR3) est. 48GFLOPS
GeForce 8800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 512MB, est. 60GFLOPS
GeForce 9800 GT 1024MB, est. 60GFLOPS
GeForce 8800 GTS 512MB, est. 77GFLOPS
GeForce 9800 GTX 512MB, est. 77GFLOPS
GeForce 9800 GTX+ 512MB, est. 84GFLOPS
GeForce GTS 250 1024MB, est. 84GFLOPS
GeForce 9800 GX2 512MB, est. 138 GFLOPS
COMPUTE CAPABILITY 1.3:
GeForce GTX 260(192sp) est. 85GFLOPS (120)
Tesla C1060 est. 93GFLOPS (131)
GeForce GTX 260 est. 96GFLOPS to 111GFLOPS (135 to 156)
GeForce GTX 275 est. 123GFLOPS (173)
GeForce GTX 285 est. 127GFLOPS (179)
GeForce GTX 280 est. 130GFLOPS (183)
GeForce GTX 295 est. 212GFLOPS (299)
(1.41% Improvement Factor, for being 1.3 Capable)
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Nothing else to complain about ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
COMPUTE CAPABILITY 1.3:
GeForce GTX 260(192sp) est. 85GFLOPS (120)
Tesla C1060 est. 93GFLOPS (131)
GeForce GTX 260 est. 96GFLOPS to 111GFLOPS (135 to 156)
GeForce GTX 275 est. 123GFLOPS (173)
GeForce GTX 285 est. 127GFLOPS (179)
GeForce GTX 280 est. 130GFLOPS (183)
GeForce GTX 295 est. 212GFLOPS (299)
(1.41% Improvement Factor, for being 1.3 Capable)
The GTX275 is coming up as 120 Gflops (stock speed) on my one machine that has it. What is the number in brackets after the Gflops?
____________
BOINC blog |
|
|
|
...
GeForce GTX 295 est. 212GFLOPS (299)
(1.41% Improvement Factor, for being 1.3 Capable)
Actually a factor of 1.41 or 141% or +41%, which ever you prefer. It makes the BOINC-GFlop ratings comparable between GT200 and older cards.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
=Lupus=Send message
Joined: 10 Nov 07 Posts: 10 Credit: 12,777,491 RAC: 0 Level
Scientific publications
|
Just for the notes:
Nforce 700-Series onboard-graphics:
CUDA 1.1 beacuse of onboard gforce8200/8300 , but with 3 GFlops and i think 16 SPs an extreme NO-GO for gpugrid. But it works well on seti.
=Lupus= |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I think I found the reading Boinc uses for its GFLOPS count...
From CUDA-Z:
32-bit Integer: 120753 Miop/s
From Boinc:
6/20/2009 12:55:51 CUDA device: GeForce GTX 260 (driver version 18585, CUDA version 1.3, 896MB, est. 121GFLOPS)
Bob
Boinc 6.6.36 for windows_x86_64:
CUDA device: GeForce GTX 260 (driver version 18618, compute capability 1.3, 896BM, est/ 104GFLOPS)
CUDA-Z 0.5.95:
32-bit Integer 115236 Miop/s
They now seem a bit disprate, but 115 is perhaps more realistic than Boincs estimate!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
GeForce GTX 260(192sp) est. 85GFLOPS (120)
Tesla C1060 est. 93GFLOPS (131)
GeForce GTX 260 est. 96GFLOPS to 111GFLOPS (135 to 156)
GeForce GTX 275 est. 123GFLOPS (173)
GeForce GTX 285 est. 127GFLOPS (179)
GeForce GTX 280 est. 130GFLOPS (183)
GeForce GTX 295 est. 212GFLOPS (299)
Is that correct that the 280 is (slightly) faster than the 285? From the board specs, the 285 should be a bit faster.... |
|
|
|
is ACEMD bandwidth bound? it seems like it should be getting closer to peak flops or does gpugrid use double precision? if so these numbers are impressive. |
|
|
Home PCSend message
Joined: 13 Jul 09 Posts: 1 Credit: 10,519,698 RAC: 0 Level
Scientific publications
|
Not sure about the Tesla C1060 numbers above, AFAIK all C1060s have 4096MB of GDDR3.
My C1060s report (standard from the factory), no OC as :
CUDA device: Tesla C1060 (driver version 19038, compute capability 1.3, 4096MB, est. 111GFLOPS)
"compute capability 1.3" is the core architecture version. |
|
|
|
GeForce GTX 250 1024MB, est. 84GFLOPS
It's a GTS-250 not a GTX-250.
But my numbers match up -->
MSI GTS-250, 185.18.36 driver, Linux 2.6.28-15 64b @ default clocks(760/1150)
Wed 30 Sep 2009 11:43:46 AM CDT NVIDIA GPU 0: GeForce GTS 250 (driver version 0, CUDA version 2020, compute capability 1.1, 511MB, est. 84GFLOPS)
XFX sp216 GTX-260, 190.36 driver, Linux 2.6.28-15 64b @ default clocks(576/999)
Thu 01 Oct 2009 07:55:04 PM CDT NVIDIA GPU 0: GeForce GTX 260 (driver version 0, CUDA version 2030, compute capability 1.3, 895MB, est. 96GFLOPS)
Same card with GPU @ 650 (linked shaders) mem @ 1100
Thu 01 Oct 2009 08:08:46 PM CDT NVIDIA GPU 0: GeForce GTX 260 (driver version 0, CUDA version 2030, compute capability 1.3, 895MB, est. 108GFLOPS)
____________
- da shu @ HeliOS,
"A child's exposure to technology should never be predicated on an ability to afford it." |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Sorry but after about 2hours we cant edit the typo's on this board (so, as was pointed out before, and corrected, it is a GTS250, CUDA 1.1 capable, and 84GFlops – says so on my tin)!
The list is just a rough guide.
Different cards will be clocked slightly differently and so you should expect the odd anomaly, such as the listed GTX 280 being faster than a GTX 285. Factory clock settings are not all identical, and later edition cards typically find some performance advantage. There are plenty of Factory Overclocked cards out there, and RAM is an easy target.
My GTX260 is rated as 104GFlops, right in the middle of the range I listed – it has plenty of headroom for overclocking; http://www.techpowerup.com/gpuz/w6pes/
If you have an overclocked card you can listing it and the details might help other select a card or clock it.
The Tesla T1060 might cost a lot but it is not as useful for this project as you might think; the project uses less than 256MB RAM, so the 4GB is not really beneficial here.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Overclocking my Palit GTX 260 (216)
I used Riva Tuner to up the speeds of my GTX 260 from the modest stock settings of; GPU 625, Mem 1100, Shaders 1348. Boinc rates that at 104 GFLOPS!
I first un-tentatively (seeing plenty of headroom, given the low temperatures) popped it up to GPU 666, Mem 1201, Shaders 1436. This gave me a Boinc GFLOPs rating of 111, about as good as it gets for a factory over-clocked GTX260, at an extra £20 or so.
I left the Fan control at Auto, and the system seemed happy to let the fan speed sit at 40%, about 3878 rpm. The temp rose from 60degrees C to 68degrees and then stabilized when running Milkyway@home.
I was happy enough with that, so I upped it again to GPU 702, Mem 1230, Shaders 1514. Just for reference, thats about a 12% increase all round and gave me 117 GFLOPS. With no change in temps I upped it again,
GPU 740, Mem 1269, Shaders 1596
That is an 18% increase, and a Boinc rating of 123GFLOPS, which equates to the same performance as a GTX 275!
Presently the GPU seems stable and is capable of running GPUGRID (65 degrees) and Milkyway@home (68 degrees); good enough for me. So the temps and fan speed did not rise any further, but when I upped it again (about 5 percent) the system became a bit unstable, so I withdrew to the above settings.
I noticed that the amount of CPU that Boinc reports GPUGRID needs rose from 19% to 23% This is on a Phenom II 940 overclocked from 3GHz to 3.3GHz.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GPU 740, Mem 1269, Shaders 1596 settings were fine for Windows but not for GPUGRID. I backed off to 117 Gflops, and that seems stable. |
|
|
|
@SKGiven: this is such a great piece of work you've put up here. It should be a sticky thread because it's so informative. I confirmed that you had good data by comparing your numbers to my 9800 GTX+ and 9800 GT...dead-nuts-on.
So I did some research on NVIDIA's site, EVGA's site, eBay, and e-comm vendors. My conclusion: if you have double width room in your rig, the GTX 275 is the bang-for-the-buck champ, by virtue of being virtually the same performance as the 280 or 285, but can be had for $200 (EVGA GTX 275 after rebate from Microcenter.com), plus upgrading the PSU if necessary ($59 after rebate for a Corsair 550W from newegg.com). $260 bucks for ~170 Gflops......wow! (To do the same with the GTX 295 would be over $600... for +67% performance, you'd pay +130%.)
Forget putting any money or KWh running extra CPUs of any sort....the game is to find places to stuff these boards into PCs of family and friends. "Here, let me upgrade your graphics card...." |
|
|
|
@SKGiven, great work. I've had some people asking me about what they want to buy, this may help them make a decision.
Here's a few to confirm your readings, native settings
I have 4 computers with one 8800GT each. Two at the moment have an older client and don't report this info.
GeForce 8800 GT (driver version 19038, compute capability 1.1, 512MB, est. 60GFLOPS)
GeForce 8800 GT (driver version 19038, compute capability 1.1, 512MB, est. 60GFLOPS)
I have an additional one that is factory overclocked. It runs just as kool as the others, XFX did a good job on building these. It has a higher reading because of the overclocking. Throwing this in just for comparison.
GeForce 8800 GT (driver version 19038, compute capability 1.1, 512MB, est. 65GFLOPS)
One thing about the XFX 8800GT is they are single slot width boards. I choose them for two reasons, I needed single width slot boards and they come with a double lifetime warranty. They also only take one six pin power connector.
One thing to note about upgrading, be sure your power supply has enough power and you have the required power connector(s) for these cards. Also check if you have room for a double wide card or not. Most manufacturers will list these requirements on thier website under the product description or technical specs. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks for your +ve contributions and the rare compliments. It was worth the effort because this thread has already had over 4400 hits. I am sure a few people went away slightly wiser and have kept or increased their contribution to GPUGRID because of what is here. Keep chipping in with the sound advice.
Good choices of cards and advice from Cheech Wizard and Krunchin-Keith. Size matters as do the Amps!
When the next NVIDIA line up hits the shops I hope people will report their findings too, and perhaps soon there will be ATI cards to add as well! In the mean time perhaps a few people could list their rare or OC cards & specs (factory or personal efforts)?
Thanks, |
|
|
|
@SKG: just put in the GTX 275 (stock,892 MB, not overclocked). Your number is 127 reported (x1.41 for actual GFlops.) BOINC reported 125 to me...so it's another good data point I can vouch for, in addition to the 9800 GTX+ and the 9800 GT.
Yay! Another 170+ GFlops for GPUGrid. Now I have to find a home for the 9800 GTX +. Have a couple of relatives who are already letting me run BOINC on their desktops. Just need to pick up another Corsair 550 and let them let one of them let me do a little quick surgery. My 3-card total will be 310+ GFlops.
Can't wait to see what kind of numbers the 300 series will turn in...and what it will do to the prices of the GTX 295.
Keep it going, SKG! It's all 'cause of this thread, man! |
|
|
|
And to further underscore the validity of some of this data:
Now that my new GTX 275 has crunched and reported it's first GPUGrid work unit, I went into my account history to compare. I was able to find 3 work units of identical size (3977 credits claimed/5369 awarded) that had been done by my 3 different boards. Took run time (sec)/3600= hours each. These numbers, by the way, are in line with what I've observed.
To wit:
GTX 275: 7.25 hours (173 Gflops, per this thread)
9800 GTX+ 14.5 hours (84 Gflops)
9800 GT 20.38 hours (60 Gflops)
Any way that you normalize these 3, you'll find that the results are quite linear. Bottom line: double your Gflop rating, double your work. Sounds obvious, I know, but I thought I'd throw some empirical data out here to further validate SKGiven's table. I was not sure that the 1.41 factor for compute capability 1.3 vs 1.1 would hold true, but it seems to. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks Cheech Wizard,
I think the GTX 275 cards are the best value for money, at the minute. So, good purchase.
The 1.41 improvement factor was all Extra Terrestrial Apes work. I added it into the list as it allows people to compare Compute Capable 1.1 and CC 1.3 cards, and see how they match up where it matters, crunching here on GPUGRID. I doubt that anyone reading this thread would want to buy a high spec CC1.1 card now, and people know to avoid the old CC1.0 cards.
Your data is good stuff. Confirmations like that make it clear how much better the 1.3 cards are.
I would guess that prices might drop a bit just before Christmas or for the sales. You might see some 300s on sale before then, but who knows, it could be next year. I Wonder if there is a CC1.4 or CC1.5 on the horizon?
I checked the Boinc rating of an ION recently. It is only 6GFlops! Your card is about 30times as fast. Mind you it is better than an 8600M at 5GFlops and good enough for HD even on an Atom330 system. But its just not a cruncher.
The Corsair 550’s are good kit too, well worth the money.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This is nicked from another GPUGrid thread, but it is relevant here too.
tomba reported GT 220 specs:
GPU 1336MHz
GPU RAM 789MHz
GPU Graphics 618MHz
With 48 Shaders, 1024MB, est. 23GFLOPS
Wait for it, Compute Capable 1.2.
I did not even think that existed!
First there was 1.0, then there was 1.1, then 1.3 and now there is 1.2.
What strange number system you have, Mr. Wolf - Sort of like your GPU names.
Aaahhh bite me.
Funnies aside, it appears to respond to present work units similar to CC 1.3 cards, getting through work units relitively faster than CC1.1 Cards. In this case I guess it is about 23*1.3% faster. So it gets through 30% extra.
So it has an effective Boinc GFlop value of about 30GFlops – not bad for a low end card and it just about scrapes in there as a card worth adding to the GPUGrid, if the system is on quite frequently and you don’t get lumbered with a 120h task!
The letter box was open, but this one was snuck under the door all the same! |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
I thought i'd throw the following in now that BOINC has standardized the formula between the different brands of cards. You'll notice that they are now shown as "GFLOPS peak". The startup info is from BOINC 6.10.17 and none of the cards listed are overclocked.
Sulu:
31/10/2009 9:38:55 PM NVIDIA GPU 0: GeForce GTX 295 (driver version 19062, CUDA version 2030, compute capability 1.3, 896MB, 596 GFLOPS peak)
31/10/2009 9:38:55 PM NVIDIA GPU 1: GeForce GTX 295 (driver version 19062, CUDA version 2030, compute capability 1.3, 896MB, 596 GFLOPS peak)
Chekov:
31/10/2009 9:30:14 PM ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.3.145, 1024MB, 1000 GFLOPS peak)
Maul:
31/10/2009 2:53:28 AM NVIDIA GPU 0: GeForce GTX 260 (driver version 19062, CUDA version 2030, compute capability 1.3, 896MB, 537 GFLOPS peak)
31/10/2009 2:53:28 AM NVIDIA GPU 1: GeForce GTX 260 (driver version 19062, CUDA version 2030, compute capability 1.3, 896MB, 537 GFLOPS peak)
Spock:
31/10/2009 9:58:51 PM NVIDIA GPU 0: GeForce GTX 275 (driver version 19062, CUDA version 2030, compute capability 1.3, 896MB, 674 GFLOPS peak)
____________
BOINC blog |
|
|
|
It's interesting that those figures don't correspond with the ones in koschi's FAQ. I had hoped that the new BOINC detection mechanism would eliminate the confusion between what I've called 'BOINC GFlops' and 'marketing GFlops', but it seems we now have a third unit of measurement.
Anyone got a formula to consolidate them? |
|
|
|
@Richard: well, I can report one data point for a possible conversion factor: the same GTX 275 (referenced above) reported 125 GFlops by my BOINC 6.6.36 client, would be 176GFlops using the 1.41x factor for CC 1.3 (per above in this thread), is reported by 6.10.17 as 700 GFlops Peak.
So does the 6.6.x number X 5.6 = 6.10.17 number? Does the 6.10.17 number / 3.98 = the old number adjusted for CC 1.3 (in other words, does the new rating system discern between CC 1.1 and CC 1.3 and adjust its rating accordingly?) It will take others reporting old numbers vs 6.10.x numbers to establish consistency/linearity (or lack thereof.)
Jeez...just when it looked like we had this straight~! |
|
|
|
@MarkJ: just to clarify, and correct me if I'm wrong, in your sample data above, user Sulu reports 596 GFlops peak per core for his GTX 295 (1192 total for one card), whereas user Maul's system has a pair of GTX 260s, and is reporting 537 GFlops each. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
MarkJ , thanks for the update info.
I updated two of my Boinc clients from 6.10.6 to 6.10.17
I noted that my 64bit Vista Ultimate version was not detected on the web site and it tried to give me the x86 client! On the other hand it spotted my Windows 7 64bit system and allocated the correct x64 client.
My GTS 250 use to be reported as 84GFlops, now it is reported as 473GFlops
My GTX 260 use to be reported as 104GFlops, now it is reported as 582GFlops
Obviously the 1.41 factor for Compute Capable 1.3 cards has not been fully appreciated here, but it has to a smaller extent, and the GFlops are still Boinc GFlops as they do not match the industry standard.
Koschi has a GTS 250 at 705 and a GTX 260 at 804
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I think the new rating system is to allow for the comparison of NVidia and ATI cards.
Several new ATI cards have been released recently along with entry NVidia cards such as the GT220, with its CC1.2 GPU.
So the new system is to prevent the picture becoming cloudier, especially when the G300 range hits the shelves.
With the new Boinc GFlops rating system, it would seem that a CC 1.2 Card will have a Factor of about 1.2 and a CC1.3 card will have a factor of about 1.3. If so, that would make good sense, but it does need to be verified. That is an Open Invitation. |
|
|
|
here is my data
11/3/2009 11:33:39 Starting BOINC client version 6.10.17 for windows_intelx86
11/3/2009 11:33:39 log flags: file_xfer, sched_ops, task
11/3/2009 11:33:39 Libraries: libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3
11/3/2009 11:33:39 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
11/3/2009 11:33:39 Running under account User
11/3/2009 11:33:39 Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz [x86 Family 6 Model 15 Stepping 11]
11/3/2009 11:33:39 Processor: 4.00 MB cache
11/3/2009 11:33:39 Processor features: fpu tsc sse sse2 mmx
11/3/2009 11:33:39 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
11/3/2009 11:33:39 Memory: 2.00 GB physical, 4.85 GB virtual
11/3/2009 11:33:39 Disk: 232.88 GB total, 196.46 GB free
11/3/2009 11:33:39 Local time is UTC -6 hours
11/3/2009 11:33:40 NVIDIA GPU 0: GeForce GTX 260 (driver version 19107, CUDA version 2030, compute capability 1.3, 896MB, 510 GFLOPS peak)
____________
|
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
@MarkJ: just to clarify, and correct me if I'm wrong, in your sample data above, user Sulu reports 596 GFlops peak per core for his GTX 295 (1192 total for one card), whereas user Maul's system has a pair of GTX 260s, and is reporting 537 GFlops each.
Yep thats correct.
The GTX260's are almost a year old now, so probably not the most recent design. The GTX295's are the single PCB version, so may be different to the dual PCB version.
____________
BOINC blog |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
My ATI HD 4850 is now rated. Boinc says this 800 shader device with a core @ 625MHz and 512MB DDR3 offers up 1000 GFlops! Thats within 20% of a GTX295. Excellent value for money, if it can be hooked up here. It fairly zips through the Folding@home tasks, but as for GPUGRID, the proof will be in the pudding.
I guess the HD 5970 (when released) will weigh in at around 5000GFlops! With its two 40nm cores @ 725Mhz, 2x1600shaders and 2GB DDR5 @4GHz. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Well, so much for the comparison list of cards and performance.
It now looks like the only cards capable of consistently completing tasks are the G200 GPU based cards. Given the increase in task length, this really narrows the range to 5 expensive top end cards:
GTX260 216sp, GTX275, GTX280, GTX285 and the GTX295
Task failure rates for the G92 cards are now so high that for many it is not worth bothering. I have retired one card from the project, a 8800 GTS 512MB. That only leaves my GTX260 and GTS250. The GTX260 is running very well indeed thank you, but as for the GTS250? It lost a total of 45h run time last week, due to task failures. So 25% of the time it was running was wasted! That is a top of the range G92 card and the GPU sits at 65degrees C and is backed by a stable Q9400 @3.5GHz.
Perhaps some people with lower end G200 cards (Geforce 210/220) might still white knuckle it for several days to get through the odd task, and the mid range, Geforce GT 240 and Geforce GTS 240 cards can still contribute, but they are a bit tame!
If the G300 series does not turn up, its ATI or bye-bye project time!
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
This is nicked from another GPUGrid thread, but it is relevant here too.
tomba reported GT 220 specs:
GPU 1336MHz
GPU RAM 789MHz
GPU Graphics 618MHz
With 48 Shaders, 1024MB, est. 23GFLOPS
This is tomba. I've been running my GT 220 24/7 for three months.
BOINC sees:
05/12/2009 11:19:05 NVIDIA GPU 0: GeForce GT 220 (driver version 19107, CUDA version 2030, compute capability 1.2, 1024MB, 128 GFLOPS peak)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks for posting these details.
The discrepency between both cards is due to the Boinc Version. The older Boinc Versions used a different system to calculate the GPU performance.
So the old value of 28GFlops equates to the new value of 127GFlops for the GeForce GT 220 cards. |
|
|
|
Thanks for posting these details.
The discrepency between both cards is due to the Boinc Version. The older Boinc Versions used a different system to calculate the GPU performance.
So the old value of 28GFlops equates to the new value of 127GFlops for the GeForce GT 220 cards.
The old, lower GFlops figure is always shown as "est. nn GFLOPS"
The new, higher figure will always be shown as "nnn GFLOPS peak"
Other things (clock rate etc.) being equal, the 'peak' figure will always be 5.6 times the 'est.' figure. (source: [19310], coproc.h) |
|
|
|
Below is an updated CUDA Performance Table for cards on GPU-GRID, with reported Boinc GPU ratings, and amended ratings for G200 cores (in brackets) -only for compute capable 1.3 cards. (MrS calculated that G200 core CUDA cards operate at 141% efficiency compared to the reported Boinc GFLOPS).
This is a guide to Natively clocked card performance on GPU-GRID only (not for other projects)!
[snip]
I would speculate that given the 41% advantage in using compute capable 1.3 (G200) cards, GPU-GRID would be likely to continue to support these cards’ advantageous instruction sets.
For those that have compute capable 1.0/1.1 cards and 1.3 cards and participate in other GPU projects, it would make sense to allocate your 1.3 cards to GPU-GRID.
But what other GPU projects are currently capable of using 1.1 cards? My search for them has not found any I'm interested in that will use the G105M
card on my laptop, and I've tried just about all of those suggested except SETI@home. Due to driver availability problems, it's currently limited to CUDA 2.2; the Nvidia site says the 190.* series is NOT suitable for this card. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You could try Einstein, but dont expect too much from that project. In a way it would be suited to that project. It wont stress the GPU too much! |
|
|
|
I recently found that the 195.62 driver solves the problem with CUDA level for that card, for MOST laptops including that one. It now has both Collatz and Einstein workunits in its queue. For Einstein, helping them develop their CUDA software currently looks more likely than speeding up the workunits very soon. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The following GeForce cards are the present mainstream choice,
GT 220 GT216 40nm Compute Capable 1.2 128 BoincGFlops peak
GT 240 GT215 40nm Compute Capable 1.2 257 BoincGFlops peak
GTX 260 Core 216 GT200b 55nm Compute Capable 1.3 582 BoincGFlops peak
GTX 275 GT200b 55nm Compute Capable 1.3 674 BoincGFlops peak
GTX 285 GT200b 55nm Compute Capable 1.3 695 BoincGFlops peak
GTX 295 GT200b 55nm Compute Capable 1.3 1192 BoincGFlops peak
You should note the EXACT GPU Core before buying one:
GT200b is Not a GT200
55nm is Not 65nm
If it is not EXACTLY as above don’t get it!
The GT220 is slow but it will get through the tasks in time.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Details of a GT240 GV-N240D3-1GI made by GIGABYTE:
PCIE2.0 1GB DDR3/128BIT Dual Link DVI-I/D-Sub/HDMI
Compute Capable 1.2, 280 Boinc GFlops
Full Specs:
GPU: GT215 Revision: A2 Technology: 40 nm Die Size: 727 mm² BIOS Version: 70.15.1E.00.00 Device ID: 10DE - 0CA3 Bus Interface: PCI-E x16 @ x16 Subvendor: Gigabyte (1458) ROPs: 8 Shaders: 96 (DX 10.1) Pixel Fillrate: 4.8 GPixel/s Texture Fillrate: 19.2 GTexel/s Memory Type: DDR3 Bus Width: 128 bit Memory Size: 1024 MB Bandwidth: 25.6 GB/s Driver: nvlddmkm 8.17.11.9562 (ForceWare 195.62) / 2008 R2 GPU Clock: 600 MHz 800 MHz 1460 MHz Default Clock: 600 MHz 800 MHz 1460 MHz
Comments,
WRT crunching for GPUGrid, this offers up just under half the GFlops power of a GTX260 216sp card.
In terms of power consumption it is much more efficient and does not require any special connectors. It should therefore appeal to people that don’t want to buy an expensive PSU at the same time as forking out for a new GPU, or a completely new computer. This particular card is also short and should fit many more computers as a result.
The one I am testing benefits from a large fan blowing directly onto it, from the front bottom of the case, and having the 3 blanking plates beneath it at the rear removed. The result; it is running GPUGrid at an amazingly cool 37 Degrees C!
- GPU Load @ 69% and Memory Controller @ 34% running an m3-IBUCH_min_TRYP task.
So with the low power requirements and its 40nm core it runs cool and would no doubt be very quiet. As for the power consumption, my systems total power consumption when crunching on 4 CPU cores @ 100% moved from 135W to 161W when I started running GPUGrid on top of that. So it only added on another 26W.
The card also drops its power usage considerably when not in use. It guess it uses about 10W when idle.
As for real world crunching performance (completed vs crashed tasks), only time will tell, but it seems to have a lot going for it despite lacking the performance capabilities of top GTX GT200b cards. |
|
|
CTAPbIiSend message
Joined: 29 Aug 09 Posts: 175 Credit: 259,509,919 RAC: 0 Level
Scientific publications
|
eVGA GTX275 Oced up to 702/1584/1260 gives me 760 GFlops
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GT240 took about 14h to complete its first task. There were no problems. Unfortunately there was no bonus either, due to the nature of the work unit. At that rate you would get about 6500points per day, but if you got early finish bonuses (the next task should) you would probably get about 7500 or 8000points per day with this card.
1641191 55914 17 Dec 2009 19:47:06 UTC 18 Dec 2009 13:38:51 UTC Completed and validated 51,800.35 12,269.58 2,831.97 3,823.15 Full-atom molecular dynamics v6.71 (cuda23) |
|
|
|
The GT240 took about 14h to complete its first task. There were no problems. Unfortunately there was no bonus either, due to the nature of the work unit. At that rate you would get about 6500points per day, but if you got early finish bonuses (the next task should) you would probably get about 7500 or 8000points per day with this card.
1641191 55914 17 Dec 2009 19:47:06 UTC 18 Dec 2009 13:38:51 UTC Completed and validated 51,800.35 12,269.58 2,831.97 3,823.15 Full-atom molecular dynamics v6.71 (cuda23)
Hi SKGiven
You did get a bonus claimed was as above 2831 granted was 3823.
I would like to know if you can use this card to crunch while you are using your computer and in particular whether you can watch video at the same time or does it make video stutter.
Thanks for the info.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You are correct, I did not read it properly! After a quick look at the sent and due date I jumped to the wrong conclusion; a few days ago a task was sent to me the same day it was due for return, so no bonus was granted, and this one looked like it was doing the same thing.
The GT240 card completed another task; one of the longer GIANNI_VIL tasks. It took 23h. So the card should bring home about 6649points a day, as long as it keeps working perfectly!
People should note that this card has a 600MHz core and 96 Shaders. Most other cards have a 550MHz core and some cards also purport to have 112 Shaders. I have not yet come across a card with both a 600MHz core and 112 Shaders. The RAM frequencies also vary by about 20% over the range of cards.
If people can report their cards performance and details (Points per day, GFlops, Shader count and frequencies) then an exact performance table could be produced. I only have one of these cards at the minute so I cant do this.
WRT watching videos and crunching:
After noticing that playing video coincided with one GPUGrid work unit failure last week, I decided to disable GPU use while using the computer with my GTS250 (G92b core), to see if that improved things. I think it has; no failures since, but I expect some other measures the Techs introduced have helped things too. G92 core cards are subject to a CUDA bug. So, for anyone that has one I would suggest that as a general rule people do not use the GPU when watching videos, or even when using the system; it would be even more annoying than usual to find that an unwanted website commercial caused a task to fail after 9h.
The GT240 uses a GT215 core which does not presently suffer from such problems. For testing purposes I enabled GPU Crunching when the computer was in use and then watched part of a 1080p movie. Using GPU-Z, I could see that the GPU RAM useage went up by 8MB and that the GPU usage did fluctuate a little, however the films quality was perfectly normal. There was no glitch graphics or sound. The test system had a 2.2GHz Phenom/opteron core with 4GB RAM at its disposal and was using all 4 cores to crunch on Boinc at the same time. So, I was very impressed with the cards performance, as well as Boincs; being able to crunch CPU and GPU tasks simultaneously while the system played a 1080p movie.
When watching video with my old 8800GTS 512 (G92 core), and trying to run GPUGrid at the same time, this was not the case!
With low quality video there was no discernable difference.
With medium quality video there was the occasional jump, but hardly noticeable.
With TV quality, video interruptions were a bit too frequent for my liking.
For HD / 1080p and better, I just could not watch the video and run GPUGrid at the same time. |
|
|
|
Well I've got the first 240 going on a system with win 7 and a core duo 4300 with only 2 gig of ram.
IPlayer stutters with this card running GPU Grid so had to set it to run only when computer had been idle for 2hrs so I'm concerned I wont be getting my bonus:(
If GPU Grid would give me the option to only have one WU at a time I might get my bonus Grrrr
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
|
Well I've got the first 240 going on a system with win 7 and a core duo 4300 with only 2 gig of ram.
IPlayer stutters with this card running GPU Grid so had to set it to run only when computer had been idle for 2hrs so I'm concerned I wont be getting my bonus:(
If GPU Grid would give me the option to only have one WU at a time I might get my bonus Grrrr
I've found that if you set the sum of Connect Every and Additional Work Buffer low enough, you can get the bonus. 0.3 days is low enough to get the bonus with my 9800 GT. That usually gives me only one workunit except when the first one is at least half done; then two. Running nearly 24 hours a day.
|
|
|
|
Setting for additional work was 0.25 lowered it to 0.1 and left blank for connect.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Betting Slip, I am a bit surprised to hear that your media player stutters.
I tested my card and there was no stuttering when playing a 1080p movie, but my systems design is different (AMD quad, 4GB).
Perhaps the problem is i-player, the system, an update, a monitor driver or a scan was running?
Do you run CPU tasks as well?
I did note in the past that occasionally some tasks do interfere with normal system use, but only at certain times; when a task is completing or uploading it uses more resources than normal (Hard Disk, RAM and Internet).
I like the low power design. Your GPU is probably doing over ten times the work of an i7.
What are the specs of your card (GPU-Z and Boinc)?
PS. You could try setting additional work to 0.00 (might work, but keep an eye on it - you might not get any new tasks)! |
|
|
|
Unfortunately the computer is remote and haven't got ready access to it. I have lowered additional work to 0.05.
I will be installing on local computers 2 more of these cards one with 1gig ddr3 and one with 512MB ddr5 so will be able to give you more up to date info on these particular cards.
I am also going to add another 2GIG of RAM to the remote macjine because with Win 7 X64 it could be short of memory and hence the slight stutter.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
|
24/12/2009 11:44:04 Starting BOINC client version 6.10.13 for windows_x86_64
24/12/2009 11:44:05 Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz [Intel64 Family 6 Model 23 Stepping 7]
24/12/2009 11:44:05 Processor: 3.00 MB cache
24/12/2009 11:44:05 Processor features: fpu tsc pae nx sse sse2 pni
24/12/2009 11:44:05 OS: Microsoft Windows 7: Enterprise x64 Edition, (06.01.7600.00)
24/12/2009 11:44:05 Memory: 4.00 GB physical, 8.00 GB virtual
24/12/2009 11:44:05 Disk: 368.10 GB total, 325.36 GB free
24/12/2009 11:44:05 Local time is UTC +0 hours
24/12/2009 11:44:06 NVIDIA GPU 0: GeForce GT 240 (driver version 19107, CUDA version 2030, compute capability 1.2, 512MB, est. 46GFLOPS)
This is what BOINC says for GT 240 512MB DDR5
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
dsSend message
Joined: 27 Nov 09 Posts: 1 Credit: 116,640 RAC: 0 Level
Scientific publications
|
from log file:
NVIDIA GPU 0: GeForce 9800 GTX+ (driver version unknown, CUDA version 2030, compute capability 1.1, 512MB, 470 GFLOPS peak)
I ran the "autodetect" in The Nvidia X-server settings/configuration tool and it upped the GPU speed a little, to 800 MHz, from 756MHz.
(driver is 190.53)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks for posting your info, |
|
|
|
How practical would it be to persuade the BOINC developers to add code that also reports the GPU chip type during BOINC startup? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It would be a useful tool.
If it can be done by GPU-Z there is no reason why something similar could not be included in Boinc. Perhaps they could speak to GPU-Z and include it as an optional add-on during the installation? You would have to contact the Boinc developers at Berkeley to make any such suggestion.
http://boinc.berkeley.edu/dev/
|
|
|
|
1/15/2010 2:07:52 PM Starting BOINC client version 6.10.18 for windows_x86_64
1/15/2010 2:07:52 PM Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz [Intel64 Family 6 Model 23 Stepping 10]
1/15/2010 2:07:52 PM Processor: 3.00 MB cache
1/15/2010 2:07:52 PM Processor features: fpu tsc pae nx sse sse2 pni
1/15/2010 2:07:52 PM OS: Microsoft Windows 7: Ultimate x64 Edition, (06.01.7600.00)
1/15/2010 2:07:52 PM Memory: 4.00 GB physical, 8.00 GB virtual
1/15/2010 2:07:52 PM NVIDIA GPU 0: GeForce GTX 260 (driver version 19562, CUDA version 3000, compute capability 1.3, 896MB, 605 GFLOPS peak)
this is what boinc says for my card. it is a zotac 216 gtx 260 factory oc'ed to 650mhz/1050mhz(mem)/1400mhz(shader)
Also I'm new here. it's been about a week can you explain why boinc says more than the ones on this page's list? Like it shows 100 but boinc says 605 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The more recent versions of Boinc have an updated GPU rating system.
You would probably need to read this whole thread to appreciate that, and it is getting rather long! |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A set of Notebook GPU Comparison charts
http://www.notebookcheck.net/Mobile-Graphics-Cards-Benchmark-List.844.0.html
|
|
|
MC707 Send message
Joined: 6 Jun 09 Posts: 1 Credit: 0 RAC: 0 Level
Scientific publications
|
NVIDIA GPU 0: GeForce 9400 GT (driver version unknown, CUDA version 2020, compute capability 1.1, 1023MB, 29 GFLOPS peak)
Well, that's my crappy 9400 GT. I post it since I don't see it in the first post! My 2 cents
Regards
____________
|
|
|
liveonc Send message
Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level
Scientific publications
|
Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Family 6 Model 15 Stepping 11] OC'ed to 3.00GHz
Processor: 4.00 MB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe
OS: Microsoft Windows 7: x64 Edition, (06.01.7600.00)
Memory: 4.00 GB physical, 8.00 GB virtual
Disk: 310.33 GB total, 276.25 GB free
NVIDIA GPU 0: GeForce GTX 260 (driver version 19562, CUDA version 3000, compute capability 1.3, 896MB, 629 GFLOPS peak)
NVIDIA GPU 1: GeForce GTX 260 (driver version 19562, CUDA version 3000, compute capability 1.3, 896MB, 629 GFLOPS peak)
OC'ed with Gainward Expertool to 3D Core Clock 676MHz Memory Clock 1150MHz Shader Clock 1455MHz
____________
|
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
6-3-2010 12:59:04 Starting BOINC client version 6.10.36 for windows_intelx86
6-3-2010 12:59:04 log flags: file_xfer, sched_ops, task
6-3-2010 12:59:04 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
6-3-2010 12:59:04 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
6-3-2010 12:59:04 Running under account Administrator
6-3-2010 12:59:06 Processor: 2 AuthenticAMD AMD Athlon(tm) Dual Core Processor 5200B [Family 15 Model 107 Stepping 2]
6-3-2010 12:59:06 Processor: 512.00 KB cache
6-3-2010 12:59:06 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm rdtscp 3dnowext 3dnow
6-3-2010 12:59:06 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
6-3-2010 12:59:06 Memory: 3.48 GB physical, 6.81 GB virtual
6-3-2010 12:59:06 Disk: 232.88 GB total, 202.96 GB free
6-3-2010 12:59:06 Local time is UTC +1 hours
6-3-2010 12:59:09 NVIDIA GPU 0: GeForce GTX 260 (driver version 19621, CUDA version 3000, compute capability 1.3, 896MB, 596 GFLOPS peak)
____________
Ton (ftpd) Netherlands |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
8-3-2010 10:32:55 Starting BOINC client version 6.10.36 for windows_intelx86
8-3-2010 10:32:55 log flags: file_xfer, sched_ops, task
8-3-2010 10:32:55 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
8-3-2010 10:32:55 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
8-3-2010 10:32:55 Running under account ton
8-3-2010 10:32:59 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5420 @ 2.50GHz [Family 6 Model 23 Stepping 10]
8-3-2010 10:32:59 Processor: 6.00 MB cache
8-3-2010 10:32:59 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx tm2 dca pbe
8-3-2010 10:32:59 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
8-3-2010 10:32:59 Memory: 3.00 GB physical, 5.84 GB virtual
8-3-2010 10:32:59 Disk: 232.87 GB total, 198.75 GB free
8-3-2010 10:32:59 Local time is UTC +1 hours
8-3-2010 10:32:59 NVIDIA GPU 0: GeForce GTX 295 (driver version 19634, CUDA version 3000, compute capability 1.3, 896MB, 596 GFLOPS peak)
8-3-2010 10:32:59 NVIDIA GPU 1: GeForce GTX 295 (driver version 19634, CUDA version 3000, compute capability 1.3, 896MB, 596 GFLOPS peak)
Another machine!
____________
Ton (ftpd) Netherlands |
|
|
liveonc Send message
Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level
Scientific publications
|
Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Family 6 Model 23 Stepping 10]
Processor: 6.00 MB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr
OS: Linux: 2.6.31-14-generic
Memory: 1.96 GB physical, 5.75 GB virtual
Disk: 581.16 GB total, 547.52 GB free
Local time is UTC +1 hours
NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 3000, compute capability 1.3, 895MB, 607 GFLOPS peak)
This one is a factory OC'ed XFX GeForce® 260 GTX 896MB DDR3 Black Edition (GX-260N-ADBF) running Mint Linux 8 64 bit.
____________
|
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
8-3-2010 12:02:28 Starting BOINC client version 6.10.36 for windows_intelx86
8-3-2010 12:02:28 log flags: file_xfer, sched_ops, task
8-3-2010 12:02:28 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
8-3-2010 12:02:28 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
8-3-2010 12:02:28 Running under account christa
8-3-2010 12:02:34 Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz [Family 6 Model 23 Stepping 6]
8-3-2010 12:02:34 Processor: 6.00 MB cache
8-3-2010 12:02:34 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx smx tm2 pbe
8-3-2010 12:02:34 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
8-3-2010 12:02:34 Memory: 1.98 GB physical, 3.82 GB virtual
8-3-2010 12:02:34 Disk: 232.87 GB total, 215.08 GB free
8-3-2010 12:02:34 Local time is UTC +1 hours
8-3-2010 12:02:38 NVIDIA GPU 0: Quadro FX 1700 (driver version 19187, CUDA version 2030, compute capability 1.1, 512MB, 59 GFLOPS peak)
I do not use this card for gpugrid. To slow.
WU - Collatz - gtx 295 - 30 min. fx 1700 - 4.3 hours.
____________
Ton (ftpd) Netherlands |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
9-3-2010 16:26:54 Starting BOINC client version 6.10.36 for windows_intelx86
9-3-2010 16:26:54 log flags: file_xfer, sched_ops, task
9-3-2010 16:26:54 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
9-3-2010 16:26:54 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
9-3-2010 16:26:54 Running under account Administrator
9-3-2010 16:26:54 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5420 @ 2.50GHz [Family 6 Model 23 Stepping 6]
9-3-2010 16:26:54 Processor: 6.00 MB cache
9-3-2010 16:26:54 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx tm2 dca pbe
9-3-2010 16:26:54 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
9-3-2010 16:26:54 Memory: 3.25 GB physical, 6.34 GB virtual
9-3-2010 16:26:54 Disk: 232.88 GB total, 205.44 GB free
9-3-2010 16:26:54 Local time is UTC +1 hours
9-3-2010 16:26:54 NVIDIA GPU 0: Quadro FX 570 (driver version 19187, CUDA version 2030, compute capability 1.1, 256MB, 29 GFLOPS peak)
I do not use this card for gpugrid. Too slow!!
____________
Ton (ftpd) Netherlands |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
10-3-2010 15:12:58 Starting BOINC client version 6.10.36 for windows_intelx86
10-3-2010 15:12:59 log flags: file_xfer, sched_ops, task
10-3-2010 15:12:59 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
10-3-2010 15:12:59 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
10-3-2010 15:12:59 Running under account Administrator
10-3-2010 15:13:11 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5420 @ 2.50GHz [Family 6 Model 23 Stepping 6]
10-3-2010 15:13:11 Processor: 6.00 MB cache
10-3-2010 15:13:11 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx tm2 dca pbe
10-3-2010 15:13:11 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
10-3-2010 15:13:11 Memory: 3.25 GB physical, 6.34 GB virtual
10-3-2010 15:13:11 Disk: 232.88 GB total, 205.13 GB free
10-3-2010 15:13:11 Local time is UTC +1 hours
10-3-2010 15:13:14 NVIDIA GPU 0: GeForce GTS 250 (driver version 19621, CUDA version 3000, compute capability 1.1, 1024MB, 470 GFLOPS peak)
Just replaced card quadro fx 570 into gts 250
____________
Ton (ftpd) Netherlands |
|
|
liveonc Send message
Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level
Scientific publications
|
Starting BOINC client version 6.10.36 for x86_64-pc-linux-gnu
log flags: file_xfer, sched_ops, task
Libraries: libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15
Data directory: /var/lib/boinc-client
Processor: 2 GenuineIntel Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz [Family 6 Model 15 Stepping 13]
Processor: 1.00 MB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm la
OS: Linux: 2.6.31-20-generic
Memory: 1.95 GB physical, 5.70 GB virtual
Disk: 223.62 GB total, 206.62 GB free
Local time is UTC +1 hours
NVIDIA GPU 0: GeForce 8800 GT (driver version unknown, CUDA version 2030, compute capability 1.1, 255MB, 336 GFLOPS peak)
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Boinc GFlops Peak
People should note that the calculated Boinc GFlops Peak is not always a fair reflection of a cards performance. It is a rough guide only.
For example, I have a GT240 in a system with a Q6600 overclocked to 3GHz. It has a GFlops rating of 299 but takes longer to complete a similar task as one completed on another GT240 that has a lesser GFlops rating of 288 on a natively clocked 2.2GHz quad opteron system. Despite having a more highly overclocked GPU, shaders, CPU and even a bigger fan, the card with a rating of 299GFlops is actually 16% slower (at best) than the 288GFlops rated card.
Why? Because the 288 card has DDR5 RAM ;) |
|
|
|
Comparison as far as it goes.
These cards both did the same WU. That's the only information I have.
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 285"
# Clock rate: 1.48 GHz
# Total amount of global memory: 2147155968 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Time per step: 23.829 ms
# Approximate elapsed time for entire WU: 20254.498 s
Above card on this system:
GenuineIntel
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Family 6 Model 23 Stepping 6]
Number of processors 2
Coprocessors NVIDIA GeForce GTX 285 (2047MB) driver: 19562
Operating System Microsoft Windows XP
Professional x64 Edition, Service Pack 2, (05.02.3790.00)
BOINC client version 6.10.36
Memory 4094.2 MB
Cache 6144 KB
Measured floating point speed 3934.24 million ops/sec
Measured integer speed 11620.4 million ops/sec
Average upload rate 24.79 KB/sec
Average download rate 139.59 KB/sec
Average turnaround time 0.99 days
___________________________________________________________________________
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.46 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# Time per step: 64.259 ms
# Approximate elapsed time for entire WU: 54620.303 s
Above card on this system:
GenuineIntel
Pentium(R) Dual-Core CPU E6300 @ 2.80GHz [Intel64 Family 6 Model 23 Stepping 10]
Number of processors 2
Coprocessors NVIDIA GeForce GT 240 (1024MB) driver: 19621
Operating System Microsoft Windows 7
Enterprise x64 Edition, (06.01.7600.00)
BOINC client version 6.10.18
Memory 4095.18 MB
Cache 2048 KB
Swap space 8188.51 MB
Total disk space 540.88 GB
Free Disk Space 489.41 GB
Measured floating point speed 3165.21 million ops/sec
Measured integer speed 9662.96 million ops/sec
Average upload rate 29.66 KB/sec
Average download rate 198.21 KB/sec
Average turnaround time 0.82 days
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
13-4-2010 9:58:53 Starting BOINC client version 6.10.45 for windows_intelx86
13-4-2010 9:58:53 log flags: file_xfer, sched_ops, task
13-4-2010 9:58:53 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
13-4-2010 9:58:53 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
13-4-2010 9:58:54 Running under account ton
13-4-2010 9:58:55 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5420 @ 2.50GHz [Family 6 Model 23 Stepping 10]
13-4-2010 9:58:55 Processor: 6.00 MB cache
13-4-2010 9:58:55 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx tm2 dca pbe
13-4-2010 9:58:55 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
13-4-2010 9:58:55 Memory: 3.25 GB physical, 6.34 GB virtual
13-4-2010 9:58:55 Disk: 232.87 GB total, 193.59 GB free
13-4-2010 9:58:55 Local time is UTC +2 hours
13-4-2010 9:58:55 NVIDIA GPU 0: GeForce GTX 480 (driver version 19741, CUDA version 3000, compute capability 2.0, 1536MB, 1345 GFLOPS peak)
____________
Ton (ftpd) Netherlands |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
15-4-2010 15:47:35 Starting BOINC client version 6.10.45 for windows_intelx86
15-4-2010 15:47:35 log flags: file_xfer, sched_ops, task
15-4-2010 15:47:35 Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
15-4-2010 15:47:35 Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
15-4-2010 15:47:35 Running under account Administrator
15-4-2010 15:47:35 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5420 @ 2.50GHz [Family 6 Model 23 Stepping 6]
15-4-2010 15:47:35 Processor: 6.00 MB cache
15-4-2010 15:47:35 Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx tm2 dca pbe
15-4-2010 15:47:35 OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
15-4-2010 15:47:35 Memory: 3.25 GB physical, 6.34 GB virtual
15-4-2010 15:47:35 Disk: 232.88 GB total, 204.28 GB free
15-4-2010 15:47:35 Local time is UTC +2 hours
15-4-2010 15:47:35 NVIDIA GPU 0: GeForce GTX 470 (driver version 19741, CUDA version 3000, compute capability 2.0, 1280MB, 1089 GFLOPS peak)
15-4-2010 15:47:36 Version change (6.10.43 -> 6.10.45)
____________
Ton (ftpd) Netherlands |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks Ton,
The GTX480 appears to be 23.5% faster than the GTX470 (just going by the reported GFlops peak rating).
The GTX480 costs about 40% more, so in terms of purchase value for money the GTX470 would seem to be a better choice.
In terms of running costs, longevity, overclocking and real task completion rates, we will have to wait and see. |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
Kev,
The prices in Holland are for the GTX480 - Euro 549,-- and for the GTX470 the price is Euro 399,-- including 19% VAT.
Monday i will sending over to Barcelona 1 GTX 480 and 1 GTX 470, so the guys can develop!!!
____________
Ton (ftpd) Netherlands |
|
|
|
This is nicked from another GPUGrid thread, but it is relevant here too.
tomba reported GT 220 specs:
GPU 1336MHz
GPU RAM 789MHz
GPU Graphics 618MHz
With 48 Shaders, 1024MB, est. 23GFLOPS
Wait for it, Compute Capable 1.2.
I did not even think that existed!
First there was 1.0, then there was 1.1, then 1.3 and now there is 1.2.
What strange number system you have, Mr. Wolf - Sort of like your GPU names.
Aaahhh bite me.
Funnies aside, it appears to respond to present work units similar to CC 1.3 cards, getting through work units relitively faster than CC1.1 Cards. In this case I guess it is about 23*1.3% faster. So it gets through 30% extra.
So it has an effective Boinc GFlop value of about 30GFlops – not bad for a low end card and it just about scrapes in there as a card worth adding to the GPUGrid, if the system is on quite frequently and you don’t get lumbered with a 120h task!
The letter box was open, but this one was snuck under the door all the same!
I am running a GT220 under Ubuntu 10.04 - 64-bit
Went I upgraded from 9.10 to 10.04, I finally got a newer NVIDIA driver than 185.x.x. It upgraded to 195.36.15. Now I do not get computational errors with Collatz, but have not been able to get any work, yet, for GPUGrid.
The BOINC Manager shows:
Mon 03 May 2010 02:05:32 PM EDT
NVIDIA GPU 0: GeForce GT 220 (driver version unknown, CUDA version 3000, compute capability 1.2, 1023MB, 131 GFLOPS peak)
for the card's info. I am told I needed 1.3 to run Milkyway, so that is out.
I just wonder how well it will run GG tasks. I also wonder why the
"driver version unknown" is shown. I think it did show 185.x.x before the
upgrade to Ubuntu 10.04.
Well above it shows a listing for 23 GFLOPS "est.".
The manager shows 131 GFLOPS peak. So what is going on with the difference?
Also I say a thread showing a GT9500 or 9400 doing GG. I have several PCI-only
computers, like a P4 server, that can only use a 9500/9400. I know that they
are SLOW cards, but are they too slow to do the work GPU work? Even a card
like 9400/9500 would crunch faster than the P4 does via CPU only.
Well, I am new to GPU, since my first successful GPU task ran a few days ago,
when I upgraded to the Ubuntu 10.04. So, I do not know the ins-and-outs of
GPU and what projects run what cards. I got the GT220 due to a budget that had
to be kept to when buying my 4-core system [Phenom 9650]. It came down to
PSU and drive size vs. GPU card size. This is my main computer and default
video editing one. So I needed the drive size, and got a 600W PSU for later
upgrades.
So any help/advice about GPU projects and tasks, would be helpful.
|
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
Welcome to the club.
So you can run seti@home, collatz, dnetc. These are gpu related jobs.
Succes.
Just saw that you already run the jobs.
____________
Ton (ftpd) Netherlands |
|
|
roundup Send message
Joined: 11 May 10 Posts: 63 Credit: 10,115,872,311 RAC: 50,669,683 Level
Scientific publications
|
Hi all,
what card would you recommend for a mobile device?
GTX285M? Does this one have CC 1.1 or 1.3?
Is there any potential for OCing with mobile versions?
Greetings and thanks in advance! |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It is generally not recommended to use a laptop GPU for crunching with; they tend to overheat and stop working!
The GeForce GTX 285M was released at the end of January this year. It uses a 55nm G92b core (CC1.3), has 1024MB DDR3 at 2040MHz, the 16GPU cores are clocked at 576MHz, and the 128 shaders at 1500MHz. It has a Manufacturer GFlops rating of 576 (not Boinc rating). The TDP is about 80W.
If you are determined to get a laptop with a GPU for crunching you might want to consider this alternative card:
GeForce GTS 260M GT215 40nm (CC1.2), 1024MB DDR5 at 3600MHz, 8 GPU cores 550MHz, 96 Shaders 1375MHz, 57.6 GDDR5 128. Manufacturer GFlops rating of 396.
The TDP is only 38W so it is less likely to overheat (40nm core, DDR5). |
|
|
roundup Send message
Joined: 11 May 10 Posts: 63 Credit: 10,115,872,311 RAC: 50,669,683 Level
Scientific publications
|
Thanks for the many useful information, skgiven!
I need to a buy a new notebook anyway (surely with i7 CPU and ~6GB RAM), so the idea was to take one with a GPU that is suitable for GPU crunching.
I did not know that the power consumption - and therefore temperature issues - is that differently between the GTX285M and GTS260M.
Thanks again :-) |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The power consumption of the GTS260M (similar specs to a GT240) is half that of the GTX285M:
The GTX285M also uses a 55nm core design and DDR3, while the GTX260M uses a 40nm core design and DDR5 (so the GTS260M produces less heat).
Although the GTX285M is more powerful, it would be more likely to run hot and cause problems to the laptop; it might be able to run at 90deg C but the rest of the laptop would probably not be, especially for hours at a time.
It is just my opinion (and I have no real world experience of either card) but I would opt on the safer side and get the GTX260M, if I wanted to crunch on a laptop, which I don’t. It is probably much cheaper too; the GTX285M is really NVidias top laptop gaming card, so its going to cost. It's also 2 year old technology revamped.
Most of us buy a laptop to last several years. Although I would not let it stop me buying a laptop now, I would expect some form of Fermi to show up in a laptop within a year.
I'm guessing you want an i7 laptop to crunch CPU tasks!
Not sure that its a good idea to crunch both GPU and CPU tasks on a laptop - thats a lot of heat to try to get rid of. If you do try to crunch both, get a cooling tray to set it on. |
|
|
|
Didn't check the mobile GPU specs, but if SK isn't totally wrong (which would be unusual) I'd strongly suggest the GTS260M. Apart from what has been said the 3 years old G92 (a or b) chip is CUDA compatibility level 1.1, so the new 40 nm is considerably faster at GPU-Grid due to CUDA compatibility level 1.2. This will make up for more than the 25% disadvantage in raw FLOPS, hence the card will be faster, cheaper and cooler!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Sorry, stupid mistake. GTX285M is as you say MrS CC1.1 (must have got GTX285 stuck in my head), went off on a tangent with that too!
Just one Important thing to watch out for.
There is a GTX260 55nm G92 core (CC1.1) - avoid at all costs,
and there is the GTS260 40nm G215 core (CC1.2) - the one to get, should you decide to GPU crunch on the laptop. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
There is another card that is almost as good as the GTS260M,
The GeForce GTS 250M
8 cores at 500MHz GT215 (40nm), 1024MB GDDR5, 96 shaders at 1250MHz, 360 GFlops, 28W TDP.
Basically its the same card, just clocked lower. |
|
|
roundup Send message
Joined: 11 May 10 Posts: 63 Credit: 10,115,872,311 RAC: 50,669,683 Level
Scientific publications
|
Thanks again for the helpful explaination. One more question concerning notebook versions:
What are the differences between GTS 260M and GTS 360M? Just the slighty higher clocked shaders in the 360M?
I cannot find any further differences here:
http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units#GeForce_300M_.283xxM.29_series (scroll up tp see the GTS260 data).
The NVIDIA Spec pages also do not show any further differences:
GTS 260M:
http://www.nvidia.co.uk/object/product_geforce_gts_260m_uk.html
GTS 360M:
http://www.nvidia.co.uk/object/product_geforce_gts_360m_uk.html
:-/ |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
YES, the only difference is the slightly faster clocks:
GTS360M GT215 40nm 550MHz GPU, 1436MHz shaders, 3600MHz RAM, 413 NVidia GFlops
GTS260M GT215 40nm 550MHz GPU, 1375MHz shaders, 3600MHz RAM, 413 NVidia GFlops
Basically the 300 series cards are rebranded 200 series cards. There are some dubious naming differences here and there; things to watch out for.
The NVidia site does not give much info!
Your wiki link shows the shear number of yesteryear cards they rebrand and re-release. |
|
|
roundup Send message
Joined: 11 May 10 Posts: 63 Credit: 10,115,872,311 RAC: 50,669,683 Level
Scientific publications
|
... interesting way to make the customers believe that there were any new products on the table.
Thanks again, skgiven :-) |
|
|
|
Yeah, it's gotten kind of disgusting. Every couple of months they take the same cards and give it a new name with a higher number to make them appear better. OK, that's a little exaggerated, but not that far from the truth either..
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
What bugs me even more is that NVidia’s own specification pages don’t even tell you the core type!
Instead you get fobbed off with false technical waffle such as "Vibrant Multimedia"
It's as well the specs can be found on wiki - otherwise buying NVidia cards would be a lucky dip. |
|
|
|
Makes you wonder what's more vibrant: 16 or 32 shaders? :p
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
liveonc Send message
Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level
Scientific publications
|
There are useless ways of rebranding, playing word games, & setting the clocks slightly higher or slightly lower. But Nvidia spends more money on software, so why not do something useful with that?
There must be some way fx of reprogramming that 3D vision to also support dual view. Then 2 different people can use the same screen at the same time w/o using half the screen. Just use two shutter shades, half the refresh rate, two keyboards & mice. Heck, if Philips, Samsung, Sony, etc want to add an extra bundle with their 3D, all it takes is 2 headphones to have two people watch two different channels.
____________
|
|
|
|
LOL! Now that's creative. At my work we're mostly using 2 19" screens per person.. mainly because we have them anyway and this gives more useable space than buying a new one. Having 2 people share a monitor would make work a little more.. intimate.
BTW: you'd also want to make sure you're running at least 120 Hz. Splitting 60 Hz would be quite unpleasant ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A comparative look at some of the Fermi cards and G200 cards
The following GeForce cards are the present mainstream choice, with reference clocks,
GT 220 GT216 40nm Compute Capable 1.2 128 BoincGFlops peak
GT 240 GT215 40nm Compute Capable 1.2 257 BoincGFlops peak
GTX 260 GT200b 55nm Compute Capable 1.3 596 BoincGFlops peak (sp216)
GTX 275 GT200b 55nm Compute Capable 1.3 674 BoincGFlops peak
GTX 285 GT200b 55nm Compute Capable 1.3 695 BoincGFlops peak
GTX 295 GT200b 55nm Compute Capable 1.3 1192 BoincGFlops peak
GTX 480 GF100 40nm Compute Capable 2.0 1345 BoincGFlops peak
GTX 470 GF100 40nm Compute Capable 2.0 1089 BoincGFlops peak
GTX 465 GF100 40nm Compute Capable 2.0 855 BoincGFlops peak
Two new cards will work here soon,
-edit; working now, but not fully optimized just yet.
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak (768MB)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak (1GB)
Values are approximate, and unconfirmed here. There will be a large performcne variety of GTX 460 cards, as many do not follow reference design!
It is expected that a GTX 475 following the above GF104 architecture will be released in the Autumn - it should have a full complement of 384shaders and use all 8 GPU cores.
Also expected is the release of several so-called ‘low end’ Fermi’s:
In August two cards are expected based on GF106 architecture (GTS450 and possibly GTS455).
Then in September a GF108 cards is due out.
The GF106 and GF108 will bring DX11 and Fermi architecture to mid/low end cards. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
People might want to note that the scarse 1GB version of the GTX 460 are about 10% faster than the 768MB version, going by recent Betas.
Relative to CC2.0 and CC1.3 cards and compared to their Compute Capability and reference GFlops peak, the GTX460 cards (CC2.1) are presently underperforming by approximately 1/3rd. This will likely change in the next few months with new drivers and app refinements. So this ball park correction factor is temporary. |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Boinc GFLOPS ???
I don'tknow where to find this.
Boinc (last Win x64 version) messages give only the peak GFLOPS for my cards:
230 (yes ! two hundreds and thirty) GFLOPS peak each !!!
2 x Gigabyte 9600GT TurboForce NX91T1GHP, 64 shaders 1GB GDDR3
Where may I found Boinc GFlops ? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Boinc Manager (Advanced View), Messages Tab, 13th line down. |
|
|
|
Boinc (last Win x64 version) messages give only the peak GFLOPS
That's the value we're talking about. The word "BOINC" is added to this number because there are different ways to obtain such a number.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
WerkstattSend message
Joined: 23 May 09 Posts: 121 Credit: 335,191,133 RAC: 152,503 Level
Scientific publications
|
Two new cards will work here soon,
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak (768MB)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak (1GB)
Values are approximate, and unconfirmed here. There will be a large performcne variety of GTX 460 cards, as many do not follow reference design!
Hi,
since I plan to replace my GTX260/192 with a GTX460 in autumn, this info is of high interest for me.
Could owners of this card type please post their experience here? Not always is the most expensive card the fastest and the cheapest the slowest.
Regards,
Alexander |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
There is a GTX460 thread here.
This is just a GFlops comparison thread, so we can compare relative card performances, and discuss things such as Compute Capability and Correction Factors. When the GTX460 becomes optimised I will rebuild a table of recommended cards and their relative performances, including correction factors.
|
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Boinc Manager (Advanced View), Messages Tab, 13th line down.
That's exactly what I said : Under Win64 you only get peak GFLOFS , without rhe word Boinc)
So , for a Gigabyte 9600GT NX96T1GHP Rev 3.0 , Boinc says 230 Gflops .
Cheers. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
jlhal, I was just confirming that the place to read the Boinc value, is within Boinc. This is different to NVidia's theoretical Shader Processing Rate of 312 GigaFlops, which is not an applicable reference to go by when comparing cards for crunching here. Also of note on that line is the Compute Capability, which calls for a correction factor when comparing cards from different generations. The Operating System, CPU and configurations in place contribute to performance too. So the Boinc GFlops rating acts as a raw guide, and when combined with the Compute Capability gives a more accurate performance picture.
Take this card for example,
08/08/2010 23:37:44 NVIDIA GPU 3: GeForce GT 240 (driver version 25896, CUDA version 3010, compute capability 1.2, 475MB, 307 GFLOPS peak)
It is overcloced and has a peak GFlops rating (according to Boinc) of 307, but it also has a Compute Capability (CC) of 1.2. This presently means it tends to perform about 30% faster than an equal card with a CC of 1.1.
CC1.3 cards are slightly faster again, but not much. The Fermi cards are all CC 2.0 or 2.1 (so far), but these cards can still be further optimized for crunching, so we will look at the Correction Factors again, hopefully in the autumn. The purpose is to give people a clear picture of card performance across the NVidia range and allow people to make better informed decisions as to which card they select WRT crunching here.
|
|
|
|
This is different to NVidia's theoretical Shader Processing Rate of 312 GigaFlops, which is not an applicable reference to go by when comparing cards for crunching here.
[provocative]Not any less useful than the BOINC rating without correction factors, isn't it?[/provocative]
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I looked at 4 cards, all on Win XP Pro and all crunching TONI_CAPBIND tasks, to re-evaluate/confirm the Compute Capable correction factors.
CC1.1
9600GT (234 Boinc GFlops peak)
h232f99r449-TONI_CAPBINDsp2-50-100-RND6951_0 (RunTime 60600, points 6,803.41) Host
CC1.2
GT240 (307 Boinc GFlops peak)
h232f99r516-TONI_CAPBINDsp2-50-100-RND8103_0 (RunTime 34981, points 6,803.41 Host
CC1.3
GTX260 (659 Boinc GFlops peak)
h232f99r91-TONI_CAPBINDsp2-64-100-RND9447_0 (RunTime 15081, points 6,803.41) Host
CC2.0
GTX470 (1261 Boinc GFlops peak)
f192r291-TONI_CAPBINDsp1-62-100-RND4969_1 (RunTime 8175, points 6,803.41) Host
CC1.1
9600GT (86400/60600)*6803.415=9699 (Average Credits per Day running these tasks)
9699/230=42.20
CC Correction Factor = 42.20/42.20=1.00
CC1.2
GT240 (86400/34981)*6803.415=16804
9699/307=54.74
CC Correction Factor = 54.74/42.20=1.30
CC1.3
GTX260 (86400/15081)*6803.41=38976
38976/659=59.14
CC Correction Factor = 59.14/42.20=1.40
CC2.0
GTX470 (86400/8175)*6803.41=71904
71904/1261=57.02 (35.5%)
CC Correction Factor = 57.02/42.20=1.35
CC2.1
GTX460 and GTS450
CC Correction Factor = roughly 0.90
Comparison of Optimized and Recommended Cards, with CC Correction Factors (in brackets):
GT 220 GT216 40nm Compute Capable 1.2 128 BoincGFlops peak (173)
GT 240 GT215 40nm Compute Capable 1.2 257 BoincGFlops peak (347)
GTX 260-216 GT200b 55nm Compute Capable 1.3 596 BoincGFlops peak (834)
GTX 275 GT200b 55nm Compute Capable 1.3 674 BoincGFlops peak (934)
GTX 285 GT200b 55nm Compute Capable 1.3 695 BoincGFlops peak (973)
GTX 295 GT200b 55nm Compute Capable 1.3 1192 BoincGFlops peak (1669)
GTX 480 GF100 40nm Compute Capable 2.0 1345 BoincGFlops peak (1816)
GTX 470 GF100 40nm Compute Capable 2.0 1089 BoincGFlops peak (1470)
GTX 465 GF100 40nm Compute Capable 2.0 855 BoincGFlops peak (1154)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak 768MB (816)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak 1GB (816)
Only Reference specs listed. The two GTX 460 cards are Recommended, but the applications are not fully cabable of supporting these cards, hence the low correction factor. The limitations of this table are accuracy and lifetime; comparable but different systems (CPUs) were used, not all cards used the same drivers, only one task type was looked at, and only the Fermi ran the v6.11 app. 6.13 for the GTX460 and GTS450 |
|
|
|
Thanks for the effort, looks good to me!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Hi there. my boinc and gpu setup reports this
11/21/2010 4:12:53 PM NVIDIA GPU 0: GeForce 9600 GT (driver version 26306, CUDA version 3020, compute capability 1.1, 512MB, 208 GFLOPS peak)
It's now november 2010, so hopefully I can get a new nvidia with a higher number of "GFLOPS peek |
|
|
|
GeForce GTX 580 (driver version 26309, CUDA version 3020, compute capability 2.0, 1536MB, 1741 GFLOPS peak)
It's OC'ed to 850MHz |
|
|
|
On standard clock rate:
GeForce GTX 580 (driver version 26309, CUDA version 3020, compute capability 2.0, 1536MB, 1581 GFLOPS peak)
|
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
GTX570 (standard version, default clocks)
GeForce GTX 570 (driver version 26309, CUDA version 3020, compute capability 2.0, 1248MB, 1405 GFLOPS peak) |
|
|
cavehSend message
Joined: 3 Oct 10 Posts: 2 Credit: 6,322 RAC: 0 Level
Scientific publications
|
EVGA GTX 570 superclocked (012-P3-1572-AR) with factory clocking:
GPU 797
memory 975
shader 1594
driver 26309, 1248MB, 1530 GFLOPS peak.
|
|
|
|
Finally I received the two Asus GTX 580 boards I had ordered beginning of December. I have just installed them and they are running now on stock speeds, no OC. We will see how they impact the RAC. They run on desktops with 980X CPU running at 4.2 Ghz.
Driver 266.58
Processor clock: 1'564 Mhz
Graphic clock: 782 Mhz
Memory clock : 2'004 Mhz x2 (4'008 Mhz)
Memory: 1.5 Gb DDR5
Gflop: 1'602 peak.
|
|
|
|
I've got a question...
I've got an XUbuntu Linux i7-860 with a GTS-250. Boinc runs on it w/ access to 7 threads (pseudo cores, HT quad) because the machine has some other server like functions it needs to do so a 'core/thread' is left open.
Then for totally non-BOINC reasons I was testing some low-end vid cards in the 2nd, empty PCIx slot of this machine last night. The cards being tested were all under $50 and under 50w. 8400GS, GeForce 210 and a GT-430.
I noticed that BOINC recognized and would run PrimeGrid & Collatz on the GT430 but with low-power or non-3d clocks being reported. I upgraded the driver today from a 195.36.xx out of the repositories to a 270.18.xx beta driver from a PPA and the GT-430 now reports proper clocks. I re-enabled GPUGrid on this machine and the GTS-250 is working on a WU that it looks like it might finish by the deadline ~5 days away.
Finally to the question... Is there a chance in he** of the GT-430 ever starting & completing a GPUGrid WU within the deadline with it's BOINC reported 179 GFLOP rating? If so, I can leave GPUGrid on this machine. The little, low profile GT-430 is just filling an otherwise empty PCIx slot and since it's so small/low it doesn't cause any temp problems to the GTS-250 behind it.
Thanx, Skip
BOINC log startup entries:
Sat 26 Feb 2011 03:42:42 PM CST Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz [Family 6 Model 30 Stepping 5]
Sat 26 Feb 2011 03:42:42 PM CST Processor: 8.00 MB cache
Sat 26 Feb 2011 03:42:42 PM CST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monito
Sat 26 Feb 2011 03:42:42 PM CST OS: Linux: 2.6.32-28-generic
Sat 26 Feb 2011 03:42:42 PM CST Memory: 3.83 GB physical, 2.27 GB virtual
Sat 26 Feb 2011 03:42:42 PM CST Disk: 14.76 GB total, 11.05 GB free
Sat 26 Feb 2011 03:42:42 PM CST Local time is UTC -6 hours
Sat 26 Feb 2011 03:42:42 PM CST NVIDIA GPU 0: GeForce GTS 250 (driver version unknown, CUDA version 4000, compute capability 1.1, 1023MB, 470 GFLOPS peak)
Sat 26 Feb 2011 03:42:42 PM CST NVIDIA GPU 1: GeForce GT 430 (driver version unknown, CUDA version 4000, compute capability 2.1, 512MB, 179 GFLOPS peak)
____________
- da shu @ HeliOS,
"A child's exposure to technology should never be predicated on an ability to afford it." |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
If I was you I would pull the apparently unreliable and slow 150W GTS250 and test the 49W (if non-OEM) GT430 to find out for sure. The other cards would be of no use. Post back how you get on - it would be good to know either way.
179 GFlops peak:
In theory, a GT430 is supposed to be 268.8 GFlops peak, but because it's CC2.1 it's more likely to behave as if it has 64 cuda cores, at GPUGrid, which would give you 179GFlops peak - Boinc must be reading this from the new drivers.
The actual performance of your GT430 card running the 6.13app will be largely down to your Linux setup/configuration; only trying it will tell. I know that a GT240 outperforms a GTS250 at GPUGrid running 6.12apps (and previous apps), and that many GTS250's are unreliable, but I also know that the GT240 takes a huge performance hit when trying to run the 6.13app (I expect this is the case for CC1.1 cards as well). Hopefully the GT430 will not see this hit, being a Fermi. So, my guess is that it could complete an average task inside 2days, if the system was optimized as best as possible to run GPUGrid tasks. |
|
|
|
So much has changed - give it try and report back :)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
So much has changed - give it try and report back :)
MrS
Errors out right away...
http://www.gpugrid.net/result.php?resultid=375025
____________
- da shu @ HeliOS,
"A child's exposure to technology should never be predicated on an ability to afford it." |
|
|
|
On standard clock rate (607MHz/1215Mhz/1707MHz)
GeForce GTX 590 (driver version 26791, CUDA version 3020, compute capability 2.0, 1536MB, 1244 GFLOPS peak)
There are two of these on one card of course. :) |
|
|
|
what about a solution like GTX295 + GTX480 ? I already own the 480 and i can buy the gtx295 for a very low price... |
|
|
|
I'm not sure if that will require waiting for a major change in BOINC - the ability to set up a separate job queue for each of multiple unlike GPUs, and the abilty of the BOINC clients to make separate job requests for each queue. There are so few people with each configuration of multiple unlike GPUs on the same computer that I don't consider it likely that the GPUGRID software will be changed to able have a workunit use two unlike GPUs at once. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Already works with 6.13. |
|
|
|
ok thank you :) |
|
|
|
Nvidia ------- driver ---- CUDA --- compute -- RAM - GFLOPS
Geforce: --- version: - version: - capability: --- MB ----peak:
ION ----------- 26658 --- 3020 ------- 1,2 -------- 412 ----- 17
8600 GS ---- 26658 --- 3020 ------- 1,1 -------- 500 ----- 38
GT 430 ------ 26658 --- 3020 ------- 2,1 -------- 993 ---- 201
GTX 260 ---- 26658 --- 3020 ------- 1,3 -------- 896 ---- 477
GTX 560 Ti - 26666 --- 3020 ------- 2,1 ------ 1024 ---- 901
ION and 8600 GS: to slow for GPUGrid
GTX 260 with 192 Shader: don´t go.
____________
[SG-ATA]Rolf, formerly known as [SG]Rolf, also known as
[NL-ATA]Rolf, formerly known as [Nordlichter]Rolf, also known as
[FT/ TL-ATA]Rolf, formerly known as [FT/ TL]Rolf, also known as
[P3D-ATA]Rolf, formerly known as [P3D]Rolf |
|
|
|
So much to go through. Didn't know if you had these official BOINC speeds posted for the GTX 460. I have the 1GB model at 605 GFLOPS peak and the 768MB (Superclock from EVGA) at 684GFLOPS peak. I hope this helps. Let me know if you need/want more info.
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The update is just to include the more recent cards.
Relative Comparison of Recommended Cards, with approximated CC Correction Factor values (in brackets):
GTX 590 GF110 40nm Compute Capable 2.0 2488 BoincGFlops peak (3359)
GTX 580 GF110 40nm Compute Capable 2.0 1581 BoincGFlops peak (2134)
GTX 570 GF110 40nm Compute Capable 2.0 1405 BoincGFlops peak (1896)
GTX 480 GF100 40nm Compute Capable 2.0 1345 BoincGFlops peak (1816)
GTX 295 GT200b 55nm Compute Capable 1.3 1192 BoincGFlops peak (1669)
GTX 470 GF100 40nm Compute Capable 2.0 1089 BoincGFlops peak (1470)
GTX 465 GF100 40nm Compute Capable 2.0 855 BoincGFlops peak (1154)
GTX 560 GF114 40nm Compute Capable 2.1 1263 BoincGFlops peak (1136)
GTX 285 GT200b 55nm Compute Capable 1.3 695 BoincGFlops peak (973)
GTX 275 GT200b 55nm Compute Capable 1.3 674 BoincGFlops peak (934)
GTX 260-216 GT200b 55nm Compute Capable 1.3 596 BoincGFlops peak (834)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak 768MB (816)
GTX 460 GF104 40nm Compute Capable 2.1 907 BoincGFlops peak 1GB (816)
GTX 550 GF116 40nm Compute Capable 2.1 691 BoincGFlops peak (622)
GTS 450 GF106 40nm Compute Capable 2.1 601 BoincGFlops peak (541)
This update is based on the previous table produced, and is by performance, highest first.
I have only included CC1.3, CC2.0 and CC2.1 cards, but the comparison is originally based on CC1.1 cards.
Only Reference specs listed and only for optimized cards by the recommended methods. As usual, there are accuracy limitations and it’s lifetime is limited by the apps/drivers in use. New cards were just added, rather than a new survey made. Comparable but different systems (CPUs) were used, not all cards used the same drivers, only one task type was looked at. Some of these comparisons are adapted from when we used the 6.11app but the correction factors are still valid.
At some stage I will try to look at these cards again, running the long tasks 6.13, to produce a new relative performance table.
Correction Factors Used
CC1.1 = 1.00
CC1.2 = 1.30
CC1.3 = 1.40
CC2.0 = 1.35
CC2.1 = 0.90
Thanks for the posts, |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
1st system Win 7 Pro x64:
Gigabyte GTX460 1GB
GeForce GTX 460 (driver version 27061, CUDA version 4000, compute capability 2.1, 962MB , 641 GFLOPS peak)
2nd system Xubuntu 11.04 AMD64:
Gigabyte GTX460SO 1GB
GeForce GTX 460 (driver version unknown , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak)
____________
Lubuntu 16.04.1 LTS x64 |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
1st system Win 7 Pro x64:
Gigabyte GTX460 1GB
GeForce GTX 460 (driver version 27061, CUDA version 4000, compute capability 2.1, 962MB , 641 GFLOPS peak)
2nd system Xubuntu 11.04 AMD64:
Gigabyte GTX460SO 1GB
GeForce GTX 460 (driver version unknown , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak)
Minor precisions:
1st system Win 7 Pro x64: Gigabyte GTX460OC 1GB
GeForce GTX 460OC(driver version 2.70.61, CUDA version 4000, compute capability 2.1, 962MB (?) , 641 GFLOPS peak)
2nd system Xubuntu 11.04 AMD64: Gigabyte GTX460SO 1GB
GeForce GTX 460 SO(driver version 270.41.06 , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak)
____________
Lubuntu 16.04.1 LTS x64 |
|
|
|
I suspect that the 962 MB is 1 GB minus whatever Windows 7 has reserved for its own use. |
|
|
|
I run GPUgrid on a GT 430 under Ubuntu 10.04.2. WUs take about 32 hrs. so there's no problem getting them returned in time.
Thu 02 Jun 2011 07:48:48 AM CDT NVIDIA GPU 0: GeForce GT 430 (driver version unknown, CUDA version 4000, compute capability 2.1, 1023MB, 45 GFLOPS peak)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Hi Kate,
45 GFlops peak for a GT430 does not sound right, more like your Ion ;)
A GT430 should be reporting as about 180 GFlops peak, and that's after considering it can't use all the cuda cores, otherwise it would be about 270.
Perhaps you are using a very old version of Boinc (for Linux systems we cannot see the Boinc version)? |
|
|
|
Hi skgiven,
I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation.
I noticed on an earlier message in this thread that somebody was reporting 179 GFLOPs for this card. I had been startled when I first ran BOINC on it and saw the 45 GFLOPs (you're right -- not much different from my ION 2, which shows 39). The GT 430 runs GPUgrid WUs a little over 3 times as fast as the ION 2, so there is indeed something wrong with the GFLOPs calculation for the 430. It's slow -- but not THAT slow.
In any case, I'm pretty happy with the GT 430 for my purposes. It's not going to set any speed records, but it runs cool (nvidia-smi is showing it at 54 C right now while working at 98% capacity on a GPUgrid WU), and doesn't draw much power. It's double-precision, so I can run Milky Way. And it's just fine for testing and debugging CUDA code for Fermi cards.
Kate |
|
|
|
I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation.
Kate
Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available.
Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way. |
|
|
|
Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available.
Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way.
Thanks for the clear explanation. So BOINC v6.10.17 calculates correctly for the ION (which does have only 8 cores per MP) but not for the GT 430. Well, this gives me a reason (if only cosmetic) to upgrade to 6.10.58. |
|
|
|
It's double-precision, so I can run Milky Way.
You can - just don't do it, if you can help it ;)
Your "not-so-fast" card is only 1/12th its sp performance under dp. Any ATI is going to walk all over it. I think GPU-Grid and Einstein can make much better us of this (sp) power.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
Fri 03 Jun 2011 02:34:07 PM MDT Starting BOINC client version 6.12.26 for x86_64-pc-linux-gnu
.
.
.
Fri 03 Jun 2011 02:34:07 PM MDT NVIDIA GPU 0: GeForce GTX 570 (driver version unknown, CUDA version 3020, compute capability 2.0, 1279MB, 1425 GFLOPS peak)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The update is just to add three cards (GTX 560 448 Ti, GTX 560 and GT 545)
Relative Comparison of Recommended Cards, with approximated CC Correction Factor values (in brackets):
GTX 590 GF110 40nm Compute Capable 2.0 2488 GFlops peak (3359)
GTX 580 GF110 40nm Compute Capable 2.0 1581 GFlops peak (2134)
GTX 570 GF110 40nm Compute Capable 2.0 1405 GFlops peak (1896)
GTX 560 Ti 448 GF110 40nm Compute Capable 2.0 1311 GFlops peak (1770)
GTX 480 GF100 40nm Compute Capable 2.0 1345 GFlops peak (1816)
GTX 295 GT200b 55nm Compute Capable 1.3 1192 GFlops peak (1669)
GTX 470 GF100 40nm Compute Capable 2.0 1089 GFlops peak (1470)
GTX 465 GF100 40nm Compute Capable 2.0 855 GFlops peak (1154)
GTX 560 Ti GF114 40nm Compute Capable 2.1 1263 GFlops peak (1136)
GTX 285 GT200b 55nm Compute Capable 1.3 695 GFlops peak (973)
GTX 560 GF114 40nm Compute Capable 2.1 1075 GFlops peak (967)
GTX 275 GT200b 55nm Compute Capable 1.3 674 GFlops peak (934)
GTX 260-216 GT200b 55nm Compute Capable 1.3 596 GFlops peak (834)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 768MB (816)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 1GB (816)
GTX 550 Ti GF116 40nm Compute Capable 2.1 691 GFlops peak (622)
GTS 450 GF106 40nm Compute Capable 2.1 601 GFlops peak (541)
GT 545 GF116 40nm Compute Capable 2.1 501 GFlops peak (451)
This update is based on the previous table, and is by performance after considering compute capability, highest first. I have only included CC1.3, CC2.0 and CC2.1 cards, but the comparison was originally based on CC1.1 cards.
Only Reference clocks are listed and only for optimized cards by the recommended methods. As usual, there are accuracy limitations and it’s lifetime is limited by the apps/drivers in use. New cards were added based on GFlops peak and CC, rather than resurveying different cards (demonstrated to be reliable). Comparable but different systems (CPUs) were used, not all cards used the same drivers, only one task type was looked at. Some of these comparisons are adapted from when we used the 6.11app but the correction factors are still valid.
When a new app is released I will review these cards/values for consistency; normally the cards relative performances don't change too much, but on at least 2 occasions things did change in the past. For CC2.0 and CC2.1 it's less likely. When the next generation of NVidia turns up things will probably start changing, and obviously an ATI app would add a few cards.
Correction Factors Used
CC1.1 = 1.00
CC1.2 = 1.30
CC1.3 = 1.40
CC2.0 = 1.35
CC2.1 = 0.90
Thanks for the posts,
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
ATI means only GCN 1D Cards?
____________
|
|
|
|
ATI is not clear yet, since the development of the app is not finished.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
nenymSend message
Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level
Scientific publications
|
I did a comparation of longrun tasks crunched by GTX560Ti CC2.1 (host ID 31329, 875GHz) and GTX570 (host ID 101638, 730GHz). Host with GTX570 is running Linux, all cores free for GPUGRID, SWAN_SYNC set. Host with GTX560Ti is running Win XP x64, no CPU core free, Swan_sync not used.
.................................Run time................theoretical RAC
task type...........GTX560i...GTX570......GTX560i...GTX570..Ratio (Run time)
PAOLA...............61 500......35 600.......126 400.....219 000.....1.74
NATHAN_CB1....25 700......14 700.......120 400.....210 500.....1.75
GIANNI...............54 200......30 000.......116 200.....201 000.....1.8
NATHAN_FAX4..74 800.......34 000........82 470.....181 440.....2.2
What to think about it? PAOLA, NATHAN_CB1 and GIANNI tasks uses all CUDA cores of GTX560Ti CC2.1 or anything different? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GTX560i has 384shaders of which 256 are usable by GPUGrid, due to the science requirements/projects app and GPU architecture.
A reference GTX560i has a GPU @ 822MHz
A reference GTX570 has a GPU at 732MHz and 480 shaders.
480*732/256*822=1.67. Your slightly higher ratio can be explained by SWAN_SYNC being used on Linux.
All this has been known for some time, hence my table with Correction factors: Correction Factors Used
CC1.1 = 1.00
CC1.2 = 1.30
CC1.3 = 1.40
CC2.0 = 1.35
CC2.1 = 0.90
1.35/0.9=1.5=2/3 of the shaders.
Take GPU frequency into the equation and you can see it all adds up:
822/732=1.12, 1.5*1.12=1.68
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
|
Is it time for another update of this table, especially with a new batch of cards out to replace most of the 200 series cards (increasingly hard to find) |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GeForce 600 series cards are about 60% more efficient in terms of performance per Watt, compared to the GeForce 500 series.
There are several issues.
The range is quite limited - we only have 3 high end cards (690, 680 and 670). These are all CC3.0 and all expensive.
At the low end there is the GT 640 (a recent release), and some GK OEM cards.
We are still lacking the mid range cards, which tend to make up a large portion of cards here. When they turn up we might be in a position to start comparing performances and price.
The present cards have a variety of clock rates, making relative comparisons somewhat vague, and I cannot see the GPU freq. as this is not reported in the tasks 'Stderr output' file when run with the 4.2 app, and the 600 cards can't run tasks on the 3.1app.
The old issue of system setup is still there, but now has an additional dimension; the same GPU in a low end system with a low clock CPU is going to under perform compared to a well optimized system with a high end CPU. Now we also have to consider PCIE3 vs PCIE2 performance. Thus far this has not been done for here (anyone with a PCIE3 system post up PCIE3 vs PCIE2 performances).
There is also the issue of so many different task types. The performance of each varies when comparing the 500 to 600 series cards. I think it's likely that GPUGrid will continue to run many experiments, so there will continue to be a variety of tasks with different performances. This will make a single table more difficult and may necessitate maintenance for accuracy.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
In the past there hasn't been much of a performance difference between PCIe 2 at 16x and 8x lanes, so I suspect 16 lanes at PCIe 2 or 3 won't matter much either. We'd "need" significantly faster GPUs (or smaller WUs) to make the difference count more.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Going back to the GTX200 series, that was the unmeasured speculation - x16 vs x8 didn't matter. With the Fermi cards it was measured. Zoltan said he saw a ~10% drop in performance from PCIE2 x16 to x8 on a GTX480, for some tasks. For x4 vs x16 the drop was more like 20%. ref. I think I eventually came to a similar conclusion and did some measurements.
With a new app, new performances, and new cards it's worth a revisit. That said, present tasks utilize the GPU more on the new app. I think GPU memory might be better utilized for some tasks and generally speaking the CPU is used less. While tasks will change, this suggest less importance for PCIE at present. How much that offsets the increase in GF600 GPU performance remains to be seen/measured.
What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)?
It could be limited by the motherboard as well, but according to ark.intel.com, both Ivy Bridge CPU line, the socket 1155 CPUs and the socket 2011 CPUs have only one PCIe 3.0 x16 lane, configurable in three variations: 1x16, 2x8, 1x8 & 2x4. There are simply not enough pins on the CPU socket to support more PCIe lanes. So if a MB had two (or more) real PCIe 3.0 x16 connectors, it would have to have one (or more) PCIe 3.0 x16 bridge chip (like the one the GTX 690 has on board). I don't know how the AMD CPUs support PCIe 3.0, but I think they are the same. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I thought 2011 had 40 PCIE3 lanes? That should be enough for 2*16. I know 1155 can only support 1GPU@x16 or two @x8, even if it is PCIE3 (by Bios hack/upgrade), though the on die IB effort should be faster.
AMD CPU's don't support PCIE3 at all! Only their GPU's are PCIE3.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Sorry, I've put (and I've taken in consideration) a bad link in my previous post:
The Intel Core i7-3770K is actually socket 1155 too, so it has only one PCIe 3.0 x16 lane. That's my fault.
Socket 2011 CPUs (Core i7-3820, Core i7 3930K, Core i7-3960X) have two x16, and one x8 lane.
Here is a PDF explaining Socket 2011. See page 10. |
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
Unfortunately 2011 is no longer (with SB) officially PCIe 3.0. Apparently there's a problem with the timing in the controller between different batches of chips.
What this means is that it's all about luck whether or not your CPU + GPU will accept the hack or not. This has been confirmed by NVIDIA.
Would post the link but their forums have been down for sometime now for maintenance. |
|
|
|
Is it time to add the 600 series cards to this list? 690s are EXPENSIVE, and difficult to find in stock at some dealers. 680s and 670s seem to be readily available and folks are reporting success in running all these cards on GPUGrid.
As we have discussed in private messages, the price difference between a used GTX480 at about $200 and a new GTX670 at about $400 and a new GTX 680 at about $500 are extremely expensive upgrades that until we can quantify the performance increase, MAY NOT SEEM WORTH THE PRICE. |
|
|
|
Ok I went back and saw your earlier reply to this same question, and somewhat understand your reluctance to generate and maintain the data as there are several other factors involved other than simply the GPU card.
Still an effort at this with the caveat of PCIE3.0 vs PCIE2.0, older versus newer machine, relative CPU performance levels, .. all impacting the results plus and minus could benefit everyone who is considering an upgrade. |
|
|
|
Why not list the GTX 6nn data available currently, but clearly mark it preliminary? That should at least be better than no data at all. |
|
|
JStateson Send message
Joined: 31 Oct 08 Posts: 186 Credit: 3,400,138,164 RAC: 866,012 Level
Scientific publications
|
The update is just to add three cards (GTX 560 448 Ti, GTX 560 and GT 545)
Relative Comparison of Recommended Cards, with approximated CC Correction Factor values (in brackets):
GTX 590 GF110 40nm Compute Capable 2.0 2488 GFlops peak (3359)
GTX 580 GF110 40nm Compute Capable 2.0 1581 GFlops peak (2134)
GTX 570 GF110 40nm Compute Capable 2.0 1405 GFlops peak (1896)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 768MB (816)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 1GB (816)
If I understand the above, then would it be fair to state that a dual gtx460 like this would not be as efficient as a single gtx570?
ie: 816 x 2 is less than 1896.
So if the gtx460 has 336 cores then only 256 cores are used in gpugrid. The gtx570 has 480 cores. Is all 480 used? why is there a statement in this thread that only 256 cores can be used. Both eVga boards are about the same price. I was thinking about getting that dualie 460, but not if the 570 is clearly better. By better I means 2x as fast according to your cc corrected performance.
How do your statistics compare to prime or milkyway?
thanks for looking! |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A single reference GTX570 was around 16% faster than two reference GTX460's, when I did the table. The situation hasn't changed much, though we are using a different app, and running different tasks now.
I don't like the look of that extra long bespoke GTX460 dualie; it would be restricted to EATX cases and the like.
As far as I know, none of the other projects, including PG and MW, suffer from the same issues with CC2.1 cards. ATI/AMD cards are much better for MW anyway.
A GTX570 uses all 480cores here (CC2.0). If a GTX670 is too expensive wait a couple of weeks and get a GTX660Ti if it checks out. Even though the GTX570 is a good card and you might be able to get it for a good price, in a couple of weeks the GTX660Ti should prove to be a better card in terms of performance per outlay and running cost.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
JStateson Send message
Joined: 31 Oct 08 Posts: 186 Credit: 3,400,138,164 RAC: 866,012 Level
Scientific publications
|
A GTX570 uses all 480cores here (CC2.0). If a GTX670 is too expensive wait a couple of weeks and get a GTX660Ti if it checks out.
I had a gtx670 for 24 hours before passing it to my son. It would not run DVDFab's bluray copy program unlike the gtx570 or the gtx460 or any earlier CUDA boards. The nVidia forum has been down for almost a month so I am in the dark as to why it would not work and DVDFab had no time frame for getting their product to work with kepler. I have a single slot "Galaxy Razor" gtx460 that overheats badly in an EATX case even after underclocking it all the way down. eVga's dualie is going for 230 after rebate so I may go for it.
Thanks! |
|
|
|
I have a single slot "Galaxy Razor" gtx460 that overheats badly in an EATX case even after underclocking it all the way down. eVga's dualie is going for 230 after rebate so I may go for it.
If your single GPU GTX 460 overheats, a dual GPU GTX 460 will overheat even faster, because both cards exhaust the hot air inside the case and the latter one has higher TDP. I think DVDFab will support the GTX 6xx very soon, so don't buy outdated technology for crunching (or you'll pay the price difference for electricity costs). |
|
|
JStateson Send message
Joined: 31 Oct 08 Posts: 186 Credit: 3,400,138,164 RAC: 866,012 Level
Scientific publications
|
single slot gpu is not same as a single gpu. The "Galaxy Razor" takes up exactly 1 PCIe slot and its fan simply blows air about the inside of the case. The dual slot ones exhaust the air thru the 2nd slot (usually but not always) and have much more efficient cooling.
I once saw my galaxy razor selling for 4x what I paid for it on ebay. It is also possible to plug it into a notebook's "express card" slot as shown here. It works fine for gameing but 24/7 cuda is a problem with prime grid even at normal clock speed.
The 2x one I may replace it with will put out more heat as you say, but I suspect it has more efficient cooling.
I do have a 3 slot gtx570, made by Asus, and it runs really cool 24/7 on any boinc project. However, it does take 3 slots.
I have an extensive collection of movies, mostly bluray, and the nVidia gpu's make a significient difference in copy speed. |
|
|
|
single slot gpu is not same as a single gpu.
I know.
The dual slot ones exhaust the air thru the 2nd slot (usually but not always) and have much more efficient cooling.
As you say, this is true for the radial fan type cooling like the EVGA GeForce GTX 570's, wich blows the air in line with the PCB, directly out from the case. Unlike the EVGA GeForce GTX 460 2Win has 3 axial fans blowing air through the heatsink's fins at right angles to the PCB. In addition it has very limited rear exhaust grille. (it's more than what the single slot has, but you need positive air pressure inside your case to move the hot air through that little grille, because the 3 fans do not move the air towards this grille)
The 2x one I may replace it with will put out more heat as you say, but I suspect it has more efficient cooling.
Of course the dual slot cooler of the GPU removes the heat more efficiently from the chip than a single slot cooler, but not from the case (if it isn't radial type). After a short time it can't cool itself with the hot air emitted into the case. You have to have fans blowing cool air from the outside directly to the 3 cooler of this GPU to make them work well, because they don't move the hot air out of the case. It will also make yor CPU and PSU run hotter.
I do have a 3 slot gtx570, made by Asus, and it runs really cool 24/7 on any boinc project. However, it does take 3 slots.
This should be the ENGTX570 DCII, which has far better cooler fin surface per power ratio, and it has 3 times larger rear air exhaust grille than the EVGA GeForce GTX 460 2Win. |
|
|
|
Don't buy GTX460 for GPU-Grid now, it's simply not efficient enough any more for 24/7 crunching. I'd rather think about replacing the cooler on your current card with some aftermarket design. That will probably also use 3 slots, but be strong & quiet.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
M_MSend message
Joined: 11 Nov 10 Posts: 9 Credit: 53,476,066 RAC: 0 Level
Scientific publications
|
Currently, best price/performance/watt would be GTX660Ti, which has same 1344 shaders as GTX670 but is around 20% cheaper... |
|
|
|
would the lower bandwidth of the 660ti (144Gb/s, 192bit)
compared to the (198Gb/s, 256 bit) on the 670 make a difference
to the performance on GPUGrid?
|
|
|
|
After a short time it can't cool itself with the hot air emitted into the case. You have to have fans blowing cool air from the outside directly to the 3 cooler of this GPU to make them work well, because they don't move the hot air out of the case. It will also make yor CPU and PSU run hotter.
Cases are dumb... http://www.skipsjunk.net/gallery/texas12.html
____________
- da shu @ HeliOS,
"A child's exposure to technology should never be predicated on an ability to afford it." |
|
|
|
So far GPU-Grid hasn't depended much on memory bandwidth, so the performance difference should be a few percent at most, probably less. I don't have hard numbers at hand, though.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Currently, best price/performance/watt would be GTX660Ti, which has same 1344 shaders as GTX670 but is around 20% cheaper...
So in the real world, with everything else being equal, will the gtx 660Ti finish the same type Wu as fast as the gtx 670?
How is that for a dumb and simple question?
|
|
|
|
Currently, best price/performance/watt would be GTX660Ti, which has same 1344 shaders as GTX670 but is around 20% cheaper...
So in the real world, with everything else being equal, will the gtx 660Ti finish the same type Wu as fast as the gtx 670?
How is that for a dumb and simple question?
Not dumb at all. If not the same or faster (the 660Ti is clocked faster than a 670) then it will be very close to it. Especially if you overclock the memory a bit to get some of the bandwidth back.
My understanding is that GPUGrid does not pass that much information over the PCIE lanes, so I would be surprised if a 660Ti ends up being much if any slower than a 670.
http://www.gpugrid.net/result.php?resultid=6043519 - my only result with my 660Ti so far - I note though the CPU time and GPU time are the same, not sure what is going on with that.....?
|
|
|
|
So in the real world, with everything else being equal, will the gtx 660Ti finish the same type Wu as fast as the gtx 670?
A very valid question. But like Simba I'll only go as far as stating "very close".
The problem is finding configurations which are similar enough to judge this. OS, driver, PCIe speed, CPU speed and even other running applications affect the actual runtimes (to a varying degree). And the real elephant in the room: GPU clock speed, which isn't reported by BOINC.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The reporting is also somewhat down to the app; its reported, albeit badly, by 3.1 but not by 4.2:
a 3.1 run,
Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 470"
# Clock rate: 1.36 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO: cannot open file "restart.coor"
# Time per step (avg over 1500000 steps): 22.644 ms
# Approximate elapsed time for entire WU: 33966.235 s
called boinc_finish
</stderr_txt>
]]>
a 4.2 run
Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
MDIO: cannot open file "restart.coor"
# Time per step (avg over 3750000 steps): 7.389 ms
# Approximate elapsed time for entire WU: 27707.531 s
called boinc_finish
</stderr_txt>
]]>
I had a look at Zoltan's GTX670 times, as he reported his clocks, and compared them to my 660Ti times, over several different task types. His system was consistently 9.5% faster. The first thing I would look at here would be XP vs W7; I'm not sure what the difference is now. Use to be >11% better for XP. Assuming that is still the case, and there is no noticeable performance gain from PCIE3 over PCIE 2 x16, then I would say the cards perform very closely here; the GTX670 is about 2% faster after adjusting for the clocks. So the 670 would just about edge it in overall throughput, but the 660Ti uses slightly less power and costs a fair amount less. Of course if XP isn't >11% faster than W7, then the 670 is significantly faster...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Yeah, this makes judging real world performance harder since we seldomly know the real clock speed a GPU is running at.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Don't buy GTX460 for GPU-Grid now, it's simply not efficient enough any more for 24/7 crunching. I'd rather think about replacing the cooler on your current card with some aftermarket design. That will probably also use 3 slots, but be strong & quiet. MrS
Are the GPUGrid apps still only using 2/3 of the GTX 460 shaders, or has that been fixed?
Edit: Must be fixed since the GPU utilization is showing as 95% with 6 CPU WUs running (on an X6). Big improvement. |
|
|
|
Are the GPUGrid apps still only using 2/3 of the GTX 460 shaders, or has that been fixed?
Edit: Must be fixed since the GPU utilization is showing as 95% with 6 CPU WUs running (on an X6). Big improvement.
It can't be fixed by GPUGrid, because it comes from the chip's superscalar architecture (= there are 50% more shaders than shader feeding units on a CC2.1 and CC3.0 chip). The GPU utilization is not equivalent of shader utilization, most likely it is showing the utilization of the shader feeding units. The good news is that you can overclock the shaders (of a CC2.1 card) more, because the power consumption inflicted by the GPUGrid client on the whole shader array is less than the maximum. |
|
|
|
Zoltan is right. Although it could theoretically be fixed by some funny re-writing of the code, the result would likely be slower overall.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Another mobile GK for comparsion:
2.7.2013 1:14:29 | | CUDA: NVIDIA GPU 0: GeForce GTX 560M (driver version 320.49, CUDA version 5.50, compute capability 2.1, 1536MB, 1244MB available, 625 GFLOPS peak)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Firstly note that this set of data is from WUprop and has flaws, but it at least doesn't appear to include erroneous data. It won't be as good as accurate reports from crunchers of their specific cards running apps on similar setups.
Long runs on GeForce 600 series GPU's, Windows app
GPU Computation time (minutes) (min - max) CPU usage (min-max)
NVIDIA GeForce GT 610 10,256.3 (7,321.1-13,191.4) 1.8% (1.3%-2.3%)
NVIDIA GeForce GT 630 4,672.1 (4,672.1-4,672.1) 9.7% (9.7%-9.7%)
NVIDIA GeForce GT 640 2,032.9 (1,713.9-2,839.3) 94.8% (17.1%-99.8%)
NVIDIA GeForce GTX 650 1,725.0 (1,622.7-2,047.0) 99.2% (98.6%-99.6%)
NVIDIA GeForce GTX 650Ti 1,237.7 (518.5-1,914.5) 91.7% (58.8%-99.9%)
NVIDIA GeForce GTX 660 784.6 (352.9-1,045.9) 97.3% (47.6%-100.3%)
NVIDIA GeForce GTX 660Ti 659.5 (312.9-1,348.0) 99.2% (83.0%-102.4%)
NVIDIA GeForce GTX 670 593.9 (455.3-992.8) 98.6% (90.7%-100.2%)
NVIDIA GeForce GTX 680 595.8 (471.4-899.8) 98.4% (80.3%-101.2%)
Long runs on GeForce 500 series cards Windows app
NVIDIA GeForce GTX 550Ti 1,933.7 (1,510.4-2,610.4) 14.8% (3.0%-23.3%)
NVIDIA GeForce GTX 560 1,253.3 (1,090.0-1,820.8) 20.3% (6.0%-27.3%)
NVIDIA GeForce GTX 560Ti 1,001.7 (710.2-2,011.6) 18.4% (6.4%-37.1%)
NVIDIA GeForce GTX 570 870.6 (691.5-1,743.7) 20.2% (5.5%-36.3%)
NVIDIA GeForce GTX 580 711.0 (588.8-1,087.6) 18.8% (9.2%-32.5%)
As this is 'Windows' it includes ~25% XP systems and 75% Vista+W7+W8
The GT 610 and 630 are Fermi, the rest of the 600's are GK.
Long runs on GeForce 500 series GPU's, Linux app
NVIDIA GeForce GTX 570 797.9 (712.8-966.7) 15.7% (8.5%-18.8%)
NVIDIA GeForce GTX 580 351.3 (351.3-351.3) 5.3% (5.3%-5.3%)
Long runs on GeForce 600 series GPU's, Linux app
NVIDIA GeForce GTX 650Ti 1106.2 (986.9-1324.4) 97.7% (94.5%-98.5%)
NVIDIA GeForce GTX 650TiBOOST 774.6 (769.2-780.5) 99.1% (98.8%-99.4%)
NVIDIA GeForce GTX 660 718.5 (651.3-874.1) 89.6% (86.1%-95.1%)
NVIDIA GeForce GTX 660Ti 587.1 (541.2-717.2) 94.9% (90.9%-99.6%)
NVIDIA GeForce GTX 670 533.9 (494.6-639.1) 99.4% (98.7%-99.7%)
NVIDIA GeForce GTX 680 471.9 (450.1-562.4) 98.7% (97.2%-99.5%)
This data will include different types of work from the Long queue. It's probably skewed by the misreporting of GPU's (when there are two GPU's in a system only one is reported but it's reported twice).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Hi, Good work, very interesting, congratulations.
I help in changing GPUs I'm preparing, in view of the results (if I played well) on Linux GTX 580 are the best by a landslide and the best platform for Linux computers. Greetings. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Going by the results below, on Linux a GTX660 is almost as fast as a GTX580 (~3% short), but costs less to buy and costs less to run,
http://www.gpugrid.net/hosts_user.php?sort=expavg_credit&rev=0&show_all=0&userid=25378
GTX580 v GTX660
GTX580:
I92R6-NATHAN_KIDKIXc22_2-0-41-RND9338_0 4565518 4 Jul 2013 | 4:35:14 UTC 4 Jul 2013 | 19:59:48 UTC Completed and validated 38,258.75 2,732.66 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I57R9-NATHAN_KIDKIXc22_2-0-41-RND6129_0 4565155 3 Jul 2013 | 21:57:40 UTC 4 Jul 2013 | 13:01:29 UTC Completed and validated 37,291.86 2,869.94 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I20R7-NATHAN_KIDKIXc22_1-7-41-RND9392_0 4564152 3 Jul 2013 | 11:34:28 UTC 4 Jul 2013 | 2:37:22 UTC Completed and validated 37,936.45 2,874.42 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
GTX660:
I63R3-NATHAN_KIDKIXc22_2-0-41-RND7987_0 4565212 4 Jul 2013 | 3:43:54 UTC 4 Jul 2013 | 19:38:15 UTC Completed and validated 39,537.31 39,500.98 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I38R7-NATHAN_KIDKIXc22_2-0-41-RND4496_0 4564951 3 Jul 2013 | 23:05:39 UTC 4 Jul 2013 | 15:02:38 UTC Completed and validated 39,331.53 39,307.00 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I21R5-NATHAN_KIDKIXc22_2-0-41-RND8033_0 4564771 3 Jul 2013 | 16:44:48 UTC 4 Jul 2013 | 8:38:10 UTC Completed and validated 39,480.50 39,443.40 138,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
The WUProp results below are probably misleading (contain some oddities),
Long runs on GeForce 500 series GPU's, Linux app
NVIDIA GeForce GTX 570 797.9 (712.8-966.7) 15.7% (8.5%-18.8%)
NVIDIA GeForce GTX 580 351.3 (351.3-351.3) 5.3% (5.3%-5.3%)
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Hello: I'm sorry but I do not understand these latest data, I think are changed. those of the GTX 580 does not add up.
On the other hand never considered the more CPU GTX600 series to almost zero frenta the GTX500.
Edit: Now I see that if you add up, thanks |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Yeah, I accidentally listed results from a 650TiBoost on Linux (16% slower) that I was using to check against just in case there was any issue with the results below (as two cards were in use).
I've cleared the list up now, thanks.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Hello: I want to insist on high consumption of CPU (90-100%) of the GTX600 cards against 5-30% CPU of the GTX500.
Wear and secure energy consumption that cancel out the advantage of energy efficiency of the 600 series. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You are certainly correct that a GK600 card will use 1 CPU core/thread while a GF500 will only use ~15% of a CPU core/thread, however the power difference does not cancel itself out.
I've measured this and GF400 and GF500 cards use ~75% of their TDP at GPUGrid while the GK600 cards use ~90%.
So, a GTX580 will use ~244*.75W=183W and a GTX660 will use 140*.9W=126W. The difference is 57W. The difference of my i7-3770K using 6 CPU threads or 7 CPU threads is ~4W to 7W (dependent on the app). So by running a GTX660 you save at least 50W. I'm not belittling the loss of a CPU thread (though it's actually 85% of a thread and perhaps mostly polling) as I know it can be used to crunch CPU projects, however on multicore processors the more of the CPU you use the less of an increase in CPU performance you get (diminished returns).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
SK is right here: especially with HT you're not loosing a full core. Depending on the project that additional logical would only have increased CPU throughput by 15 - 30% of a physical core. Hence the low additional power used.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
So, for potential buyers seeking advice, is it already possible to make a side by side comparison between GK104's and GK110's in points per watt?.
A Titan is roughly 50% faster than a GTX680.
GTX680 has a TDP of 195W and the Titan's TDP is 250W. That said you would really need people measuring their power usage and posting it along with run times and clocks.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
TrotadorSend message
Joined: 25 Mar 12 Posts: 103 Credit: 13,920,977,393 RAC: 1,390,776 Level
Scientific publications
|
A Titan is roughly 50% faster than a GTX680.
GTX680 has a TDP of 195W and the Titan's TDP is 250W. That said you would really need people measuring their power usage and posting it along with run times and clocks.
A GTX690 has a TDP of 300W and it provides around 100% more work than a GTX680. So it would have better point per watt ratio than a Titan, am I correct? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A GTX690 does around 80% more work than a GTX680; each GTX690 core is about 90% as fast as a GTX680.
Assuming it used 300W (or an equal proportion of it compared to other KG104's and the KG110 GPU's) and a GTXTitan used 250W then it's reasonably accurate to say that a GTX690 is overall equally as efficient as a Titan (for crunching for here in terms of performance per Watt).
GTX680 - 100% for 195W (0.53)
Titan - 150% for 250W (0.60)
GTX690 - 180% for 300W (0.60)
If a GTX680 used 195W to do a certain amount of crunching, the Titan or the GTX690 would do the same amount of work for 167W.
Both the Titan and the GTX680 are about 12% more efficient than the GTX680.
To bring in the purchase factor,
In the UK a GTX690 costs the same as a Titan (~£770), but would do 20% more work. A GTX780 on the other hand costs ~£500, but might bring around 90% the performance of the Titan (would need someone to confirm this).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
I've measured our 4xTitan E5 systems at about 1000W (4.5A rms @ 220Vac) under ACEMD load. (GPUs generally don't hit max power when running ACEMD).
MJH |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GK104 GPU's tend to operate at around 90% of their TDP, but there is variation; my GTX660 is presently at 97% TDP and my GTX660Ti is at 88% (both running NOELIA Beta's).
Taking an estimated 100W off for the other components (CPU, motherboard, HDD and RAM) that's 900W at the wall. So with a 90% PSU the GPU's are drawing just over 200W. So more like 80% of their TDP. That's definately a bit less than the GK104's and closer to what the Firmi's use ~75% TDP.
Anyway, the benefit's of the Titan are FP64, faster single GPU performance and better cooling (as in, you could realistically use 4 in the one system).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
flashawkSend message
Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level
Scientific publications
|
On my rigs, 2 GTX680's with an FX-8350 with all 8 cores at 100% pulls 630 watts water cooled. When they were on air, they were using 640 watts.
When I'm using only 2 cores to feed the GPU's the computer used 521 on water and 530 watts on air |
|
|
|
On my rigs, 2 GTX680's with an FX-8350 with all 8 cores at 100% pulls 630 watts water cooled. When they were on air, they were using 640 watts.
When I'm using only 2 cores to feed the GPU's the computer used 521 on water and 530 watts on air
That seems like a lot of power. Is it the 680s or the 8350 that makes it that high. Just curious. |
|
|
flashawkSend message
Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level
Scientific publications
|
The 8350 has a TDP of 125w while the 680 is 195w x2 |
|
|
|
In games Titan typically uses ~200 W, which matches pretty well what we estimated here. GTX680 has a TDP of 195 W, but a power target of 170 W. That's why in games it consumes typically 170 W - the turbo mode boosts just high enough to hit that number, but not any higher. So if your card is cooled well and doesn't hit the maximum turbo bin, you know it's hitting 170 W. If it's not boosting at all it's above 170 W.
@Flashhawk: I hope you've got at least an 80+ Gold PSU in that box? Otherwise such an investment should pay for itself quickly.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
flashawkSend message
Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level
Scientific publications
|
They all have Seasonic 1050 watt Gold PSU's (just upgraded the last two), I haven't had any problems with them yet. The only problem I've had is the 36 hour TDR bug on one machine, I was pretty baffled over that one until I heard about it in another forum. |
|
|
|
It would be appreciated if you guys could update the OP with all the newer cards. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I'm working on it...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
ZarckSend message
Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level
Scientific publications
|
Boinc on other websites there is a performance ranking of GPUs, on GPUGRID why not?
http://setiathome.berkeley.edu/gpu_list.php
http://registro.ibercivis.es/gpu_list.php
http://albert.phys.uwm.edu/gpu_list.php
http://boinc.freerainbowtables.com/distrrtgen/gpu_list.php
http://boinc.thesonntags.com/collatz/gpu_list.php
Etc.
Sur d'autres sites Boinc il y a un classement des performances des GPUs, pourquoi pas sur GPUGRID ?
@+
*_*
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This has been discussed at GPUGrid and privately during the last site revamp.
The GPUGrid site is based on and adapted from a Boinc template, so it has the same template page, http://www.gpugrid.net/gpu_list.php
The problem is the way the performance is calculated, the results are junk. If you used the results they would mislead you into believing some older cards were much better than they are. GPUGrid doesn't link to the page to prevent people being misled.
Anyone buying a GPU should chose from the GK110 cards (GTX Titan, GTX780), or the GK104 GTX700 and GTX600 ranges,
FAQ - Recommended GPUs for GPUGrid crunching - It's up to date.
If you have an older card and it's on the list you can still use it, but we would not recommend buying an older card to specifically use for here.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
In addition to what SK said: look at SETI, there is
6. (0.348) ATI Radeon HD 5500/5600 series (Redwood)
7. (0.345) ATI Radeon HD 5700 series (Juniper)
8. (0.275) ATI Radeon HD 5800 series (Cypress)
Which totally doesn't make sense at all. An HD5870 has about 7.7 times the raw horse power of a HD5550. And most of this difference should translate over to Seti real world performance. So the Seti list is off by a factor of 9.8, an entire order of magnitude. Pure junk. It would be much more useful to just list the maximum single precision flops, as imperfect as they are.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Like anything else, the statistics are only as good as the data put into it.
After seeing this discussion, I went and implemented this page over at PrimeGrid. PrimeGrid diverged from the standard BOINC server many years ago, so updating it with new standard features is more work than it should be. I can't just load the latest and greatest from git.
I used Milkyway's version of this page as the basis for PrimeGrid's (it's the first version I found on Google.) Therefore, I'm not sure if what I found is common to all BOINC projects or just them, but I found two major errors in the way the statistics were produced.
The first was that the first result for each GPU was added to the list differently from all the rest. Its time was divided by the fpops estimate while all the subsequent result times weren't. They all should be divided. The second error -- far more serious -- was that ALL results were included. That includes faulty results, which may return with only a few seconds of elapsed time. That's going to skew the data significantly if a GPU with a bad driver or something similar is returning hundreds of bad results. Changing the DB query to only read validated results solves that problem.
The end result looks very accurate. http://www.primegrid.com/gpu_list.php
You'll notice that on our PPS Sieve application, the Nvidia GPUs are a lot faster than the AMD GPUs. This is an integer calculation, and team green is giving team red a sound thrashing on that application.
The two GFN projects, however, are all double precision floating point, and AMD really shines there. The $300 7970 GPUs run at about the same speed as the $600 GTX TITAN.
The numbers shown in those tables seem to be an accurate representation of actual speed.
This assumes that you have accurate fpops estimates for all workunits. We've got fairly good fpops estimates at PrimeGrid, so the results of this GPU analysis are likely pretty good. If, however, the workunits are such that it's difficult or impossible to predict the fpops you're not going to get a good analysis of the GPU speeds.
This isn't intended to be an argument; I'm merely pointing out that it may be possible for you to produce an accurate version of the table. I wholeheartedly agree that an inaccurate table is worse than useless.
Mike |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Hi Michael and everybody !
Just one silly question : why are bi-gpu (i.e 690 and 590) treated as single gpu .
This is just taking in account the performance of a single chip on the card ?
According to me, may be the performance should be calculated on a "GPU card" basis as a whole ?
Cheers
____________
Lubuntu 16.04.1 LTS x64 |
|
|
|
Hi Michael and everybody !
Just one silly question : why are bi-gpu (i.e 690 and 590) treated as single gpu .
This is just taking in account the performance of a single chip on the card ?
According to me, may be the performance should be calculated on a "GPU card" basis as a whole ?
Cheers
If you're playing a game, multiple GPUs (whether on one card or many cards) can be combined together with SLI or Crossfire to increase game performance. However, for crunching, neither CUDA nor OpenCL provide the ability to combine the computational abilities.
Although in theory it's possible for the application to be specially written to use multiple GPUs simultaneously, it's a lot more complicated -- and sometimes it's close to impossible.
So, from BOINC's perspective, a 690 is no different than a pair of 680. There's no easy way to use it as a single GPU for crunching.
The statistics page is dynamically generated directly from the results database; it doesn't have any knowledge of dual GPUs, overclocking, etc. As far as the server is concerned, a host with a 690 GPU has a pair of single-chip 690s rather than a single dual-chip 690.
|
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
So, from BOINC's perspective, a 690 is no different than a pair of 680.
I agree, this is what we can see in hosts descriptions...
BOINC knows about the card type and they are not so many bi-gpu cards so the performance could be calculated on a card basis.
Of course there is an exception for those using only 1 chip among 2 : odd number of gpu between the brackets in the description.
____________
Lubuntu 16.04.1 LTS x64 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
However, for crunching, neither CUDA nor OpenCL provide the ability to combine the computational abilities.
Although in theory it's possible for the application to be specially written to use multiple GPUs simultaneously, it's a lot more complicated -- and sometimes it's close to impossible.
My understanding is that outside the Boinc environment the ACEMD app can facilitate the use of multiple GPU's for the one WU - but correct me if I'm wrong.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
My understanding is that outside the Boinc environment the ACEMD app can facilitate the use of multiple GPU's for the one WU - but correct me if I'm wrong.
Yes, I remember this as well. But anyhow, I think it makes sense to treat dual GPU cards in the same way as others and leave it to the user to figure out they've got double that throughput if you actually use both chips. It's the same as "BOINC can't tell you how fast your CPU will, unless it knows how many cores you are going to use."
@Michael: thanks for joining the discussion! You data certainly looks far more sane than what we usually see on such pages. Did your bug fixes already find their way back into the BOINC root branch? It certainly sounds like a vast improvement for everyone :)
However, there are still quite a few values in that table which don't make sense: GTX680 faster than Titan and GTX780, GTX660Ti faster than GTX670, GTX760 faster than.. you get the point. From quickly looking at these values there sseems to be about at least 10% noise in the data. As you said "the statistics are only as good as the data put into it". So for comparisons with one chip generation I'd rather revert to maximum SP-Flops and DP-Flops. And when in doubt, maybe because cards differ significantly in memory bandwidth, look for actual user provided data (because here the actual clock speeds etc. are known).
But for comparing different card generations and vendors or to get the broad picture this table is certainly useful! In these cases the maximum Flops are not worth much.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
@Michael: thanks for joining the discussion! You data certainly looks far more sane than what we usually see on such pages. Did your bug fixes already find their way back into the BOINC root branch? It certainly sounds like a vast improvement for everyone :)
Short answer, no. I didn't take it from the BOINC repository, so I'm not at all sure how close what I have is to the latest and greatest. I'd be happy to share if anyone wants to look at it. (I wasn't even aware of this page until I read about it in this thread.)
However, there are still quite a few values in that table which don't make sense: GTX680 faster than Titan and GTX780, GTX660Ti faster than GTX670, GTX760 faster than.. you get the point. From quickly looking at these values there sseems to be about at least 10% noise in the data. As you said "the statistics are only as good as the data put into it". So for comparisons with one chip generation I'd rather revert to maximum SP-Flops and DP-Flops. And when in doubt, maybe because cards differ significantly in memory bandwidth, look for actual user provided data (because here the actual clock speeds etc. are known).
Indeed, I've noticed that too. Some of the unusual results are correct, some are noise. I added a disclaimer up at top that the results should be taken with a grain of salt. Kepler cards (6xx), for example, are horrible on our Genefer app and correctly rank below the 5xx (and maybe 4xx) cards. That's accurate. Some rankings aren't correct. The results are normalized against the fpops estimate and those were adjusted earlier this year. Some tasks take months to validate, so there may be some WUs still in there with bad fpops, and that would artificially inflate the speed of some GPUs. It's not perfect, but it's pretty good. (The three apps where PrimeGrid uses GPUs are either all inteeger matn, or all double precision apps, whereas games use single precision math. The GPUs are designed for gaming, so the speed of the apps in our apps doesn't always correlate to the published and expected speeds.
(I lowered the threshold for minimum data points per GPU so that more GPUs would be included in the list, but that may allow more noise into the data. Perhaps adding a noise filter such as skipping any test that's more than x% away from the average would help too.)
What I find most useful about that page is that it helps answer a fairly common question I get: "How come my brand new laptop is estimating 60 days for XXX?". Most people don't realize how huge a difference there is in GPU speeds, and expect the GT x20 GPU (or worse, GT x20M GPU) to be fast because it's new. On the chart, they can see that it's 20 times slower than the "big boys". They can look at the chart and quickly get a good sense of which ones are fast and which are slow -- and that a "slow" GPU is a VERY slow. |
|
|
|
Another source of error in the data is that BOINC only reports the first GPU if a system contains more than one GPU from the same vendor. Tasks that run on 'hidden' GPUs will count as running on the GPU that is visible. Even more confusing, if the task is suspended and restarted, BOINC will often start it on a different GPU, so the run time will be some combination of the speeds on each CPU.
Also, the run time is gathered from the result table and the GPU type from the host table. If the GPU has been upgraded since the result was returned, the speed will be erroneously attributed to the new GPU.
It's not a perfect analysis. |
|
|
|
Hello: The end result of a task performance when running BOINC is also heavily influenced by the associated CPUs GPUs.
Not the same compare a GTX770 mounted with a FX8350 and GTX780 associated with a Phenom II. |
|
|
|
With all the discussions on card performance, I started collecting data from the tasks page and putting it together for my own personal interest. What are the differences generation to generation, card to card, clock speeds, cuda, etc. The new debugging helped a lot to know which card data came from. So before posting the analysis, here are the conditions:
* Windows Only, Long Tasks Only, No Beta Tasks, No SDOERR Tasks.
* Any task that paused and restarted on the same card type was kept (e.g. GTX680, Pause, Restart GTX680). Even if it was a different physical card. If the task restarted on a different card (e.g. GTX680, Pause, GTX460), it was eliminated.
* Memory size was grouped/changed when they were the "same". So all 1023Mb entries were changed to 1024 to keep the number of combinations down.
* Tried to get at least 30 runs for each card for a fairly valid comparison.
* All Analysis Y axis is GPU Run Time (sec).
So here is overall filtered data set.
As noted before, the 768Mb cards take a hit. It was not just NOELIA runs, the SANTI and NATHAN also had runs which significantly slowed. From this point forward, all 768Mb cards are filtered out. Here is same chart less the 768Mb cads.
While reviewing the data within Compute Capability classes, Device Clock had the biggest impact. The 2.1 cards mix of 4xx and 5xx had odd results. The 3.5 had very little variation as I only pulled from one or two users. Here are the results for just the 2.0 and 3.0 cards.
For looking at interactions, here is just the 3.0 capability cards. This part of the data set I had the most items per different groups so it shows the interactions the best.
Here you can see the strong influence of device clock. The memory width only showed a strong interaction in this group which I am assuming it is due to the 650Ti having a 128bit width and skewing the review. Memory clock has a little impact, the NOELIA's are the longest runs, CUDA version and the ACMEDLong App version had no significant impacts.
Regards,
Jeremy |
|
|
|
What shocked me for the first look at your graphs that the GTX 770 is slower at higher clocks than the GTX 680 (while these two cards are basically the same), so some other factor (CPU, CPU clock, RAM clock, PCIe bandwidth) must be skewing the data they're based on. |
|
|
|
I looked at just the 680 and 770 data separately. The 680 data came from both XP32bit and XP64bit. The 770 data was Win7 64 bit only. That was the only face value difference I could see. One other thing on the Device Clock reporting. So for my GTX460 that I have set to 820MHz
# Device clock : 1640MHz
GPU-Z shows Default at 1520 and actual as 1640.
For my GTX680 that I have set to a +26MHz
# Device clock : 1124MHz
GPU-Z shows Default as 1124, but actual as 1150MHz.
Strange....I had looked at this a few days ago and had thought even on the 680 it was reporting the actual. Guess not.
On top of this, this GTX680 is staying at a Boost of ~1201MHz most of the time.
So I do not know how the GTX770 is performing, but there is definitely some wiggle room in the device clock reporting accuracy.
So my current data set is confounded between Device Clock and OS.
|
|
|
|
Nice stats, you must give the 560ti with 1280 mem to the 2.0 cards because it is a 570 ( in performance nearly too) with some (defective?) deactivaded shadercores.
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A nice overall look at field performance variation.
You did well to remove the GTX460 from the chart - there are actually 5 different versions of the GTX460 with native performance variation of 601 to 1045 GFlops and bandwidth from 86 to 115GB/s.
460 (Jul ’10) 1GB 336:56:32 – let’s call this the basal reference model.
460 SE (Nov ’10) 228 shaders
460 OEM (Oct ‘10) As reference but reduced clocks
460 (Jul ’10) 768MB, 336:56:24
460 v2 (Sep ’11) higher clocks, 336:56:24
So, GPU identification may well be the issue in some cases, and as pointed out, especially the GTX560Ti (384 vs 448, and CC2.1 vs CC2.0).
Last time I looked, and every time for years, there was an ~11% performance difference between Linux or XP and Vista/W7/W8.
If you just look at GTX670's there is a performance range of ~10% between the reference models and the top bespoke cards.
Different system architectures could also influence performance by around the same amount; an average CPU with DDR2 and PCIE 1.1 say at X4 would cause a drop from a top system of around 10% or so.
Then there is downclocking, overclocking, recoverable errors...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
dskagcommunity, That was not confusing at all. :)
I see I had pulled data from two different computers which as you noted had different memory configs also. Just changed the below capability after you pointed it out.
Card_______Mem___Capability__Device clock___Mem clock___Mem width
GTX560Ti___1280___2_________1520_________1700_______320
GTX560Ti___1024___2.1________1644_________2004_______256
http://www.gpugrid.net/result.php?resultid=7267807
http://www.gpugrid.net/result.php?resultid=7274793
Initially I did not take the capability from the posts since I was getting the info manually by review of work units. Had later gone to http://en.wikipedia.org/wiki/CUDA, but got it a bit wrong for this mixture.
Thank you.
|
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Yes absolutely a neat overview.
Two remarks from me though. If I see correct the 770 is only little faster than the 660. In my case the 770 is doing almost 35% better than my 660. But I have a lot of problems with some tasks on the 660.
The GPU clock in the stderr output file is not the same as the actual run speed of the clock. I have watched this closely the last weeks, my 770 runs for most WU's faster and the 660 lower, when watching GPU-Z, Asus GPU Tweak, Precision X.
____________
Greetings from TJ |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Yes absolutely a neat overview.
Two remarks from me though. If I see correct the 770 is only little faster than the 660. In my case the 770 is doing almost 35% better than my 660. But I have a lot of problems with some tasks on the 660.
The presented data is an overview of what's happening. You cannot go by such measurements to accurately determine relative GPU performances - there are too many ways a system or configuration can throw the results off. If your data set for each GPU type is small (less than several hundred cards) it's inherently prone to such variation. Three bad setups would significantly throw the results if the data set was 10 or 20 cards for each type. Micro-analysing overview results is fickle and subject to continuous change, but can help understand things. It might be useful to remove the obvious erroneous results, ie the downclocks and bad setups.
The GPU clock in the stderr output file is not the same as the actual run speed of the clock. I have watched this closely the last weeks, my 770 runs for most WU's faster and the 660 lower, when watching GPU-Z, Asus GPU Tweak, Precision X.
Even for one type of GPU there is a range of around 10% performance variation from various vendors, and that's without individuals Overclocking their FOC's. The clock speed when crunching is the boost speed (for those cards that do boost). This is typically higher at GPUGrid than the average boost and often the max boost speed. It is different from what is quoted in the Stderr file. My GTX660Ti operates at around 1200MHz (without messing with it), but it's reported as operating at 1110MHz by the Stderr file, which is the manufacturers quoted boost speed. So Base core clock 1032MHz, Boost clock 1111MHz, Actual clock 1200MHz when crunching. There is an 8% difference between what I think is the Boost clock (1111MHz) when using full power, and the running clock (which rises until the power target is reached or something like heat or reliable voltage stops it rising). Anyway, the actual boost is subject to change (Voltage, temps...) and some bespoke cards have a higher boost delta than others.
I'm not sure if the quoted reference GFlops are taken from the core clock, or Avg Boost, or 'Max' boost? Whatever way it is taken would still make comparisons slightly inaccurate.
- just for ref - GTX660; Base Clock 980MHz, Boost Clock 1033MHz, Actual clock 1071MHz.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
I'm not sure if the quoted reference GFlops are taken from the core clock, average boost, or Max boost? Whatever way it is taken would still make comparisons slightly inaccurate.
That is what I would say.
Comparison is overall quite hard, as you said yourself, it is the setup (and what is a good setup and what a wrong/faulty one), the type of OS, RAM, cooling and so on.
Than a 650 is not the same as a 780. Very likely the 780 will be faster, but is it more efficient? That depends how you look at it and what you are comparing.
If you want the compare it scientifically, you need exact the same setup, with only one variable factor and that is the graphics card. The WU need to be exact the same as well. That can never be done in the BOINC-world.
Nevertheless the overview Jeremy has made, gives nice insight.
____________
Greetings from TJ |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Another minor factor for here is that different types of WU utilize the MCU to different extents. This means that running WU's that greatly utilize the MCU on memory bandwidth constrained cards would reduce their relative performance, and running WU's that don't tax the MCU as greatly would make such MCU constrained cards appear faster. The problem with this situation is that we don't know which type of WU GPUGrid will be running in the future, so it's difficult to know which card will be relatively better in terms of performance per cost. I've seen the MCU on my GTX660Ti as high as 44% and also down in the low 30's.
If you take a GTX660Ti and a GTX660, the theoretical performance difference is 30.7% for reference cards, however the actual performance difference is more like 20%. Both cards have have a bandwidth of 144.2GB/s but the 660 has fewer shaders to feed, so it's bandwidth/shader count is higher. My 660Ti also boosts much higher than a reference 660, 1202MHz vs 1071MHz, so this exasperates the bandwidth limitation.
My MCU loads are 38% and 28% while running the Beta WU's. Roughly speaking the GTX660Ti appears to be taking a 10% hit due to it's bandwidth restrictions.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Nice work, Jeremy! I'm wondering how to make the best use of this. I don't know what you're doing exactly, but you think it would be feasible to implement this as part of the GPU-Grid homepage? Provided the project staff themselves are interested in this. I'd imagine data polling maybe once a day and then presenting the user different filters on what to show. The obvious benefit would be that this is live data, so it's be definition current and doesn't have to be updated manually (just checked for sanity every now and then).
And some comments regarding your current work. I'd prefer "projected RAC" or "maximum RAC" or "maximum credits per day" instead of run times for the following reasons:
- WUs differ in runtime and get their credits adjusted accordingly. By looking only at task lengths you introduce unnecessary noise into your data, because you neglect how much work the WUs contained (which the credits tell you)
- to take this into account you'd measure "credits per time interval", so just use 1 day, as this coincides with what's used for the RAC we all know
- on the current scale we can see the differences between slow cards, but everything faster seems just like a wash in this plot, but I'd argue that the performance of the current cards is actually more important
-> invert the y-value, credits per time instead of time
And while showing scatter plots is nice to get a quick feeling for the noise of a data set, they're bad for presenting averages of tightly clustered data. We could use "value +/- standard deviation" instead.
I suppose the OS difference could be filtered out: either generate separate graphs for Win XP + Linux and Win Vista+7+8 or normalize for any of them (use that 11% difference SK mentioned). This and calculating RAC should make your data significantly more precise. And everyone please remember that we're not shooting for the moon here, but rather "numbers as good as we can get them automatically".
And for the GTX650 Ti you can see two distinct groups. I suppose one is the GTX650 Ti and the other one the GTX650 Ti Boost. Hurray for nVidias marketing team coming up with these extremely sane and intuitive names.. not sure there's anything you could do to separate the results of these two different cards.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
- WUs differ in runtime and get their credits adjusted accordingly. By looking only at task lengths you introduce unnecessary noise into your data, because you neglect how much work the WUs contained (which the credits tell you)
You know probably better than I ETA how it works with credit, but I see the following:
A Noelia WU uses the card better than Nathan, Santi and SDoer at the moment with 95-97% GPU usage, it is faster (around 3000 seconds in my cases) but gives more credit. So task-length can not be used here to compare completely but run time neither.
____________
Greetings from TJ |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Good job indeed everybody who gathered the informations for comparison.
One suggestion : there is a CPU benchmark in BOINC , why not a GPU benchmark ?
____________
Lubuntu 16.04.1 LTS x64 |
|
|
|
Good job indeed everybody who gathered the informations for comparison.
One suggestion : there is a CPU benchmark in BOINC , why not a GPU benchmark ?
The CPU benchmarks are a joke. They don't give you a good indication of how well a CPU runs a real application. Different CPUs have different strengths and weaknesses, and some apps will run great on CPU X and terribly on CPU Y, while other apps will have completely opposite behavior.
With GPUs, it's much worse than with CPUs as the architectures of different GPUs are completely different and the performance on different types of math varies a lot more than it does with GPUs. Standard BOINC GPU benchmark would therefore be even more worthless than the current CPU benchmarks.
What you ideally would want (if it's possible) is to build a benchmark suite directly into your application so you can run it in benchmark mode on various hardware. (My apps tend to be of the "Repeat this big loop N times" type, where the time for each iteration is relatively constant, so benchmarks just consist of timing the first N iterations.)
If you really want to get good data, you can have every result perform a benchmark and return the benchmark result along with the science result. Depending on how the app works, you could even gather the benchmark data while the real app is running, so there doesn't need to be any wasted time. The server could then gather solid data on exactly how fast each app works on specific GPUs. |
|
|
|
Mike wrote: If you really want to get good data, you can have every result perform a benchmark and return the benchmark result along with the science result. Depending on how the app works, you could even gather the benchmark data while the real app is running, so there doesn't need to be any wasted time. The server could then gather solid data on exactly how fast each app works on specific GPUs.
There is something like this already present in the acemd client, as there is a line at the end of the stderr stating:
# Time per step (avg over 12000000 steps): 2.104 ms |
|
|
|
MrS,
Thank you for the feedback. Sifting the data took the longest part. Basically started with the top 10 users and machines and worked down until I got enough records. Then had to review individual work units for the machines that different types of cards. Ended up with a little over a thousand records over several hours. There is so much info in this forum on the different cards, but I was curious to see some more filtering. Automation would be great; however, I am not able to do that.
I do not have a complete understanding of how the credits are determined. Just that there are bonuses for 24 hour turn around. With size of personal queues, card speeds, and even time allotted to crunching, I did not use this metric because of large potential for skew. I think I heard there is something like 25-50% bonuses for credit on fast returns. For better or worse, that is why I looked at just GPU Time.
You are right that the different WU's have different amounts of calculations, but I am not sure how to normalize that out. The NOELIA's have a difference. Probably the best could be a uniform WU naming from the scientists for easier filtering. I guess here the credits could help pin point. For example, the two numerical digits in the name before NOELIA are the longest running work units. Probably a different class of analysis' being done. I can not sort by just credit as they get 180,000 when <24, but 150,000 when at 26 hours. Well, while writing this, I could back calculate to a normalized credit score by using time (Sent and Reported) if I understood the complete bonus system if it is only based on time.
There were a few other plots I had created, but did not post because it was rather slow for me (turn plot into image, get a place to save pics online without ads in the created link, manually look at the code of the site to get the link, and then paste here in the forum). Plus was not sure on length of post.
I would probably try this again if the data collection were a bit easier. Also, if the stderr had an avg actual boost clock and not device default. Maybe through that output on each temp line output, and then have an avg in the stderr header summary. I would include a median analysis with 95% confidence intervals. Most of the data is non-normal so it skews averages.
Anyway, looks like there are a few additional ways to clean up the data set for a new look. Really would like a way to know the actual boost clocks running though as that was a strong interaction.
Regards,
Jeremy
|
|
|
|
Mike wrote: If you really want to get good data, you can have every result perform a benchmark and return the benchmark result along with the science result. Depending on how the app works, you could even gather the benchmark data while the real app is running, so there doesn't need to be any wasted time. The server could then gather solid data on exactly how fast each app works on specific GPUs.
There is something like this already present in the acemd client, as there is a line at the end of the stderr stating:
# Time per step (avg over 12000000 steps): 2.104 ms
The problem with stderr is that if you have a user that has BOINC set up to stop when CPU uasge goes above a certain percantage -- and that's the default in at least some versions of the BOINC client -- stderr may exceed the max allowed size of the stderr.txt file, which results in the beginning of the file being truncated. Combined with the fact that BOINC will switch the task between GPUs, unless you can see the enitre stderr, you can't be certain that the task ran entirely on the GPU you think it ran on.
Also, if there's more than one GPU in the system, you need to have the app tell you which GPU was running the task. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I asked for an benchmark app a few years back. There were several reasons.
One was to allow us to test our clocks, report performances, and get an accurate comparison of different GPU's. Another reason was because it could be used by the numerous review sites that now tend to show folding performances.
I also thought it could be used as a reset mechanism; when you stop getting WU's because you fail too many tasks (runaway failures) you would be allowed to run the benchmark app (no credits), and when you did this successfully you would be allowed to get normal tasks again. I think it would have needed it's own queue, so No was the answer.
Jeremy, the actual boost clocks for GPU's is a big consideration. While most cards seem to run higher than what the manufacturers say, that's not always the case, and isn't the case for reference mid range cards.
With high end cards it appears that most cards (of a given model) boost to around the same clock frequencies. For example non-reference GTX660Ti's tend to boost to around 1200MHz, 8% higher than the quoted Boost by the manufacturer and over 13% higher than the supposed Max Boost of a reference card. However we can't be sure if that's the case when looking at the data; some cards might not boost, and cards boost to different extents based on different circumstances, the power target, temperature, GPU usage...
My reference GTX660 only boosts to 1071MHz (which is in fact 13MHz below the reference Max Boost, and why I suggest getting a non-reference model that clocks a bit higher).
My Gigabyte 670 is in a Linux system, and is supposedly running at a reference Boost of 1058MHz. However I don't know if that's what it's really running at? I would like to see the real boost clock when crunching a WU being recorded in the Stderr, especially on Linux as XServer tells me the clock is 750MHz (which is wrong), the Stderr is almost blank, and there aren't any handy little apps such as GPUZ, Afterburner, Precision...
It would be useful to know what other peoples cards boost to while running GPUGrid WU's...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Relative performances of high end and mid-range reference Keplers at GPUGrid:
100% GTX Titan
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
58% GTX 690 (each GPU)
55% GTX 660Ti
53% GTX 760
51% GTX 660
43% GTX 650TiBoost
33% GTX 650Ti
This is meant as a general guide but it should be reasonably accurate (within a few percent). The figures are based on actual results but they should still be perceived as estimates; there are some potential unknowns, unexpected configurations, app specific effects, bespoke cards, OC's, bottlenecks... and the table is subject to change (WU and app type). Note that lots of Kepler cards have non-reference clocks, so expect line variations of over 10% (the GTX660 performance range could be from 50 to 56% of a Titan).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
Graph looks about right. And yes, boost clocks make a complete difference in actual GPU performance. So much so, that the table presented, and pretty much all tables in general as far as GPUs go, are wobbily at best.
And down right laying on their side at worst. |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Relative performances of high end and mid-range reference Keplers at GPUGrid:
100% GTX Titan
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
55% GTX 660Ti
53% GTX 760
51% GTX 660
43% GTX 650TiBoost
33% GTX 650Ti
Well, what about the GTX 690 (even if you consider only 1 of 2 GPU chips)?
____________
Lubuntu 16.04.1 LTS x64 |
|
|
|
Here is your answer.
A GTX690 does around 80% more work than a GTX680; each GTX690 core is about 90% as fast as a GTX680.
Assuming it used 300W (or an equal proportion of it compared to other KG104's and the KG110 GPU's) and a GTXTitan used 250W then it's reasonably accurate to say that a GTX690 is overall equally as efficient as a Titan (for crunching for here in terms of performance per Watt).
GTX680 - 100% for 195W (0.53)
Titan - 150% for 250W (0.60)
GTX690 - 180% for 300W (0.60)
If a GTX680 used 195W to do a certain amount of crunching, the Titan or the GTX690 would do the same amount of work for 167W.
Both the Titan and the GTX680 are about 12% more efficient than the GTX680.
To bring in the purchase factor,
In the UK a GTX690 costs the same as a Titan (~£770), but would do 20% more work. A GTX780 on the other hand costs ~£500, but might bring around 90% the performance of the Titan (would need someone to confirm this).
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
5potSend message
Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level
Scientific publications
|
My 780 is neck and neck with a titan. Like I said, it all depends on clock speed. |
|
|
jlhalSend message
Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level
Scientific publications
|
Here is your answer
...
Thanks, I saw rhis already. Just wondered why it did not appear in skgiven's list, but I presume this is because he does not consider bi-gpu as a whole...
____________
Lubuntu 16.04.1 LTS x64 |
|
|
|
OK :)
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
oprSend message
Joined: 24 May 11 Posts: 7 Credit: 93,272,937 RAC: 0 Level
Scientific publications
|
Hello folks, this is a question I posted few days earlier to moderator and he answered me already , thanks , but anyway:
I'm running application : Long runs(8-12 hours on fastest card) 8.14 (cuda55). Workunit name is 29x17-SANTI RAP74wtCUBIC-19-34-RND6468_0 . Boinc manager says that it takes 92 hrs and it seems to be correct. I don't know much about these things , I have windows 7 Pro 64 , Geforce GT 430 (cuda 96 , it says somewhere) but computer itself is some years old; pentium 4 (some sort of at least virtual dual-prosessor) 3,4 GHz . 3 GB RAM. Motherboard is asus p5gd2-x ,bios is updated to 0601.
So is it ok that it takes so long. I think these gpugrid-things were faster sometimes. I installed boinc manager again carefully not in a service or protected application mode. Could it be that this "old iron" works better with older graphics card drivers for example. Boinc seems to detect correctly craphic card anyway. So is there any good tricks to go "full speed" , or is this just ok?
Ps.Workunit is now completed and it did took that time. But those of you who have same thoughts , try einstein@home for few gpu-workunits because at least with my old computer and little gp-card , those WUs went fast through , with quite a big estimated GFLOPs-size.
(there seems to be lot of good info on this forum , thanks everyone).
Regards ,opr . |
|
|
|
Yep, I have a couple of those still and will be upgrading soon. Set it to only run Short runs on that computer. (30hrs) 96 cores aint much when you see a Geforce Titan with 2688 (and a 780Ti on the way!)
You can run Linux to get better results, don't know if you'd bother with that. |
|
|
|
I am combining 2 questions that I have into one here:
kingcarcas,
Are you saying that Linux is better for number crunching than say, WIN 7-64? I am thinking about building a dedicated BOINC machine. Running Ubuntu would save me $200 on a Windows OS.
Also, anyone, what would the performance of a single GTX780 be compared to performance of two 760's? An EVGA 780 w/ ACX cooling can be had at Newegg for $500. Two EVGA GTX 760's w/ ACX cooling can be had for the exact same price of $500. The number of CUDA cores is identical: 1152 (x2) for the 760's, and 2304 for the 780.
Clock speeds of the 2 models are within 10% of each other, but the 780 uses a 384bit memory, whereas the 760's have a 256bit memory. So would a single 780 be overall much faster? Only running 1 GPU would almost certainly be easier. |
|
|
matlockSend message
Joined: 12 Dec 11 Posts: 34 Credit: 86,423,547 RAC: 0 Level
Scientific publications
|
Are you saying that Linux is better for number crunching than say, WIN 7-64? I am thinking about building a dedicated BOINC machine. Running Ubuntu would save me $200 on a Windows OS.
Yes, Linux is at least 10% faster than Windows 7. Look at my setup, as I have almost a 300k RAC with just a GTX 660. If it's a dedicated machine, there should be no hesitation to use Linux.
Also, anyone, what would the performance of a single GTX780 be compared to performance of two 760's? An EVGA 780 w/ ACX cooling can be had at Newegg for $500. Two EVGA GTX 760's w/ ACX cooling can be had for the exact same price of $500. The number of CUDA cores is identical: 1152 (x2) for the 760's, and 2304 for the 780.
Others here can tell you about the performance difference, but another thing to consider is the power consumption and running cost of two GPUs vs one. 340W(170W x 2) vs 250W. |
|
|
|
Thanks for that very helpful info matlock. I will dwnld Ubuntu and install it on a spare drive so I can get familiar with it. As for the GPU's I hadn't thought to consider the power factor. Going to guess that to get the most work done the smarter option would be one 780, instead of two 760's. And it does have the faster 384bit memory bus. But I am most certainly not an expert in these matters, only a quick learner. I would appreciate any thoughts from others regarding two 760's or one 780 for GPUGrid work. |
|
|
matlockSend message
Joined: 12 Dec 11 Posts: 34 Credit: 86,423,547 RAC: 0 Level
Scientific publications
|
Let me know if you need any help with Ubuntu or another distribution. I recommend Lubuntu though. It's the same as Ubuntu but doesn't have all the flash and bloat of Gnome3, as it comes with LXDE. LXDE is a very simple and lightweight desktop environment. If you need a few more features, go for MATE Desktop(a Gnome2 fork) which is easy to install on top of Lubuntu. Everything you need should be available through the synaptic package manager, which is a graphical front-end to apt.
A single 780 would also allow you to add another card later.
I haven't had a failed task for quite a while, but what timing. When I posted I had a RAC of 296k and it just dropped to 282k. Oh well, it will climb back. |
|
|
|
Matlock I will remember to ask you if I need assistance. Buying the hardware is some months off, but I think I'll download that Lubuntu you mentioned onto a spare drive I have and then boot into it and play around with it, see how I like it.
Right now I am using an i7-3770k, OC'd for the winter to a very stable 4.5GHz, Vcore at 1.256v. It puts out 9-10 deg C more heat that way, as opposed to my summer speed of 4.3GHz, Vcore 1.120v! 32 gigs of RAM and I'm thinking I could steal 16GB of that for the new machine because Windows Task Manager always says I have 24GB available. Using an EVGA Superclocked GTX770 with ACX cooler for the GPU. |
|
|
matlockSend message
Joined: 12 Dec 11 Posts: 34 Credit: 86,423,547 RAC: 0 Level
Scientific publications
|
I have a new recommendation for new Linux users, and that is Linux Mint. I had a chance to use it this weekend and I'm impressed. It puts more polish on Ubuntu.
Try Mint with MATE: http://www.linuxmint.com/edition.php?id=134
|
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,732,827,502 RAC: 880,227 Level
Scientific publications
|
Hi all,
I am trying to install a BOINC version which recognizes CUDA on Puppy Linux 5.7 on a USB Stick (4 GB). I do not have much experience with Linux at all, however I got BOINC running with the boinc-6.2.15.pet from the user “Wulf-Pup”. It works great on CPU WUs however it does not see my CUDA cards (GTX 8400 as a test). So I was wondering if anybody has experience with Puppy Linux and Boinc.
I tried Dotsch/UX as well: It worked but Climateprediction filled the USB stick (8 GB), got stuck and I was never able to format it again. The start process was also quite long and error prone.
I know it works with Ubuntu but it does not convince me installed on an USB Stick.
My final goal is to install Boinc on a sleek Linux on an USB 16 GB (USB 3.0), so I would be able to run a high-end graphic card with Linux instead of my W7 environment on the same computer.
Thanks. |
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
Try HOW TO - Install GPUGRID on a USB stick.
I installed Xubuntu (Ubuntu with a much lighter desktop) on a USB 2.0 stick using Unetbootin. It worked but using the USB stick as a substitute for an HDD gave very slow disk reads/writes and slow boots. USB 3.0 is faster than 2.0 and might be adequate but I would be prepared to configure to use some HDD space for permanent storage. But if you're going to do that, then it seems to me you may as well install Linux in a dual-boot configuration alongside Windows.
If you really must run Windows on that machine along with Linux and if you have a Win install disk then the best configuration, if you want to crunch GPUgrid on Linux, is to just erase Windows from the disk, install Linux, then install VirtualBox (free) on Linux, then install Windows in a Virtual Box virtual machine. That way you can have Windows and Linux running simultaneously rather than having to boot back and forth between them. The virtual Windows won't run quite as fast as real Windows but, depending on the apps you intend to run, that might not be a problem.
I don't see any advantage to having a bootable USB stick to boot a machine that already has an HDD with enough spare room for a dual-boot setup unless you need to boot to Linux infrequently, perhap
____________
BOINC <<--- credit whores, pedants, alien hunters |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Update to add the GTX Titan Black, the GTX 780 Ti and the 750 Ti
Relative performances of high end and mid-range reference Keplers at GPUGrid:
114% GTX Titan Black
112% GTX 780Ti
100% GTX Titan
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
58% GTX 690 (each GPU)
55% GTX 660Ti
53% GTX 760
51% GTX 660
47% GTX 750Ti
43% GTX 650TiBoost
33% GTX 650Ti
This is meant as a general guide but it should be reasonably accurate (within a few percent). The figures are based on actual results but they should still be perceived as estimates; there are some potential unknowns, unexpected configurations, app specific effects, bespoke cards, OC's, bottlenecks... and the table is subject to change (WU and app type). Note that lots of Kepler cards have non-reference clocks, so expect line variations of over 10% (the GTX660 performance range could be from 50 to 56% of a Titan).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Hi,
The 750Ti ought to come in at about 0.50.
Matt |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Hi Matt,
I've changed the table a couple of times but it's based on what I can see from others, and the best comparison I can find now is as follows:
Both Win7 systems with comparable CPU's.
GTX750Ti
818x-SANTI_MAR419cap310-29-84-RND6520_0 5288118 16 Mar 2014 | 14:01:45 UTC 16 Mar 2014 | 18:26:07 UTC Completed and validated 14,621.79 4,475.01 18,300.00 Short runs (2-3 hours on fastest card) v8.15 (cuda60)
http://www.gpugrid.net/result.php?resultid=7938386
GTX660
425x-SANTI_MAR419cap310-23-84-RND9668_0 5283371 15 Mar 2014 | 10:36:31 UTC 15 Mar 2014 | 17:31:57 UTC Completed and validated 12,503.16 2,083.05 18,300.00 Short runs (2-3 hours on fastest card) v8.15 (cuda60)
http://www.gpugrid.net/result.php?resultid=7931978
The most recent driver (used) is supposed to increase boost for the Maxwell's, but I've got little to go on, and the non-reference cards range from <1089MHz to 1255MHz Boost (15%). I expect Alexander has a good card based on the reported non-Boost speeds. Then there are all the other performance unknowns; what else the system is doing, system spec, temps, and the big if; memory controller load.
Update!
I think I've found the issue; running Einstein iGPU WU's at the same time as NVidia GPUGrid WU's - It's known to reduce GPUGrid WU performance (but also depends on CPU usage and other settings). So the GTX750Ti performance I've included is probably less than what it could be.
Update 2
GTX750Ti (no Einstein iGPU app)
401x-SANTI_MAR423cap310-29-84-RND5514_1 5289992 17 Mar 2014 | 6:14:56 UTC 17 Mar 2014 | 10:04:35 UTC Completed and validated 13,649.27 3,335.68 18,300.00 Short runs (2-3 hours on fastest card) v8.15 (cuda60)
http://www.gpugrid.net/result.php?resultid=7941692
Performance was 9.5% faster than the first WU when not running the iGPU Einstein app, making the GTX750Ti 9% slower than a GTX660.
So,
114% GTX Titan Black
112% GTX 780Ti
100% GTX Titan
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
58% GTX 690 (each GPU)
55% GTX 660Ti
53% GTX 760
51% GTX 660
47% GTX 750Ti
43% GTX 650TiBoost
33% GTX 650Ti
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
So in short: does the 750 Ti currently have the highest ratio of performance to energy used? |
|
|
|
So in short: does the 750 Ti currently have the highest ratio of performance to energy used?
It's a wash between 750s and 780s when you factor in the cost of the host systems need to support an equal amount of GPU compute capacity. The 750 configuration has slightly higher capital cost which is about balanced by reduced operational cost.
Matt |
|
|
Jim1348Send message
Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level
Scientific publications
|
So in short: does the 750 Ti currently have the highest ratio of performance to energy used?
My GTX 750 Ti draws about 52 watts on the Shorts, while my GTX 660 pulls the full TDP of 140 watts (both as measured by GPU-Z). And the times on the 750 Ti are only a little longer, with a Nathan short taking typically 2 hours 54 minutes on the 750 Ti and 2 hours 37 minutes on the 660, or about as skgiven noted above. (The 660 is clocked at 1000 MHz base, running at 1136 MHz, and the 750 Ti is 1072/1215 MHz).
In short, it is a great card, just in time for the summer. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
It's a pity Maxwell's didn't also arrive in larger guises. Cards as capable as the GTX770 or GTX780Ti but based on the Maxwell architecture using the existing, mature 28nm process would certainly be a winner. There is definitely plenty of scope for mid range cards too. Perhaps NVidia missed a trick here? Maybe the predicted GTX750 and GTX750Ti performances were not so high, or maybe NVidia didn't want to interfere with the existing production or plans for GTX780Ti, Titan Black and Titan Z by releasing mid to high end 28nm Maxwell cards. If so that would amount to withholding technology for 'business' reasons. Hopefully there is still time for more 28nm Maxwell's - something that performs to the standards of a GTX770 or 780 but uses much less power would go down well over the summer...
Rumour has it that the 20nm Maxwell's might not be ready this year, but even if 20nm card do arrive in the 3rd quarter, availability could be a problem (as seen before) and the inevitable expense will put many off. Until at least this time next year they are likely to be quite expensive. Unless you are interested in low to mid end cards (GTX750Ti), now isn't a great time to buy especially if your electric costs are high.
The release of the 28nm Maxwell's and continued release of high end 28nm Kepler's suggests NVidia is continuing to adapt the strategy for releasing cards. Way back in the days of the GF200 cards, a flagship GTX280 was released along with a slightly lesser version (260). Then these were added to and revision models released under the same series (55nm versions of the GTX285, GTX275 and GTX260). This model continued with the release of the GTX480 and GTX470, changed slightly with the release of the slightly modified architectures and refined process (release of the GTX580 but still on 40nm). The GF600's were also reincarnated (GF700) and their designs tweaked in the same way the GF500's were an upgrade to the GF400's, but it's been a bit more drawn out and the GF700 range has or is getting more high end cards to choose from.
The 28nm Maxwells completely reverses the original model; new architecture, but the same 28nm process, and instead of a flagship release, a low to medium end release (which should keep OEM's happy).
Rumour has it that a high end 20nm Maxwell might make it out late this year, but if that's the case it's not great for us. It will be interesting to see if smaller 20nm Maxwells make it onto the market earlier than the high end 'flagship' cards, but I'm hoping for something now and a few 28nm Maxwell runs might be feasible.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Jim1348Send message
Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level
Scientific publications
|
It's a pity Maxwell's didn't also arrive in larger guises. Cards as capable as the GTX770 or GTX780Ti but based on the Maxwell architecture using the existing, mature 28nm process would certainly be a winner. There is definitely plenty of scope for mid range cards too. Perhaps NVidia missed a trick here?
Exactly, except that I don't think they are missing a trick as much as keeping their options open. If the 20nm process does not arrive soon enough (whatever that means), then I have no doubt that we will see more Maxwells at 28nm. The GTX 750 Ti was just testing the waters.
|
|
|
|
... |
|
|
|
Hello,
I'm thinking of pairing my 650Ti with another low-mid card and I'm torn between the 750Ti and the 660. Their performance GPUGRID-wise is at about the same level, but I'm thinking maybe the 660's 50% wider memory bus gives it an edge over the 750Ti, at least for some types of WUs. On the other hand, the 750Ti's power consumption (60W) is less than half of the 660 (140W), greatly reducing power cost and heat emission. The purchase cost difference is ~15 euro for me, the 660 being the more expensive, and I don't find it a decisive factor. What do you guys say?
Also: my motherboard has its second PCIE-16 slot at 4x. How much would it affect the performance of my 650Ti?
Thanks,
Vagelis
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
While the GTX660 is 5 or 10% faster, the GTX750Ti is the better card because it's newer (more future-proofed) and uses much less electric, which in turn generally means cooler, quieter and better for the rest of your system.
The 660 only has a 50% wider memory bus on paper. The 750Ti's cache size is larger and I think it's not as constrained as the previous generations due to the GPU's architecture. It's not super-scalar for a start.
If I were you I would consider selling your 1GB GTX650Ti (110W TDP) and getting two 2GB GTX750Ti's (60W).
The GTX750Ti is really an upgrade for the GTX650Ti, but does a lot more work (about 40% more).
The SP GFlops/W of the GTX750Ti is 21.8. The only Kepler's that come close to that are the high end GK110 cards:
Titan (18.0), 780Ti (20.2), Titan Black (20.5) and Titan Z (21.7).
Which in itself strongly suggests that these GPU's would be exceptional Maxwell candidates at 28nm...
Of course these are Very expensive cards and well out of most people's price bracket. However, the performance of two GTX750Ti's lays between a GTX780 and a Titan, but the two GTX750Ti's would cost a lot less to buy and a bit less to run.
PCIE x4 is unlikely to make a difference for a low-medium end card. You might see a bit of loss on a very high end card (GK110, especially a Titan Z) but if you wanted a card like that you would be using a high end motherboard.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks for your thorough response, skgiven! I hadn't thought of the dual 750Ti solution and God, is it tempting!!
____________
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
53% GTX 760
51% GTX 660
Since the middle of April my:
GTX 660 has done 92 WUs for an average of 11526 credits per hour
GTX 760 has done 116 WUs for an average of 13796 credits per hour.
I reckon the 760 beats the 660 by ~20% vs. your 2% (but then I may be comparing your apples with my pears…)
What do you think?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I think you may be right, but I was trying to compare reference cards.
The 53% and 51% are against a GTX Titan, which would make my estimation of the 760 ~4% faster than a 660 (and with a lot of variation); (53/51)*100%=104%.
Note that lots of Kepler cards have non-reference clocks, so expect line variations of over 10% (the GTX660 performance range could be from 50 to 56% of a Titan).
I may have been comparing non-reference 660's (back then the app didn't report the clocks). However, your 760 might be better than average, your 760 setup might be better, your 660 might be a real reference model or the 660 setup maybe isn't great/it's not boosting/cpu availability is poor...
Did the recent driver changes now allow greater boosting on the 760?
Maybe the app now works better for the 760, or the sample data I had just wasn't great. We would need to compare your 660 to other 660's to get an idea of it's performance. Ditto for the 760's.
Another noteworthy performance variation stems from different task types which utilize the GPU differently. This can result in different 'relative' performances across some cards.
Typically, there is a task type drift which is down to the different usages of memory interface width/Bandwidth (Frame Buffer in MSI Afterburner V 3.0.0) and the L2 cache factor.
Some examples of L2 cache:
GTX750Ti 2MB (GM107)
GTX780, GTX Titan 1536K (GK110)
GTX770, GTX680, GTX670 512K (GK104)
GTX660Ti 384K (GK104-300-KD-A2)
GTX660 384K (GK106-400-A1)
GTX650Ti 256K (GK107)
Performance comparison aside, a 760 is a 'slightly' newer version/model of the 660 (akin to the difference between a GTX460 and a GTX560).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
114% GTX Titan Black
112% GTX 780Ti
100% GTX Titan
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
58% GTX 690 (each GPU)
55% GTX 660Ti
53% GTX 760
51% GTX 660
47% GTX 750Ti
43% GTX 650TiBoost
33% GTX 650Ti
Hi Skgiven,
This analysis is nine months old. Any chance of an update, to include the GPUs marked in red below?
Thanks, Tom
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
211% GTX Titan Z (both GPU's)
116% GTX 690 (both GPU's)
114% GTX Titan Black
112% GTX 780Ti
109% GTX 980
100% GTX Titan
93% GTX 970
90% GTX 780
77% GTX 770
74% GTX 680
59% GTX 670
55% GTX 660Ti
53% GTX 760
51% GTX 660
47% GTX 750Ti
43% GTX 650TiBoost
37% GTX 750
33% GTX 650Ti
This is a quick estimate but it should be accurate to within a few percent and serve as a reasonable guide on actual performance at GPUGrid.
The GTX900 series cards are the choicest for most GPUGrid crunchers due to their higher series (future proofed), compute capabilities, price and relatively lower power usages/performance. That said the GTX 780Ti, 750Ti and Titans, 690 and 660Ti still offer reasonable performance/Watt and power costs vary greatly world wide.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Thanks! My pen is poised over my cheque book :) |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
If power efficiency is an issue go for a Maxwell based GPU. |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
With upcoming release of GTX960 (specs are all over the place and still unknown) If 10/11/12 SMM with 125watt power limit than the performance/wattage ratio will be top notch. GM200 could be two different boards- a full fat and cut down version (similar to Kelper's GK110 GTX 780.) This coming year for GPU cards will be a good one.
Skgiven:
If time permits for a new thread- would it be possible to create a table for per core power usage and per SMM/SMX wattage efficiency/Total power/runtime ratios? (similar to one in the Maxwell Now thread.) |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Performance GPU Power GPUGrid Performance/Watt
211% GTX Titan Z (both GPUs) 375W 141%
116% GTX 690 (both GPUs) 300W 97%
114% GTX Titan Black 250W 114%
112% GTX 780Ti 250W 112%
109% GTX 980 165W 165%
100% GTX Titan 250W 100%
93% GTX 970 145W 160%
90% GTX 780 250W 90%
77% GTX 770 230W 84%
74% GTX 680 195W 95%
59% GTX 670 170W 87%
55% GTX 660Ti 150W 92%
53% GTX 760 130W 102%
51% GTX 660 140W 91%
47% GTX 750Ti 60W 196%
43% GTX 650TiBoost 134W 80%
37% GTX 750 55W 168%
33% GTX 650Ti 110W 75%
Note that these are estimates and that I’ve presumed Power to be the TDP as most cards boost to around that, for at least some tasks here.
I don’t have a full range or cards to test against every app version or OS so some of this is based on presumptions based on consistent range observations of other cards. I’ve never had a GTX750Ti, GTX750, 690, 780, 780Ti or any of the Titan range to compare, but I have read what others report. While I could have simply listed the GFLOPS/Watt for each card that would only be theoretical and ignores discussed bottlenecks (for here) such as the MCU load.
The GTX900 series cards can be tuned A LOT - either for maximum throughput or less power usage / coolness / performance per Watt:
For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3600MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work on 6 cores).
The former does more work, is ~9% faster than stock.
The latter is more energy efficient, uses 60% stock power but does ~ 16% less work than stock or ~25% less than with OC'ed settings.
At 60% power but ~84% performance the 970 would be 34% more efficient in terms of performance/Watt. On the above table that would be ~214% the performance/Watt efficiency of a Titan.
I expected the 750Ti and 750 Maxwell's also boost further/use more power than their reference specs suggest, but Beyond pointed out that although they do auto-boost they don't use any more power for here (60W). It's likely that they can also be underclocked for better performance/Watt, coolness or to use less power.
PM me with errors/ corrections.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks. It looks like a GTX 750Ti is currently the best replacement for my GT440; anything higher would require a new PSU and would probably trip the circuit breaker frequently.
This may change when more 900 series boards become available. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks. It looks like a GTX 750Ti is currently the best replacement for my GT440; anything higher would require a new PSU and would probably trip the circuit breaker frequently.
This may change when more 900 series boards become available.
Most GTX750Ti's have a 6-pin power connector, so while they are rated as 60W TDP I expect many can use more power, if it's available. What's actually observed while crunching is key. This may throw/skew the performance/Watt rating substantially. Different card versions are built & tuned by manufacturers in different ways, some aim for efficiency, some for performance and some for cost.
While a GTX960 may well be on the horizon, it's likely to have a TDP of around 120 or 125W, with the GDDR amount and clocks determining a slightly higher or lower TDP. Such a card probably wouldn't fit your power requirements. I'm not sure when lesser cards will be introduced and there is little talk of them. The GTX750Ti was very successful, so NVidia might keep production going for a while longer.
If you look at NVidia's range of GPU's there are 2 Large TDP Power gaps. First between the 750Ti (60W) and the 192-bit 760 (130W) and from there to the 256-bit GTX760 (170W). The latter gap was recently filled by the GTX970 (145W) and 980 (165W), so it would be reasonable to presume the former gap (60W - 130W) will be filled by Maxwell's.
You have to go as far back as the 75W GDDR5 version of the GT 640 and the GTX650Ti (110W) to find anything between 60W and 130W. So it's a gap that I expect NVidia to fill out with Maxwell's. The question is when? My guess is that it's only a few months away, but it may or may not contain a revised 750Ti model.
If you do go for a GTX750Ti make sure you get a 2GB version and keep an eye on it's power usage; you can always force the GPU to run at a lower clock to use less power.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Be aware that the GTX900 series cards can be tuned A LOT; either for maximum performance or performance/Watt. For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3500MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work). The former does more work, the latter is more energy efficient; ~27% faster or in theory ~30% more efficient (probably much more).
This information will help during the summer for dense systems with no Air Conditioner cooling. Lower temperatures = longevity. For Winter: the Maxwell can be tuned for Max performance while Summer season see's higher efficiency and lower temps with slightly longer work unit runtimes. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Thanks. It looks like a GTX 750Ti is currently the best replacement for my GT440; anything higher would require a new PSU and would probably trip the circuit breaker frequently.
Most GTX750Ti's have a 6-pin power connector, so while they are rated as 60W TDP I expect many can use more power, if it's available. What's actually observed while crunching is key. This may throw/skew the performance/Watt rating substantially. Different card versions are built & tuned by manufacturers in different ways, some aim for efficiency, some for performance and some for cost.
I'm running a lot of 750Ti cards (14 of them). The fastest stock 750Ti I know of is the PNY OC which has stock OC clocks of 1201/1281MHz (Afterburner reports 1346-1359MHz when running GPUGrid) and 3004/6008 memory:
http://www.pny.com/gtx_750_ti_2048mb_oc_pcie
Kill-a-watt reading is plus 60 watts after adding the GPU and running GPUGRID (94% usage). Some lower usage WUs draw less. Interesting that this model has no 6 pin connector, all power is from the PCIe bus even though it's probably the fastest 750Ti available. The EVGA OC Superclocked is almost as fast and also has no 6 pin connector. The fan on the EVGA is larger but both run at about the same temps. The 2 fan EVGA ACX runs cooler and does have the 6 pin. It's also just slightly slower than the PNY. I'm running 4+ each of the above 3 models and would recommend any of them without reservation. If installing in an environment where temps are a serious problem I'd recommend the EVGA ACX as its cooling is way more than normally needed. Also have 1 ASUS 750Ti OC - 2 fan model. It's considerably slower than any of my other cards, won't OC very much at all, runs at the same temp as the PNY and EVGA single fan models and has a very odd placement for the 6 pin connector. ASUS not recommended...
Here's a typical sd_err (SDOERR_BARNA5 at 94%) from a PNY housed in a tiny ITX case (running PCIe 2.0 x8. Notice temps are very good even with the ITX case and small PNY fan (fan speed 38% right now on a GERARD, probably a little higher for the SDOERR)):
<stderr_txt>
# GPU [GeForce GTX 750 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 750 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 5.0
# PCI ID : 0000:01:00.0
# Device clock : 1280MHz
# Memory clock : 3004MHz
# Memory width : 128bit
# Driver version : r343_00 : 34465
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C
# GPU 0 : 58C
# GPU 0 : 59C
# GPU 0 : 60C
# GPU 0 : 61C
# GPU 0 : 62C
# GPU 0 : 63C
# Time per step (avg over 3750000 steps): 11.679 ms
# Approximate elapsed time for entire WU: 43795.551 s
# PERFORMANCE: 87466 Natoms 11.679 ns/day 0.000 ms/step 0.000 us/step/atom |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
This information will help during the summer for dense systems with no Air Conditioner cooling. Lower temperatures = longevity. For Winter: the Maxwell can be tuned for Max performance while Summer season see's higher efficiency and lower temps with slightly longer work unit runtimes.
Hear, hear! Through the summer my main rig ran with a 770 and a 660 and I often had to shut off one of them because of critical PCIe heat.
In September I replaced the 660 with a 750ti. No more heat problems.
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
The GTX 960(ti) will feature [3] different dies. (8SMM/10SMM/12SMM) The 8SMM could be 70-100TDP with Ti variants being 100-130TDP.
[8SMM GM206 full or Cut GM206] [10SMM a full GM206 or cut GM204 same as GTX970m] [12SMM Cut GM204 same as GTX980m)
The 8SMM performance is 10~% a GK104/170TDP (1152CUDA) GTX 760. 10 SMM is 10-20~% of a full (1536CUDA) GK104/TDP195/230. The 12SMM is ~5% of a cut GK110 GTX780/TDP225 (2304CUDA)
Future Maxwell cards will excel at GPUGRID. Depending upon final release specs : 2/3/4 GM206 on a Motherboard will have similar power consumption compared to [1] GK104 or [2] GK106. Once GM206/cutGM204 are released it could put some Kelper boards to pasture concerning Watt/performance ratio.
-Subject to change-
GTX960-75TDP [8SMM/1024CUDA] 9.375 Watts per SMM @ 0.073 watt per core
GTX960(ti)-100TDP [10SMM/1280CUDA] 8.333 Watts per SMM @ 0.078 watt per core
GTX960(ti)-125TDP [12SMM/1536CUDA] 10.416 Watts per SMM @ 0.081 watt per core
Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM
GTX580-244TDP [16SM/512cores] 15.25 watts per SM @ 0.47 watt per core
GTX760-170TDP [6SMX/1152cores] 28.33 watts per SMX @ 0.147 watt per core
GTX660ti-145TDP [7SMX/1344cores] 20.71 watts per SMX @ 0.107 watt per core
GTX660-140TDP [5SMX/960cores] 28 watts per SMX @ 0.145 watt per core
GTX680-195TDP [8SMX/1536cores] 24.37 watts per SMX @ 0.126 watt per core
GTX780-225TDP [12SMX/2304cores] 18.75 watts per SMX @ 0.097 watt per core
GTX780Ti-250TDP [15SMX/2880cores] 16.66 watts per SMX @ 0.086 watt per core
GTX750-55TDP [4SMM/512cores] 13.75 watts per SMM @ 0.107 watt per core
GTX750Ti-60TDP [5SMM/640cores] 12 watts per SMM @ 0.093 watt per core
GTX970-145TDP [13SMM/1664cores] 11.15 watts per SMM @ 0.087 watt per core
GTX980-170TDP [16SMM/2048cores] 10.62 watts per SMM @ 0.082 watt per core
|
|
|
|
Thanks. It looks like a GTX 750Ti is currently the best replacement for my GT440; anything higher would require a new PSU and would probably trip the circuit breaker frequently.
This may change when more 900 series boards become available.
Most GTX750Ti's have a 6-pin power connector, so while they are rated as 60W TDP I expect many can use more power, if it's available. What's actually observed while crunching is key. This may throw/skew the performance/Watt rating substantially. Different card versions are built & tuned by manufacturers in different ways, some aim for efficiency, some for performance and some for cost.
If you do go for a GTX750Ti make sure you get a 2GB version and keep an eye on it's power usage; you can always force the GPU to run at a lower clock to use less power.
Will a GTX750Ti (2GB) run without anything connected to the 6-pin power connector if I don't try to make it run faster than usual?
I bought one and tried to install it, but found that the computer doesn't have any power connectors that aren't already in use.
Do you know of any source of power cable splitters that would allow me to have a hard disk and the GTX750Ti share a power cable from the PSU? Probably also a few extenders for such power cables. Needs to be a source that will ship to the US.
Also, is there any comparison available for the crunch rates of a GTX750Ti and a GTX560? If the GTX750Ti has a high enough crunch rate, I'm thinking of moving nearly all my GPUGRID work to the GTX750Ti on one computer, and all my BOINC GPU work requiring double precision to the computer with the GTX560.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Not sure about installing the GPU without using a 6-pin connector, but in theory it should work. Until you started crunching it would not need much power and you could always power cap it using MSI Afterburner (or similar) before launching Boinc.
You would need 2 free molex sockets for this,
http://www.amazon.com/StarTech-com-6-Inch-Express-Adapter-LP4PCIEXADAP/dp/B0007RXDDM/ref=sr_1_1?s=electronics&ie=UTF8&qid=1421399317&sr=1-1&keywords=molex+to+6-pin
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Jim1348Send message
Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level
Scientific publications
|
Will a GTX750Ti (2GB) run without anything connected to the 6-pin power connector if I don't try to make it run faster than usual?
I bought one and tried to install it, but found that the computer doesn't have any power connectors that aren't already in use.
Do you know of any source of power cable splitters that would allow me to have a hard disk and the GTX750Ti share a power cable from the PSU? Probably also a few extenders for such power cables. Needs to be a source that will ship to the US.
My Asus GTX 750 Tis will not run at all without the PCIe power cable connected. There is an LED on the card that stays red until you plug it in, and then it turns green.
Newegg has lots of PCIe power adapters.
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=pcie+power+adapter&N=-1&isNodeId=1 |
|
|
|
Not sure about installing the GPU without using a 6-pin connector, but in theory it should work. Until you started crunching it would not need much power and you could always power cap it using MSI Afterburner (or similar) before launching Boinc.
You would need 2 free molex sockets for this,
http://www.amazon.com/StarTech-com-6-Inch-Express-Adapter-LP4PCIEXADAP/dp/B0007RXDDM/ref=sr_1_1?s=electronics&ie=UTF8&qid=1421399317&sr=1-1&keywords=molex+to+6-pin
That may be PART of a solution. There are no free molex sockets.
After writing my request, I decided to check newegg.com and found that they have enough variety in their internal power cables and adapters that I may be able to put together a five-part solution from items they sell, after I check the computer again to make sure what type of connectors are used on the hard drive power cables.
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Will a GTX750Ti (2GB) run without anything connected to the 6-pin power connector if I don't try to make it run faster than usual?
I've been running an EVGA 750ti for three months. It's the non-factory OCed version and it does not have an additional power connection.
I OCed it to the level of the factory OCed version and it's happy, 24/7. it runs at a constant 72C.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Updated with the GTX960 (approximate values):
Performance GPU Power GPUGrid Performance/Watt
211% GTX Titan Z (both GPUs) 375W 141%
116% GTX 690 (both GPUs) 300W 97%
114% GTX Titan Black 250W 114%
112% GTX 780Ti 250W 112%
109% GTX 980 165W 165%
100% GTX Titan 250W 100%
93% GTX 970 145W 160%
90% GTX 780 250W 90%
77% GTX 770 230W 84%
74% GTX 680 195W 95%
64% GTX 960 120W 134%
59% GTX 670 170W 87%
55% GTX 660Ti 150W 92%
53% GTX 760 130W 102%
51% GTX 660 140W 91%
47% GTX 750Ti 60W 196%
43% GTX 650TiBoost 134W 80%
37% GTX 750 55W 168%
33% GTX 650Ti 110W 75%
Throughput performances and Performances/Watt are relative to a GTX Titan.
Note that these are estimates and that I’ve presumed Power to be the TDP as most cards boost to around that, for at least some tasks here.
I don’t have a full range or cards to test against every app version or OS so some of this is based on presumptions based on consistent range observations of other cards. I’ve never had a GTX750Ti, GTX750, 690, 780, 780Ti or any of the Titan range to compare, but I have read what others report. While I could have simply listed the GFLOPS/Watt for each card that would only be theoretical and ignores discussed bottlenecks (for here) such as the MCU load, which differs by series.
The GTX900 series cards can be tuned A LOT - either for maximum throughput or less power usage / coolness / performance per Watt:
For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3600MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work on 6 cores).
The former does more work, is ~9% faster than stock.
The latter is more energy efficient, uses 60% stock power but does ~ 16% less work than stock or ~25% less than with OC'ed settings.
At 60% power but ~84% performance the 970 would be 34% more efficient in terms of performance/Watt. On the above table that would be ~214% the performance/Watt efficiency of a Titan.
I expected the 750Ti and 750 Maxwell's also boost further/use more power than their reference specs suggest, but Beyond pointed out that although they do auto-boost they don't use any more power for here (60W). It's likely that they can also be underclocked for better performance/Watt, coolness or to use less power.
The GTX960 should also be very adaptable towards throughput or performance/Watt.
PM me with errors/ corrections.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks for this update skgiven :)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres |
|
|
|
BOINC has a strange way of calculating performance computing.
The theoretical power is calculated this way: Shaders X Shader Clock x 2 (Because ADD + MADD)
For example GTX 280: 240 x 1296 x 2 = 622,080 FLOPs => 622.08 GFLOPS
GTX 580: 512 x 1544 x 2 = 1581056 FLOPs => 1581.056 GFLOPS
GTX 780 Ti: 2880 x 925 x 2 = 5328000 FLOPs => 5328 GFLOPS
For Radeon (Because AMD OpenCL GPU):
Radeon HD5870 320 (5D) x 850 x 2 x 5 = 2,720 GFLOPS (5 for 5 vector instructions)
Radeon HD6970 384 (4D) x 880 x 2 x 4 = 2703.36 GFLOPS
Radeon HD7970: 2048 x 1050 x 2 = 4096 GFLOPS (Scalar architecture)
Radeon R9 290x: 2816 x 1000 x 2 = 5632 GFLOPS |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
The single-width Galaxy GTX 750 Ti Razor is a new one on me!
My mobo has six PCIe slots, and with two double-width GPUs installed I have two spare single slots. Or perhaps I should sell my two GPUs and install six of these??!!
I don't find them on sale Stateside, neither amazon.com nor newegg.com, but amazon UK and France have them, albeit quite expensive.
Any experience of this GPU out there? |
|
|
|
The single-width Galaxy GTX 750 Ti Razor is a new one on me!
My mobo has six PCIe slots, and with two double-width GPUs installed I have two spare single slots. Or perhaps I should sell my two GPUs and install six of these??!!
I don't find them on sale Stateside, neither amazon.com nor newegg.com, but amazon UK and France have them, albeit quite expensive.
Any experience of this GPU out there?
I have no experience with them, but something you should check: Is there adequate space between the single slots that all 6 cards can get adequate air for cooling?
Also, this card appears to be designed to blow air around within the case, rather than blowing it out of the case. Graphics cards designed that way tend to run hotter.
|
|
|
Matt Send message
Joined: 11 Jan 13 Posts: 216 Credit: 846,538,252 RAC: 0 Level
Scientific publications
|
Just picked up an EVGA GTX 750 Ti FTW for this host and I'm really impressed with it so far. The boost clock is running at 1320MHz and on one of the current GERARD_CXCL12 Long WUs it's at 42C (default fan curve) at 88% GPU Usage and ~67% GPU Power. I keep a window in that room open just a little (it's -20C outside), but even when that room is warmed up I have yet to see the card hit 50C.
I'm using rough numbers, but it seems that this card may be at least twice as energy efficient as my 780 Ti Superclocked cards. |
|
|
|
skgiven:
Thanks for putting together this reference chart, and keeping it up to date. It helped me make a decision on my purchase (below).
I currently have 3 GPUs in the rig:
GTX 660 Ti - eVGA 3GB FTW
GTX 660 Ti - MSI 3GB N660TI TF 3GD5/OC
GTX 460 - eVGA 1GB SC
I'll be pulling the GTX 460 out, and replacing it with...
GTX 970 - eVGA 4GB FTW ACX 2.0 04G-P4-2978-KR
http://www.amazon.com/gp/product/B00OSS0AG4/
$369.99
... for better GPUGrid capabilities, for better performance in iRacing, and to better support my next purchase, the Oculus Rift DK2. I know about the 3.5 GB memory oddity, and I also know that there's a "FTW+" card, but it requires an 8-pin that I can't supply. :) I'm already tapped out with 4 real 6-pins, and 2 other 6-pins made from 4 molexs. Pushing 860 watts, 24/7, on a 5-year-old 1000 watt supply, in a 5-year-old Dell XPS 730x... is fun!
Thanks again for helping me decide,
Jacob |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,875,294 RAC: 0 Level
Scientific publications
|
I've been deploying 750ti and 750 cards of late. I also have a couple of 650 leftovers in the mix.
I like the price/performance/power combination for the 750ti and 750.
It seems anything with more performance costs quite a bit more AND requires significantly more power to run.
In a number of my set ups, I've been upgrading older systems running HD 4850's with the 750TI which has shifted me from Moowrap and Milkyway to GPUGrid -- it has also resulted in a lower power load (120W versus 80W).
|
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,688,805,840 RAC: 23,784,839 Level
Scientific publications
|
Due to the single slot requirement in one of my computers, I have spotted this card:
Gainward GeForce GTX 750 (Single-Slot), 2GB GDDR5
Shader-Units/TMUs/ROPs: 512/32/16 • Computing: 1044GFLOPS (Single), 33GFLOPS (Double)
In view of what I have read above, it would seem to be a good choice for grid computing, am I right?
I guess this card can be overclocked? |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
sorry, folks, for my numerous postings with same content. There was a technical problem.
sorry again for my multiple postings - now idea what my browser is doing with me
Can happen sometimes, no worries. Skgiven or ETA can remove them.
Edit it took 4 minutes to post this message. Such a slow forum can cause these multiple posts.
Edit 2 and the Edit in just a second....
____________
Greetings from TJ |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,875,294 RAC: 0 Level
Scientific publications
|
The slow posting may be left overs from the disconnect yesterday.
I am still seeing slow response from this site.
That could be a case of backlog with folks reporting in previously completed work and grabbing new work.
Or it *could* be something of an ongoing and unresolved 'pipeline' issue.
I've seen no comment regarding this from the admin folks though.
sorry, folks, for my numerous postings with same content. There was a technical problem.
sorry again for my multiple postings - now idea what my browser is doing with me
Can happen sometimes, no worries. Skgiven or ETA can remove them.
Edit it took 4 minutes to post this message. Such a slow forum can cause these multiple posts.
Edit 2 and the Edit in just a second....
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Updated with the GTX Titan X (estimated values based on limited results):
Performance GPU Power GPUGrid Performance/Watt
211% GTX Titan Z (both GPUs) 375W 141%
156% GTX Titan X 250W 156%
116% GTX 690 (both GPUs) 300W 97%
114% GTX Titan Black 250W 114%
112% GTX 780Ti 250W 112%
109% GTX 980 165W 165%
100% GTX Titan 250W 100%
93% GTX 970 145W 160%
90% GTX 780 250W 90%
77% GTX 770 230W 84%
74% GTX 680 195W 95%
64% GTX 960 120W 134%
59% GTX 670 170W 87%
55% GTX 660Ti 150W 92%
53% GTX 760 130W 102%
51% GTX 660 140W 91%
47% GTX 750Ti 60W 196%
43% GTX 650TiBoost 134W 80%
37% GTX 750 55W 168%
33% GTX 650Ti 110W 75%
Throughput performances and Performances/Watt are relative to a GTX Titan.
Note that these are estimates and that I’ve presumed Power to be the TDP as most cards boost to around that, for at least some tasks here.
I don’t have a full range or cards to test against every app version or OS so some of this is based on presumptions based on consistent range observations of other cards. I’ve never had a GTX750Ti, GTX750, 690, 780, 780Ti or any of the Titan range to compare, but I have read what others report. While I could have simply listed the GFLOPS/Watt for each card that would only be theoretical and ignores discussed bottlenecks (for here) such as the MCU load, which differs by series.
The GTX900 series cards can be tuned A LOT - either for maximum throughput or less power usage / coolness / performance per Watt:
For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3600MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work on 6 cores).
The former does more work, is ~9% faster than stock.
The latter is more energy efficient, uses 60% stock power but does ~ 16% less work than stock or ~25% less than with OC'ed settings.
At 60% power but ~84% performance the 970 would be 34% more efficient in terms of performance/Watt. On the above table that would be ~214% the performance/Watt efficiency of a Titan.
I expected the 750Ti and 750 Maxwell's also boost further/use more power than their reference specs suggest, but Beyond pointed out that although they do auto-boost they don't use any more power for here (60W). It's likely that they can also be underclocked for better performance/Watt, coolness or to use less power.
The GTX960 should also be very adaptable towards throughput or performance/Watt.
PM me with errors/ corrections.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks for this update skgiven :)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Hi skgiven,
Thanks for the update. I note that the 750Ti continues to perform 4% points behind the 660.
That does not reflect my experience...
I've logged all my completed WUs for over a year now and one of the items I calculate is credits per run-time hour. I've just done an analysis of my 750Ti vs. my 660 since the start of this year. The 750Ti sits in my AMD 8-core main rig with a 770, and is connected to video. The 660 sits undisturbed, upstairs, in a Pentium 8-core. The numbers are:
So my 750Ti is marginally better than my 660, not 4% points worse.
|
|
|
|
Hi:
This is still running after 24h:36m - 72.5%, with 8h:06m still to run on my GTX650Ti......
Is this abnormal?
Thanks
Task
click for details
Show names Work unit
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
14103116 10868065 190176 12 Apr 2015 | 10:11:00 UTC 17 Apr 2015 | 10:11:00 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v8.47 (cuda65) |
|
|
|
No it's perfectly normal. These are very long units and I would say any card below a 660 or 660ti would not finish within 24hrs |
|
|
|
This is still running after 24h:36m - 72.5%, with 8h:06m still to run on my GTX650Ti......
Is this abnormal?
You're asking in the wrong thread.
These are fairly long workunits, so it's normal that these take more than 24h on a GTX650Ti. |
|
|
|
Thanks: which is the correct thread? |
|
|
|
"Number Crunching" |
|
|
|
Many thanks, Betting Slip! Boy, did you ever stir some memories with the team name "Radio Caroline"!! :)- |
|
|
|
I wish that the web site would allow us to mine the dataset as apparently some have been able to do.
Wish God would give us the ability to see the differences in runtimes over a long period versus the GPUs that we have and the ones we may buy sometime
For example,
I recently added a EVGA GTX960 (4G RAM about USD $240)
I wish to compare the performance of this pair on GPU work units directly
At this point only GTX960, two GTX760, and one GTX550 exist
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Robert, while running NOELIA_ETQ_bound tasks your GTX960 is ~11.7% faster than your GTX760.
There isn't a GTX550, but there is a GTX 550 Ti, 192Cuda Cores but GF116 40nm and Compute Capable 2.1 691 GFlops peak (622 with correction factor applied albeit back in 2012).
The performance of a GTX550Ti would be around 15% of the original GTX Titan, or to put it another way your GTX960 would be ~4 times as fast as the GTX550Ti.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
xixouSend message
Joined: 8 Jun 14 Posts: 18 Credit: 19,804,091 RAC: 0 Level
Scientific publications
|
10-06-15 09:59:20 | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, CUDA version 7.5, compute capability 5.2, 4096MB, 3065MB available, 6611 GFLOPS peak)
10-06-15 09:59:20 | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, device version OpenCL 1.2 CUDA, 12288MB, 3065MB available, 6611 GFLOPS peak)
|
|
|
xixouSend message
Joined: 8 Jun 14 Posts: 18 Credit: 19,804,091 RAC: 0 Level
Scientific publications
|
10-06-15 09:59:20 | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, CUDA version 7.5, compute capability 5.2, 4096MB, 3065MB available, 6611 GFLOPS peak)
10-06-15 09:59:20 | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, device version OpenCL 1.2 CUDA, 12288MB, 3065MB available, 6611 GFLOPS peak)
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
This was updated with the GTX 980Ti using limited results of one task type only (subsequent observations show that performance of different cards varies by task type; with some jobs scaling better than others):
Performance GPU Power GPUGrid Performance/Watt
211% GTX Titan Z (both GPUs) 375W 141%
156% GTX Titan X 250W 156%
143% GTX 980Ti 250W 143%
116% GTX 690 (both GPUs) 300W 97%
114% GTX Titan Black 250W 114%
112% GTX 780Ti 250W 112%
109% GTX 980 165W 165%
100% GTX Titan 250W 100%
93% GTX 970 145W 160%
90% GTX 780 250W 90%
77% GTX 770 230W 84%
74% GTX 680 195W 95%
64% GTX 960 120W 134%
59% GTX 670 170W 87%
55% GTX 660Ti 150W 92%
53% GTX 760 130W 102%
51% GTX 660 140W 91%
47% GTX 750Ti 60W 196%
43% GTX 650TiBoost 134W 80%
37% GTX 750 55W 168%
33% GTX 650Ti 110W 75%
Throughput performances and Performances/Watt are relative to a GTX Titan.
Note that these are estimates and that I’ve presumed Power to be the TDP as most cards boost to around that, for at least some tasks here. Probably not the case for GM200 though (post up your findings).
When doing this I didn’t have a full range or cards to test against every app version or OS so some of this is based on presumptions based on consistent range observations of other cards. I’ve never had a GTX750, 690, 780, 780Ti, 970Ti or any of the Titan range to compare, but I have read what others report. While I could have simply listed the GFLOPS/Watt for each card that would only be theoretical and ignores discussed bottlenecks (for here) such as the MCU load, which differs by series.
The GTX900 series cards can be tuned A LOT - either for maximum throughput or less power usage / coolness / performance per Watt:
For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3600MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work on 6 cores).
The former does more work, is ~9% faster than stock.
The latter is more energy efficient, uses 60% stock power but does ~ 16% less work than stock or ~25% less than with OC'ed settings.
At 60% power but ~84% performance the 970 would be 34% more efficient in terms of performance/Watt. On the above table that would be ~214% the performance/Watt efficiency of a Titan.
I expected the 750Ti and 750 Maxwell's also boost further/use more power than their reference specs suggest, but Beyond pointed out that although they do auto-boost they don't use any more power for here (60W). It's likely that they can also be underclocked for better performance/Watt, coolness or to use less power.
The GTX960 should also be very adaptable towards throughput or performance/Watt but may not be the choicest of cards in that respect.
Note that system setup and configuration can greatly influence performance and performance varies with task types/runs.
PM me with errors/ corrections.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Thanks for this new update skgiven :)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres |
|
|
|
Hi,
Just a question : is GPUGrid in SP or DP ?
I Though it was SP, but I have done some math and found that it it seems to be in DP.
I ran some long tasks, which are ~5 TFLOP, and ran it on my GTX 750 Ti OC in 108000 secondes, that lead to ~46 GFLOPS, which is the peak in DP of my card.
Am I right ? If yes, why do the Titan and Titan black, which have a strong DP, seem to be weaker on GPUGrid than high-end GTX 900 cards which have 10 times less DP ?
As I planned to buy a Titan Black for GPUGrid, I'm very interested if you have the explanation.
Thanks ! |
|
|
Matt Send message
Joined: 11 Jan 13 Posts: 216 Credit: 846,538,252 RAC: 0 Level
Scientific publications
|
I can confirm that this project is indeed SP. |
|
|
|
Just a question : is GPUGrid in SP or DP ?
The GPUGrid app does most of its calculations in SP, on the GPU. The rest (in DP) is done on the CPU.
I Though it was SP, but I have done some math and found that it it seems to be in DP.
I ran some long tasks, which are ~5 TFLOP, and ran it on my GTX 750 Ti OC in 108000 secondes, that lead to ~46 GFLOPS, which is the peak in DP of my card.
Am I right ?
No. For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.
... why do the Titan and Titan black, which have a strong DP, seem to be weaker on GPUGrid than high-end GTX 900 cards which have 10 times less DP ?
A Titan Black nearly equals a GTX 780Ti from GPUGrid's point of view.
DP cards like the Titan Black usually have lower clocks than their gaming card equivalent, and/or the ECC makes the card's RAM run slower.
As I planned to buy a Titan Black for GPUGrid,...
Don't buy a DP card for GPUGrid. A Titan X is much better for this project, or a GTX980Ti, as it has a higher performace/price ratio than a Titan X.
I'm very interested if you have the explanation.
Now you have it too. :)
Thanks !
You're welcome. |
|
|
|
For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.
I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.
GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.
If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).
I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ? |
|
|
|
For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.
I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.
GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.
If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).
I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ?
For a typical computer, the CPU runs at about 4 times the clock speed of the CPU. However, typical GPU programs are capable of using many of the GPU cores at once. GPUs have a varying number of GPU cores - usually more for the more expensive GPUs, currently with a maximum of about 3000 GPU cores per GPU. A GTX 750 Ti has 640 GPU cores. A GTX Titan Z board has 5760 GPU cores, but only because it uses 2 GPUs.
A CPU can have multiple CPU cores, with the number being as high as 12 for the most expensive CPUs. However, BOINC CPU workunits usually use only one CPU core each. |
|
|
|
I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.
GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.
If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).
I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ?
If the only factors playing the key roles were the tasks' computational load and the cards' performance, then I guess the tasks' computation times would be roughly equal to [task GFLOP] / [card GFLOP/s]. But of course, this is not an ideal world :)
Off the top of my head I can think of the following factors:
Each task type's degree of parallelism. The power of GPUs is in the sheer number of computing cores. The more cores a task can utilize, the more the performance will reach the theoretical levels.
Each task type's need for main memory accesses. The GPU can perform its operations much quicker if it doesn't need to access the card's main memory
It follows from the above, that the size of the in-GPU cache and its speed can have a major effect on the actual performance.
The PCI-Express performance. The card eventually finishes with the computational work it has been assigned and then a) the results of the computation must be transferred back to the host and b) new computational work must be transferred from the host to the card. During these times, the card sits idle, or at least is not too busy doing real work.
At least for GPUGrid, the CPU has to do some significant work too. For my (new!) GTX 970, the acemd process consumes ~25% of a logical core. This means a) there must be some CPU resources available, and b) the CPU has to provide some "acceptable" levels of performance. Doing other CPU-bound computational work (e.g. CPU BOINC tasks) can have a significant effect.
System memory. Especially if other CPU-bound work is running, all these tasks will need to frequently access the main memory and this access has specific limited performance. Many tasks needing access simultaneously effectively lower the memory access bandwidth for each task. The increased load on the memory controller also effectively increases memory latency.
These are just factors that I could readily think of, there may be others too that come into play (e.g. CPU cache size and speed). Computational performance is a difficult, complex topic. High Performance Computing especially more so! I'm telling you this as a professional software engineer who has spent a lot of time improving the performance of software...
____________
|
|
|
|
Thank you, so my idea of a bottleneck was not so far from the truth (it seems to be MANY bottlenecks instead !).
It should be great to know what exactly slow the GPU computation and how we can improve the speed of the card.
I run BOINC on a SSD, which is far faster than a classic HDD. Maybe using a RAMDisk should improve a bit more the total speed of the task ?
For the RAM, using a non-ECC high frequency low latency RAM ?
And for the GPU itself, does the bandwidth heavily affect the computation ? High-end GPU use a 384 bits bandwidth, and past high-end used as high as 512 bits (GTX 280...).
Does compute capability affects the speed ?
It is very sad to see that average computation speed is so far from theoretical GFLOPS peak. This is a waste of computational time and energy. |
|
|
|
I run BOINC on a SSD, which is far faster than a classic HDD. Maybe using a RAMDisk should improve a bit more the total speed of the task ?
It won't be noticeable, as the speed of the HDD(SSD) subsystem matters only when a task is starting/ending/making a checkpoint.
For the RAM, using a non-ECC high frequency low latency RAM ?
Yes.
And for the GPU itself, does the bandwidth heavily affect the computation ? High-end GPU use a 384 bits bandwidth, and past high-end used as high as 512 bits (GTX 280...).
Not heavily, but it's noticeable (up to ~10%). But the WDDM overhead is much more a bottleneck especially for high-end cards.
My GTX 980 in a Windows XP host is only 10% slower than a GTX980Ti in a Windows 8 host.
Does compute capability affects the speed ?
The actual compute capability which the client is written to use matters.
The GPUGrid client is the most recent among BOINC projects, as it is written in CUDA6.5.
It is very sad to see that average computation speed is so far from theoretical GFLOPS peak.
This is a waste of computational time and energy.
It's not as far as you think.
In your calculation you took the 5.000.000 GFLOPs from BOINC manager / task properties, but this value is incorrect, as well as the result.
I suppose from the task's runtime that you had a GERARD_FXCXCL12. This workunit gives 255000 credits, including the 50% bonus for fast return, so the basic credits given for the FLOPs is 170000. 1 BOINC credit is given for 432 GFLOPs, so the actual GFLOPs needed by the task is 73.440.000. Let's do your calculation again with this number (which is still an estimate).
73.440.000 GFLOPs / 1.300 GFLOPS = 56492.3 seconds =15h 41m 32.3s
73.440.000 GFLOPs / 72.000 sec = 1020 GFLOPS
FLOPS stands for FLOating Point Operations Per Second. (its the speed of computation)
FLOPs stands for FLOating Point OPerations (its the "total" number of operations done)
The 1 BOINC credit for 432 GFLOPs comes from the definition of BOINC credits.
It says that 200 cobblestones (=credits) are given for 1 day work on a 1000 MFLOPS computer.
1000 MFLOPS = 1 GFLOPS.
200 credits for 24h at 1 GFLOPS
200 credits for 86400sec at 1 GFLOPS
200 credits for 86400 GFLOPs
1 credit for 432 GFLOPs |
|
|
|
For the RAM, using a non-ECC high frequency low latency RAM ?
Yes.
Especially for the graphics board RAM; much less for the CPU RAM (they are separate except for some low-end graphics boards).
|
|
|
|
The PCI-Express performance. The card eventually finishes with the computational work it has been assigned and then a) the results of the computation must be transferred back to the host and b) new computational work must be transferred from the host to the card. During these times, the card sits idle, or at least is not too busy doing real work.
Not always true for CUDA workunits. With recent Nvidia GPUs, it's possible for CUDA workunits to transfer data to and from the graphics memory at the same time that the GPU is performing calculations on something already in the graphics memory. This requires starting some of the kernels asynchronously, though. I don't know if GPUGRID offers any workunits that do this, or if this is also possible for OpenCL workunits.
|
|
|
|
Not always true for CUDA workunits. With recent Nvidia GPUs, it's possible for CUDA workunits to transfer data to and from the graphics memory at the same time that the GPU is performing calculations on something already in the graphics memory. This requires starting some of the kernels asynchronously, though. I don't know if GPUGRID offers any workunits that do this, or if this is also possible for OpenCL workunits.
Indeed I would think that not all the card's memory is being accessed by the GPU at the same time, so some part(s) of it could be updated without stopping the GPU. But to avoid data corruption you would need exclusive locks at some level (range of addresses, banks, whatever). Depending mostly on timing and I would guess the cleverness of some algorithm deciding which parts of the memory it would make available for external changes, these changes could happen without the GPU stopping at all. With such schemes however, you generally get better latency (in our case, the CPU applying the changes it wants with a shorter delay), but also lower overall throughput (both the CPU and GPU access fewer memory addresses over time).
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Rambling way off topic!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Rambling way off topic!
Absolutely and is making this thread to long.
____________
Greetings from TJ |
|
|
Lluis Send message
Joined: 22 Feb 14 Posts: 26 Credit: 672,639,304 RAC: 0 Level
Scientific publications
|
I've a GTX660 (I5; windows 7 64bit ) that I use only for crunching and I'm thinking about upgrading to a GTX970.
Is worth to buy a GTX970 G1 Gaming or is preferable a GTX 970 W.F.(30 € cheaper) ?
In the Skgiven table the GTX 980 has a performance of 109% while GTX 970 has 93%.
Is worth to buy a GTX 980 or is best value a GTX 970 (34% cheaper) ?
Thanks for your advice. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
A GTX 970 is a significant upgrade from a GTX660 (80% faster for here), while only using an additional 5W power.
The GTX980 is a bigger GPU and uses 25W more than your GTX660 (if that might be an issue). It's about 17% faster than a GTX970, but costs 34% more, so while it's a bit more powerful it's not quite as good value for money.
It's down to personal choice, but I would go for the 970 and save the €30 too.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
8/1/2015 7:38:20 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 353.62, CUDA version 7.5, compute capability 3.5, 3072MB, 2956MB available, 4698 GFLOPS peak)
|
|
|
|
Make change to program to run 2 or more video cards in the same computer.
In the event viewer find the data directory.
8/1/2015 7:38:20 AM | | Starting BOINC client version 7.4.42 for windows_x86_64
8/1/2015 7:38:20 AM | | log flags: file_xfer, sched_ops, task
8/1/2015 7:38:20 AM | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8
8/1/2015 7:38:20 AM | | Data directory: C:\ProgramData\BOINC
C:programfiles(x86)/BOINC
<cc_config>
<use_all_gpus>1</use_all_gpus>
</cc_config>
Simply make sure it says to use all gpus
|
|
|
|
Make change to program to run 2 or more video cards in the same computer.
In the event viewer find the data directory.
8/1/2015 7:38:20 AM | | Starting BOINC client version 7.4.42 for windows_x86_64
8/1/2015 7:38:20 AM | | log flags: file_xfer, sched_ops, task
8/1/2015 7:38:20 AM | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8
8/1/2015 7:38:20 AM | | Data directory: C:\ProgramData\BOINC
C:programfiles(x86)/BOINC
<cc_config>
<use_all_gpus>1</use_all_gpus>
</cc_config>
Simply make sure it says to use all gpus
My computer (running 64-bit Windows 7) doesn't have a C:programfiles(x86)/BOINC directory. It does, however, have a C:/Program Files/BOINC directory. What file within that directory should get the cc_config addition? Is anything else needed if the file doesn't already exist? |
|
|
|
Google "BOINC client configuration"
Click the link that comes up
If the webserver is down, then on the Google results page, click the little down-arrow, and then click Cached.
... and try to keep this thread on-topic, please. |
|
|
|
I run 64 bit server 2008. I have both 32 bit and 64 bit boinc folders.
The config file is in both folders. Here is my 64 bit boinc folder.
<cc_config>
<use_all_gpus>1</use_all_gpus>
</cc_config>
Both Boinc folders have the "Use All Gpus" in the config file. |
|
|
|
http://www.overclock.net/t/827904/how-to-multi-gpus-on-boinc |
|
|
Lluis Send message
Joined: 22 Feb 14 Posts: 26 Credit: 672,639,304 RAC: 0 Level
Scientific publications
|
This was updated with the GTX 980Ti using limited results of one task type only (subsequent observations show that performance of different cards varies by task type; with some jobs scaling better than others):
Performance GPU Power GPUGrid Performance/Watt
211% GTX Titan Z (both GPUs) 375W 141%
156% GTX Titan X 250W 156%
143% GTX 980Ti 250W 143%
116% GTX 690 (both GPUs) 300W 97%
114% GTX Titan Black 250W 114%
112% GTX 780Ti 250W 112%
109% GTX 980 165W 165%
100% GTX Titan 250W 100%
93% GTX 970 145W 160%
90% GTX 780 250W 90%
77% GTX 770 230W 84%
74% GTX 680 195W 95%
64% GTX 960 120W 134%
59% GTX 670 170W 87%
55% GTX 660Ti 150W 92%
53% GTX 760 130W 102%
51% GTX 660 140W 91%
47% GTX 750Ti 60W 196%
43% GTX 650TiBoost 134W 80%
37% GTX 750 55W 168%
33% GTX 650Ti 110W 75%
I try to reply at the message 41294 in the trend " NVidia GPU Card comparisons in GFLOPS peak" but I don't know how (It seems I have only the choice of doing this in private. As the intention is to do a public petition and share a little information I do a "reply" in this trend. I beg your pardon for the inconveniences.).
In some messages, like this one, Skgiven has done a comparative between different types of graphics cards (thanks for the work!).
I just upgraded from a Gigabyte GTX 660 OC to a Palit GTX 1070 Dual. With the firsts results it seems the GTX 1070 outperform the GTX 660 by 265%. If I compare with this table, the GTX 1070 seems 135% faster than the Titan and a little slower than the GTX 980Ti.
Do you think that is true?
As the configuration of my computer is not optimized and I bought a Palit because its size (I didn't know its performance) I'll appreciate very much a new comparative that include the Pascal brand.
Thank you for your enlightening posts!.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I suggest that when people populate this thread with Pascal performances they focus on long tasks and stipulate the task type, their specs and setup; especially how they are using their CPU. Ideally people would test when not using their CPU or only have 1 or 2 CPU apps running on say a quad core/8thread type setup. State if you are using multiple GPU's and what PCIE version is in use. It might be an idea to focus on the 50nm SDOERR_CASP tasks first and then look at the PABLO tasks separately, as performances appear to significantly differ for these task types.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Lluis Send message
Joined: 22 Feb 14 Posts: 26 Credit: 672,639,304 RAC: 0 Level
Scientific publications
|
I suggest that when people populate this thread with Pascal performances they focus on long tasks and stipulate the task type, their specs and setup; especially how they are using their CPU. Ideally people would test when not using their CPU or only have 1 or 2 CPU apps running on say a quad core/8thread type setup. State if you are using multiple GPU's and what PCIE version is in use. It might be an idea to focus on the 50nm SDOERR_CASP tasks first and then look at the PABLO tasks separately, as performances appear to significantly differ for these task types.
Is that the idea? If not, please put a example of how to do it.
*****************************************
CPU: I-5 4570, 8 Gb; 4 cores / 4 threads; Windows 10-64bits; GPU: Palit GTX 1070 Dual, 8 Gb; Driver R375.70;
Task: 15544310; Work unit: 11940364; Long runs v9.14 (cuda80):
No others BOINC works at the same time. GPU Usage: around 77%
Name: ...PABLO_SH2TRIPEP_Y_TRI_1-0-1-RND7830_0 ;
Exec. time: 15,998.75 ; CPU time: 15,858.73; Credit: 145,800
Time per step (avg over 12500000 steps):1.279 ms; PERFORMANCE: 25921 Natoms 1.279 ns/day
Task: 15546704; Work unit: 11942322; Long runs v9.14 (cuda80):
No others BOINC works at the same time. GPU Usage: around 77%
Name: ...PABLO_SH2TRIPEP_Q_TRI_1-0-1-RND3682_0 ;
Exec. time: 15,941.08 ; CPU time: 15,810.36; Credit: 145,800
Time per step (avg over 12500000 steps):1.275 ms; PERFORMANCE:25926 Natoms 1.275 ns/day
/////////////////////////////////////////////////////////////////////////////////////////////////
Task: 15553499; Work unit: 11941970; Long runs v9.14 (cuda80):
Three Rosetta works at the same time. GPU Usage: around 77%
Name: ... PABLO_SH2TRIPEP_H_TRI_1-0-1-RND1508_1;
Exec. time: 16,137.84 ; CPU time: 15,991.84; Credit: 145,800
Time per step (avg over 12500000 steps):1.290 ms; PERFORMANCE: 25911 Natoms 1.290 ns/day
/////////////////////////////////////////////////////////////////////////////////////////////////
Task: 15552458; Work unit: 11947387; Long runs v9.14 (cuda80):
Three Rosetta works at the same time. GPU Usage: around 77%
Name: ...SDOERR_CASP22S20M_crystal_contacts_50ns_a3D_2-0-1-RND9834_0;
Exec. time: 15,512.56 ; CPU time: 15,386.86; Credit: 137,850
Time per step (avg over 12500000 steps):1.240 ms; PERFORMANCE:24496 Natoms 1.240 ns/day
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
That's the idea.
Just tested using, zero, one, two and three CPU cores from the on my A6-3500 tri-core AMD CPU @2.1GHz:
The 1060-3GB GPU utilization went from around 90% [Linux] to ~80%, ~60% and all the way down to ~36% while running a long PABLO_SH2 task. PCIe Bandwidth Utilization (PCIE2X16) went from ~27% to ~24% to ~19% and then to ~10% (varying from 6% to 14%).
GPU Utilization while #CPU's used from tri-core crunching CPU projects, % performance loss:
CPUs Used %GPU Utilization %PCIe Utilization %Increase in runtime
0 90 27 0
1 80 24 12.5
2 60 19 50
3 36 10 250
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
Lluis Send message
Joined: 22 Feb 14 Posts: 26 Credit: 672,639,304 RAC: 0 Level
Scientific publications
|
Two more tasks without other usage of the CPU.
CPU: I-5 4570, 8 Gb; 4 cores / 4 threads; Windows 10-64bits; GPU: Palit GTX 1070 Dual, 8 Gb; Driver R375.70;
By the way, can anybody explain me what does mean "Natoms" and "ns/day". I have found this general reference but, if it's possible, I would like a more explicit one.
The "ns/day" figure gives the rate of the simulation - the higher the better. The "Natoms" figure gives the size of the system - the greater the number of atoms, the slower the simulation, in a not-quite linear relationship
Task: 15555405; Work unit: 11949955; Long runs v9.14 (cuda80):
No others works at the same time. GPU Usage: around 78%
Name: ... SDOERR_CASP1XX_crystal_ss_contacts_50ns_a3D_2-0-1-RND8379_0
Exec. time: 15,496.20 CPU time: 15,337.38; Credit: 137,850
Time per step (avg over 12500000 steps):1.239 ms; PERFORMANCE: 24496 Natoms; 1.239 ns/day
Task: 15555324; Work unit: 11949887; Long runs v9.14 (cuda80):
No others works at the same time. GPU Usage: around 78%
Name: ... SDOERR_CASP1XX_crystal_ss_contacts_50ns_a3D_0-0-1-RND5885_0
Exec. time: 15,417.61 CPU time: 15,275.11 Credit: 137,850
Time per step (avg over 12500000 steps): 1.233 ms; PERFORMANCE: 24496 Natoms; 1.233 ns/day
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
OT. "Natoms" and "ns/day".
Number of Atoms: number of atoms in the model (in silico molecular simulation).
Nano Seconds per day: time the model (moving proteins & other molecules) are observed.
Obviously molecules move fast; molecular reactions tend to take place between pico and micro seconds.
It's also usually the case that the researcher's name (eg. Pablo) is included in the work units name as well as some reference to the molecule or molecular region being studied (eg. SH2 is a domain/region of a protein).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Task: 15544310; Work unit: 11940364; Long runs v9.14 (cuda80):
No others BOINC works at the same time. GPU Usage: around 77%
Name: ...PABLO_SH2TRIPEP_Y_TRI_1-0-1-RND7830_0 ;
Exec. time: 15,998.75 ; CPU time: 15,858.73; Credit: 145,800
Time per step (avg over 12500000 steps):1.279 ms; PERFORMANCE: 25921 Natoms 1.279 ns/day
Number of Atoms: number of atoms in the model (in silico molecular simulation).
Nano Seconds per day: time the model (moving proteins & other molecules) are observed.
Obviously molecules move fast; molecular reactions tend to take place between pico and micro seconds.
It's also usually the case that the researcher's name (eg. Pablo) is included in the work units name as well as some reference to the molecule or molecular region being studied (eg. SH2 is a domain/region of a protein).
FYI: in the Maxwell now thread while I compared cards ns/day Matt said:
Actually, there's a bug there.
The time reported isn't the daily rate, but the iteration time. That's inversely proportional to the daily rate, so use 1000/[value] instead.
Just tested using, zero, one, two and three CPU cores from the on my A6-3500 tri-core AMD CPU @2.1GHz:
The 1060-3GB GPU utilization went from around 90% [Linux] to ~80%, ~60% and all the way down to ~36% while running a long PABLO_SH2 task. PCIe Bandwidth Utilization (PCIE2X16) went from ~27% to ~24% to ~19% and then to ~10% (varying from 6% to 14%).
That's a big time performance loss - I'd only imagine how much a multi GPU would struggle when CPU compute is operational on this generation. Hopefully Zen rectify any bottlenecks. (I'll get a Zen if includes more PCIe lanes than Intel X99 platform. I want to build another 6 GPU Win8.1 system if I can get Nvidia drivers to cooperate. Lately the r370 driver and r375 won't allow more than 4 GPU's to run. Crunchers PM me if want to buy golden clocked GTX 970's that are stable 1.5GHz here. A mixed Pascal and Maxwell system on GPUGRID platform won't work from the app being of different CUDA generations 6.5 vs. 8.0. I hope there a fix on the chopping block.) |
|
|