Author |
Message |
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Does anybody have a source code of nanosleep for Windows? Contact me in PM.
gdf |
|
|
|
Hi GDF,
do you know the current windows sleep implementation causes problems or is it still a supposition?
If it's the latter it might be a good idea to test it by writing the time when the thread actually wakes up into a log file and analysing the variance of these times. The test machine would have to be "loaded normally" though, with at least ncpus-1 BOINC tasks.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Hi GDF,
do you know the current windows sleep implementation causes problems or is it still a supposition?
If it's the latter it might be a good idea to test it by writing the time when the thread actually wakes up into a log file and analysing the variance of these times. The test machine would have to be "loaded normally" though, with at least ncpus-1 BOINC tasks.
MrS
It is likely as it is the only difference. We may change the way we do this. We want to use the extra CPU as well...but we cannot lose performance on the GPU.
g |
|
|
|
I asked a programmer-friend about this. He took a look at how nanosleep is implemented in cygwin and came up with the conclusion that they round up to full ms (which may well be more than 1) and that it seems like it's a limitation of the win scheduler / kernel.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Any news about this?
Thanks |
|
|
|
I've been thinking about this issue a bit more.
To my knowledge the problem looks like this: you tell your GPU-managing-thread to sleep for 1 ms and after that time it gets woken up and polls the GPU if it's ready yet. This "1 ms sleep" is implemented by the windows-intrinsic sleep function. The problem is that this sleep only allows a minimum resolution of 1 ms and depending on the other running threads it may be even more inaccurate. And the reason that no better solution seems to exist is that waking-thread-up process has to be managed by the scheduler and from what my friend told me it seems like windows may not be able to handle shorter times than 1 ms and that's why there is no proper nano sleep for windows (yet).
If I'm right we'd need some other solution and it would be in nVidias interest to solve this issue, as most GP-GPU programs would eventually run into this bottleneck - either use a CPU core completely or suffer a substantial performance loss.
And 1 ms is really critical. You could increase step times to ~100 ms (not talking about GPU-Grid specifically) the performance loss due to the sleep issue would be small, percentage-wise. But then you'd get a totally choppy / laggy windows experience, the computer would not be useable any more.
The ideal 25 fps correpsond to 40 ms per frame, so we'd need step times <40 ms or better. At 40 ms loosing 2 or 3 ms every now and then due to sleep already starts to hurt and the overhead only gets worse as cards become faster.
So what could be done? Some ideas:
- MS or you: somehow implement an accurate sleep / nanosleep function
- MS and/or nVidia: provide a means for the GPU to actively tell the GPU that it's finished, the need to poll the GPU seems fundamentally wrong
- nVidia: somehow modify the driver so that CUDA code could really run along the conventional GPU operation in parallel, avoiding the screen refresh issue completely (which in turn would enable one to use step times of e.g. seconds opposed to the current ms). Probably difficult: handle the GPU-internal scheduling and resource share dynamically for several concurrent apps.
- you: implement a user controlled switch to return to the old method of constant polling. Could be benefical for users with OC'ed GTX 280 and the upcoming 30% performance increase.
- you: if you use the CPU core for some calculations anyway it may be much easier to implement proper GPU polling and to avoid a performance loss on the GPU
- you: let's say you observe that on an interactively used windows system a typical GPU needs on average 60 ms per step and 99.9% of all steps need more than say 50 ms. Then you could use the regular sleep until 50 ms passed and then start polling the GPU constantly. Benefit: very low loss in GPU performance. Trade off: more CPU usage than 6.45. Possibly difficul: determine the spread of step times accurately enough and make the algorithmn robust enough so that it's not confused by things like GPU-Grid + gaming.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
We have currently implemented the last solution for Windows (not yet uploaded) and done some tests with Stefan on it.
GDF |
|
|
|
Very nice, my card says thanks in advance ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|