1000FPS CHALLENGE

Doesn't quite fit anywhere else? Post here!
11 posts Page 1 of 1 First unread post
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


I have an i5-6500U CPU @ 3.2GHz (3.6GHz peak). It has an HD 530 GPU built into it. I am using Mesa 11.3.0-git i965 drivers. Despite Wikipedia and MesaMatrix saying I have GL 4.3 support, due to a lack of GLSL 4.30 support, I have OpenGL Core 4.2 EDIT: OpenGL 4.3 support is now in Mesa Git. I am now on 12.1.0-git.

The challenge is to render a 512x64x512 VXL file at >1000FPS on the target hardware. You will need to produce source code that I can compile.
There are no limits as to how you go about completely ruining the quality, but if you simply raytrace 1 pixel then while it will technically pass the challenge, it won't be a very impressive "entry".

In case it wasn't glaringly obvious, I run Linux. C or C++ is fine, just make sure the damn thing compiles in GCC. You probably don't want to use any other languages here, speed is what matters.

Intel GPUs don't really reach 1000FPS at high resolutions. Sure, you *can* reach that speed if you minimise the program window, but that's about as good as it gets.

The full fillrate is about 850FPS at 1280x720. If we minimise glxgears, we get about 1900FPS. So we'll do this on a different resolution. We only care about the "real" FPSes here, not the no-blit ones.
- 1280x720: 850FPS real, 1900FPS no-blit EDIT: Actually, a simple test which draws one triangle and uses usleep(100) yields ~1500FPS real. This category is now somewhat plausible.
- 800x600: 2200FPS real, 4600FPS no-blit
- 640x480: 3350FPS real, 6450FPS no-blit
- 640x360: 4100FPS real, 8100FPS no-blit

Any resolution in that list that isn't 1280x720 will do.

Here's a template that may or may not work:
Code: Select all
// requirements: SDL2, libepoxy
// suggested libs: datenwolf's linmath.h
// compiling on OSes that don't hate developers (i.e. anything that isn't Windows):
// cc -O1 -g -o challenge main.c `sdl2-config --cflags --libs` -lepoxy

#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <errno.h>

#include <math.h>

#include <epoxy/gl.h>
#include <SDL.h>

#define INIT_WIDTH 800
#define INIT_HEIGHT 600

SDL_Window *window;
SDL_GLContext context;

int main(int argc, char *argv[])
{
	(void)argc;
	(void)argv;
	SDL_Init(SDL_INIT_TIMER | SDL_INIT_VIDEO | SDL_INIT_NOPARACHUTE);

	window = SDL_CreateWindow("1000FPS Challenge",
		SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
		INIT_WIDTH, INIT_HEIGHT,
		SDL_WINDOW_OPENGL);
	SDL_assert_release(window != NULL);

	SDL_GL_SetAttribute(SDL_GL_RED_SIZE, 8);
	SDL_GL_SetAttribute(SDL_GL_GREEN_SIZE, 8);
	SDL_GL_SetAttribute(SDL_GL_BLUE_SIZE, 8);
	SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24);
	SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);

	// highest compat version: 3.0
	// lowest core version: 3.1
	// highest core version: currently 4.2
	// if you want a core context, set the version to 3.1 or higher
	// if you want a compat context, set the version to 3.0 or lower
	SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
	SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
	SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_COMPATIBILITY);
	//SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 1);
	//SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);

	context = SDL_GL_CreateContext(window);
	SDL_assert_release(context);

	SDL_GL_SetSwapInterval(0); // disable vsync

	// set shit up here

	int nextframe_time = SDL_GetTicks() + 1000;
	int frame_counter = 0;
	int running = 1;
	while(running) {

		// draw shit here

		SDL_GL_SwapWindow(window);
		usleep(1000);
		//sched_yield();
		int curframe_time = SDL_GetTicks();
		frame_counter++;
		if(curframe_time >= nextframe_time) {
			printf("%4d FPS\n", frame_counter);
			frame_counter = 0;
			nextframe_time += 1000;
		}

		SDL_Event ev;
		while(SDL_PollEvent(&ev)) {
			switch(ev.type) {
				case SDL_QUIT:
					running = 0;
					break;
			}
		}
	}

	return 0;
}
Note that the forums appear to change hard tabs to 3-wide soft tabs so you may want to fix that.

WARNING: The NVidia shader compiler is extremely lax and lets you get away with stuff that is blatantly invalid GLSL. The Mesa shader compiler, on the other hand, actually requires you to do stuff properly. For instance, if you don't provide a #version directive, it will assume you want GLSL 1.10, which is how the actual specification works. Another example, "float k = texture(tex, tc0);" will not work. Use "float k = texture(tex, tc0).r;" instead. Make sure you print the shader logs to the console, so I can send them straight back at you when your shaders inevitably fail.
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


I don't consider this a "win" as I get this fps by flying around the border but it IS pretty close:

Image

UPDATE 1: Progress. This time I'm not just walking around the sides of the map.

If I replace the frag shader with a simple passthrough I get this:

Image

CPU-side frustum culling is available if necessary, but it's currently buggy.

Fun thing, I managed to improve performance by maxing out at 12vert/3face in the geometry shader instead of going with 24vert/6face (you will never be able to see more than 3 faces of a cube). I was explicitly told by someone who works on the driver I use that a 1->24 expansion in the geometry shader is a terrible idea. Turns out it's slightly less terrible than one would think, but reducing it to 1->12 does help a lot.
longbyte1
Deuced Up
Posts: 336
Joined: Sun Jul 21, 2013 7:27 pm


you're trying to beat your own challenge? You have extremely advanced OpenGL at your disposal, you can take advantage of that and parallelize the heck out of the shaders so the CPU does minimal work. Which makes the challenge somewhat pointless.
LeCom


Source pls
You're always trying to push OpenGL polygon rendering; if we have something that beats everything imaginable on hardware with OpenGL>2.0, we don't really care about whether it's raycasted or polygon-rendered, CPU-based or shader-based, or w/e.
And why did iceball renderer suck so hard if you can pull out so many FPS.
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


https://github.com/iamgreaser/1kfps-cha ... ee/geomver - yes, you WILL want the geomver branch. It's GL 3.2 Core. Unfortunately it seems to wedge on my HD 3000, and my Radeon 6700M drivers have decided to forget that they can do anything more than 2.1. Fortunately it works just fine on llvmpipe, and gets about 24fps on that laptop.

You'll need SDL2 and libepoxy. Oh, and a suitable map. I use mesa.vxl mostly for testing.

cc -O1 -g -o 1kfps main.c `sdl2-config --cflags --libs` -lepoxy && LIBGL_ALWAYS_SOFTWARE=yes ./1kfps mesa.vxl

I got CPU-side frustum culling behaving properly, so now I get about 1600fps on mesa.vxl if I don't do anything at all.

WASD+lctrl+space keys move the camera, arrow keys rotate the camera.

Reason for the geomver branch is because the master branch uses a compute shader approach which unfortunately gets stuck at about 500FPS.
longbyte1 wrote:
you're trying to beat your own challenge? You have extremely advanced OpenGL at your disposal, you can take advantage of that and parallelize the heck out of the shaders so the CPU does minimal work. Which makes the challenge somewhat pointless.
Seeing as it's so super easy, I take it you have an entry for me already?

No entry? Go write one. No excuses. No "I can't OpenGL". Just do it. If you need a hand with a few things people here can help (e.g. the VXL format isn't the most obvious thing to implement).

My intention was to get people interested in it and actually try it rather than point out the obvious fact that 0.75 is ruined. You should still try it anyway. See how far you can push your hardware. If you don't reach 1000FPS, well, the only true failure is to not even try, because by not trying, nobody gets to see that something is hard.

If you have any ideas for joke entries, implement them and then post screenshots and, when you are happy with what you have so far, sauce. Do not post ideas for entries, I don't care unless you actually manage to implement something that can be shown in a screenshot.
LeCom wrote:
You're always trying to push OpenGL polygon rendering;
I try to encourage OpenGL in general because there's a lot you can do with it and even a GMA 4500 is usually somewhat decent. But to be blunt, that's a piece of hardware that actually benefits from raytracing an SVO rather than spamming polygons. And the best place to run that raytracer is on the GPU, which has I think 5 cores and 4 threads each.
LeCom wrote:
if we have something that beats everything imaginable on hardware with OpenGL>2.0, we don't really care about whether it's raycasted or polygon-rendered, CPU-based or shader-based, or w/e.
And you'd probably be content with 120FPS. But this challenge is about pushing the shit out of your (well OK, my) GPU by any means.
LeCom wrote:
And why did iceball renderer suck so hard if you can pull out so many FPS.
It doesn't. It gets about 180FPS with the toon shader enabled, and about 240FPS with it disabled. I think my laptop was getting about 100FPS with the toon shader on but it's been a while. My pi3 doesn't do well with shaders on, but it does get about 20FPS with it all disabled.

The software one sucked ass. The OpenGL one was actually decent. The raytraced one isn't complete (pkg/gm/rt/) but is worth having a look at if you have GL 2.1 support - just remember that you can downscale, disable lighting, and disable reflections.

If you don't even have GL 2.1 support on Linux, please demote your potato to a doorstop. Even the GMA 3150 supports 2.1 on Linux, and that is a complete and utter potato, and you should still demote that potato to a doorstop.
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


I'm now trying to get something simple together in Vulkan. Well, OK, I've managed to clear the screen to a non-black colour and get a swapchain working. (This is under validation, by the way.) Getting my first triangle is proving to be difficult, though. My barriers aren't quite set up properly, so no triangle actually shows up.

And I am pretty much at about 1000 lines now.

I'd like to see if I can do better under Vulkan vs under OpenGL. Considering that the render only does one or two runs of draw calls, probably not. But if I can get per-chunk frustum culling happening in a compute shader run, I can do several indirect draws in a single call, so it might be faster? (I recall that the number of draw calls is a notably harsh bottleneck in my renderers.)
longbyte1
Deuced Up
Posts: 336
Joined: Sun Jul 21, 2013 7:27 pm


Wait what? 1000 lines and not even a triangle drawn? Blue_Surprised2
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


Yeap.

Also since then I've managed to get a triangle but I don't consider this a success as, well, this is the vertex shader:
Code: Select all
#version 450

out gl_PerVertex {
        vec4 gl_Position;
};

const vec3[] parray = vec3[](
        vec3( 0.0f,  0.5f,  0.0f),
        vec3( 0.5f, -0.5f,  0.0f),
        vec3(-0.5f, -0.5f,  0.0f));

void main()
{
        gl_Position = vec4(parray[gl_VertexIndex%3], 1.0);
}
...yeah, I don't have vertex inputs working properly yet.

For some reason vkCmdUpdateBuffer and vkCmdFillBuffer both crash.
longbyte1
Deuced Up
Posts: 336
Joined: Sun Jul 21, 2013 7:27 pm


rip vulkan
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


GOT A TRIANGLE. You need to use vkBindBufferMemory to tie a VkBuffer object to a VkMemory object.

The validation layer currently doesn't tell you this.
Marisa Kirisame
Deuced Up
Posts: 152
Joined: Sat Sep 21, 2013 10:52 pm


Started to make this work on Vulkan

No depth buffer

No culling

90FPS

Fuck

(Under similar conditions, the GL version gets ~230FPS)

Perhaps I should take that advice of rendering to a texture and then blitting that texture to the swapchain surface instead of trying to do everything in sRGB

EDIT: Got it running at about 260FPS for full map, let's see how well frustum culling will work
11 posts Page 1 of 1 First unread post
Return to “The Lounge”

Who is online

Users browsing this forum: No registered users and 15 guests