Bunnymark GL in Jai – 200k sprites at 200fps [video] (opens in new tab)

(youtube.com)

106 pointsfarzher4y ago68 comments

68 comments

It's been awhile since I've done game engine work, but is this impressive? The first thing that comes to mind is they're using instanced rendering. This allows the CPU to only deal with 1 sprite, while telling the GPU to render multiple instances of the sprite, and use a GPU buffer to find each sprite's transformation matrix. All the CPU has to do is update that mmap'ed buffer with new position information (or do something more clever to derive transformations).

Am I missing something that makes the video novel?

cshenton4y ago

It is not impressive. This workload is entirely bandwidth constrained and is more a test that your renderer isn’t doing anything stupid.

farzherOP4y ago

you're not missing anything, it's not impressive. i was just checking how fast computers are and sharing the results. my original title was "an optimized 2d game engine can render 200k sprites at 200fps" but the mods changed it to match my youtube title (which made it a lot more popular). and the fact it's written in jai isn't relevant, it's just what i happened to use

johnnyanmac4y ago

> and the fact it's written in jai isn't relevant

I figured it wasn't given that you were showcasing a GL project. But nonethees disappointing as someone curious as to whether or not the language helped in indirect ways with how you structured your project and if you feel you could scale it up to something closer to production ready. That did seem to be the goal of Jai when I last looked into its development some 4 years ago.

farzherOP4y ago

jai's irrelevant to the performance here, but it's very relevant to how easy this was to make. i'm not a systems programmer. i've tried writing hardware accelerated things like this in C++ but have failed to get anything to compile for years. the only reason i was able to get this working is because of jai. this is my first time successfully using openGL directly, outside of someone else's game engine

quadcore4y ago

Nope, I don't think so. The point I see in it is regarding some thought I have about that games should run on intel graphic chipsets. Diablo 3 doesn't for example. I wonder if D2 ressurected does...

hiccuphippo4y ago

The author was using this benchmark to compare HTML game frameworks and made this to see how different it is native:

https://old.reddit.com/r/Kha/comments/8hjupc/how_the_heck_is...

OskarS4y ago

> All the CPU has to do is update that mmap'ed buffer with new position information

Doesn’t even have to do that, this is child’s play for a compute shader. The CPU can go take a 16 millisecond nap and let the GPU do all the work.

ffhhj4y ago

They should try adding GPU polygon-volume collisions and see how many it can handle.

xaedes4y ago

Nice demo! We need more of this approach.

You really can achieve amazing stuff with just plain e.g. OpenGL optimized for your rendering needs. With todays GPU acceleration capabilities we could have town-building games with huge map resolutions and millions of entities. Instead its mostly only used to make fancy graphics.

Actually I am currently trying to build something like that [1]. A big big world with hundreds of millions of sprites is achievable and runs smoothly, video RAM is the limit. Admittedly it is not optimized to display those hundreds of millions of sprites all at once, maybe just a few millions. Would be a bit too chaotic for a game anyway I guess.

[1] https://www.youtube.com/watch?v=6ADWXIr_IUc

bob10294y ago

> We need more of this approach.

1000% agree.

I recently took it upon myself to see just how far I can push modern hardware with some very tight constraints. I've been playing around with a 100% custom 3D rasterizer which purely operates on the CPU. For reasonable scenes (<10k triangles) and resolutions (720~1080p), I have been able to push over 30fps with a single thread. On a 5950x, I was able to support over 10 clients simultaneously without any issues. The GPU in my workstation is just moving the final content to the display device via whatever means necessary. The machine generating the frames doesnt even need a graphics device installed at all...

To be clear, this is exceptionally primitive graphics capability, but there are many styles of interactive experience that do not demand 4k textures, global illumination, etc. I am also not fully extracting the capabilities of my CPU. There are many optimizations (e.g. SIMD) that could be applied to get even more uplift.

One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline. I have CPU-only user-interactive experiences that can go from input event to final output frame in under 2 milliseconds. I don't think even games like Overwatch can react to user input that quickly.

kingcharles4y ago

Just to be clear - you're writing a "software-based" 3D renderer, right? This is the sort of thing I excelled at back in the late 80s, early 90s, before the first 3D accelerators turned up around 1995 I think.

What features does your renderer support in terms of shading and texturing? Are you writing this all in a high-level language, e.g. C, or assembler? If assembler, what CPUs and features are you targeting?

And of course, why?

bob10294y ago

> you're writing a "software-based" 3D renderer, right?

Yes. This is 100% what you are familiar with.

> What features does your renderer support in terms of shading and texturing?

I have a software-defined pixel shading approach that allows for some degree of flexibility throughout. Each object in the scene currently defines a function that describes how shade its final pixels based on a few parameters.

> Are you writing this all in a high-level language, e.g. C, or assembler?

I am writing this in C#/.NET6. I do have unsafe turned on for pointer access over low-level bitmap operations, but otherwise its all fully-managed runtime.

> And of course, why?

Because I want to see if I can actually build an effective gaming experience without a GPU in 2022. Secondary objective is simply to learn some new stuff that isnt boring banking CRUD apps.

1 more reply

fao_4y ago

Unrelated but wrt. modern rendering versus 90s rendering I'd imagine that a lot of the performance shims used in the 90s might not apply because the critical problem is different.

Performance based development these days isn't so much on maximizing usage of the cycles of the machine (I mean, ok fundamentally it's still about that, but-), rather it's about getting the microcode to do the right thing. E.g. LUTs being extremely bad for caching performance. Branch predictions being a much more important predictor of performance than anything else. Huge rams make a lot of old tips around ram size usage invalid. SIMD / vector based operations and threading are a boon but require a very different way of working

2 more replies

farzherOP4y ago

> One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline

i'm definitely going to have to test that! always trying to minimize input delay

bob10294y ago

I think it can reduce input delay enough to change streaming gaming economics, but the current state of cloud economy makes it difficult to scale in practice.

1 more reply

gpderetta4y ago

Did you consider using an existing software rasterizer, like Mesa llvmpipe? Or part of the challenge was writing one yourself (nothing wrong with that)?

syntheweave4y ago

The upper rendering limit generally isn't explored deeply by games because as soon as you add simulation behaviors, it imposes new bottlenecks. And the design space of "large scale" is often restricted by what is necessary to implement it; many of Minecraft's bugs, for example, are edge cases of streaming in the world data in chunks.

Thus games that ship to a schedule are hugely incentivized to favor making smaller play spaces with more authored detail, since that controls all the outcomes and reduces the technical dependencies of how scenes are authored.

There is a more philosophical reason to go in that direction too: Simulation building is essentially the art of building Plato's cave, and spending all your time on making the cave very large and the puppets extremely elaborate is a rather dubious idea.

p1necone4y ago

Is this not done because of technical limitations, or is it just not done because a town building game with millions of entities would not be fun/manageable for the player?

Although, there's a few space 4x games that try this "everything is simulated" kind of approach and succeed. Allowing AI control of everything the player doesn't want to manage themselves is one nice way of dealing with it. See: https://store.steampowered.com/app/261470/Distant_Worlds_Uni...

ArtWomb4y ago

I immediately thought of the bullet physics games like gradius, parodius, raidan, r-type.

What made it of course was the art. An army of digital illustrators working by hand to create bitmaps that pop.

One pseudo 2.5d game I'm playing now is Iridion 2 GBA (2003). You can see the care taken with the art design team, pure lovers of the genre ;)

aappleby4y ago

Very rough guesstimates:

200000 * 200 * 2 = 80M tris/sec

200000 * 200 * 32x32px = 40 gpix/sec (if no occlusion culling)

Neither of those numbers are particularly huge for modern GPUs.

I'd wager that a compute shader + mesh shader based version of this could hit 2M sprites at 200 fps, though at some point we'd have to argue about what counts as "cheating" - if I do a clustered occlusion query that results in my pipeline discarding an invisible batch of 128 sprites, does that still count as "rendering" them?

gmueckl4y ago

This demo program is obviously FPS limited in some other way - it's locked at 200fps from the start. The true limits are higher than what is shown.

jcelerier4y ago

I've been able to reach 5M particles at 60 fps on a very naive (as a GPU noob) implementation that uses Qt's RHI, which has some unnecessary copying and safeties, with compute + vertex + fragment.

farzherOP4y ago

writing 100% of the code on the gpu can render 10,000,000 triangles per frame at 60fps ... even in the web browser! (because there's no javascript running) https://www.youtube.com/watch?v=UNX4PR92BpI

but yes, that's cheating, since it's impractical to work with

quadcore4y ago

Using Goroutines, I also made 10k 2D rabbits wander on a map for 5% of my laptop cpu (they'd sleep a lot admitedly). One goroutine per rabbit, how amazing when you think about it. That's when Go really got me.

edit: oh they do rabbits in the video as well what a bunny coincidence

edit2: the goroutines werent drawcalling btw, they were just moving the rabbits. The drawcalls were still made using a regular for loop, in case you wonder.

cshenton4y ago

This is doable on a single core in JavaScript.

_aavaa_4y ago

This by the looks of it is in Jonathan Blow’s Jai language.

How are you finding working with it? Have you done a similar thing in C++ to compare the results and the process of writing it?

200k at 200fps on an 8700k with a 1070 seems like a lot of rabbits. Are there similar benchmarks to compare against in other languages?

farzherOP4y ago

it's a lot of fun! jai is my intro to systems programming. so i haven't tried this in C++ (actually i have tried a few times over the past few years but never successuflly).

this is just a test of opengl, C++ should be the same exact performance considering my cpu usage is only 7% while gpu usage is 80%. but the process of writing it is infinitely better than C++, since i never got C++ to compile a hardware accelerated bunnymark.

the only bunnymarks i'm aware of are slow https://www.reddit.com/r/Kha/comments/8hjupc/how_the_heck_is...

which is why i wrote this, to see how fast it could go.

DantesKite4y ago

I thought Jai wasn't released yet. Are you a beta user or did he release it already?

KapKap554y ago

It isn't released. That said, from people I know, it seems like you can just ask nicely and show some interest and he'll let you try it out.

1 more reply

adamrezich4y ago

the official rendering modules are a bit all over the place atm... did you use Simp, Render, GL, or handle the rendering yourself?

farzherOP4y ago

just used raw GL calls from #import "GL". although i did #import "Simp" as well for Simp.Texture and Simp.Shader, which Simplified things quite a bit

juancn4y ago

Neat. Isn't this akin to 400k triangles on a GPU? So as long as you do instancing it doesn't seem too difficult (performance wise) in itself. Even if there are many sprites, texture mapping should solve for the taking pixels to the screen part.

My guess is that the rendering is not the hardest part, although it's kinda cool.

chrisseaton4y ago

> Isn't this akin to 400k triangles on a GPU?

Is it faster to render two triangles with slightly less area, or one triangle with slightly more area, to draw the same sprite?

iliketrains4y ago

Rendering only one large triangle can be faster than two. First one triangle needs less memory, less vertex processing, etc.

Second, modern GPUs render pixels in groups of 2x2 up to 8x8 "tiles". If only one pixel from this group is part of a triangle, the entire group will be rendered. When two triangles form a quad, the entire area along the diagonal "seam" will be rendered twice. The smaller quads you have, the more overhead.

Also see https://www.saschawillems.de/blog/2016/08/13/vulkan-tutorial...

HWR_144y ago

I disagree, with the exception of the case you link to where half the pixels are outside the viewport or maybe where a sufficient percentage are outside the viewport.

> When two triangles form a quad, the entire area along the diagonal "seam" will be rendered twice

This may be true, but I'm pretty sure that this is more than made up for by the additional pixels in the single triangle circumscribing the quad. In fact, I'm willing to bet that it's a mathematical certainty for any rectangle, although I didn't do enough of the math to prove it.

Instead, I would say that most rendering, especially of hundreds of thousands of 2D shapes, are going to be pixel limited. So trading pixels for vertices is a poor trade.

1 more reply

quadcore4y ago

Pretty sure overdraw / fillrate bottlenecks before vertex processing. Also you could draw that quad using strips which would then amount for only one more processed vertex compared to triangle.

Edit: okay surely with modern architecture there is no pixel write because of some early alpha cut but you still have to fetch the texture to make it so texture fetch (memory) will bottleneck first. I guess.

Jasper_4y ago

You shouldn't use strips, they're slower than triangle lists on most GPUs.

If by alpha cut you mean "discard", that's going to be much slower than two triangles. Two triangles will have a tiny bit of quad overshading on the seam, compared to a full extra triangle's worth in the alpha cut case.

1 more reply

chrisseaton4y ago

By this argument you should higher performance from higher-poly models... which clearly isn't the case?

1 more reply

juancn4y ago

Honestly I'm not sure.

I don't think that at 200k or 400k level will matter much. Math is probably easier on humans if you think about the sprites as rectangular (so two triangles), but you could in principle make each sprite a triangle, and texture map in a shader a rectangular area of the triangle.

chmod7754y ago

Bit of a tangent and useless thought experiment, but I think you could render an infinite amount of such bunnies, or as many as you can fit in RAM/simulate. One the CPU, for each frame, iterate over all bunnies. Do your simulation for that bunny and at the pixel corresponding to its position, store its information in a texture at that pixel if it is positioned over the bunny currently stored there (just its logical position, don't put it in all the pixels of its texture!). Then on the GPU have a pixel shader look up (in surrounding pixels) the topmost bunny for the current pixel and draw it (or just draw all the overlaps using the z-buffer). For your source texture, use 0 for no bunny, and other values to indicate the bunny's z-position.

The CPU work would be O(n) and the rendering/GPU work O(m*k), where n is the number of bunnies, m is the display resolution and k is the size of our bunny sprite.

The advantage of this (in real applications utterly useless[1]) method is that CPU work only increases linearly with the number of bunnies, you get to discard bunnies you don't care about really early in the process, and GPU work is constant regardless of how many bunnies you add.

It's conceptually similar to rendering voxels, except you're not tracing rays deep, but instead sweeping wide.

As long as your GPU is fine with sampling that many surrounding pixels, you're exploiting the capabilities of both your CPU and GPU quite well. Also the CPU work can be parallelized: Each thread operates on a subset of the bunnies and on its own texture, and only in the final step the textures are combined into one (which can also be done in parallel!). I wouldn't be surprised if modern CPUs could handle millions of bunnies while modern GPUs would just shrug as long as the sprite is small.

[1] In reality you don't have sprites at constant sizes and also this method can't properly deal with transparency of any kind. The size of your sprites will be directly limited by how many surrounding pixels your shader looks up during rendering, even if you add support for multiple sprites/sprite sizes using other channels on your textures.

farzherOP4y ago

i finally got around to writing an opengl "bunnymark" to check how fast computers are.

i got 200k sprites at 200fps on a 1070 (while recording). i'm not sure anyone could survive that many vampires

nick__m4y ago

that many rabbits, it's frightening!

Do you have the code somewhere, I would like to see how it's made?

liftm4y ago

Does this work with large semi-transparent objects? (My 10-year-old experience with 2D game engines was that 10k objects wasn't really a problem, unless you were trying to make clouds or fog from ~200x100px sized, half-transparent images. Have a 100 of those, and you'd run at 5 FPS.)

gravypod4y ago

Instead of using textures you can get very good performance from shaders.

Example (not mine): https://www.shadertoy.com/view/tlB3zK

sqrt_14y ago

I assume each sprite is moved on the CPU and the position data is passed to the GPU for rendering.

Curious how you are passing the data to the GPU - are you having a single dynamic vertex buffer that is uploaded each frame?

Is the vertex data a single position and the GPU is generating the quad from this?

quelsolaar4y ago

You can do it in SO many ways! You can have one vertex buffer or double buffer it, or you can run the entire simulation on the GPU too. In general, uploading data to the GPU can be the slowest part. OpenGL, and more modern Graphics APIs have evolved in the direction of minimizing the communication between CPU and GPU since it is almost always a big bottle neck. Modern GPUs are designed to manage themselves with work queues, local data and sometimes even local storage to avoid the need to interact with the CPU.

farzherOP4y ago

you can write 100% of the code on the gpu. but that's impractical to work with. i did that here to see how fast webgl can go, since javascript is so slow https://www.youtube.com/watch?v=UNX4PR92BpI

for this bunnymark i have 1 VBO containing my 200k bunnies array (just positions). and 1 VBO containing just the 6 verts required to render a quad. turns out the VAO can just read from both of them like that. the processing is all on the CPU and just overwrites the bunnies VBO each frame

andrewmcwatters4y ago

How much time is spent in Jai? How much time is spent presenting the graphics? Unfortunately, graphics benchmarks like this are hard because they don't tell us much. You have to profile these two parts separately.

jaqalopes4y ago

Gotta be honest this is beyond my current comprehension, but seeing the visuals on this while stoned was a trippy pleasure.

SemanticStrengh4y ago

Yes although the performance is probably largely due to occlusion? Also the sprites do not collides with their environnement

jancsika4y ago

Is there a way to do it as 1 sprite with 200k SVG filters applied to it at 1fps?

adanto68404y ago

Anecdote: In Unity, using DrawMeshInstancedIndirect, you can get >100k sprites _in motion_ and still maintain >100 FPS.

Using some slight shader/buffer trickery, and depending on what you're trying to do (as is always the case with games & rendering at this scale), you can easily get multiples of that -- and still stay >100FPS.

I agree, more of this approach is great. And I am totally flabbergasted at how abysmally poor the performance is with SpriteRenderer Unity's built-in sprite rendering technique.

That said, it's doable to get relatively high-performance with existing engines -- and the benefits they come with -- even if you can definitely, easily even, do better by "going direct".

j / k navigate · click thread line to collapse

68 comments

daenz4y ago

Am I missing something that makes the video novel?

cshenton4y ago

It is not impressive. This workload is entirely bandwidth constrained and is more a test that your renderer isn’t doing anything stupid.

farzherOP4y ago

johnnyanmac4y ago

> and the fact it's written in jai isn't relevant

farzherOP4y ago

quadcore4y ago

Nope, I don't think so. The point I see in it is regarding some thought I have about that games should run on intel graphic chipsets. Diablo 3 doesn't for example. I wonder if D2 ressurected does...

hiccuphippo4y ago

The author was using this benchmark to compare HTML game frameworks and made this to see how different it is native:

https://old.reddit.com/r/Kha/comments/8hjupc/how_the_heck_is...

OskarS4y ago

> All the CPU has to do is update that mmap'ed buffer with new position information

Doesn’t even have to do that, this is child’s play for a compute shader. The CPU can go take a 16 millisecond nap and let the GPU do all the work.

ffhhj4y ago

They should try adding GPU polygon-volume collisions and see how many it can handle.

xaedes4y ago

Nice demo! We need more of this approach.

[1] https://www.youtube.com/watch?v=6ADWXIr_IUc

bob10294y ago

> We need more of this approach.

1000% agree.

kingcharles4y ago

And of course, why?

bob10294y ago

> you're writing a "software-based" 3D renderer, right?

Yes. This is 100% what you are familiar with.

> What features does your renderer support in terms of shading and texturing?

> Are you writing this all in a high-level language, e.g. C, or assembler?

I am writing this in C#/.NET6. I do have unsafe turned on for pointer access over low-level bitmap operations, but otherwise its all fully-managed runtime.

> And of course, why?

Because I want to see if I can actually build an effective gaming experience without a GPU in 2022. Secondary objective is simply to learn some new stuff that isnt boring banking CRUD apps.

1 more reply

fao_4y ago

Unrelated but wrt. modern rendering versus 90s rendering I'd imagine that a lot of the performance shims used in the 90s might not apply because the critical problem is different.

2 more replies

farzherOP4y ago

> One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline

i'm definitely going to have to test that! always trying to minimize input delay

bob10294y ago

I think it can reduce input delay enough to change streaming gaming economics, but the current state of cloud economy makes it difficult to scale in practice.

1 more reply

gpderetta4y ago

Did you consider using an existing software rasterizer, like Mesa llvmpipe? Or part of the challenge was writing one yourself (nothing wrong with that)?

syntheweave4y ago

p1necone4y ago

Is this not done because of technical limitations, or is it just not done because a town building game with millions of entities would not be fun/manageable for the player?

ArtWomb4y ago

I immediately thought of the bullet physics games like gradius, parodius, raidan, r-type.

What made it of course was the art. An army of digital illustrators working by hand to create bitmaps that pop.

One pseudo 2.5d game I'm playing now is Iridion 2 GBA (2003). You can see the care taken with the art design team, pure lovers of the genre ;)

aappleby4y ago

Very rough guesstimates:

200000 * 200 * 2 = 80M tris/sec

200000 * 200 * 32x32px = 40 gpix/sec (if no occlusion culling)

Neither of those numbers are particularly huge for modern GPUs.

gmueckl4y ago

This demo program is obviously FPS limited in some other way - it's locked at 200fps from the start. The true limits are higher than what is shown.

jcelerier4y ago

I've been able to reach 5M particles at 60 fps on a very naive (as a GPU noob) implementation that uses Qt's RHI, which has some unnecessary copying and safeties, with compute + vertex + fragment.

farzherOP4y ago

writing 100% of the code on the gpu can render 10,000,000 triangles per frame at 60fps ... even in the web browser! (because there's no javascript running) https://www.youtube.com/watch?v=UNX4PR92BpI

but yes, that's cheating, since it's impractical to work with

quadcore4y ago

edit: oh they do rabbits in the video as well what a bunny coincidence

edit2: the goroutines werent drawcalling btw, they were just moving the rabbits. The drawcalls were still made using a regular for loop, in case you wonder.

cshenton4y ago

This is doable on a single core in JavaScript.

_aavaa_4y ago

This by the looks of it is in Jonathan Blow’s Jai language.

How are you finding working with it? Have you done a similar thing in C++ to compare the results and the process of writing it?

200k at 200fps on an 8700k with a 1070 seems like a lot of rabbits. Are there similar benchmarks to compare against in other languages?

farzherOP4y ago

it's a lot of fun! jai is my intro to systems programming. so i haven't tried this in C++ (actually i have tried a few times over the past few years but never successuflly).

the only bunnymarks i'm aware of are slow https://www.reddit.com/r/Kha/comments/8hjupc/how_the_heck_is...

which is why i wrote this, to see how fast it could go.

DantesKite4y ago

I thought Jai wasn't released yet. Are you a beta user or did he release it already?

KapKap554y ago

It isn't released. That said, from people I know, it seems like you can just ask nicely and show some interest and he'll let you try it out.

1 more reply

adamrezich4y ago

the official rendering modules are a bit all over the place atm... did you use Simp, Render, GL, or handle the rendering yourself?

farzherOP4y ago

just used raw GL calls from #import "GL". although i did #import "Simp" as well for Simp.Texture and Simp.Shader, which Simplified things quite a bit

juancn4y ago

My guess is that the rendering is not the hardest part, although it's kinda cool.

chrisseaton4y ago

> Isn't this akin to 400k triangles on a GPU?

Is it faster to render two triangles with slightly less area, or one triangle with slightly more area, to draw the same sprite?

iliketrains4y ago

Rendering only one large triangle can be faster than two. First one triangle needs less memory, less vertex processing, etc.

Also see https://www.saschawillems.de/blog/2016/08/13/vulkan-tutorial...

HWR_144y ago

I disagree, with the exception of the case you link to where half the pixels are outside the viewport or maybe where a sufficient percentage are outside the viewport.

> When two triangles form a quad, the entire area along the diagonal "seam" will be rendered twice

Instead, I would say that most rendering, especially of hundreds of thousands of 2D shapes, are going to be pixel limited. So trading pixels for vertices is a poor trade.

1 more reply

quadcore4y ago

Pretty sure overdraw / fillrate bottlenecks before vertex processing. Also you could draw that quad using strips which would then amount for only one more processed vertex compared to triangle.

Jasper_4y ago

You shouldn't use strips, they're slower than triangle lists on most GPUs.

1 more reply

chrisseaton4y ago

By this argument you should higher performance from higher-poly models... which clearly isn't the case?

1 more reply

juancn4y ago

Honestly I'm not sure.

chmod7754y ago

The CPU work would be O(n) and the rendering/GPU work O(m*k), where n is the number of bunnies, m is the display resolution and k is the size of our bunny sprite.

It's conceptually similar to rendering voxels, except you're not tracing rays deep, but instead sweeping wide.

farzherOP4y ago

i finally got around to writing an opengl "bunnymark" to check how fast computers are.

i got 200k sprites at 200fps on a 1070 (while recording). i'm not sure anyone could survive that many vampires

nick__m4y ago

that many rabbits, it's frightening!

Do you have the code somewhere, I would like to see how it's made?

liftm4y ago

gravypod4y ago

Instead of using textures you can get very good performance from shaders.

Example (not mine): https://www.shadertoy.com/view/tlB3zK

sqrt_14y ago

I assume each sprite is moved on the CPU and the position data is passed to the GPU for rendering.

Curious how you are passing the data to the GPU - are you having a single dynamic vertex buffer that is uploaded each frame?

Is the vertex data a single position and the GPU is generating the quad from this?

quelsolaar4y ago

farzherOP4y ago

you can write 100% of the code on the gpu. but that's impractical to work with. i did that here to see how fast webgl can go, since javascript is so slow https://www.youtube.com/watch?v=UNX4PR92BpI

andrewmcwatters4y ago

jaqalopes4y ago

Gotta be honest this is beyond my current comprehension, but seeing the visuals on this while stoned was a trippy pleasure.

SemanticStrengh4y ago

Yes although the performance is probably largely due to occlusion? Also the sprites do not collides with their environnement

jancsika4y ago

Is there a way to do it as 1 sprite with 200k SVG filters applied to it at 1fps?

adanto68404y ago

Anecdote: In Unity, using DrawMeshInstancedIndirect, you can get >100k sprites _in motion_ and still maintain >100 FPS.

I agree, more of this approach is great. And I am totally flabbergasted at how abysmally poor the performance is with SpriteRenderer Unity's built-in sprite rendering technique.

That said, it's doable to get relatively high-performance with existing engines -- and the benefits they come with -- even if you can definitely, easily even, do better by "going direct".

j / k navigate · click thread line to collapse