Model 3 Step 1/1.5 and Step 2 Video Board Differences

MetalliC · Post by **MetalliC** » Sat May 24, 2025 3:14 pm

gm_matthew wrote: ↑Fri May 23, 2025 4:15 pm Model 3 performs hidden surface removal using the z-buffer; here's a link to the Pro-1000 product description, look for "hidden surface removal" on page 21. The way it works is that Earth sends the 1/z value of each pixel to be rendered to the depth buffer 3D-RAM and if there are no other pixels in front of it, the 3D-RAM updates the z-buffer and sends a signal back to Earth to render the pixel. If there is another pixel in front of the one to be rendered, nothing happens.

by HSR usually means things like "Early-Z" test https://www.khronos.org/opengl/wiki/Early_Fragment_Test

if you maybe not aware of - in canonical rendering pipelines depth test is at very end, right before writing fragment to frame buffer (with optional blending).
which means fragment is first fully shaded, textured, and only then it may turn out this fragment doesn't pass depth test and should be discarded. which looks like HUGE waste of performance, fill rate in 1st place.
that's why GPU developers starting from 2nd half 90s started to implement various optimizations, like "Early-Z" or PowerVR's approach.

I wanted to say - it is possible Real3D 1000 / Model 3 may have something similar.

gm_matthew wrote: ↑Fri May 23, 2025 4:15 pmHowever, this doesn't improve performance because each pixel has to be checked one by one, and it is almost certainly not performing any kind of occlusion culling.

are you sure ?
it is strange if hardware which have such advanced features like model LODs switching and blending, "culling nodes", etc doesn't have relatively trivial occlusion culling via view frustum...

gm_matthew wrote: ↑Fri May 23, 2025 4:15 pmEDIT: Forgot to mention that the Real3D Pro-1000 has a fillrate of 50 megapixels/s per pixel processor (page 12 of the Pro-1000 product description linked above); we’ve always assumed that Model 3 Step 1.0 matches this, and that Step 1.5 and Step 2.x are clocked higher.

it is possible that "pixel processor" may process 2 fragments in parallel per one clock, which is usual thing in GPU processing.

btw, which exactly chip they call "pixel processor", Mars or Earth?

MetalliC · Post by **MetalliC** » Sat May 24, 2025 3:35 pm

gm_matthew wrote: ↑Sat May 24, 2025 1:32 pm no 3D renderer is 100% efficient in terms of fillrate (except perhaps PowerVR which uses a very different implementation).

all the modern GPUs uses various HSR optimizations, and they are (almost) same efficient in this as old good PowerVR 2

gm_matthew wrote: ↑Sat May 24, 2025 1:32 pm I doubt that Sega would have accepted a fillrate of 33 megapixels/s for Model 3 given that the Model 2 manages 50 megapixels/s (which I have confirmed from similar testing in MAME). Heck, the Model 1 manages 36 megapixels/s and has to run games at only 30 fps.

yep, it is pretty expectable Sega wanted Model 3 video board to be at least same 50Mpix, but this doesn't mean it should have 50MHz clock,

gm_matthew wrote: ↑Sat May 24, 2025 1:32 pmThe point is that just because an ASIC is receiving a particular input frequency doesn't necessarily mean that it has to be running at that frequency. For example, the PowerVR chip used by Dreamcast receives 33.333 MHz but runs at 100 MHz.

meanwhile in reality Dreamcast's HOLLY chipset had 100MHz main clock input (and also 54MHz pixel clock for CRTC)

: dc_clock.png (58.69 KiB) Viewed 40525 times

gm_matthew · Post by **gm_matthew** » Sat May 24, 2025 4:20 pm

MetalliC wrote: ↑Sat May 24, 2025 3:14 pm by HSR usually means things like "Early-Z" test https://www.khronos.org/opengl/wiki/Early_Fragment_Test

if you maybe not aware of - in canonical rendering pipelines depth test is at very end, right before writing fragment to frame buffer (with optional blending).
which means fragment is first fully shaded, textured, and only then it may turn out this fragment doesn't pass depth test and should be discarded. which looks like HUGE waste of performance, fill rate in 1st place.
that's why GPU developers starting from 2nd half 90s started to implement various optimizations, like "Early-Z" or PowerVR's approach.

I wanted to say - it is possible Real3D 1000 / Model 3 may have something similar.

AFAIK the original ATI Radeon (released in April 2000) was one of the first consumer immediate-mode GPUs to implement early depth testing (HyperZ), and Nvidia didn't start doing it until the GeForce3 (February 2001). The GeForce2 series in particular was notorious for being inefficient especially in 32-bit color, since it did not perform any kind of early depth testing.

MetalliC wrote: ↑Sat May 24, 2025 3:14 pm are you sure ?
it is strange if hardware which have such advanced features like model LODs switching and blending, "culling nodes", etc doesn't have relatively trivial occlusion culling via view frustum...

One of of the Real3D Pro-1000/Model 3's key features is culling objects that it determines to be completely outside the viewing frustum; this is performed by Mercury. This feature is not perfect and sometimes ends up culling objects that should be visible, for example in The Lost World during the attract mode, the T-rex's leg is culled even though it should be visible. This is fully emulated in Supermodel.

By occlusion culling I meant the hardware detecting polygons that would be completely hidden from view by other polygons and not rendering them at all; I am very sure that Model 3 does not do this.

MetalliC wrote: ↑Sat May 24, 2025 3:14 pm it is possible that "pixel processor" may process 2 fragments in parallel per one clock, which is usual thing in GPU processing.

btw, which exactly chip they call "pixel processor", Mars or Earth?

Each pixel processor (Earth) definitely only renders one pixel per cycle. Earth has a 48-bit output to three 3D-RAM chips, 16 bits each. Each pixel is made up of 24 bits for RGB color, plus 22 bits of edge-crossing metadata. Jupiter can use this metadata to perform post-processing to achieve anti-aliasing and translucency; check out this page for more details.

gm_matthew · Post by **gm_matthew** » Sat May 24, 2025 4:48 pm

MetalliC wrote: ↑Sat May 24, 2025 3:35 pm all the modern GPUs uses various HSR optimizations, and they are (almost) same efficient in this as old good PowerVR 2

The point is that if a GPU has a theoretical fillrate, if it renders nothing but tiny polygons it is not going to even come close to that theoretical figure, even taking T&L out of the equation. Voodoo 1 can theoretically render up to 50 megapixels/s but only up to 1.9 million 10-pixel polygons per second (source), which is 19 megapixels/s.

Of course modern GPUs can perform triangle setup much more quickly but even they won't come close to their theoretical fillrates if they're drawing nothing but tiny polygons. We don't know exactly how long Model 3 takes to set up each polygon, but it's certainly not going to be only 2-3 cycles per polygon.

meanwhile in reality Dreamcast's HOLLY chipset had 100MHz main clock input (and also 54MHz pixel clock for CRTC)

Okay, I stand somewhat corrected on this one. My source was this post; I interpreted it as saying that the 33 MHz crystal was driving the PowerVR chip directly but the diagram you posted proves otherwise.

Ian · Post by **Ian** » Sat May 24, 2025 4:53 pm

The powervr was an interesting architecture, fillrate is almost constant because the hardware only draws the top pixels, so there is no overdraw. Like the model3 the entire scene graph must be passed to the h/w then it splits up the work into tiles. Then in each tile per pixel it works out which is the top pixel to draw. It can also composite alpha from back to front per pixel so there are no transparency composition errors, which are a common problem even with modern games.

This also made dreamcast emulation quite difficult because normal hardware just can not do sorting at the pixel level like this. There are order independent algorithms out there, but they are all pretty expensive.

gm_matthew · Post by **gm_matthew** » Sat May 24, 2025 5:34 pm

Ian wrote: ↑Sat May 24, 2025 4:53 pm This also made dreamcast emulation quite difficult because normal hardware just can not do sorting at the pixel level like this. There are order independent algorithms out there, but they are all pretty expensive.

Indeed! I took a quick look at the PowerVR2 emulation in MAME yesterday and it's very complex.

MetalliC · Post by **MetalliC** » Sun May 25, 2025 10:49 am

gm_matthew wrote: ↑Sat May 24, 2025 4:48 pm The point is that if a GPU has a theoretical fillrate, if it renders nothing but tiny polygons it is not going to even come close to that theoretical figure, even taking T&L out of the equation. Voodoo 1 can theoretically render up to 50 megapixels/s but only up to 1.9 million 10-pixel polygons per second (source), which is 19 megapixels/s.

Of course modern GPUs can perform triangle setup much more quickly but even they won't come close to their theoretical fillrates if they're drawing nothing but tiny polygons. We don't know exactly how long Model 3 takes to set up each polygon, but it's certainly not going to be only 2-3 cycles per polygon.

guess why it was only up to 1.9M tri/sec for Voodoo 1? have you read at least a bit manual you posted link to?

spoiler: it is old immediate renderer device, with triangle setup on CPU, which means to draw each triangle you'd have to write a lot of interpolation coefficients to Voodoo registers.
and it's also PCI device, which means you may do up to 33M/66M 32bit register writes per second depending on bus clock.
so, if each triangle setup require let's say 32 registers writes you'll be able to draw only 66M / 33 = 2 million
and it's not related to GPU performance at all

btw, in real word you'd also need to calculate all these interpolation parameters on CPU, so more probable it was actual bottleneck back in the days.

gm_matthew wrote: ↑Sat May 24, 2025 4:48 pmOkay, I stand somewhat corrected on this one. My source was this post; I interpreted it as saying that the 33 MHz crystal was driving the PowerVR chip directly but the diagram you posted proves otherwise.

is there something wrong with you, so you taking information from some 20yr old forum posts instead of Dreamcast schematics and service manuals available these days ?

MetalliC · Post by **MetalliC** » Sun May 25, 2025 11:31 am

anyway, all these talks about clock and speeds went too far.

all I'm actually wanted to say is: if assume what Hikaru's GPU really is Real3D's product, it looks like a not big evolution improvement compared to Model 3 video board -
- a bit faster clock - 41MHz instead of 33MHz
- a bit improved texturing features - "free" trilinear filtering, programmable texture instructions
- notable improved lighting - 4 light sources per polygon, with several attenuation modes and spot type lights, instead of only 1 light per viewport,
- improved fog with "depth queue" mode
- number of more minor changes.
so, nothing totally new revolutional in "rendering core", but some evolutionary improvements.

and of course, worth to mention, totally new "frontend" Antarctic ASIC, which makes programming of this device much more easier and comfortable compared to Model 3.

Post by **Bart** » Sun May 25, 2025 2:47 pm

MetalliC wrote: ↑Fri May 23, 2025 1:08 pm
Bart wrote: ↑Fri May 23, 2025 4:22 am It doesn't fit the timeline. I've spoken to a few people who used to work at Real3D, some overlapping the Model 3 era, and no one has ever mentioned a post-Model 3 project with Sega. Real3D was pretty much finished by 1999 and in the year or two prior, they were trying to move into consumer PC graphics with that Intel partnership.
I see your point.
but, as was noted above, most of Hikaru GPU ASICs have relatively low part numbers 608x, which fits somewhere in late 1997 - early 1998 period.
and all but 1 of them works at relatively low clock 41.6MHz.
all of that makes me think it was actually designed in 1997 or a bit later, which is fits in timeline - about mid summer 1997 was finished work on Model 3 Step 2 video board, so Real3D and Sega may have some time and resources for work on (a bit) improved GPU.

Bart wrote: ↑Fri May 23, 2025 4:22 amWhy would either company keep the partnership a secret?
because it is pretty common practice. it's business.
because it is / will be your product with your brand name, and you don't want to promote some partner company (for free).
or, it may be vice versa - you may want to use partner company name to promote your product, and most likely it will not be free for you

anyway, in very many cases we wasn't aware of who actually designed one or another chipset for some gaming device, until it was decapped and examined.

Bart wrote: ↑Fri May 23, 2025 4:22 amMaybe it really is a Sega custom design. Didn't Konami and a few others bang out custom ASICs a couple years earlier?
I don't believe it may be fully custom desing, there is no point to create from scratch some unique and totally new 3D GPU, it will cost too many resources and will take too many time, especially for a company who not fully focused on this (3D HW development) area.

but using some licensed IP - yes, I think it was possible.

Maybe, as you suggested, it wasn’t a huge modification to the Pro-1000 (only the front end) and either Sega themselves or a third party did it after being granted access to Pro-1000 IP. I’m not sure how extensive the co-development of Model 3 was but Model 3 games seem to have all their code written by Sega. The Model 2 partnership was likely very deep (the Model 2 programming manuals appear to be written by Sega and are terrible).

Pretty interesting and I wish we could get the full story!

Btw Matthew: I’m in New York these days for an indeterminate amount of time but next time I’m in Nevada I can try to check the crystals on my VF3 board. We may have high quality board photos from Abelardo, though.

gm_matthew · Post by **gm_matthew** » Sun May 25, 2025 4:57 pm

MetalliC wrote: ↑Sun May 25, 2025 10:49 am guess why it was only up to 1.9M tri/sec for Voodoo 1? have you read at least a bit manual you posted link to?
spoiler: it is old immediate renderer device, with triangle setup on CPU, which means to draw each triangle you'd have to write a lot of interpolation coefficients to Voodoo registers.
and it's also PCI device, which means you may do up to 33M/66M 32bit register writes per second depending on bus clock.
so, if each triangle setup require let's say 32 registers writes you'll be able to draw only 66M / 33 = 2 million
and it's not related to GPU performance at all

btw, in real word you'd also need to calculate all these interpolation parameters on CPU, so more probable it was actual bottleneck back in the days.

Actually I have read a fair bit of that manual, enough to know that you only need to update the vertex coordinates (6 registers) and colors (9 registers) when rendering untextured polygons without alpha blending or z-buffering which is what the 1.9M figure is for. But the point I'm trying to make is that whether it's from the time required for triangle setup, additional cycles required per scanline or even just the time it takes to transmit the vertex data, the fillrate for drawing tiny polygons is going to be lower than for drawing larger polygons on pretty much any GPU.

And yes, I know that in practice the polygon rate of Voodoo (and indeed any card without hardware T&L) is almost always going to be limited by the CPU.

MetalliC wrote: ↑Sun May 25, 2025 11:31 am anyway, all these talks about clock and speeds went too far.

all I'm actually wanted to say is: if assume what Hikaru's GPU really is Real3D's product, it looks like a not big evolution improvement compared to Model 3 video board -
- a bit faster clock - 41MHz instead of 33MHz
- a bit improved texturing features - "free" trilinear filtering, programmable texture instructions
- notable improved lighting - 4 light sources per polygon, with several attenuation modes and spot type lights, instead of only 1 light per viewport,
- improved fog with "depth queue" mode
- number of more minor changes.
so, nothing totally new revolutional in "rendering core", but some evolutionary improvements.

and of course, worth to mention, totally new "frontend" Antarctic ASIC, which makes programming of this device much more easier and comfortable compared to Model 3.

What doesn't make sense to me is Hikaru's graphics chips running at only 41.6 MHz when the "less powerful" NAOMI/Dreamcast runs its PowerVR2 chip at 100MHz, nor Model 3's graphics chips running at only 33 MHz when the Model 2 runs at 50 MHz. But I'm done with talking about clock speeds and fillrates because it's not really contributing anything useful.

But I can definitely believe that Real3D could have designed or at least help design Hikaru's graphics chipset; perhaps one day we might know for sure.

Supermodel Forum

Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences

Re: Model 3 Step 1/1.5 and Step 2 Video Board Differences