Temple demo deep dive
For the Kyro II it was agreed that STM would produce a demo with ImgTec support to show off the hardware. The result was Temple (variously also known as TempleDemo and TempleMark), that could be run as both a looping demo (default) or as a benchmark to test other hardware in comparison to the Kyro II.
Planning for this demo began at the end of November 2000, with the Marketing Requirements Document (MRD) setting out a highly ambitious schedule of getting specification agreement by 20th December 2000, beta available 26th January 2001 and release for 16th Feburary 2001. However it would take until February 2001 before a decision on what the demo would be was actually made. Multiple ideas and themes were suggested by STM and ImgTec including:
City Flypast
A remote-controlled helicopter flies around a city. It could be on a “surveillance misson”, so it could zoom in on various buildings/rooms in the city to showcase various graphical effects. A city was chosen to promote high overdraw.
Wander About the Human Body
A camera flies around the internal organs of a human body. This makes modelling easier in the sense that they do not need to be realistic, but will require many polygons to produce good models.
Space Shuttle Re-entering Earth’s Atmosphere
The space shuttle has trouble re-entering, and needs a speed boost to land safely. The big “KYRO” button is pressed to boost the speed... hence letting the space shuttle land safely and saving the day. Other card logos are shown as “sponsorship” adverts on the shuttle, which all get blown/burnt away when KYRO kicks in. Demo requires human facial animation.
Thunderbirds Style
This is unspecific as to content, merely an idea to reduce artwork overhead. Animation requires a lot of work, so if going with Gerry Anderson style puppets, the graphics still look “cool”, convincing, etc. – only rendering puppets. These require very simplistic animation, yet can still carry out effects as required (gloss, smoke, etc.). This also permits a demo to be created in “installments”, to keep people returning to the web site for the next episode.
Spy Idea
First-person view of moving around an office environment, searching for clues, etc. Lots of overdraw (due to partitions in an office), and an opportunity to use the “X-Ray” specs. Also interactive.
Blade Runner
Fixed camera path flies around a city, using blade-runner style dark environment. Various “KYRO” neon signs are seen, along with broken (and sparking?) neon signs for the competitors. Big opportunity to be heavy on the style and effects.
Temple
A temple filled with columns is flown around (possibly a first-person style wander), reaching the central pool of water. Floating above it is a vase/artifact/etc. surrounded by bright lights. This is struck by lightning and explodes, revealing its secret contents – the KYRO II chip. Should show off overdraw (due to columns), nice blending effects and any technical demo code we can recycle.
There was a lot of discussion about the required implementation details to ensure that Kyro would run the demo faster than the competition. Particuarly the use of at least 5 layers of multi-texturing given the competitions capabilities:
- GeForce 2: 2 layers of multi-texture
- Radeon: 3 layers of multi-texture
- NV20 [GeForce 3]: 4 layers of multi-texture [as rumoured at the time, final hardware had 4 render pipelines with 2 texture mapping units per pipeline, the same as the GeForce 2]
- Matrox G400: 2 layers of multi-texture + bump map
By using 5+ textures it would ensure that a GeForce 2 & 3 and G400 would have to do three render passes, the others two, both of which would be slower than Kyro that could handle up to 8 textures, albeit at an incremental cost per texture.
The temple idea was chosen as the most realistic to achieve given the timescales and limited resources availble at the time, although the final result differed somewhat from the expanded upon original idea:
Story is someone is looking for secrets of the ancients (dark’n’moody as required). An ancient (Roman?) temple is approached, flying up the steps. The camera flies around the inside of the temple (filled with columns), until reaching a central point. Here is a pool of water with an artifact on a pedestal sticking out of the water. The central area should be nicely lit with as many real-time lights as we can cope with, showing real-time shadows dancing on the floor. The artifact is struck by lightning and explodes, revealing its secret contents – the KYRO II chip. Could hence turn the pool opaque red as the lighting strikes to match Kyro logo – also providing useful place to hide pieces of artifact...Extras we could insert given time:
- A spotlight from a torch as we move around the temple.
- Zoom in on a pillar to see hieroglyphs with the ST logo in them.
- KYRO chip could be contained in a gem (ruby to match logo?) – maybe show prism effects (various alpha overlays – mainly artwork overhead)
- Coins are hidden inside the artifact – ST logo on one side, KYRO on the reverse?
- Artifact: Could be a sarcophagus (matches dark theme), or an amphora. If we use a jug/vase, we ensure even further overdraw by requiring the inside/back of the object to be drawn (we’ll nip the camera over the top to show the inside).
Pros
- Columns promote overdraw if there are multiple rows of them (e.g. 5 deep surrounding the pool/item of interest).
- By using slim columns so a single column won’t blot out a large portion of the screen by itself, we prevent any PVS/BSP related tricks being used (as the gain from such a mechanism would be tiny – probably slower than direct rendering on an average card).
- Gladiator was popular this year, so ancient is possibly still “cool” with marketing?
- Columns are simply replicated – they are rotated to prevent obvious duplication to the eye. Hence major reduction on artwork.
- Multi textures & texture coordinates can help make columns look different (also simply rotating each column a random amount before placement – they are not obviously cloned).
- Central artifact is the other “showcase” item – hence mainly floor, column, pool and artifact need to be built.
- Concentrate effects on artifact (umpteen multi-layer texturing to approach photo realism if possible).
- Hence use detail textures, environment mapping, specular masks, etc, etc. to show off effects – these are all legitimate uses of multitexturing (requiring more than 4 texures).
- We can insert additional work into the demo once the bare bones are complete. Hence the demo is “scaleable” and “extensible” according to schedule.
- It seems to promote many creative ideas from various people, and open many options.
- Dark and moody to match new artwork.
Some concept and storyboard artwork:








Due to the short timescales the demo was built on a highly modified version of the codebase that was used for VillageMark. An internal review revealed that the "engine" (more of a framework) had a number of limitations:
- The source is contained within a single .h file and single .c file.
- The engine itself is very basic.
- The engine is very explicit.
- The models (from 3ds max) are stored in a .h file, so not easy to change the model and run demo.
- Only three levels of MT (MultiTexturing) supported.
- No Bump Mapping is supported.
- No stencil buffers are used.
- Only one light is used.
But also the following advantages:
- The engine comes with some utility code which will give the same look and feel to the product as the other IMG demos.
- The engine copes with h/w T&L.
A significant amount of effort went into expanding the engines functionality during production.
The demo took inspiration from ancient Egypt, and some of the reference shots used during development were screengrabs from the 1999 movie The Mummy.
Some work-in-progress screenshots:

















The demo showed a journey through an Egyptian temple built into the side of a hill, starting outside and moving inside through several different rooms designed with heavy overdraw and up to 6 texture layers.
From the user guide:
Demo Walk-ThroughInitially you are presented with the Kyro II logo. Once the demo has completed loading from disc, the screen fades up from black to show the ancient pyramids.
The sky has 5 layers of texture to give an impression of depth (via cloud parallax) – no additional polygons required for these layers. The pyramids and the floor are textured using 3 layers, including an animated light map representing the shadows cast by the moving clouds. Other 2 maps are detail map (showing stone grain when the geometry is close to the camera) and diffuse colour map.
We then approach the entrance to the temple via a short corridor. Portaling is used to prevent unnecessary overdraw by not rendering the inside of the temple until it is visible, at which point we cease to render the temple exterior.
The column room contains multiple pillars, showing off:
- Slight overdraw
- DXT1 textures (everywhere, except flames and environment mask in bowls)
- Environment mapping (bowls)
- Real-time lighting (look at roof)
- Particles (alpha blending)
- Multi-texture
The bowls next to each pillar use environment map, specular mask, bump map and diffuse textures. The specular mask is used to show a stippled surface to the metal.
The camera flies through the room and down another corridor (at which point, portalling kicks in again, hiding the geometry in the pillar room and revealing the geometry in the pool room).
The pool room (which is in a more advanced stage of completion cpw. pillar room) shows off:
- Multi-texture (5 layers in pool, 6 on floor)
- Environment mapping and masking (pool floor)
- Bump mapping (pool steps, columns, flooring)
- Real-time lighting (flickering light on walls)
- DXT1 textures (all, except pool transparency, flames and environment mask)
The 6 layers of texture on the pool floor are:
- Diffuse texture
- Environment map
- Environment mask
- Light map
- Detail map
- Bump map
The environment mask is shown off best when the camera spins around the pool, showing the environment masking. The reflections on the floor are limited to the circular marble inlay and centre of the flag stones due to wear and tear. Note that the masking is carried out with multiple textures instead of high definition polygons (the pool floor is made up of one large map (1024x1024) up to the edge of the pillars).
The 5 layers on the pool are:
- Caustics
- Refraction map
- Procedural alpha map (blends refraction to reflection)
- Reflection map
- Pool scum
The camera then flies into another portaling corridor, finishing in the chip room. The cat statues cast real-time shadows from the moving will-o-the-wisp lights, which are represented by small particle systems. The statues have lightning leaping between them, with the Kyro chip spinning above the group of statues. This scene uses a high number of polygons – over 2750 faces in the chip alone. Chip surface is bump mapped, environment mapped and masked - lettering is glossy compared to chip surface, and the chip pins are metallic.
The camera zooms into the chip to show the Kyro logo, finally fading to black. The demo then loops back to the initial pyramid scene.
Benchmarking
As previously mentioned the demo could also be run in a benchmark mode that would run a single pass of the demo, rendering each frame as quickly as possible and then write timing details to a csv file for comparison with runs from other hardware.
The demo was also highly processor speed dependent. On a 500MHz P3 the Kyro could achieve 93% of the performance of the Kyro II. However with a 1GHz processor the Kyro could only manage 83/84% of the Kyro II's performance. Some benchmark values obtained from competitors with Temple Demo v1.0.3 and different operating systems:



Development
The development environment utilised the following:
- Microsoft Windows 2000 (SP1 or higher)
- Microsoft Visual Studio 6.0 SP5 (with processor pack) or SP6
- Microsoft DirectX7a SDK
- Microsoft Visual SourceSafe source control
Additional tools used:
- Autodesk 3D Studio Max 4
- Adobe Photoshop
- UPX - Executable compression
Artwork
The modelling was done using 3D Studio Max 4, and textures developed with Adobe Photoshop. All textures were saved as bitmaps (BMP), with the intention that most would be converted to DXT1 format (DDS) where possible for use. The 3d scene would use the PowerVR exporter for 3DS Max which exports the scenes meshes, camera, lights and animations to a single C source file and header.
The exported file contained the following:
- 1750 animation frames
- 735 meshes (no instancing, all mesh vertices exported in world space)
- 102 materials
- 34 lights
- 1 camera
As the framerate was set at 30 frames per second that means the total runtime was 58.3333 seconds (1750 / 30). Why such an arbitrary value was used is anyones guess.
The exported C file defines structures and arrays of floats representing different attributes. One example is that every mesh has a visibility array to indicate whether it is hidden or rendered on every frame. However due to the exporter only appearing to export floats that means that this boolean state was represented by 0.0f or 1.0f, which means 4 bytes per value, so 735 meshes * 1750 animation frames * 4 bytes = 5145000 bytes or 4.9MB of data just for visibility. One advantage though is that it was highly repeating so compressed incredibly well when UPX was used.
Likewise the camera was defined with arrays, each containing 1750 values so its position and coordinates it was looking at were defined per frame. This was probably done by PowerVR to ease the computational load for weaker processors rather than interpolating along splines.
Code structure
The demo was developed in pure C, no C++, and a small amount of inline assembly. As is typical for larger C applications the codebase was split up into modules, each module consisting of a single header and source file. All public functions in the module are prefixed with the module name.
The code was designed to be compiled into a single exceutable with no external dependencies. This means all textures and other resources were embedded. However this made the executable over 27Mb in size, which was especially significant for electronic distribution over ISDN and dial-up internet connections that were common at the time (up to 128kbps). To address this the executable was available compressed with UPX, which thanks to the significant amount of repeated and highly compressable data reduced the size down to around 6Mb.
Downloads
Download v1.0.7 Unofficial Hornet build
A 24+ year old rendering bug
Whilst running the last release of the demo (v1.0.6) on modern hardware a very obvious rendering error occurs with the fire particle systems. The bottom of the particle system is visible but then is cutoff by an "invisible" box. This exact effect occurs on hardware from Intel and Nvidia (AMD not tested) and therefore a vendor specific driver issue can be largely discounted.

Reviewing the rendering code for the fire particle systems reveals an interesting mechanism. Each fire system is rendered to its own 64x64 texture, which is then rendered as a billboard in the scene.
Where to begin looking? Fortunately the source code has a #define called PARTICLESDEBUG that is commented out. Enabling this renders a solid black quad to the texture before rendering the particles. When recompiled and run this displays the particle systems as intended, albeit with a solid background which fills the billboard.
This suggests a subtle problem with the rendering of the particle system, one that doesn't present on Kyro hardware but does on other hardware.
The code has a fairly typical WinMain function that initialises the application, creates the window, Direct3D and DirectDraw devices and then loops processing Windows messages, calls the framework function D3DFERender, and finally swaps the buffers (fullscreen flip or windowed blit).
The D3DFERender function call graph (with some logic) is as follows, where indented functions are called from the previous level function:
if(clearFlags) d3dDevice->Clear() // clear according to config
RenderScene
Particles_RenderAllTextures // render particle systems to textures
foreach fire texture {
Particles_RenderToTexture(fire_system)
d3dDevice->Clear() // clear color buffer (0x0)
Particles_Draw
}
UpdateScene
Anim_Update
Camera_UpdateCamera
RainCyl_Update
Cull_CullGrid
Geometry_Update
Lights_UpdateLights
Pool_Update
Shadows_Update
Lightning_Update
Droplets_Update
d3dDevice->Clear() // clear z-buffer (1.0f) and stencil (0x0)
Materials_RenderMeshes // render opaque geometry
// render transparent geometry
Pool_Render
Droplets_Draw
Shadow_RenderShadows
Lightning_Render
Particles_DrawAll
foreach fire system {
render particle billboards
render moving systems
}
D3DTPrint3DDisplayWindow // render text panels
Particles_UpdateAll
Of particular note are the potential three calls to the D3D7Device Clear method every frame. The first is never actually called as the clearFlags variable is not initialised with any value.
The next call is done during the call to render each particle system to its texture, clearing only the color buffer. And finally it is called again prior to the main rendering calls to render the scene, where only the depth (z) and stencil buffers are cleared as an optimisation.
When rendering the particle systems, in the device state D3DRENDERSTATE_ZENABLE is true, however D3DRENDERSTATE_ZWRITEENABLE is false (disabled). D3DRENDERSTATE_ZFUNC is the default D3DCMP_LESSEQUAL. This means that the particles are rendered with respect to the values in the depth buffer, but never update/overwrite them. This will become important later.
Another important fact is that whilst each particle system gets its own 64x64 texture to render to, this is only the color buffer. When made the active rendertarget it will utilise the device created depth buffer, in this case the 64x64 top left corner of the depth buffer.
The final important fact is the call to D3DTPrint3DDisplayWindow renders the text panels as triangles with a depth value of 0. The range of the Direct3D depth buffer being 0 (nearest) to 1 (farthest), therefore the text is rendered as the minimum depth value any pixel could possibly have.
As can be seen in the call graph above because of the order of the Clear calls the depth buffer will contain the depth values of the preceeding frame when rendering the particle systems. A close up of a frame the depth buffer used for the particles would be the area in the red rectangle. The green rectangle represents the area of the text panel which is depth 0. Therefore it is only the area inside the red rectangle and outside the green rectangle where the pixels of the particles will compare against whatever non-zero depth value is in that area and potentially be rendered.

As the particles are bright at the bottom (due to the blend mode being addition) but fade individually as they move higher this is why we see the cut off bottom of the particle system but nothing at the top as they have faded out by this point.
Reading through the source control history in the header comments of each file reveals that this bug was introduced as part of the v1.0.4 release. Prior to this the clearBuffer variable in the D3DFERender function was set to clear the z-buffer, however this was removed with the comment "No longer clear Z-buffer in D3D shell, as we're doing it in the main code!", meaning the call to Clear in the RenderScene function. Clearly the call in the particle rendering code was forgotten about.
Reinstating a clear of the Z buffer in either the D3DFERender functions Clear call or in the particle renderings call to Clear resolves the problem.
So why wasn't this detected in 2001 when released? It can only be assumed that v1.0.4 and higher weren't tested on hardware from other providers. By v1.0.3 most of the major issues had been ironed out and review testing/benchmarking done so minor bug fixes didn't necessarily demand exhaustive testing.
But why didn't it show on Kyro hardware too? On Kyro there is no z-buffer surface in video memory matching the dimensions of the rendertarget, only a hardware z-buffer matching the tile size within the chip*. As each tile is processed there is an implicit (or rather driver implemented) clear to the last clear value requested by the application before processing and rendering the triangles for that tile. In this case that was to 1.0f for depth, so the particles would render correctly to their texture on Kyro.
*The driver can though capture the z-buffer values upon completion of the tile processing and store it in a system memory buffer that does match the rendertarget dimensions.