GeForce 3

Written by Boldson on 05 November 2009. Posted in GeForce 3 Series

The GeForce3 video card was presented on February 22, 2001 at MacWorld Expo in Tokyo. Anyone could attend in absentia, as the video of this event was posted on the Internet. Monster of graphics engines", "living legend" John Carmack from ID Software, praising the capabilities of the new chip, noted that it should not be called "GeForce3". In the sense that the number of innovations has reached that critical limit, at which instead of a serial number, you need to give a new name Why?

A brand new chip
Because when developing the GeForce3, nVidia engineers took a new approach. Previously, performance was improved through "brute force": increased computing power, increased frequencies, doubled pipelines and computing units, and the architecture as a whole remained the same. Only the fillrate increased, that is, the speed at which pixels fill the 3D scene. Since chip manufacturing technologies have a certain limit, nVidia graphics cards soon fell out of balance. The effect of insufficient memory bus bandwidth has appeared, when further performance improvement is hindered by a low data rate.
This became especially noticeable after the release of ATI Radeon video cards. They were based on the Rage6C chip, which has half as many 3D pipelines as the GeForce2 (NV15). However, Radeon is by no means twice behind the opponent, and in some cases is able to come close to him. And all because ATI engineers from the very beginning focused on improving the balance and neutralizing "bottlenecks" in the architecture.
GeForce3 was the first serious attempt to improve the architecture to achieve a better balance. In addition, new blocks have appeared in it, giving developers the opportunity to use many previously inaccessible features.

GeForce 3 Specifications

Name	GeForce 3
Core	NV20
Process technology (µm)	0.15
Transistors (million)	60
Core frequency	200
Memory frequency (DDR)	230 (460)
Bus and memory type	DDR-128bit
Bandwidth (Gb/s)	7.3
Pixel pipelines	4
TMU per conveyor	2
textures per clock	8
textures per pass	4
Vertex conveyors	1
Pixel Shaders	1.1
Vertex Shaders	1.1
Fill Rate (Mpix/s)	800
Fill Rate (Mtex/s)	1600
DirectX	8.0
Anti-Aliasing (Max)	MS - 4x
Anisotropic Filtering (Max)	8x
Memory	64 / 128 MB
Interface	AGP4x
RAMDAC	350MHz

The GPU is the Graphics Processing Unit
When the GeForce256 was announced, Nvidia claimed it was the world's first GPU. However, unlike the central processing unit - CPU, it had practically no possibility of real programming. Of course, it could perform a certain set of operations, but this set was hard-coded by the chip developers, and not by the programmers themselves. The operations of hardware coordinate transformation, lighting (T&L block) and texture combination (NSR block - rasterizer) were not always suitable for the tasks that game developers set themselves. Therefore, full-fledged T&L support is still rare.
With the release of GeForce3, the situation has changed radically. The nfiniteFX technology (from two words - Infinite and Effects) provided for the presence of two new mechanisms - Vertex Processor and Pixel Processor, each of which allowed, using a set of low-level commands, to create all kinds of special effects, the number of which (nVidia claimed) was endless.
Vertex Processor worked at the stage of converting the coordinates of triangle vertices (vertices) into a form suitable for further processing. With the help of the vertex processor instruction set, developers could create lighting effects, morphs, keyframe animations, and more. To create, for example, one animation phase, they just had to specify the start and end coordinates, and the GPU would calculate the rest. The example is rather conditional, but the idea, I think, is clear.
Also, for the first time, a 3D accelerator could work not only with polygons, but also with curves of the second and higher order. The developer did not have to worry about splitting the curved surface into constituent triangles - this will be done by the Vertex Processor with a given degree of accuracy.
Pixel Processor is a further development of NSR (nVidia Shading Rasterizer) technology. The process of applying multiple textures (for example, to obtain lighting or volume) was fully programmable, allowing you to combine up to eight different textures from different pipeline lines. In addition, the pixel processor could work with textures that set not only color, but also other surface properties: reflectivity, for example, or relief. This was especially useful when simulating a water surface - the GeForce3 could simulate water in real time, which was demonstrated on benchmarks optimized for it.
All of the above features were implemented using the DirectX 8 interface command set, which was specially developed for the new generation of 3D accelerators. Video cards that did not have full hardware support for DirectX 8 (and that's all except GeForce3) could implement vertex operations using the processor, and pixel operations were not available to them. The OpenGL interface has not been left out either: all the necessary extensions have recently been added to its composition.

Lightning-fast memory architecture
This is the translation of another architectural novelty, first used in the GeForce3. Lightspeed Memory Architecture (LMA) is a set of technologies designed to compensate for the low (for technological reasons) performance of the local video memory bus. The main drawback of the nVidia graphics chips that existed before GF3 was the lack of balance between the graphics core and the memory subsystem. The speed in 16-bit video modes was much higher than in 32-bit ones, because the memory simply could not cope with the load placed on it by issuing huge amounts of data. The GeForce3 had several mechanisms designed to increase memory efficiency.
Crossbar Memory Controller is a completely new principle of memory bus organization. The GeForce3 included not one 128-bit controller, but four 32-bit ones, each of which worked independently of its neighbors. Why is this needed? To optimize memory accesses. If the chip requested two blocks of data of 32 bits located in memory not in a row, it received them immediately - from two controllers. It was no longer necessary to wait until one 256-bit word was read first, and then another.
Such an architecture was especially important when processing highly detailed three-dimensional scenes, when each object consisted of many small (a couple of pixels in size) triangles. To get a texture for two pixels, you did not have to idle large amounts of data on the bus.
In addition to the new controller, large caches were used that can satisfy the requests of running pipelines without accessing the bus. We will not specify the exact sizes of the caches and the mechanisms of their operation, but their presence is indirectly indicated by the large number of transistors used.
Visibility Subsystem - a set of methods for improving the efficiency of working with the Z-buffer. As you know, each object on the screen has a third coordinate - Z, which determines its distance from the observer. In accordance with it, objects are drawn on the screen, starting with the most distant ones. Obviously, some of them are completely obscured by others and are completely invisible. But the 3D accelerator still spends precious time on them, "honestly" filling them with textures. This phenomenon is called overdraw. The overdraw coefficient in modern games ranges from 1.3 to 3.5 depending on the number of objects.
For the first time, ATI used the Z-buffer optimization technology, calling it HyperZ. Thanks to it, Radeon in high resolutions is able to compete with GeForce2. The GeForce3 also used this kind of optimization. Z-Occlusion Culling - a mechanism for discarding invisible objects, excluding them from processing.

The Z-buffer compression mechanism used in Radeon has also been implemented in the GeForce3 core. According to the developers of the chip, it allowed to reduce the amount of data in this buffer by a factor of four.
But fast clearing (Fast Z-clear) was used only in Radeon. nVidia deemed it unnecessary to create anything like this. However, she could see better.

GeForce 3

Smoothing even faster
The most notable drawback of the image generated by the 3D accelerator is the "aliasing" effect on the edges of the triangles that make up the image. This is an inevitable phenomenon due to the fact that the monitor screen consists of a rectangular matrix of pixels, and the lines on it are broken. In order to smooth the edges of objects, you need to either increase the resolution (then the pixels become smaller), or apply the FSAA (Full Screen Anti-Aliasing) method, in which the colors of neighboring pixels are averaged over the entire screen. All previous 3D accelerators used the supersampling mechanism. This is a head-on solution to the problem: an image is built in the buffer with a resolution increased by several times vertically, horizontally, or in both directions at once. After that, the color of a pixel (sample) is obtained by averaging the colors of neighboring pixels in the enlarged image. In this case, two unpleasant effects arise: firstly, the image is noticeably "smeared", and secondly, the speed drops sharply, since in fact the 3D accelerator operates at a higher resolution.
The GeForce3 used two new mechanisms. The first is an algorithm with the unpronounceable name Quincunx. The bottom line is the use of not a simple 2- or 4-sample mask of neighboring pixels, but a 5-sample one. Those. a 3x3 block is taken, and the resulting value is calculated based on five, not nine, pixels in that block. As a result, at the cost of a 2-sample FSAA, we got the quality of a 4-sample FSAA.
The second algorithm is multisampling, which Nvidia calls HRAA (High-Resolution Anti-Aliasing). It is based on the generation of the same high-resolution image, only the color of the pixels, which as a result will merge into one anyway, is not recalculated each time. If the group of pixels to be collapsed into one lies inside a triangle, then it is filled with the same color value. If the group lies on the boundary, it is calculated in the usual way. Thus, the number of calculations during HRAA was reduced by several times, and the quality improved, since the textures inside the triangles are not smoothed.

GeForce3 - was a revolution in the world of video accelerators. Many innovations that radically improve the architecture of the chip. Lots of new features for developers. High performance, which could be increased a little more by overclocking, debugged drivers.

Max Payne