f2 X icon 3 y2 steam2
 
 Search find 4120

GeForce 8800 GTX

38521

It so happened that after some time, when computer platforms were improved in general evolutionarily, the release of the 8800 series made a real splash. This video card is widely used now, so starting from this series we will talk about video accelerators in the present.

321851

 

The NVIDIA GeForce 8800 can truly be considered the revolutionary platform of the day. Such an epochal phenomenon did not really happen every year, and in terms of its significance it can only be compared with the release of the Intel Core architecture, but with the release of the Microsoft Windows Vista operating platform.

NVIDIA GeForce 8800 architecture 

321861

Starting to develop a new generation of 3D architecture back in the summer of 2002, a number of key requirements were set before NVIDIA engineers. In addition to the classic challenge of developing a faster GPU with improved image quality, the ability to handle physics effects and intensive floating point calculations was also taken into account. At the same time, in collaboration with Microsoft, the goal was to give new qualities to GPU pipelines when working with flows and geometry, to determine the key functions of the new generation of Direct X (for Windows Vista - DirectX 10), of course, taking into account the achievement of maximum performance when working with applications using legacy OpenGL, DirectX 9, and earlier versions of DirectX.

The final result of the development of the GeForce 8800 architecture was the release of two chips - the high-end solution GeForce 8800 GTX and the more "modest" version of the GeForce 8800 GTS. The first representatives of the new NVIDIA GeForce 8800 3D graphics architecture, which will appear on store shelves, will be video cards based on the NVIDIA GeForce 8800 GTX chip, so today the story will be mainly about their features. The GeForce 8800 GTX chips are truly the industry's first DirectX 10 compliant solutions that support Extreme High Definition (XHD) screen resolutions with high performance in the most "heavy" maximum operating modes.

An interesting fact should be noted: the announcement of the NVIDIA GeForce 8800 architecture practically coincides with the new market strategy of NVIDIA, which is now shifting to the promotion of platform solutions. This does not mean at all that the new graphics will not work with chipsets from other companies, but NVIDIA promises maximum results precisely as a result of using platforms based on GeForce 8800 GTX video cards, including in SLI configuration, and new high-end nForce 600 series chipsets. SLI. Simultaneously with the GeForce 8800 graphics, NVIDIA announced the nForce 680i SLI chipset for the Intel platform, but there is no doubt that over time similar solutions will be presented for working with AMD processors.

Perhaps the strongest impression of the new architecture is the unified nature of its pipelines. All theoretical disputes about the possibility or impossibility of implementing the design of unified pipelines in the foreseeable future end today, because the solutions of the GeForce 8800 family have a powerful parallel architecture of unified shaders and consist of 128 separate, completely independent stream processors with a clock frequency of up to 1.35 GHz. Each processor pipeline, in turn, can be dynamically reassigned to handle vertex, pixel, geometry, or physics operations, thus providing GPU resource peaking and maximum balanced flexibility when processing shader tasks.

322231

Let's take a look at the block diagram of the GeForce 8800 GTX chip, today we often have to return to this illustration. The single-core design of the GeForce 8800 GTX allows for significant performance gains in today's applications and scaling up some of the shader operations that will be used most intensively in future games.

322191

For a clearer explanation of the essence of the architecture of unified pipelines, first, let's recall the principle of operation of the classical model of the pipeline involved in processing a stream of shader data with a number of attributes, indices, commands and textures sent by the central processor to the graphics chip. The main stages of processing - vertex and pixel shaders, rasterization and final writing of pixels to the frame buffer, are carried out in a uniform linear sequence, while GeForce 7 class chips use many physical pipelines at each main stage of processing - up to 200 successive stages of the pipeline for each stage of processing pixel shader.

Unlike the classic discrete design, in the case of the unified pipeline and shader architecture of the GeForce 8800, it becomes possible to significantly reduce the number of pipeline stages involved and change the linear sequence of data flow processing, making it more cyclical. Thus, incoming data is fed to the input of the unified shader module, written to the registers as output, and then fed back to the input of the module to execute the next processing operation. In the illustration below, classic pipelines processing discrete shaders are represented in different colors.

322201

It is assumed that the number of hardware shader modules built into the GeForce 8800 architecture will be especially in demand when working with DirectX 10 3D games. architecture will be more successful with a balanced load distribution, for efficient chip loading when working under DirectX 10. Of course, the unified shader architecture of the GeForce 8800 is also effective when working with OpenGL and DirectX 9 and earlier, since there are no restrictions or a fixed number of unified shaders for processing pixel and vertex shaders with any API model.

Let's try to visualize the advantages of the unified shader architecture in the following example. Suppose, in the course of action, we need intensive rendering of geometry - powerful processing of vertex shaders, in which case the performance will rest on the maximum number of vertex modules. The scenario below, which requires more complex processing of lighting effects on water, on the contrary, requires more intensive work of pixel shaders, and here the maximum performance will also be limited by the number of pixel shader processors. In both cases, it's a long way to the full loading of the chip and prudent energy consumption, since part of the chip is idle one way or another.

322101

In the case of a unified shader architecture, not only the efficiency of loading the chip increases, but also the performance due to the complete redistribution of resources to the currently required task - the processing of pixel or vertex shaders.

322111

The Unified Streaming Processors (SP) cores of the GeForce 8800 chip, which are general-purpose processors for processing floating-point data, can process geometric, vertex, pixel shaders, physical effects - no difference.

Unified Stream Processor (SP) Architecture

So, the age-old dream of all times and peoples - flexible parallelization of processing shader operations, is solved in the GeForce 8800 architecture using a variety of scalar stream processors that process incoming data streams and generate output streams, which, in turn, can be used for further processing by other SPs. Grouped together, these processor engines can provide impressive parallel processing power.

The illustration below clearly shows the balanced design of the GeForce 8800 architecture, where the SP stream processor block is combined with the cache, Texture Filtering (TF) and Texture Addressing (TA) blocks. Imagine 128 such universal streaming processor pipelines united into "subsets" - that's how many of them there are in the GeForce 8800 GTX chip.

 

322241

Why scalar architecture? In the early stages of the development of the GeForce 8800 architecture, NVIDIA engineers analyzed hundreds of shader programs and came to the conclusion that the traditional vector architecture is less efficient in using computing resources than the scalar design of processor modules, especially when processing complex mixed shaders that combine vector and scalar instructions. Moreover, it is quite difficult to achieve efficient processing and compilation of scalar calculations using vector pipelines. Despite the fact

Traditional vector graphics chips, both from NVIDIA and ATI, have hardware implementation of shaders with support for dual instruction execution. Thus, modern ATI chips with "3+1" design allow execution of a single 4-element vector instruction or a paired operation from a three-element vector instruction and a scalar instruction. NVIDIA GeForce 6x and GeForce 7x series chips support pair execution of 3+1 and 2+2 instructions, but they are also far from the efficiency of the GeForce 8800 architecture, which allows loading scalar chip modules with scalar instructions with 100% efficiency. Note that the vector shader program code is converted into scalar operations directly by the GeForce 8800 chip. Thus,

Lumenex module - high quality anti-aliasing, HDR and anisotropic filtering

The NVIDIA Lumenex module, implemented in the GeForce 8800 series chips, takes high-quality anti-aliasing (AA) and anisotropic filtering (AF) technology to a new level. Thanks to the use of both zonal (coverage) and geometric samples, the new anti-aliasing technology is called Coverage Sampling Antialiasing (CSAA), while providing support for four new CSAA multi-sampled anti-aliasing modes for video cards on a single GPU - 8x, 8xQ, 16x and 16xQ.

Each of the new AA modes is activated from the NVIDIA driver control panel by selecting the option called Enhance the Application Setting. Initially, to initialize the operation of the new A modes, you will need to activate any level of AA in the game settings in order for the application to properly distribute and set the settings for AA rendering surfaces. In case the game does not support AA, the user can set the Override Any Applications Setting mode in the NVIDIA driver control panel. It works, but not in every case.

321871

In many toys, the new 16x mode will provide a refresh rate comparable to the standard 4x multisampling mode, but with significantly higher picture quality. Below is an example of how CSAA 16x mode works compared to standard 4X AA multisampling.

GeForce 8800 series chips support HDR (High Dynamic Range) rendering process with 128-bit precision, not only in FP16 (64-bit color) mode, but also FP32 (128-bit color) mode, which can be processed simultaneously with the multisampled anti-aliasing process. This allows you to achieve realistic lighting effects and overlay shadows, while providing high dynamics and detail in the darkest and brightest objects. The screenshot below of the face of NVIDIA's new virtual top model, Adrienne Curry, is an excellent illustration of the level of realism achieved when running the NVIDIA Lumenex engine in GeForce 8800 chips.

The illustration below shows an example of how Anisotropic Filtering (AF) works, which allows you to achieve greater clarity and sharpness of various objects located at an acute angle and / or leaving in perspective. In combination with the technology of multiple trilinear texturing (with changing resolution as you move away, trilinear mipmapping), anisotropic filtering allows you to reduce the scale of distortion and make the picture much clearer. In the illustration below: on the left - Isotropic Trilinear MipMapping, on the right - Anisotropic Trilinear MipMapping.

322131

It should be remembered that anisotropic filtering is very sensitive to the memory bus bandwidth, especially at high AF levels. For example, the 16xAF mode means 16 bilinear reads for each of two adjacent levels of multiple texturing (a total of 128 memory accesses), complicated by obtaining the final pixel-by-pixel color texture. Solutions based on GeForce 8800 chips received a new option in the AF control panel, called Angular LOD Control, which has two modes - Quality and High Quality. Pictured below: GeForce 7 AF (left) versus GeForce 8 with Angular LOD Texture Filtering set to High Quality (right).

322141

 NVIDIA Quantum Effects Technology - Physical Effects

The new NVIDIA Quantum Effects technology allows you to simulate and render many new physical effects with the new generation GeForce 8800.

321881

128 GeForce 8800 GTX stream processors provide enough floating point power to achieve a range of new realistic game effects such as haze, fire, explosions; realistic imitation of moving hair, fur, water. Of course, the most interesting game effects with emulation of physical phenomena can be observed after the release of DirectX 10 games.

PureVideo and PureVideo HD

NVIDIA PureVideo HD technology, well-known for all modern NVIDIA graphics cards, is also integrated into the GeForce 8800 chips and allows you to provide high quality and smooth playback of HD Video content from HD DVD and Blu-ray media, with minimal use of CPU resources. PureVideo HD technology is a complete hardware and software solution that supports HDV formats H.264, VC-1, WMV/WMV-HD and MPEG-2 HD. In addition, GeForce 8800 chips support PureVideo technology for working with standard WMV and MPEG-2 formats. AACS-protected content from Blu-ray or HD DVD media can be played on GeForce 8800-based systems using AACS-compatible players such as CyberLink, InterVideo, and Nero. All GeForce 8800 cards support HDCP for Blu-ray Disc and HD DVD,

Extreme High Definition Gaming support

All GeForce 8800 cards support Extreme High Definition (XHD) gaming settings, and games can run in widescreen modes up to 2560x1600 - seven times the picture quality of a 1080i HD TV and twice that of 1080p HD. It should be added that the dual DVI interface of the GeForce 8800 GTX card provides XHD gaming quality with a resolution of 2560x1600 and high FPS.

Unrelated shader counting, branching, and Early-Z

Texture addressing, fetching, and filtering takes a certain number of GPU cycles, and if a texture needs to be fetched and filtered before the next rendering operation is performed in a particular shader, the latency of this process (for example, in the case of 16x AF) can significantly slow down the GPU. The GeForce 8800 architecture provides for a sparing mode of operation and a mechanism for "hiding" texture sampling latency by simultaneously executing a number of independent mathematical operations. If in the GeForce 7 pixel pipeline the calculation of the texture address is interspersed with mathematical FP shader operations in the Shader Unit 1 module, then the unrelated independent work in shader and texture operations in the GeForce 8800 removes this problem.

 

322261

Another important aspect that directly affects the overall performance of the graphics system, especially when processing complex DX10 shaders, is the efficiency of the branching process. Unlike GeForce 7 series chips, which are "sharpened" for processing typical DirectX 9 shaders, the GeForce 8800 architecture is designed to process complex DX10 shaders, while branching 16 pixels (threads), in some cases up to 32 pixels.

As for the Z-buffer, the GeForce 8800 GTX chips perform pixel sorting at a speed four times faster than that of the GeForce 7900 GTX, so the GPU is able to handle all complex situations at the level of each pixel. Z-comparisons of data about each pixel are made in the rasterizer module - ROP (raster operations). To increase performance, GeForce 8800 chips support Early-Z technology, which allows determining the Z-values ​​of pixels before they enter the pixel shader pipeline, due to this, performance is increased and a number of obviously unnecessary operations are not performed. An example of Early-Z operation is shown in the figures below.

 

322271

 

GeForce 8800 GTX: design and performance

We return to the general block diagram of the GeForce 8800 GTX, where we consider the host interface module, which consists of command receive buffers, vertex data, and textures sent for GPU processing from the CPU via the graphics driver. The next module is an introductory assembler block that collects vertex data from buffers and converts it to FP32 format, while generating a number of identifiers in parallel to mark up repeated operations with vertices and primitives.

322291

The GeForce 8800 GTX chip includes 128 stream processors (or simply stream processors, stream processors, SPs), each of which is assigned to process any specific shader operation, while the output of the SP can be redirected to the input of another stream processor.

As mentioned at the beginning, each stream processor of the GeForce 8800 GTX chip operates at a clock frequency of 1.35 GHz and supports dual processing of MAD and MUL scalar operations, which in total gives a raw performance of about 520 gigaflops (billion FP operations per second). In fact, this is not pure performance yet, since 100% efficiency is implied only when working with scalar shaders only. However, the process of processing mixed scalar and vector code using the GeForce 8800 GTX has advantages over GPUs with vector hardware shaders, all because of the above data processing limitations (3+1, 2+2, etc.). Texture filtering modules, completely independent of stream processors and operating at the frequency of the GPU core (575 MHz for the GeForce 8800 GTX),

 

322301

Bilinear FP16 texture filtering with GeForce 8800 is performed at a speed of 32 pixels per clock (almost 5 times faster than GeForce 7x), anisotropic FP16 2:1 filtering - 16 pixels per clock. Taking into account the core clock frequency of 575 MHz, by simple calculations, you can find out that the total performance when calculating bilinearly filtered and 2:1 bilinear anisotropically filtered texels will be 575 MHz x 32 = 18.4 billion texels per second.

The GeForce 8800 GTX chip has six Raster Operation (ROP) sections, and each section is able to process 4 pixels (16 sub-pixel samples) with a total performance of up to 24 pixels per clock with color processing and Z-processing. If only Z-processing is performed using the new technology, the performance is up to 192 samples per clock (sample per pixel) or 48 pixels per clock (4x multisampled anti-aliasing).

GeForce 8800's ROP subsystem supports multisampled, supersampled and transparent adaptive antialiasing, while new antialiasing modes - 8x, 8xQ, 16x and 16xQ provide the best quality for modern single-chip GPUs.

The ROP subsystem also supports frame buffer blending with FP16 and FP32 rendering. Up to eight textures (MRT, Multiple Render Targets) can be rendered simultaneously, each MRT can be rendered in different color formats, and this feature is also supported in DX10.

The key point is the memory controller. The GeForce 8800 GTX chips have six memory controller subsections, each with a 64-bit interface, for a total of 384-bit combined memory interface bus width. Thus, 768 MB of high-speed memory is obtained, while DDR1, DDR2, DDR3, GDDR3 and GDDR4 memory is supported. GeForce 8800 GTX video cards are equipped with GDDR3 memory with a default clock speed of 900 MHz (1800 MHz DDR), which at 384-bit interface width gives a throughput of up to 86.4 Gb / s.

In this article, we will not dwell on the DirectX 10 Shader Model 4 specifications, nor on its differences from DirectX 9 Shader Model 3, one of the separate materials on our site, and most likely more than one, will be devoted to this. Today it should be noted that the unified shader architecture of the GeForce 8800 is fully compatible with DirectX 10, plus it provides excellent results in modern DirectX 9 and OpenGL games. Let's briefly talk about the key DirectX 10 technologies implemented in the GeForce 8800 today.

322311

 Specifications NVIDIA GeForce 8800 GTX

Name GeForce 8800 GTX
Core G80
Process technology (µm) 0.09
Transistors (million) 681
Core frequency 575
Memory frequency (DDR) 900 (1800)
Bus and memory type GDDR3 384 Bit
Bandwidth (Gb/s) 86.4
Unified shader blocks 128
Frequency of unified shader units 1350
TMU per conveyor 32 (total)
ROP 24
textures per clock 32
textures per pass 32
Shader Model 4.0
Fill Rate (Mpix/s) 13800
Fill Rate (Mtex/s) 18400
DirectX 10.0
Anti-Aliasing (Max) SS&MS - 16x
Anisotropic Filtering (Max) 16x
Memory 768
Interface PCI-E
RAMDAC 2x400

 

 Stream Output , implemented in the GeForce 8800, is an important feature of DirectX 10. This architecture allows data generated by geometry (or vertex) shaders to be sent to memory buffers and then returned and loaded into the front of the GPU pipeline for further processing.

By the way, the previously mentioned DirectX 10 geometry shaders supported by hardware in the GeForce 8800 are ultimately designed to improve the quality of animation and realism of facial expressions, simulate spilled physical processes and many other geometric operations.

32

In general, the GeForce 8800 series chips in conjunction with the DX10 API have a wide range of capabilities for processing game objects in batch mode, as well as creating massive scenes by rendering a large number of dynamic objects using a single driver call - Geometry Instancing.

An excellent example of the new DirectX 10 capabilities of the GeForce 8800 family of graphics chips is how realistic hair is now created and animated. Rendering "natural" hair in the case of DirectX 9 was assigned mainly to the CPU, interpolation and tessellation of control points of the physical model was also assigned to the CPU. With DirectX 10, the physical modeling of the hair is left to the GPU, while the interpolation and tessellation of control points is left to the geometry shader.

3223

 In fact, the new GeForce 8800GTX 3D accelerator has become not only a new era in the field of 3D graphics processing, but also a new benchmark for other components of a modern PC. The thing is that the performance of 3D graphics accelerators every year is more and more ahead of the performance of central processors, and this cannot but affect the overall performance in general. Any powerful 3D accelerator needs a powerful central processor, and in the case of the GeForce 8800GTX, the performance of the central processor had to be no less than that of the top-end processor of that day - the Intel QX6700, and it could easily not be enough to provide data for such a powerful video subsystems.

The video card GeForce 8800GTX absolutely rightfully could be called the most productive single video card in the world at that time. In most test applications, it won a convincing victory over a direct competitor from ATi and for a long time remained the top solution for enthusiasts.