f2 X icon 3 y2 steam2
 
 Search find 4120

GeForce 6800 Ultra

In April 2004, nVidia introduced the new NV40 GPU. The new NV30 graphics processor, which was supposed to help the company crush ATI Technologies, which dominated the DirectX 9-compatible solutions sector, failed to put NVIDIA in the lead. The architecture nurtured in NVIDIA laboratories, which exceeds the requirements of the DirectX 9 standard, turned out to be very cumbersome in terms of the number of transistors and very vulnerable in the form in which it first saw the light in the face of the GeForce FX 5800 Ultra. The shortcomings of the graphics processor architecture, as well as relying on high-frequency GDDR2 memory and a 128-bit video memory access bus, led to the fact that the new product from NVIDIA showed much lower results in many tasks than the RADEON 9700 PRO from ATI.
 
geforce_go_6_series_031 

NVIDIA quickly corrected their mistake by releasing an improved version of the NV30, the NV35 GPU, which proved to be much more viable. Nevertheless, the bitterness of defeat with the NV30 turned out to be great - the company stopped production of the NV30, having produced only a few tens of thousands of these processors, and then removed all mention of the NV30 from the corporate website altogether.

A year has passed. During this time, the company has released a number of new rather successful solutions, the NV35 has been replaced by the NV38. The Detonator driver line was forgotten, and its place was taken by ForceWare drivers, with which, thanks to the use of a special shader code compiler, the performance of NVIDIA video adapters in games using DirectX 9 pixel shaders was significantly improved. Nevertheless, the best choice for enthusiasts is still there remained video adapters based on graphic processors from ATI Technologies - RADEON 9800 PRO, and then RADEON 9800 XT.

Of course, all this time, the team of NVIDIA development engineers did not sit idly by - the company was trying to take revenge on its competitor. Rumors about the monster GPU NV40 began to appear long before it was first embodied in silicon. We talked about sixteen rendering pipelines, support for pixel and vertex shaders version 3.0, and other innovations. The circulating rumors were very diverse, up to "reliable information" that the new product will use GDDR3 memory operating at a frequency of 1600 MHz, while the core frequency will be up to 600 MHz

This time the company acted more cautiously and thoroughly, trying its best to prevent a repeat of the situation with the NV30. Curiously, the decision was made to completely abandon the letters "FX" in the names of future video adapters based on NV40 - they were renamed GeForce 6800 Ultra and GeForce 6800.

nv_031

When developing the next generation graphics processor, NVIDIA could not help but take into account previous experience - the situation when the performance of graphics processors from a competitor turns out to be higher in almost all price categories, despite the formal technological superiority of NVIDIA, did not suit the company in any way. Therefore, in addition to further expanding the functionality and introducing support for model 3.0 shaders, the closest attention during the creation of the NV40 was paid to improving performance and, especially, strengthening the initially weak points of the CineFX architecture.

NVIDIA GeForce 6800/6800 Ultra Pixel Pipelines As applied to the NV40 and the CineFX 3.0 architecture, we couldn't talk about pixel pipelines in the traditional sense. Starting with the NV30 and the CineFX architecture, NVIDIA's GPUs instead of several "independent" pixel pipelines had one "wide" pixel pipeline, in which several pixels are being processed at the same time. NV40 inherits the NV3x architecture, but at the same time, of course, it had significant improvements and additions.

The NV40 pixel pipeline has been "expanded" by a factor of 4 compared to the NV35 pipeline - now 16 pixels can be processed simultaneously, and the maximum output speed is 16 pixels per clock. When overlaying more than one texture, the output speed of pixels decreases - for example, when overlaying two textures, processing the same 16 pixels will take two cycles, that is, the output speed will decrease to 8 pixels per cycle. When working with a Z-buffer or pattern buffer, the NV40's pixel pipeline, like the NV35, "accelerates". The overall fourfold increase in performance also affected the work in such conditions: now the GPU was able to output a maximum of 32 Z values ​​per clock. So, the NV40 had plenty of "brute force": we can safely talk about a fourfold increase in the fill rate compared to the NV35.

Along with the “expansion” of the pixel pipeline, NVIDIA has also increased the processing power of the pixel processor. Firstly, the number of available temporary registers - the weakest point of the pixel processors of the GeForce FX series chips, associated with the structural features of the NV3x and NV40 pixel pipeline - has been increased, and now complex pixel shaders, when calculating with full 32-bit precision, should not set the pixel processor on its knees.
Secondly, apparently, the number of full-fledged ALUs (arithmetic logic units) that perform operations on pixel components has been doubled in NV40. More precisely, two types of NV35 FP ALUs, “full-fledged” and “simplified”, which replaced the NV30 integer ALUs, turned into “full-fledged” ALUs that perform operations of any complexity at the same speed. Here's how NVIDIA showed the advantage of CineFX 3.0 with twice the number of ALUs over "traditional" architectures:

419151

So, the NV40 ALUs could perform up to 8 operations on the components of one pixel in one cycle, and if we take into account that the NV40 pipeline processed 16 pixels simultaneously, then in total it turns out that the NV40 had 32 full-fledged floating-point ALUs, and they can perform up to 128 operations on pixel components per cycle.

Pixel Shaders 3.0
Support for Pixel Shaders Model 3.0 introduced in the NV40 meant, first of all, support for dynamic loops and branches in pixel shaders. Now the decision about which branch of the shader will be executed for this or that pixel was made right during the execution of the shader - the variables whose values ​​determine the progress of the shader execution can change without being predetermined constants, as is the case with static branches and loops.
Obviously, when shaders 2.0 were executed, the new functionality of NV40 didn't manifest itself in any way, only the speed characteristics of the pixel processor were important here.

NVIDIA HPDR - images become more realistic
Previous generation GPUs from NVIDIA did not have support for outputting information from the pixel shader to multiple buffers simultaneously (Multiple Render Targets) and rendering to a floating-point buffer (FP Render Target). The family of chips from ATI initially supported these functions, favorably differing from NVIDIA graphics processors.

NV40 finally has full support for both Multiple Render Targets and FP Render Target, which allowed the company's marketers to introduce a new term: NVIDIA HPDR. Behind this abbreviation, which stands for High-Precision Dynamic-Range, lies the possibility of building scenes with a high dynamic range of illumination (HDRI, High Dynamic Range Images).

NVIDIA used the 16-bit OpenEXR format developed by Industrial Light and Magic (ILM). In the 16-bit OpenEXR description, one bit was assigned to the sign of the exponent, five bits to the value of the exponent, and ten bits to represent the mantissas of the chromatic color coordinates (u, v), five bits per coordinate. The dynamic range of the presentation is nine orders of magnitude, from 6.14*10^-5 to 6.41*10^4.

Serios Sam 2 HDR

4838627771 


The process of building and displaying an HDR image using the NV40 GPU was divided into 3 phases:
Light Transport - calculating a scene with a high dynamic range of illumination and storing information about the light characteristics for each pixel in a buffer using the floating point data format - OpenEXR.
NVIDIA emphasized that the NV40 supported working with data presented in floating point format at all stages of building an HDR scene, which guaranteed minimal loss of accuracy: - calculations
in floating point shaders,
- floating point texture filtering,
- operations with buffers , which use floating point data representation.
Tone Mapping - Converts a high dynamic range image to an LDRI RGBA or sRGB format.
Color and Gamma Correction - converting an image into the color space of a display device - CRT or LCD monitor, etc.

So, with the advent of NV40 and HPDR technology, high dynamic range images, one step closer to the arrival of photorealistic graphics in game worlds, became available not only to owners of video cards from ATI, but also to fans of NVIDIA.

Vertex Pipelines, Vertex Shaders 3.0
Having strengthened the NV40 pixel processor, NVIDIA did not forget about the "geometric strength" of the new GeForces. The new graphics chips had twice as many vertex pipelines - six versus three for the NVIDIA GeForce FX 5950 Ultra. New games had more and more complex models, the number of polygons in the scenes grew, so the double peak performance of the new GPUs from NVIDIA was not unclaimed.
Along with the increase in performance, the functionality of vertex processors in NV40 also increased - in the new graphics processor, NVIDIA announced full support for model 3.0 vertex shaders. As with pixel shaders, the length of vertex shaders was now virtually unlimited (actually limited by DirectX Model 3.0 shader specifications), while shaders can have truly dynamic branches and loops - deciding what code will be executed for a particular or another vertex is taken right during the execution of the shader, and not at the compilation stage.

Vertex Frequency Stream DividerAnother interesting feature that NV40 vertex processors were endowed with. Using this “frequency divider”, NV40 vertex processors could read data from streams and update the input parameters of the vertex shader not for each processed vertex, as before, but less often, with a frequency that can be changed.

NVIDIA gives an example of using this feature: by reading data from a stream at a certain frequency, which determines, for example, animation, you can, based on the same data set that determines the geometry of a model, for example, a soldier, create an entire army of soldiers that will not be exactly the same "clones" - each will be different from the others, having a unique appearance and unique animation.

UltraShadow II
First announced in the NV35 graphics processor, UltraShadow technology, having received the index "2" in the name, was also transferred to the NV40. The essence of the technology has not changed: when calculating dynamic shadows using the template buffer, it was possible to specify the boundary values ​​Z (depth bounds), beyond which shadows from light sources will not be calculated. Thus, it was possible to save on calculations and improve performance in scenes using real-time shadow calculation.
NVIDIA illustrated the action of UltraShadow II with a diagram: it shows the boundary values ​​(zmin and zmax), beyond which the pattern buffer is not calculated:

419161

The ability to set boundary conditions for calculating shadows, coupled with the well-known ability of NV40 to "accelerate" when calculating the template buffer and Z-buffer, that is, output not 16, but 32 values ​​per clock, made it possible to count on the appearance of a serious advantage of NV40 over competitors in games, widely used calculation of dynamic shadows using the template buffer. 
An example of a game that uses dynamic shadow calculation, or, moreover, whose gameplay is literally “built” on shadows, is Doom3.

Programmable video processor
The home computer has long been positioned as a universal platform for entertainment, which means that there was little point for developers to separate graphics processors intended for use in personal computers according to classes of typical tasks - the entire line of chips is endowed with the fullest possible functional set.
The NV40 had a programmable video processor designed to encode / decode video streams and perform various operations on them. Previously, S3 announced this with its DeltaChrome VPU. Besides, ATI RADEON 9500/9600/9700/9800 video adapters could also decode video using the power of pixel processors. The NV40 video processor, NVIDIA VP, had the following features:
Adaptive deinterlacing support
High-quality scaling and filtering
Block artefact removal
Built-in TV encoder
Color space conversion
Frame rate conversion
Gamma correction
Noise
reduction HDTV support (720p, 1080i, 480p, CGMS modes)
Audio and video hardware synchronization
Support for MPEG-1/2/4 encoding and decoding
Support WMV9/H.264 decoding

Full-screen anti-aliasing: a new level of quality
One of the significant differences between ATI's R3x0 family graphics processors and NVIDIA's GeForce FX series chips was a different approach to implementing full-screen anti-aliasing. NVIDIA's GeForce FX supported multisampling, supersampling, and combinations thereof using the traditional subpixel arrangement on an ordered orthogonal grid (Ordered Grid), while ATI's GPUs use multisampling with subpixel arrangement on a rotated grid.

419181

The full-screen anti-aliasing method, which uses the arrangement of subpixels on a rotated grid, at the same cost, provided a much higher quality of smoothing the edges of polygons compared to the method using the traditional arrangement of subpixels.

After the failure of the NV30, nVidia made every effort to regain lost ground. And the company managed to do this with the release of the NV40 video card, also known as the GeForce 6800. The card was very efficient and much more productive than the FX 5900, also due to the considerable number of transistors (222 million). The NV45, which was also called the GeForce 6800, was nothing more than an NV40 with an AGP-to-PCI Express bridge, which allowed the card to support the new interface standard and, in addition, SLI. SLI technology allowed two PCI Express GeForce 6 graphics cards to be combined for increased performance. The chip contained 222 million transistors and was made with 0.13 micron process technology. The power requirements for that time were simply colossal - a power supply unit with a power of 480W or higher was needed,

Specifications NVIDIA GeForce 6800 Ultra

 

Name GeForce 6800 Ultra
Core NV40/NV45
Process technology (µm) 0.13
Transistors (million) 222
Core frequency 400
Memory frequency (DDR) 550 (1100)
Bus and memory type GDDR3 256 Bit
Bandwidth (Gb/s) 35.2
Pixel pipelines 16
TMU per conveyor 1
textures per clock 16
textures per pass 16
Vertex conveyors 6
Pixel Shaders 3.0
Vertex Shaders 3.0
Fill Rate (Mpix/s) 6400
Fill Rate (Mtex/s) 6400
DirectX 9.0c
Anti-Aliasing (Max) SS&MS - 8x
Anisotropic Filtering (Max) 16x
Memory 256
Interface AGP/PCI-E
RAMDAC 2x400

 

With the release, nVidia has finally managed to get out of the painful swamp of the NV30 line and release a truly advanced graphics accelerator that will make the heart of any NVIDIA gamer beat faster - NV40 is a chip that, by tradition, became the ancestor of a new family of graphics chips from NVIDIA . Having established strong relationships with game and software developers in the process of adapting the NV30 architecture, the company developed the next graphics processor, NV40, no longer "at random", but with an eye to the requirements and wishes of the developers. And its representative GeForce 6800 Ultra is the fastest gaming processor of that day the accelerator was able to literally blow up the games of that time and the games released in a later period.

Far Cry

water_fix1