Tips/tricks related to Computer Graphics, GPUs and other programming

Deferred Shading is a rendering technique (in comparison to regular forward rendering) which has become popular over the last few years. Games like the upcoming Battlefield 3, Crysis 2 and numerous others use it, so I wanted to give it a try and implement a simple deferred renderer myself in DirectX 10.

GitHub Repository and Visual Studio 2010 files:
GitHub Repository here

Images for all the render targets/textures (Diffuse, Normals, Depth, Ambient Occlusion, Composite with Ambient Occlusion, Composite without AO):

Diffuse Texture

View-space Normals

Depth Texture

Ambient Occlusion

Composite with Ambient Occlusion

Composite without AO

What is Deferred Shading?

If you are reading this chances are you already know what deferred shading is and how it works, so I won’t go into too much detail. Wikipedia has a good overview and there are other places which can describe it better.

Basically deferred shading involves rendering your view-space normals, your depth-buffer, specular/diffuse, ambient occlusion, etc. values into different textures and then using them at a later pass to create a final composite image (by rendering a full-screen quad). There are several advantages and some disadvantages (increased memory bandwidth due to multiple render targets, inability to handle transparency, anti-aliasing, etc.)


I started off by modifying an existing DirectX10 project. “Tutorial10” from Microsoft’s DirectX SDK Samples was used as a starting point, which uses the DXUT Library to specify setup and input (think GLUT for OpenGL). So my scene itself is simple (just one character mesh) as I mainly wanted to focus on the rendering part (I might add a more complex scene later). The steps which I’m going to cover are:

  1. Setup the Multiple Render Targets and full-screen quad.
  2. Call simple pass (which uses the geometry shader) to render view-space normals and depth into a texture.
  3. Call pass to generate the AO texture from the previously generated normals and depth.
  4. Gaussian Blur passes to smooth out the noise from the AO texture.
  5. View Texture (either the composite lit one or one of the others)

1. Setting up Render Targets:

I’ll just show how to setup the multiple render targets for the first pass, and then you can figure out the setup for the AO textures. First we declare all the variables we need:

// The Multiple Render Targets
ID3D10Texture2D*                    _mrtTex;// Environment map
ID3D10RenderTargetView*             _mrtRTV;// Render target view for the mrts
ID3D10ShaderResourceView*           _mrtSRV;// Shader resource view for the mrts
ID3D10EffectShaderResourceVariable* _mrtTextureVariable = NULL; // for sending in the mrts
ID3D10Texture2D*                    _mrtMapDepth;// Depth
ID3D10DepthStencilView*             _mrtDSV;// Depth stencil view

These are the minimum variables you need. You need the textures (ID3D10Texture2D), the RenderTargetViews, the ShaderResourceViews and the EffectShaderResourceVariable (to send in the texture to the shader). You also need a DepthStencilView for the depth.

The setup is done in the SetupMRTs(..) function:

// Sets up the Multiple Render Targets
HRESULT SetupMRTs(ID3D10Device* pd3dDevice) {
    HRESULT hr;

    // Create depth stencil texture.
    D3D10_TEXTURE2D_DESC dstex;
    ZeroMemory( &dstex, sizeof(dstex) );
    dstex.Width = _width * TEXSCALE;
    dstex.Height = _height * TEXSCALE;
    dstex.MipLevels = 1;
    dstex.ArraySize = NUMRTS;
    dstex.SampleDesc.Count = 1;
    dstex.SampleDesc.Quality = 0;
    dstex.Format = DXGI_FORMAT_D32_FLOAT;
    dstex.Usage = D3D10_USAGE_DEFAULT;
    dstex.BindFlags =  D3D10_BIND_DEPTH_STENCIL;
    dstex.CPUAccessFlags = 0;

    _mrtMapDepth = NULL;

    V_RETURN(  pd3dDevice->CreateTexture2D( &dstex, NULL, &_mrtMapDepth ));

    // Create the depth stencil view for the mrts
    DescDS.Format = dstex.Format;
    DescDS.Texture2DArray.FirstArraySlice = 0;
    DescDS.Texture2DArray.ArraySize = NUMRTS;
    DescDS.Texture2DArray.MipSlice = 0;

    _mrtDSV = NULL;
    V_RETURN (  pd3dDevice->CreateDepthStencilView( _mrtMapDepth, &DescDS, &_mrtDSV ) );

    // Create all the multiple render target textures
    dstex.Format = DXGI_FORMAT_R16G16B16A16_UNORM;
    _mrtTex = NULL;
    V_RETURN ( pd3dDevice->CreateTexture2D( &dstex, NULL, &_mrtTex ) );

    // Create the 3 render target view
    DescRT.Format = dstex.Format;
    DescRT.Texture2DArray.FirstArraySlice = 0;
    DescRT.Texture2DArray.ArraySize = NUMRTS;
    DescRT.Texture2DArray.MipSlice = 0;
    V_RETURN ( pd3dDevice->CreateRenderTargetView( _mrtTex, &DescRT, &_mrtRTV ) );

    // Create the shader resource view for the cubic env map
    ZeroMemory( &SRVDesc, sizeof( SRVDesc ) );
    SRVDesc.Format = dstex.Format;
    SRVDesc.Texture2DArray.ArraySize = NUMRTS;
    SRVDesc.Texture2DArray.FirstArraySlice = 0;
    SRVDesc.Texture2DArray.MipLevels = 1;
    SRVDesc.Texture2DArray.MostDetailedMip = 0;
    _mrtSRV = NULL;
    V_RETURN ( pd3dDevice->CreateShaderResourceView( _mrtTex, &SRVDesc, &_mrtSRV ) );

    return S_OK;

I know the above might look overwhelming at first, but it’s the standard way of generating a render target in DX10. The first thing we do is generate the Depth texture and the Depth Stencil View, making sure we specify the ArraySize (the # of render targets). Also make sure that the ViewDimension is specified as D3D10_RTV_DIMENSION_TEXTURE2DARRAY, since we have an array of Textures (in HLSL the variable will be declared as Texture2DArray). If we are using just one render target we would use D3D10_RTV…_TEXTURE2D instead (for the Ambient Occlusion Textures).

Next we generate the Textures themselves, the RenderTargetView and the ShaderResourceView. Make sure the BindFlags are D3D10_BIND_RENDER_TARGET | D3D10_BIND_SHADER_RESOURCE, indicating that the textures will be used as both Render Targets and Shader Resources.

2. Rendering View-Space Normals and Depth:

In the main draw function (OND3D10FrameRender), before we render to texture we first make sure that we save the old viewport and render target:

// Save the old RT and DS buffer views
ID3D10RenderTargetView* apOldRTVs[1] = { NULL };
ID3D10DepthStencilView* pOldDS = NULL;
pd3dDevice->OMGetRenderTargets( 1, apOldRTVs, &pOldDS );

// Save the old viewport
UINT cRT = 1;
pd3dDevice->RSGetViewports( &cRT, &OldVP );

/** Start rendering to all the textures **/

Next we are going to look at the RenderTextures(..) function:

// Renders all the textures:
// -Diffuse
// -Normals
// -Position
// -Depth
void RenderTextures( ID3D10Device* pd3dDevice) {
  // Set a new viewport for rendering to texture(s)
  SMVP.Height = _height * TEXSCALE;
  SMVP.Width = _width * TEXSCALE;
  SMVP.MinDepth = 0;
  SMVP.MaxDepth = 1;
  SMVP.TopLeftX = 0;
  SMVP.TopLeftY = 0;
  pd3dDevice->RSSetViewports( 1, &SMVP );

  float ClearColor[4] = { 0.0f, 0.125f, 0.3f, 1.0f };//{ 0.0f, 0.0f, 0.0f, 1.0f };

  // Clear Textures
  pd3dDevice->ClearRenderTargetView( _mrtRTV, ClearColor );
  pd3dDevice->ClearDepthStencilView( _mrtDSV, D3D10_CLEAR_DEPTH, 1.0, 0 );

  // set input layout
  pd3dDevice->IASetInputLayout( g_pVertexLayout );

  // Set all the render targets
  ID3D10RenderTargetView* aRTViews[ 1 ] = { _mrtRTV };
  UINT numRenderTargets = sizeof( aRTViews ) / sizeof( aRTViews[0] );
  pd3dDevice->OMSetRenderTargets( numRenderTargets, aRTViews, _mrtDSV );

  // Render the objects

  //_aoTextureVariable->SetResource(  _aoSRV );

  // Get the technique
  D3D10_TECHNIQUE_DESC techDesc;
  g_pTechnique->GetDesc( &techDesc );

  // send the camera variables
  g_pProjectionVariable->SetMatrix( ( float* )g_Camera.GetProjMatrix() );
  g_pViewVariable->SetMatrix( ( float* )g_Camera.GetViewMatrix() );
  g_pWorldVariable->SetMatrix( ( float* )&g_World );

  /** Render the Mesh ***/
  UINT Strides[1];
  UINT Offsets[1];
  ID3D10Buffer* pVB[1];
  pVB[0] = g_Mesh.GetVB10( 0, 0 );
  Strides[0] = ( UINT )g_Mesh.GetVertexStride( 0, 0 );
  Offsets[0] = 0;
  pd3dDevice->IASetVertexBuffers( 0, 1, pVB, Strides, Offsets );
  pd3dDevice->IASetIndexBuffer( g_Mesh.GetIB10( 0 ), g_Mesh.GetIBFormat10( 0 ), 0 );

  //D3D10_TECHNIQUE_DESC techDesc;
  g_pTechnique->GetDesc( &techDesc );
  ID3D10ShaderResourceView* pDiffuseRV = NULL;

  for( UINT subset = 0; subset < g_Mesh.GetNumSubsets( 0 ); ++subset )   {     pSubset = g_Mesh.GetSubset( 0, subset );     PrimType = g_Mesh.GetPrimitiveType10( ( SDKMESH_PRIMITIVE_TYPE )pSubset->PrimitiveType );
    pd3dDevice->IASetPrimitiveTopology( PrimType );

    pDiffuseRV = g_Mesh.GetMaterial( pSubset->MaterialID )->pDiffuseRV10;
    g_ptxDiffuseVariable->SetResource( pDiffuseRV );

    g_pTechnique->GetPassByIndex( 2 )->Apply( 0 );
    pd3dDevice->DrawIndexed( ( UINT )pSubset->IndexCount, 0, ( UINT )pSubset->VertexStart );
} // End Render Textures

Next we are going to look at the HLSL shader code for rendering to the multiple targets:

NOTE: Before you start coding in HLSL (in Visual Studio) make sure you setup the debugger for it. It saved me a lot of time and headache. Thanks to Bobby Anguelov for the instructions.

/******* Multiple Render Target Functions***************/

// Geometry Shader input - vertex shader output
struct GS_IN
    float4 Pos	: POSITION;  // WorldViewProj position
    float4 PosWV: TEXCOORD1; // World View Position
    float3 Norm : NORMAL;    // Normal
    float2 Tex	: TEXCOORD0; // Texture coord

// Pixel Shader in - Geometry Shader out
    float4 Pos  : SV_POSITION; 
    float4 PosWV: TEXCOORD1;    // World View Position
    float3 Norm: NORMAL;        //normal
    float2 Tex : TEXCOORD0;
    uint RTIndex : SV_RenderTargetArrayIndex; // which render target to write to (0-2)

// Simple Vertex Shader for MRT
    GS_IN output = (GS_IN)0;
    input.Pos += input.Norm*Puffiness;
    output.PosWV = mul( float4(input.Pos,1), World );
    output.PosWV = mul( output.PosWV, View );
    output.Pos = mul( output.PosWV, Projection );
    output.Norm = mul( input.Norm, World );
    output.Norm = mul( output.Norm, View );
    output.Norm = mul( output.Norm, Projection );
    output.Tex = input.Tex;
    return output;

// Geometry Shader For MRT
void GSMRT( triangle GS_IN input[3], inout TriangleStream<PS_MRT_INPUT> CubeMapStream )

    // 0 = diffuse color
    // 1 = normals
    // 3 = depth

    for( int f = 0; f < 4; ++f )
        // Compute screen coordinates
        PS_MRT_INPUT output;
        output.RTIndex = f;
        for( int v = 0; v < 3; v++ )
            output.Pos =  input[v].Pos;	// position
            output.PosWV =  input[v].PosWV;	// position
	    output.Norm = input[v].Norm;	// normal
            output.Tex = input[v].Tex;
            CubeMapStream.Append( output );

} // end of geometry shader

float4 PSMRT( PS_MRT_INPUT input ) : SV_Target
	if (input.RTIndex == 0)	{	 // diffuse
	   return g_txDiffuse.Sample( samLinear, input.Tex  );
	else if (input.RTIndex == 1) { // normal
	   // convert normal to texture space [-1;+1] -> [0;1]
	   float4 normal; = input.Norm * 0.5 + 0.5;
	   normal.w = 1.0;
	   return normal;
	else if (input.RTIndex == 2) // position - not actually used
	   return input.PosWV;
	else {			     // depth
          float normalizedDistance = input.Pos.z / input.Pos.w;
          normalizedDistance = 1.0f - normalizedDistance;
	  return float4(normalizedDistance, normalizedDistance, normalizedDistance, normalizedDistance);
    //return input.Pos;

I am just going to focus on the geometry shader (GSMRT()) and pixel shader (PSMRT()). The code above renders 4 values: The Diffuse colour (from texture), the view-space normals, the view-space position and the depth. I actually don’t use the view-space position since I reconstruct the position from depth later on, but I thought I’d just leave it in there.

The geometry shader is essential for rendering to multiple targets. The “RTIndex” variable is used for telling the pixel shader to render to which target. Otherwise we just pass through the information (normals, UV-values, position, etc.)

In the pixel shader we read the RTIndex value and then write out the values we want to. Hopefully the code isn’t too complex and is comprehensible. Now you can go ahead and create the final render (composite texture), or apply other effects. I thought I might give Screen-Space Ambient Occlusion a try.

3. Ambient Occlusion & Gaussian Blur:

Ambient Occlusion is a popular technique for adding realism to scenes. It takes into account “attenuation of light due to occlusion”. Screen-Space Ambient Occlusion has been the most-popular and cost-effective way of implementing AO. The videogame Crysis was the first to implement it, and there have been multiple variants since. I implemented the SSAO algorithm
featured on I won’t go over the shader code here since GameDev has a pretty good explanation.

To render Ambient Occlusion the function “RenderAmbientOcclusion()” is called. The first step is to change the render target, and then we render a full-screen quad. The first pass will generate the AO texture, but it has a lot of noise. To “smooth out” the noise we apply gaussian blur. The blur shaders were taken from here (it’s basically sampling from around the current pixel and averaging to smooth out – do it once for horizontal and then once for vertical).

SSAO requires a lot of computation – my framerate goes from 60 to ~11 when I turn on AO (though it’s roughly ~23 without gaussian blur). There shouldn’t be such a huge drop in the framerate; the texture lookup for the random vector is the bottleneck, and I can probably optimize it further.

Finally we can go on to create the final render.

4. Viewing Composite Texture:

To view the composite rendering we once again render a full-screen quad and pass in all the textures (diffuse, depth, normals, ambient occlusion). NOTE: The light variables are hard-coded into the shader (ideally you’d want to send them in).

The final shader is simple enough:

// Pixel Shader for rendering full-screen quad
// 0 = diffuse 
// 1 = normal (put specular in .w?)
// 2 = position
// 3 = depth
// 4 = ambient occlusion
// 5 = composite
float4 PSQuad( PS_INPUT input) : SV_Target 
   // get all the values

   // Diffuse
   float4 diffuse = _mrtTextures.Sample( samPoint, float3(input.Tex, 0) );
   if (TexToRender == 0)
   	return diffuse;

    // normals 
    float4 normals = _mrtTextures.Sample( samPoint, float3(input.Tex, 1) );
    normals =  (normals - 0.5) * 2.0;
    if (TexToRender == 1)
 	return normals;

    // depth
    float4 depth	= _mrtTextures.Sample( samPoint, float3(input.Tex, 3) );
    // discard
    if (depth.x == 0.0)
	return float4( 0.0f, 0.125f, 0.3f, 1.0f );
    if (TexToRender == 3)
	return depth;

    // position
    // reconstructed from depth value
    depth = 1.0 - depth;
    float4 H = float4(input.Tex.x * 2.0 - 1.0, 
                     (1.0 - input.Tex.y) * 2.0 - 1.0,  
    float4 D = mul(H, ProjectionInverse);
    float4 position = D / D.w;
    if (TexToRender == 2)
        return position;
    // ambient occlusion
    float4 ao;
    if (UseAO == true) {
    	ao  = _aoTexture.Sample( samLinear, input.Tex );
    	if (TexToRender == 5)
	    return ao;

    // else calculate the light value - phong shading
    float3 lightDir = vLightPos -;
    float3 eyeVec   =;
    float3 N        = normalize(normals);
    float3 E        = normalize(eyeVec);
    float3 L        = normalize(lightDir);
    float3 reflectV = reflect(-L,;

    // diffuse
    float4 dTerm = diffuse * max(dot(N, L), 0.0);

    // specular
    float4 specular = matSpecular * pow(max(dot(reflectV, E), 0.0), matShininess);

    float4 outputColor = dTerm;// + specular;
    if (UseAO == true) {
    	outputColor = outputColor * ao;

    return outputColor;

First I read in all the different textures. And depending on which texture the user wants to view the value is returned. If the composite texture is to be viewed phong shading is performed. Though I calculate the specular value I don’t use it in my final rendering since it gives an overly plastic look. And depending on whether ambient occlusion is turned or not further the color is appropriately darkened.


I went over how to setup multiple render targets and write to them, rendering a full-screen quad and ambient occlusion, and then creating a composite texture. If there is something in my code you don’t understand or have suggestions to improve please feel free to ask/let me know.

I will probably try to create a better scene and come up with better images. I might also try implementing shadow mapping since it shouldn’t be that difficult to include it in a deferred renderer.

NOTE: When you exit my DirectX program it gives errors – I realize I don’t release all the resources but it should be ok.

GitHub Repository and Visual Studio 2010 files:
GitHub Repository here


  1. Debugging HLSL
  2. A Simple and Practical Approach to SSAO
  3. Gaussian Blur Filter Shader

Comments on: "Deferred Shading in DirectX 10 (with Ambient Occlusion)" (4)

  1. […] Deferred Shading in Direct3D 10 with ambient occlusion. Visual Studio 2010 files available. […]

  2. Marcos Eike said:

    Cool! The tutorial is easier!

  3. What is commonly understood as Multiple Render Target functionality is not what you are using. MRT in DX 10-11 is achieved by sending multiple rendertarget views to OMSetRenderTargets() method and using SV_TargetX to declare multiple outputs of Pixel Shader.

    What you are doing is setting single render target (which happens to be array od 2D textures) and using Geometry Shader to duplicate each drawn primitive. Each duplicate of given primitive is then rasterized with different branch taken in PS, resulting in filling different slices of 2D texture array.

    This is really unnecessary, since you do additional job in GS and rasterize 4 primitives instead of one. Maybe this specific process is somehow optimized in hardware and is not as slow as I think it is (compared to using traditional MRT), but still – it is not the way it’s meant to be used.

    On the side note, the technique you’re using is by some called “Layered rendering” – this term unfortunately relates to more things, but here is a link that uses it as I think of it: (third example from the top, OpenGL).
    It may be used to rather cool stuff like generating environment cubemap in single pass (

    • Thanks for the comment – I think I actually tried doing it the way you mentioned and I couldn’t get it to work, so I adapted the “CubeMapGS” example which used it this way. Your comment might explain the performance drop due to the GS rasterizing 4 primitives.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: