TGFC Lifestyle - Powered by Discuz! Board

标题: [新闻] KILLZONE2确实证明了cell的强大 [打印本页]

作者: west2046 时间: 2009-2-10 22:45 标题: KILLZONE2确实证明了cell的强大

转http://bbs.levelup.cn/showtopic-756780.aspx

KILLZONE2确实证明了cell的强大
曾经看过gg技术总监的视屏介绍，KZ2一个场景有两百个光源，并且技术总监亲自在开发机上演示了。要实现两百个光源应该是用了deferred rendering（[延迟渲染)还令人惊奇的是居然deferred rendering同时实现MSAA，这个是DX10.1显卡才能实现。估计KZ2是第一个deferred rendering MSAA的游戏。我们知道RSX不能实现DX10的特效，无疑是CELL实现的，现在的技术还没有让GPU可以模拟特效的能力，都只是对其固有的特效进行编程控制。对新增加的特效，新的DIRECTX版本加入新特效就必须换新的支持这个DIRECTX版本的显卡。所以ps3设计思路是高浮点运算的CPU来模拟特效。所以KZ2对ps3意义不只是一个大作，它是一技术象征的游戏，来证明ps3自己的实力。KZ2 各种特效光影、物理、粒子主要靠cell实现，据说只用6个spu，不过却能在同一个场景里实现这么多特效，各种人物、物理、光影、爆炸、粒子尘埃同时能达到720p 30fps。
http://forum.beyond3d.com/showpost.php?p=1018853&postcount=1
链接是GG的技术访谈，里面提到了KZ2 deferred rendering MSAA。

[ 本帖最后由 west2046 于 2009-2-10 22:47 编辑 ]

作者: czzj12345 时间: 2009-2-10 22:50

原来是LU的

作者: 倍舒爽 时间: 2009-2-10 22:52

目测不只2aa....
锐度也很高。。

可惜sony说什么该引擎不会卖。。。你说sb不

[ 本帖最后由倍舒爽于 2009-2-10 22:55 编辑 ]

作者: 无敌JJ 时间: 2009-2-10 22:55

据说KZ2其实是没30FPS的.:D

作者: west2046 时间: 2009-2-10 22:57

引用:

原帖由 倍舒爽 于 2009-2-10 22:52 发表
目测不只2aa....
锐度也很高。。

可惜sony说什么该引擎不会卖。。。你说sb不

这引擎的效率估计就和GT的引擎一样，其他第三方玩不起！

作者: 先手必胜 时间: 2009-2-10 23:05

引用:

原帖由 无敌JJ 于 2009-2-10 22:55 发表
据说KZ2其实是没30FPS的.:D

是包机房的BOSS说的吗？还没包上呢？正式版都快出了，据说偷跑了！！:D

作者: 真理捍卫者 时间: 2009-2-10 23:07

引用:

原帖由 无敌JJ 于 2009-2-10 22:55 发表
据说KZ2其实是没30FPS的.:D

索黑只会用别人说的话来喷的吗?

作者: 无敌JJ 时间: 2009-2-10 23:09

引用:

原帖由 真理捍卫者 于 2009-2-10 23:07 发表

索黑只会用别人说的话来喷的吗?

人家拿生5和KZ2试玩对比了下,发觉KZ2的帧数明显没生5足,

所以自行想象吧.:D

我只是阐述一下别人的事实.

作者: 先手必胜 时间: 2009-2-10 23:27

引用:

原帖由 无敌JJ 于 2009-2-10 23:09 发表

人家拿生5和KZ2试玩对比了下,发觉KZ2的帧数明显没生5足,

所以自行想象吧.:D

我只是阐述一下别人的事实.

又是人家说？不要老是听信人家说，自己去试下才有说服力！！
别人说的不见得都是事实吧？:D

作者: tobewind 时间: 2009-2-10 23:36

玩家并不能从画面上看出用了哪些xx种特效，xx个光源，画面如果赏心悦目，实在懒得比来比去。话说比来比去的真正在“玩”游戏的又有多少呢。。。你玩游戏时刻暗爽于画面上有200个强大光源吗……

作者: 19xx 时间: 2009-2-10 23:39

楼主不是自杀了吗,

怎么诈尸了?

作者: 里昂2236 时间: 2009-2-10 23:55

引用:

原帖由 无敌JJ 于 2009-2-10 23:09 发表

人家拿生5和KZ2试玩对比了下,发觉KZ2的帧数明显没生5足,

所以自行想象吧.:D

我只是阐述一下别人的事实.

KZ2帧数确实没有BH5高~~因为实在是不好意思——BH5是60帧的游戏

作者: 倍舒爽 时间: 2009-2-11 00:20

引用:

原帖由 里昂2236 于 2009-2-10 23:55 发表

KZ2帧数确实没有BH5高~~因为实在是不好意思——BH5是60帧的游戏

bh5是60fps的？

你玩到pc版了？？

kz2的帧率无论如何都比bio5稳定好不，当然单指p3版。。

作者: WhinStone 时间: 2009-2-11 02:51

现在翻07年的老黄历干啥？

作者: 深蓝LWL1123 时间: 2009-2-11 07:41

KZ2的帧数非常稳定,不丢帧画面还不撕裂...
30FPS那是肯定有,至于什么CPU渲染DX10.1的火星技术那就算了
架构都不一样,还渲染DX10,INTEL和AMD都可以去死了

作者: 级替四 时间: 2009-2-11 08:39

200个光源，做梦呢

作者: mapledot 时间: 2009-2-11 08:41

引用:

原帖由 tobewind 于 2009-2-10 23:36 发表
玩家并不能从画面上看出用了哪些xx种特效，xx个光源，画面如果赏心悦目，实在懒得比来比去。话说比来比去的真正在“玩”游戏的又有多少呢。。。你玩游戏时刻暗爽于画面上有200个强大光源吗……

画面明显比没有光源的游戏要真实强大得多

作者: 级替四 时间: 2009-2-11 08:48

这游戏出色的画面来自制作组一丝不苟的工作，说白了就是钱砸到位了。

作者: mapledot 时间: 2009-2-11 08:51

希望MS也砸两个，造福玩家

作者: 陈惯吸 时间: 2009-2-11 08:55

200个光源，有意思

作者: liuyicheng 时间: 2009-2-11 08:55

200个光源

DX10.1

作者: huihuihui 时间: 2009-2-11 10:44

还是做主机游戏好啊，不用考虑兼容性

作者: arthurking 时间: 2009-2-11 10:50

天上为什么有牛在飞？因为有索饭在地上吹！

话说过来，玩了KZ2 demo，觉得那画面渣得很啊。感情DX10.1和200个光源就那效果？

作者: 千山暮雪 时间: 2009-2-11 10:56

cell本来就很强大，可惜娶了脑残的rsx

作者: carnon 时间: 2009-2-11 11:07

http://www.youtube.com/watch?v=VEQlDZh3JQs&feature=related

来源应该是这视频，从4：40开始看

[ 本帖最后由 carnon 于 2009-2-11 11:10 编辑 ]

作者: 倍舒爽 时间: 2009-2-11 11:16

引用:

原帖由 carnon 于 2009-2-11 11:07 发表
 http://www.youtube.com/watch?v=VEQlDZh3JQs&feature=related

来源应该是这视频，从4：40开始看

oh yeah，原来有这东西。。。

之前某人贴出来的一个地址：
http://game.ali213.net/viewthrea ... p;extra=&page=1

里面提及：
延迟渲染（Deferred Rendering）
   延迟渲染技术可以说是未来游戏的发展趋势，其原理是先把全景多边形物体的信息比如位置、法线面、各种贴图渲染到G-Buffer（缓冲区）内，延迟打光步骤。

   延迟渲染可以避免在渲染过程中出现无效渲染（渲染器在运算过程中做了某些无用功）的现象和提高出现大量复杂、耗时的像素渲染时的工作效率。

   延迟渲染可以创建大量的点光源以及产生真实的光照结果，提高画面的真实度；也可以避免对不可见的点进行光照，节省了资源。不过，延迟渲染并不是很适合DX9，在目前的硬件上必须以牺牲MSAA（多重取样抗锯齿）为代价（使用了延迟渲染的《S.T.A.L.K.E.R.》、《幽灵行动：尖峰战士》都无法支持多重取样抗锯齿），而在新的DX10硬件上则没有问题。

偶今次是旗帜鲜明地要挺kz2了。。。

究竟是谁吹牛，如何个吹牛法，还是需要清晰判断地。。。。

[ 本帖最后由倍舒爽于 2009-2-11 11:19 编辑 ]

作者: ffcactus 时间: 2009-2-11 11:28

引用:

原帖由 级替四 于 2009-2-11 08:48 发表
这游戏出色的画面来自制作组一丝不苟的工作，说白了就是钱砸到位了。

你知道个吊。

作者: dogsoldier 时间: 2009-2-11 11:31

我是RF，但我没有心病

.............................................................................................................

看了demo的画面，我一点也不激动，真的

作者: 变色龙 时间: 2009-2-11 11:31

200个光源的我记得在killzone 2某PDF里面看到过，有心的人可以去找找看

作者: dogsoldier 时间: 2009-2-11 11:38

引用:

原帖由 里昂2236 于 2009-2-10 23:55 发表

KZ2帧数确实没有BH5高~~因为实在是不好意思——BH5是60帧的游戏

嘿嘿，bio5是30fps，而KZ2和360版的bio5帧数相同，也就是说，比PS3版的bio5要稳定

作者: 老江湖 时间: 2009-2-11 12:08

引用:

原帖由 dogsoldier 于 2009-2-11 11:31 发表
我是RF，但我没有心病

.............................................................................................................

看了demo的画面，我一点也不激动，真的

没办法啊，神机以前一直是渣的FPS，偶尔出现个较好的就变神了

作者: Minstrel_boy 时间: 2009-2-11 12:18

等天师出来战翻尔等

作者: wetwet 时间: 2009-2-11 12:32

这么高深的程序问题,岂是我等,尔等能争得明白的.

[ 本帖最后由 wetwet 于 2009-2-11 12:33 编辑 ]

作者: 老江湖 时间: 2009-2-11 12:36

引用:

原帖由 wetwet 于 2009-2-11 12:32 发表
这么高深的程序问题,岂是我等,尔等能争得明白的.

实际神机想说我已经可以模拟DX11了，并且可以无限的模拟下去

作者: wetwet 时间: 2009-2-11 12:38

引用:

原帖由 老江湖 于 2009-2-11 12:36 发表

实际神机想说我已经可以模拟DX11了，并且可以无限的模拟下去

这些对玩家来说都是浮云.
我们就看画面和游戏感受.
觉得好就是好,渣就是渣
这些个消息不用去鸟他.

[ 本帖最后由 wetwet 于 2009-2-11 12:40 编辑 ]

作者: edenfu 时间: 2009-2-11 16:26

kz2demo最大的观感就是粒子效果很多，而且用的很到位，气氛加分

作者: hpkiller 时间: 2009-2-11 16:32

引用:

原帖由 dogsoldier 于 2009-2-11 11:31 发表
我是RF，但我没有心病

.............................................................................................................

看了demo的画面，我一点也不激动，真的

纯引.....

作者: legendkang 时间: 2009-2-11 16:46

我们知道RSX不能实现DX10的特效，无疑是CELL实现的，现在的技术还没有让GPU可以模拟特效的能力，都只是对其固有的特效进行编程控制。对新增加的特效，新的DIRECTX版本加入新特效就必须换新的支持这个DIRECTX版本的显卡。－－－这分明是360啊！

作者: hudihutian 时间: 2009-2-11 18:05

很稀奇么？

http://research.scea.com/ps3_deferred_shading.pdf


Deferred Pixel Shading on the PLAYSTATION®3
1

Abstract— This paper studies a deferred pixel shading algorithm
implemented on a Cell/B.E.-based computer entertainment
system.
The pixel shader runs on the Synergistic Processing Elements
(SPEs) of the Cell/B.E. and works concurrently with the GPU to
render images.  The system's unified memory architecture allows
the Cell/B.E. and GPU to exchange data through shared textures.
The SPEs use the Cell/B.E. DMA list capability to gather
irregular fine-grained fragments of texture data generated by the
GPU.  They return resultant shadow textures the same way.  The
shading computation ran at up to 85 Hz at HDTV 720p
resolution on 5 SPEs and generated 30.72 gigaops of
performance.  This is comparable to the performance of the
algorithm running on a state of the art high end GPU.  These
results indicate that the Cell/B.E. can effectively enhance the
throughput of a GPU in this hybrid system by alleviating the
pixel shading bottleneck.

Index Terms—Computer Graphics, HDTV, Parallel Algorithms,
Rendering
I.  INTRODUCTION
he current trend toward multi-core microprocessor
architectures has led to performance gains that exceed
the predictions of Moore's law.  Multiple cores first
became prevalent as fragment processors in graphics
processing units (GPUs).  More recently the CPUs for
computer entertainment systems and desktop systems have
embraced this trend.  In particular the Cell/B.E. processor
developed jointly by IBM, Sony and Toshiba contains up to
nine processor cores with a high concentration of floating
point performance per chip unit area.

We have explored the potential of the Cell/B.E. for
accelerating graphical operations in the PLAYSTATION®3
computer entertainment system.  This system combines the
Cell/B.E. with a state of the art GPU in a unified memory
architecture.  In this architecture both devices share access to
system memory and to graphics memory.  As a result they can
share data and processing tasks.

We explored moving pixel shader computations from the GPU
to the Cell/B.E. to create a hybrid real time rendering system.

Alan Heirich is with the Research and Development department of Sony
Computer Entertainment America, Foster City, California.
Louis Bavoil is with Sony Computer Entertainment America R&D and the
University of Utah, School of Computing, Salt Lake City, UT (e-mail:
bavoil@sci.utah.edu).
Our initial results are encouraging and we find benefits from
the higher clock rate of the Cell/B.E. and the more flexible
programming model.  We chose an extreme test case that
stresses the memory subsystem and generates a significant
amount of DMA waiting.  Despite this waiting the algorithm
scaled efficiently with speedup of 4.33 on 5 SPEs.  This
indicates the Cell/B.E. can be effective in speeding up this sort
of irregular fine-grained shader.  These results would carry
over to less extreme shaders that have more regular data
access patterns.

The next two sections of this paper introduces the graphical
problems we are solving and describe related work.  We next
describe the architecture of the computer entertainment system
under study and performance measurements of the pixel
shader.  We study the performance of that shader on a test
image and compare it to the performance of a high-end state
of the art desktop GPU, the NVIDIA GeForce 7800 GTX.
Our results show the delivered performance of the Cell/B.E.
and GPU were similar even though we were only using a
subset of the Cell/B.E. SPEs.  We finish with some
concluding remarks.
II.  PIXEL SHADING ALGORITHMS
We study variations of a Cone Culled Soft Shadow algorithm
[3].  This algorithm belongs to a class of algorithms known as
shadow mapping algorithms [15].  We first review the basic
algorithm then describe some variations.
A.  Soft Shadows
Soft shadows are an integral part of computing global
illumination solutions.  Equation (1) describes an image with
soft shadows in which, for every pixel, the irradiance L
arriving at a visible surface point from an area light source is

Ω ⎥
⎦
⎤
⎢
⎣
⎡
= ∫
Ω
Vd
r
E L i l
light
light
2
cos cos
π
θ θ
(1)
In this equation Ωlight is the surface of the area light and dΩ is
the differential of surface area.  Elight is the light emissivity per
unit area, and θl , θi are the angles of exitance and incidence of
a ray of length r that connects the light to the surface point.  V
is the geometric visibility along this ray, either one or zero.
The distance term 1 / π r
2
reflects the reduction in subtended
solid angle that occurs with increasing distance.  This
expression assumes that the material surface is diffuse
(Lambertian).
Deferred Pixel Shading on the
PLAYSTATION®3
Alan Heirich and Louis Bavoil
T
Deferred Pixel Shading on the PLAYSTATION®3
2

When V=1 and Ωlight has area dΩ this equation describes
diffuse local illumination from a point light as is typically
computed by GPUs using rasterization.  When this equation is
expanded recursively in E (by treating each surface point as a
source of reflected light) the result is a restriction of the
Rendering Equation of global illumination  [12] to diffuse
surfaces.
B.  Cone Culled Soft Shadows

Equation (1) is traditionally solved by offline methods like ray
tracing.  Stochastic ray tracing samples the integrand at
various points on Ω and accumulates the result into L.  The
CCSS algorithm takes an analogous approach, rendering from
the light and gathering the radiance from the resulting
fragments into pixels.

The CCSS algorithm consists of fragment generation steps
and a pixel shading step.  We have implemented fragment
generation on the GPU and pixel shading on the Cell/B.E.
The GPU is programmed in OpenGL-ES using Cg version 1.4
for shaders.  Fragments are rendering into OpenGL-ES
Framebuffer Object texture attachments using one or more
render targets.  These textures are then detached from the
Framebuffer Objects and used as input to the pixel shading
step.  The pixel shading step returns a shadow texture which is
then bound to the GPU for final rendering.

The algorithm is not physically correct and we accept many
approximations for the sake of real time performance.  Lights
are assumed to be spherical which simplifies the gathering
step.  Light fragments for each pixel are culled against conical
frusta rooted at the pixel centroid.  These frusta introduce
geometric distortions due to their mismatch with the actual
light frustum.

The culling step uses one square root and two divisions per
pixel.  No acceleration structure is used so the algorithm is
fully dynamic and requires no preprocessing.  The algorithm
produces high quality shadows.  It renders self-shadowed
objects more robustly than conventional shadow mapping
without requiring a depth bias or other parameters.

1)  Eye Render

The first fragment generation step captures the locations of
pixel centroids in world space.  This is done by rendering
from the eye view using a simple fragment shader that
captures transformed x, y and z for each pixel.  We capture z
rather than obtaining it from the Z buffer in order to avoid
imprecision problems that can produce artifacts.  We use the
depth buffer in the conventional way for fragment visibility
determination.

If this is used as a base renderer (in addition to rendering
shadows) then the first step also captures a shaded
unshadowed color image.  This unshadowed image will later
be combined with the shadow texture to produce a shadowed
final image.  For some shaders, such as approximate indirect
illumination, this step can also capture the surface normal
vectors at the pixel location.

2)  Light Render

The second fragment generation step captures the locations
and alpha values (transparency) of fragments seen from the
light.  For each light, for each shadow frustum, the scene is
rendered using the depth buffer to capture the first visible
fragments. The positions and alphas of the fragments are
generated by letting the rasterizer interpolate the original
vertex attributes. For some shaders, including colored
shadows and approximate indirect illumination, this step also
captures fragment colors.

3)  Pixel Shading

In the third step, performed on the Cell/B.E., light fragments
are gathered to pixels for shading.  Pixels are represented in
an HDTV resolution RGBA texture that holds (x,y,z) and a
background flag for each pixel.  Light fragments are contained
in one (or more) square textures.

Pixel shading proceeds in three steps:  gathering the kernel of
fragments for culling; culling these fragments against a
conical frustum; and finally computing a shadow value from
those fragments that survived culling.

4)  Fragment Gather

For each pixel, for each light, the pixel location (x,y,z) is
projected into the light view (x',y',z').  A kernel of fragments
surrounding location (x',y',0) in the light texture is gathered
for input to the culling step.  Figure 1 illustrates this projection
and the surrounding kernel.

It is not necessary to sample every location in the kernel, and
performance gains can be realized by subsampling strategies.
In our present work we are focused on system throughput and
so we use a brute-force computation over the entire kernel.

5)  Cone Culling

For each pixel, for each light, a conical frustum is constructed
tangent to the spherical light with its apex at the pixel centroid
as illustrated in figure 2.  The gathered fragments are tested
for inclusion in the frustum using an efficient point-in-cone
test.

The point-in-cone test performs these computations at each
pixel:

axis  =  light.centroid – pixel.centroid
alength
2

=  axis . axis
cos
2
θ  = alength2
/ (light.radius
2
+
alength2
)
Deferred Pixel Shading on the PLAYSTATION®3
3
na = normalize(axis)

Figure 1 (kernel lookup). The pixel is projected from the
world into the light plane, which is equivalent to finding the
nearest fragment F in the light view to the ray from the pixel
to the light center.  In this example fragment F blocks the ray
from the light to the pixel, and we say F shadows the pixel.

Figure 2 (cone culling).  Computing the shadow intensity at a
pixel in a cone with the apex at the pixel and tangent to the
light sphere. The fragments of the light view are fetched in a
kernel centered at the projection of the cone axis over the
light plane.  Fragments are tested for visibility using an
efficient point-in-cone test+.

The point-in-cone then performs these computations for each
fragment:

fe = fragment.centroid  –
pixel.centroid
axisDotFe  = na . fe
direction  =  (axisDotFe > 0)
flength2
  =  fe . fe
inside =  (cos
2
θ * flength2
<= axisDotFe
2
)
pointInCon
e
=  direction && inside

(An expression for cos
2
θ that more accurately reflects the
tangency between the cone and sphere is (alength2
–
light.radius
2
) / alength2
).

6)  Computing new shadow values

The final step is to compute shadow values from the
fragments that survived the culling step.  Here we describe
three such shading computations, and others are possible.  We
present detailed performance measurements of the
monochromatic shader in section 5.  We have implemented
substantial portions of the other shaders on the Cell/B.E. and
GPU to verify proof-of-concept.
a)  Monochromatic soft shadows

We can compute monochromatic soft shadows from
translucent surfaces by using a generalization of the
Percentage Closer Filtering algorithm  [14].  Among the
fragments that survived cone culling we compute the mean
alpha (transparency) value. The resulting shadow factor is
one minus this mean.  At pixels where no fragments survived
culling the shadow factor is one. Test images for this shader
appear in figure 3.
b)  Colored soft shadows

We can obtain colored shadows by including the colors of the
translucent fragments and of the light source.  In addition to
computing the mean alpha value we also compute the mean
RGB for the fragments.  This requires gathering twice as
much fragment data for the shading computation.  We
multiply these quantities with the light source color to obtain a
colored shadow factor.  At pixels where no fragments
survived culling the shadow factor is one.
c)  Approximate indirect illumination

It is worth noting that an approximate indirect illumination
component can be computed similarly to Frisvad et. al.'s
Direct Radiance Map algorithm [5].  This requires accounting
for a transport path from light source to fragment to pixel.
This estimate is approximate because it does not account for
occluding objects between the fragment and the pixel and also
because it only samples a limited kernel of fragments.

Assuming the fragment materials are diffuse (Lambertian), the
irradiance at the fragment can be estimated during the light
render step proportional to the cosine of the incident angle at
the fragment.  The subsequent reflected radiance at the pixel is
this irradiance times the cosine of the incident angle at the
pixel.  This radiance can be estimated during the pixel shading
step if we have the surface normal at the pixel.  This surface
normal can be generated during the eye render step.

This computation requires more DMA traffic to accommodate
the pixel normals. Since this is not part of the gathered
Deferred Pixel Shading on the PLAYSTATION®3
4

Figure 3: some test images of complex models rendered using the monochromatic shader.  (Left) the dandelion is a challenging
test for shadow algorithms.  The algorithm correctly reproduced the fine detail at the base of the plant as well as the internal
self-shadowing within the leaves.  (Right) a tree model with over 100,000 polygons rendered above a grass colored surface.

fragment data it can be accommodated efficiently using
predetermined transfers of large blocks of data.
III. RELATED WORK
There is an extensive existing literature on shadow
algorithms.  For a recent survey of real-time soft shadow
algorithms see [6].  For a broad review of traditional
shadow algorithms see [16].

The most efficient shadow algorithms work in image space
to compute the shading for each pixel with respect to a set
of point lights.  The original image-space algorithm for
point lights is shadow mapping [15].  In this algorithm the
visible surface of each pixel is transformed into the view of
the light and then compared against the first visible surface
as seen from the light.  If the first visible surface lies
between the transformed pixel and the light then the
transformed pixel is determined to be in shadow.

Traditional shadow mapping produces “hard'' shadows that
are solid black with jagged edges.  They suffer from many
artifacts including surface acne (false self-shadowing due to
Z imprecision) and aliasing from imprecision in sampling
the light view.

The Percentage Closer Filtering algorithm [14] is
implemented in current GPUs to reduce jagged shadow
edges.  This algorithm averages the results of multiple
depth tests within a pixel to produce fractional visibility for
pixels on shadow boundaries.  This has the effect of
softening shadow boundaries but since it is a point light
algorithm it does not produce the wide penumbrae that
characterize shadows from area lights.

Adaptive Shadow Maps  [4,13] address the problem of
shadow map aliasing by computing the light view at
multiple scales of resolution.  The multiresolution map is
stored in the form of a hierarchical adaptive grid.  This
approach can be costly because the model must be rendered
multiple times from the light view, once for each scale of
resolution.

Layered Depth Interval maps [2] combine shadow maps
taken from multiple points on the light surface.  These are
resolved into a single map that represents fractional
visibility at multiple depths.  In practice four discrete
depths were sufficient to produce complex self-shadowing
in foliage models.  This method produces soft shadows at
interactive rates but is costly because it requires multiple
renders per light.  It does not address translucency.

The irregular Z-buffer [11] has been proposed for hardware
realization for real-time rendering.  It causes primitives to
be rasterized at points specified by a BSP tree rather than
on a regular grid.  As a result it can eliminate aliasing
artifacts due to undersampling.  This is similar to Alias-free
Shadow Maps [1].

Jensen and Christensen extended photon mapping [10] by
prolongating the rays shot from the lights and storing the
occluded hit points in a photon map which is typically a kd-
tree. When rendering a pixel x the algorithm looks up the
nearest photons around x and counts the numbers of
shadow photons ns and  illumination photons ni in the
neighborhood.  The shadow intensity is then estimated as  V
= ni / (ns + ni). Our algorithm uses similar concepts to
gather fragments and shade pixels, and in addition works
with translucent materials.
Deferred Pixel Shading on the PLAYSTATION®3
5

Figure 4: the PLAYSTATION®3 architecture.  The 3.2 GHz Cell/B.E. contains a Power Architecture processor (the PPE) and
seven Synergistic Processing Elements (SPEs) each consisting of a Synergistic Processing Unit (SPU), 256 KB local store (LS),
and a Memory Flow Controller (MFC).  These processors are connected to each other and to the memory, GPU and peripherals
through a 153.6 GB/s Element Interconnect Bus (EIB).  The Cell/B.E. uses Extreme Data Rate (XDR) memory which has a peak
bandwidth of 25.6 GB/s.  The GPU interface (IOIF) to the EIB provides 20 GB/s in and 15 GB/s out.  Memory accesses by the
Cell/B.E. to GPU memory pass through the EIB, IOIF and GPU.  Access by the GPU to XDR pass through the IOIF, EIB and
MIC.

IV. PLAYSTATION®3 SYSTEM

Figure 4 shows a diagram of the PLAYSTATION®3
computer entertainment system and its 3.2 GHz Cell/B.E.
multiprocessor CPU.  The Cell/B.E. consists of an IBM
Power Architecture core called the PPE and seven SPEs.
(While the Cell/B.E. architecture specifies eight SPEs our
system uses Cell/B.E.s with seven functioning SPEs in
order to increase manufacturing yield.)  The processors are
connected to each other and to system memory through a
high speed Element Interconnect Bus (EIB).  This bus is
also connected to an interface (IOIF) to the GPU and
graphics memory.  This interface translates memory
accesses in both directions, allowing the PPE and SPEs
access to graphics memory and providing the GPU with
access to system memory.  This feature makes the system a
unified memory architecture since graphics memory and
system memory both are visible to all processors within a
single 64-bit address space.

The PPE is a two way in order super-scalar Power
Architecture core with a 512 KB level 2 cache.  The SPEs
are excellent stream processors with a SIMD (single
instruction, multiple data) instruction set and with 256 KB
local memory each.  SIMD instructions operate on 16-byte
registers and load from and store to the local memory. The
registers may be used as four 32-bit integers or floats, eight
halfwords, or sixteen individual bytes.  DMA (direct
memory access) operations explicitly control data transfer
among SPE local memories, the PPE level 2 cache, system
memory, and graphics memory.  DMA operations can chain
up to 2048 individual transfers in size multiples of eight
bytes.

The system runs a specialized multitasking operating
system.  The Cell/B.E. processors are programmed in C++
and C with special extensions for SIMD operations.  We
used the GNU toolchain g++, gcc and gdb.  The GPU is
programmed using the OpenGL-ES graphics API and the
Cg shader language.

The Cell/B.E. supports a rich variety of communication and
synchronization primitives and programming constructs.
Rather than describe these here we refer the interested
reader to the publicly available Cell/B.E. documentation
[7]-[9].
V. RESULTS
We implemented the CCSS algorithm as described in
section 2 using the monochromatic pixel shader described
in II.5 and II.6.a.  We implemented it in hybrid form on the
computer entertainment system using the Cell/B.E. and
GPU, and also on a standalone high end GPU for
comparison.

On the Cell/B.E. we measured performance in three stages:
fragment rendering, shadow generation, and final draw.
Times and performance measurements are shown in tables
1 through 4.

Eye
render
Light
render
1-SPE 5-SPEs Draw
time
10.11  3.29  50.47 11.65 5.6

Table 1: Performance of stages of the algorithm.  All times
are in milliseconds.  The eye and light render stages are
performed on the GPU as is the final draw.  Pixel shading
is performed on the SPEs.  We measured the time for pixel
shading using from 1 to 5 SPEs.  The results showed good
parallel speedup.  Detailed measurements of pixel shading
are given in tables 2 and 3.
A.  Cell/B.E. Software Implementation
Eye and light fragments are rendered to OpenGL-ES
Framebuffer Object texture attachments.  We used 32 bit
float RGBA textures for all data.  The textures for these
attachments may be allocated in linear, swizzled or tiled
Deferred Pixel Shading on the PLAYSTATION®3
6
formats in either GPU or system memory.  We
experimented with all combinations of texture format and
location in order to find the combination that gave the best
performance.

GPU performance is highest rendering to native tiled
format in GPU memory.  The performance advantage is
high enough that it is worth rendering in tiled format and
then reformatting the data to linear allocation for processing
by the Cell/B.E.  In order to minimize the latencies incurred
by the SPEs in accessing this data we reformat the data into
system memory rather than GPU memory.

The key to running any algorithm on the SPEs is to develop
a streaming formulation in which data can be moved
through the processor in blocks.  We move eye data in
scanline order and double buffer the scanline input.  While
one scanline of pixels is being processed we prefetch the
next scanline.  As each scanline is completed it is written to
the shadow texture.  We have measured the DMA waiting
for the scanline data and it was negligible.

For every pixel of input we generate a series of DMA
transactions to gather the necessary light fragments.  The
source address for each transaction is a location inside the
light fragment buffer.  We compute this address by
applying a linear transform (matrix multiplication) to the
eye data (x,y,z) to obtain a light coordinate (x',y',z').

These transactions are bundled into long DMA lists.  By
having multiple DMA lists in flight concurrently we buffer
fragment data in order to minimize DMA waiting.  We
experimented with the number and size of the DMA lists in
order to minimize runtime.  We found that having four
DMA lists was optimal and that larger numbers did not
reduce the runtime.  We found similarly that fetching 128
pixels per DMA list was optimal and that longer DMA lists
did not reduce runtime.

We parallelized the computation across multiple SPEs by
distributing scanlines to processors.  This is straightforward
and provides balanced workloads.  We scheduled tasks
using an event queue abstraction provided by the operating
system that is based on one of the Cell/B.E.
synchronization primitives, the mailbox.  We measured the
cost of this abstraction at less than 100 microseconds per
frame.  When running in parallel on multiple SPEs the
individual processors completed their work within 100
microseconds of each other.

Each SPE computes a set of scanlines for the shadow
texture.  They deliver their result directly into GPU
memory in order to minimize the final render time.
B.  Measurements
We validated the correctness of the implementation by
rendering a variety of models under different conditions.
We then made detailed measurements of performance and
scaling of the tree model in figure 3.  These measurements
appear in tables 2 and 3.  All of our measurements used a
single light source.  The tree model contains over 100,000
polygons.  The performance of the shading computation is
independent of the time required to generate the fragments,
and thus is independent of the geometric complexity of the
model.

  1-
SPE
2-
SPEs
3-
SPEs
4-
SPEs
5-
SPEs
Full  50.47 28.86 16.78 13.25 11.65
Hz  19 34 59 75 85
Speedup  1  1.75 3.01 3.81 4.33
Scaling  1  0.87 1.00 0.95 0.87
No
waiting
41.97 21.05 14.09 10.63 8.56
Speedup  1  1.99 2.98 3/95 4.90
Scaling  1  1.00 0.99 0.99 0.98

Table 2: Parallel performance of the pixel shading
computation.  All times are in milliseconds.  Images were
rendered at HDTV 720p resolution (1280x720 pixels).  The
tree was rendered with data-dependent optimizations
disabled in order to obtain worst-case times.  The image
was rendered using the full algorithm (“full'') and with the
DMA fragment gather operation disabled (“no waiting'').
The computation was exactly the same in both cases, but in
the “no waiting'' case the shader processed uninitialized
fragment data.  The speedup and scaling efficiency was
evaluated in all cases.  These results show that the
computation speeds up almost perfectly but that substantial
time is lost waiting for the gather operation.  Further
information about the DMA costs appears in table 3.

  1-
SPE
2-
SPEs
3-
SPEs
4-
SPEs
5-
SPEs
Wait
time
8.50 7.81  2.69  2.62  3.09
%
waiting
17 27  16  20  27
DMA
GB/s
2.53 4.43  7.62  9.66  10.98
DMA
per
second
42.47
M
74.27
M
127.73
M
161.76
M
183.97
M

Table 3: DMA costs on different numbers of SPEs. All
times are in milliseconds.  The algorithm spent
considerable time waiting for the results of the DMA
fragment gather operation (“wait time'').  Expressed as a
percentage of the pixel shading computation, the
monochromatic shader spent between 17 and 27 percent
waiting for fragment DMA.  This explains the deviation
from ideal scaling in table 2.  The Cell/B.E. sustained 10.98
GB/s of DMA traffic using packet sizes that were
predominantly 48 bytes in length, and over 183 mega-
transactions (M=10242
) per second.

Deferred Pixel Shading on the PLAYSTATION®3
7
All images were rendered at HDTV 720p resolution,
1280x720 pixels.  We used lightmap resolution of
1024x1024 in our experiments and a 3x3 fragment kernel.
In order to ensure that we measured worst-case
performance we disabled optimizations that skipped
background pixels and transparent fragments.  We
measured performance on one to five SPEs.  In our tests the
other two SPEs were in use by graphics and operating
system services.
C.  Data Analysis
Tables 1 and 2 show that the shading calculation can be
sped up to meet any realistic performance requirement.
The monochromatic shader ran at 85 Hz using 5 SPEs and
at 34 Hz using 2 SPEs.  Videogames are typically rendered
at 30 or 60 frames per second.  Shading calculations should
generally run at these rates, but for shadow generation it is
possible to use lower frame rates without affecting image
quality.  It would also be possible to use shadows generated
at 720p resolution with a base image rendered at a higher
1080p resolution (1920x1080 pixels).

Table 3 analyzes the time spent waiting for DMA
transactions to complete.  This was as much as 27% of the
total time.  Note that if we were able to remove all of this
DMA waiting the performance on 5 SPEs would reach 116
frames per second as indicated by the”no waiting'' data in
table 1.

While it is difficult to observe the DMA behavior directly
we can reason about the bottlenecks in our computation.
Every DMA transaction costs the memory system at least
eight cycles of bandwidth no matter how small the
transaction. Thus 400 M transactions per second is an
upper limit of the system memory performance.  The shader
generated 183.97 M DMA transactions per second which
does not approach the limits of the memory system.  Most
of these were 48-byte gathers of light view fragments,
while the rest were block transfers of entire scanlines 20
KB in size.

We profiled the runtime code to measure the number of
SIMD operations that were spent in DMA address
calculations.  The results appear in table 4.  We found that
we were spending between 14% and 17% of operations
supporting the DMA gather operation.

DMA
addressing
Shading Total  DMA
percentage
16,358,400 79,718,400 96,076,800 17

Table 4: Results of run-time profiling.  These figures count
the number of SIMD instructions executed per frame for
both shaders in the inner loop and DMA addressing
calculations.  It does not include the cost of scalar code
that controls the outer loop.  The number of operations is
four times the number of instructions.  The last column
shows the percentage of SIMD operations that were spent
computing addresses for the DMA gather.

We also measured the time to execute the scalar control
logic and perform the DMA for the eye render fragments in
order to better estimate the cost of shaders with scanline
order data access.  These DMA operations are for an entire
scanline at a time, 20 K bytes in size. Each frame reads
and writes each scanline once for a total of 28.125
megabytes of DMA activity using two transactions.  On one
SPE this required 2.13 ms of time yielding an effective
transfer rate of over 12.89 GB/s.  For shaders with scanline
order access, it should be possible to read as much as five
times as much scanline data without exhausting the overall
DMA bandwidth or the number of DMA transactions.
D.  Comparison to GeForce 7800 GTX GPU
We implemented the same algorithm on a high end state of
the art GPU, the NVIDIA GeForce 7800 GTX running in a
Linux workstation.  This GPU has 24 fragment shader
pipelines running at 430 Mhz and processes 24 fragments
in parallel.  By comparison the 5 SPEs that we used process
20 pixels in parallel in quad-SIMD form.

The GeForce required 11.1 ms to complete the shading
operation.  In comparison the Cell/B.E. required 11.65 ms
including the DMA waiting time, and would require only
8.56 ms if the DMA waiting were eliminated.  The
performance of the Cell/B.E. with 5 SPEs was thus
comparable to one of the fastest GPUs currently available,
even though our implementation spent 27% of its time
waiting for DMA.  Results would presumably be even
better on 7 SPEs, or on fewer SPEs if we could reduce or
eliminate the DMA waiting.
VI.  REMARKS
We have explored moving pixel shaders from the GPU to
the Cell/B.E. processor of the PLAYSTATION®3
computer entertainment system.  Our initial results are
encouraging as they show it is feasible to attain scalable
speedup and high performance even for shaders with
irregular fine-grained data access patterns.  Removing the
computation from the GPU effectively increases the frame
rate, or more likely, the geometric complexity of the models
that can be rendered in real time.

We can also conclude that the performance of the Cell/B.E.
is superior to a current state of the art high end GPU in that
we achieved comparable performance despite performance
limitations and despite using only part of the available
processing power.  Our current implementation loses
substantial performance due to DMA waiting.  This results
from the fine-grained irregular access to memory and is
specific to the type of shaders we have chosen to
implement.  We have explored shaders based on shadow
mapping [15] which require evaluating GPU fragments
generated from multiple viewpoints.  These multiple
viewpoints are related to each other by a linear viewing
transformation.  Gathering the data from these multiple
viewpoints requires fine-grained irregular memory access.
Deferred Pixel Shading on the PLAYSTATION®3
8
This represents worst-case behavior for any memory
system.
REFERENCES
[1]  Timo Aila and Samuli Laine, “Alias-Free Shadow Maps,”  in Proc.
Rendering Techniques 2004: 15th  Eurographics  Workshop on
Rendering, 2004, pp. 161-166.
[2]  Maneesh Agrawala, Ravi Ramamoorthi,  Alan Heirich and Laurent
Moll, “Efficient Image-Based Methods for Rendering Soft
Shadows,” in Proc. ACM SIGGRAPH, 2000, pp. 375-384.
[3]  Louis Bavoil and Claudio T. Silva,. “Real-Time Soft Shadows with
Cone Culling,” ACM SIGGRAPH Sketches and Applications, 2006.
[4]  Randima Fernando, Sebastian Fernandez, Kavita Bala and Donald P.
Greenberg, “Adaptive Shadow Maps”, in  Proc. ACM SIGGRAPH,
2001, pp. 387-390.
[5]  J. R. Frisvad and R. R. Frisvad and N. J. Christensen and P. Falster,
“Scene independent real-time  indirect illumination,”, in  Proc.
Computer Graphics International, 2005, pp. 185-190.
[6]  Jean-Marc Hasenfratz, Marc Lapierre, Nicolas Holzschuch and
Francois Sillion, “A survey of Real-Time Soft Shadows Algorithms,”
Computer Graphics Forum, vol. 22, no. 4, 2003, pp. 753-774.
[7]  IBM, Sony and Toshiba, “Cell Broadband Engine Architecture
version 1.0,” August 8, 2005.
[8]  IBM, Sony and Toshiba, “SPU  Assembly Language Specification
version 1.3,” October 20, 2005.
[9] IBM, Sony and Toshiba, “SPU C/C++ Language Extensions version
2.1,” October 20, 2005.
[10]  Henrik Wann Jensen and Per H. Christensen, “Efficient Simulation
of Light Transport in Scenes with Participating Media Using Photon
Maps,”, in Proc. ACM SIGGRAPH, 1998, pp. 311-320.
[11]  Gregory S. Johnson, Juhyun Lee, Christopher A. Burns and William
R. Mark, “The irregular Z-buffer: Hardware acceleration for irregular
data structures,”  ACM Transactions on Graphics, vol. 24, no. 4,
2005, pp. 1462-1482.
[12]  James T. Kajiya, “The Rendering Equation,” in  Proc. ACM
SIGGRAPH, 1986, pp. 143-150.
[13]  Aaron Lefohn, Shubhabrata Sengupta, Joe M. Kniss, Robert Strzodka
and John D. Owens, “Dynamic Adaptive Shadow Maps on Graphics
Hardware,”  ACM SIGGRAPH Conference Abstracts and
Applications, 2005.
[14]  William T. Reeves, David H. Salesin and Robert L. Cook,
“Rendering Antialiased Shadows with Depth Maps,” in Proc. ACM
SIGGRAPH, 1987, pp. 283-291.
[15]  Lance Williams, “Casting Curved Shadows on Curved Surfaces,” in
Proc. ACM SIGGRAPH, 1978, pp.  270-274.
[16]  Andrew Woo, Pierre Poulin and  Alain Fournier, “A Survey of
Shadow Algorithms,” IEEE Computer Graphics & Applications, vol.
10, no. 6, pp. 13-32.

作者: ryuetsuya 时间: 2009-2-11 18:11

真是小孩子，真好哄....

作者: ffcactus 时间: 2009-2-11 18:18

引用:

原帖由 ryuetsuya 于 2009-2-11 18:11 发表
真是小孩子，真好哄....

人贩也跑来看戏了。

作者: 无敌JJ 时间: 2009-2-11 18:45

奇怪,怎么不提光线追踪了啊.:D

作者: thl 时间: 2009-2-11 19:08

KZ2帧数绝对比BH5高，说不高的，估计没PS3就是索黑

作者: YOUYUCAO 时间: 2009-2-11 20:52

引用:

原帖由 thl 于 2009-2-11 19:08 发表
KZ2帧数绝对比BH5高，说不高的，估计没PS3就是索黑

61帧~~~~~~~~~~

作者: tobewind 时间: 2009-2-11 21:28

引用:

原帖由 mapledot 于 2009-2-11 08:41 发表

画面明显比没有光源的游戏要真实强大得多

有没光源当然差距很大，关键在于KZ2吹嘘光源数目，瞧，有200个电灯！

骚尼一向喜欢玩这种数字游戏啊，以前就是每秒xxxx个多边形，括弧，理论值

作者: 必杀式球喀臂 时间: 2009-2-11 21:57

引用:

原帖由 thl 于 2009-2-11 19:08 发表
KZ2帧数绝对比BH5高，说不高的，估计没PS3就是索黑

不就是30帧。。。

作者: Johnny 时间: 2009-2-11 22:06

200个点光源是咋看出来的？

作者: Peee2 时间: 2009-2-12 00:02

英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/g ... cene---$1267347.htm

[ 本帖最后由 Peee2 于 2009-2-12 00:06 编辑 ]

作者: 级替四 时间: 2009-2-12 00:41

引用:

原帖由 Peee2 于 2009-2-12 00:02 发表
英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/g ... killzone-2-scene---$1267347.ht ...

昏倒。

作者: 倍舒爽 时间: 2009-2-12 04:38

引用:

原帖由 hudihutian 于 2009-2-11 18:05 发表
很稀奇么？

http://research.scea.com/ps3_deferred_shading.pdf

Deferred Pixel Shading on the PLAYSTATION®3
1

Abstract— This paper studies a deferred pixel shading algorithm
impl ...

只看到The system's unified memory architecture allows
the Cell/B.E. and GPU to exchange data through shared textures.

这不是xo么？？

另外p3版的什么“幽灵行动：尖峰战士”也是这种渲染方式么？？
那画面像驼屎般烂啊。。

[ 本帖最后由倍舒爽于 2009-2-12 04:40 编辑 ]

作者: west2046 时间: 2009-2-12 08:02

引用:

原帖由 Peee2 于 2009-2-12 00:02 发表
英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/g ... killzone-2-scene---$1267347.ht ...

顶楼还是吹少了！！！！

作者: superjay 时间: 2009-2-12 08:35

那段视频里面，GG技术总监督数到230个光源，祯数是个位数

游戏实际画面中不会达到200个这么夸张，不过仍然是有趣的技术

作者: zhangjingy 时间: 2009-2-12 08:54

引用:

原帖由 Peee2 于 2009-2-12 00:02 发表
英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/g ... killzone-2-scene---$1267347.ht ...

PS3机能太强悍了！

作者: ffcactus 时间: 2009-2-12 09:01

引用:

原帖由 Peee2 于 2009-2-12 00:02 发表
英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/g ... killzone-2-scene---$1267347.ht ...

这下那些抵挡次软饭只好闭肛了。

作者: arthurking 时间: 2009-2-12 11:02

就算有人出来说kz2有1000个光源又如何？画面还不是那么渣！

作者: 大头木 时间: 2009-2-12 11:06

为什么要用那么多光源？

作者: ffcactus 时间: 2009-2-12 11:08

引用:

原帖由 arthurking 于 2009-2-12 11:02 发表
就算有人出来说kz2有1000个光源又如何？画面还不是那么渣！

你来TV论坛是自取其辱的吗？

作者: 无敌JJ 时间: 2009-2-12 11:18

引用:

原帖由 ffcactus 于 2009-2-12 11:08 发表

你来TV论坛是自取其辱的吗？

明显是来侮辱你的,你真2,连这都看不出来.

作者: jfb 时间: 2009-2-12 11:28

用这nv的低档gpu，用着 ibm的低档cell。吹牛，也只能说明nv 这个 128位g70 也不错，ibm 那个第一代cell ，虽然和ibm的第二代cell比，性能差了很多，也算不错的。

同时也证明了ps3 开发是非常的容易，别人用5个光源的开发时间也很长， ps3 用几百个光源只要几年，开发速度已经是同类游戏的好几倍。 ps3 画面不好不再是开发难度的问题了。

作者: dogsoldier 时间: 2009-2-12 11:34

引用:

原帖由 Peee2 于 2009-2-12 00:02 发表
英国权威电脑技术杂志3D WORLD在对kz2深入分析后证实kz2单一场景出现最多光源在350个左右..

http://www.gamezine.co.uk/news/game-types/shooter/the-most-light-sources-in-one-killzone-2-scene---$1267347.ht ...

Page has moved
Sorry, this page either does not exist or it has been moved, updated or deleted. Please try one of the following links:

作者: 无敌JJ 时间: 2009-2-12 11:46

引用:

原帖由 dogsoldier 于 2009-2-12 11:34 发表

Page has moved
Sorry, this page either does not exist or it has been moved, updated or deleted. Please try one of the following links:

作者: superjay 时间: 2009-2-12 11:51

引用:

原帖由 ffcactus 于 2009-2-12 11:08 发表

你来TV论坛是自取其辱的吗？

作为众多神论的创始人，没想到说话还是那么嚣张跋扈，透露出一种愚昧的自信

你究竟有没有看GG技术总监介绍KZ2光源的视频，不会是读不懂英文吧

所谓200个光源下Killzone2运行帧数已经降低到个位数，你还在为350个光源瞎咋呼

无知真可怕

作者: wetwet 时间: 2009-2-12 12:16

200个又怎么样?
1000000个又怎么样?
对于玩家来说电视机里的画面感觉才是王道.
对于开发商来说销量才是王道.
看到200就high了,看到350更high了.....
sf难道只会对这这些数字yy了么?.....
视频,文章也不会好好看(或许没有能力看...),只会对着图片,和别人翻译过来的东西yy.
正不正确也不知道....
太可悲了....

作者: lili2k2 时间: 2009-2-12 12:23

我想起了2T浮点,双1080P, 240祯 ...............

作者: 无敌JJ 时间: 2009-2-12 12:37

引用:

原帖由 superjay 于 2009-2-12 11:51 发表

作为众多神论的创始人，没想到说话还是那么嚣张跋扈，透露出一种愚昧的自信

你究竟有没有看GG技术总监介绍KZ2光源的视频，不会是读不懂英文吧

所谓200个光源下Killzone2运行帧数已经降低到个位数，你还在为 ...

不知道是什么场景,难道根本不需要动的.

作者: zafm0861 时间: 2009-2-12 12:40

引用:

原帖由 superjay 于 2009-2-12 11:51 发表

作为众多神论的创始人，没想到说话还是那么嚣张跋扈，透露出一种愚昧的自信

你究竟有没有看GG技术总监介绍KZ2光源的视频，不会是读不懂英文吧

所谓200个光源下Killzone2运行帧数已经降低到个位数，你还在为 ...

你才知道ff大神不懂英文么
而且还闹出不少笑话
比如http://club.tgfc.com/viewthread.php?tid=6045947&highlight=

[ 本帖最后由 zafm0861 于 2009-2-12 12:41 编辑 ]

作者: 倍舒爽 时间: 2009-2-12 12:50

引用:

原帖由 dogsoldier 于 2009-2-12 11:34 发表

Page has moved
Sorry, this page either does not exist or it has been moved, updated or deleted. Please try one of the following links:

连接还在...我能看到~~
但不能看到杂志内容，只是引述。。

至于所谓200光源是什么回事？
在制作视频里显示200光源变个位数帧率是咋回事？
这个自行判断。。

我的看法是显示出和激活出200光源以及一个场景有200光源是两种概念~
前者在制作视频中看到是不含遮罩的。。。

pgr3的开发视频里，也有一段是由车体视点转换到整个城市的鸟瞰视点~为了显示城市的构造复杂~
转到鸟瞰视点时，帧率也是个位数~~但这不代表你不能游览整个城市~

demo明摆这里，要数也还是可以的~200肯定没有。。。

当然，不排除制作组有以一个场景有200光源的概念和同时显示200光源的概念相捣混的动机。。

作者: ffcactus 时间: 2009-2-12 13:01

引用:

原帖由 superjay 于 2009-2-12 11:51 发表

作为众多神论的创始人，没想到说话还是那么嚣张跋扈，透露出一种愚昧的自信

你究竟有没有看GG技术总监介绍KZ2光源的视频，不会是读不懂英文吧

所谓200个光源下Killzone2运行帧数已经降低到个位数，你还在为 ...

求所谓200个光源下Killzone2运行帧数已经降低到个位数出处。

作者: wetwet 时间: 2009-2-12 13:07

引用:

原帖由 ffcactus 于 2009-2-12 13:01 发表

求所谓200个光源下Killzone2运行帧数已经降低到个位数出处。

该干嘛干嘛去吧...
没人翻译,给你也看不懂...
你觉得哪个画面好看,哪个游戏好玩,没人拦你.
论技术,你还不是干这个的,没这慧根....

[ 本帖最后由 wetwet 于 2009-2-12 13:10 编辑 ]

作者: ffcactus 时间: 2009-2-12 13:09

引用:

原帖由 wetwet 于 2009-2-12 13:07 发表

该干嘛干嘛去吧...
没人翻译,给你也看不懂...
你觉得哪个画面好看,哪个游戏好玩,没人拦你.
论技术,你还没这慧根....

就你们这些信口开河的软饭有慧根，可以了吧。

作者: wetwet 时间: 2009-2-12 13:13

引用:

原帖由 ffcactus 于 2009-2-12 13:09 发表

就你们这些信口开河的软饭有慧根，可以了吧。

不知是谁信口开河的多...
吹游戏,吹技术,那么多神论和令人捧腹的技术说从哪里出来的

作者: ffcactus 时间: 2009-2-12 13:16

引用:

原帖由 wetwet 于 2009-2-12 13:13 发表

不知是谁信口开河的多...
吹游戏,吹技术,那么多神论和令人捧腹的技术说从哪里出来的

我可没功夫与你这样的纱布扯谈。我要的是证据来源。
不过我料你们也放不出什么屁，只会扯谈。

[ 本帖最后由 ffcactus 于 2009-2-12 13:18 编辑 ]

作者: wetwet 时间: 2009-2-12 13:19

引用:

原帖由 ffcactus 于 2009-2-12 13:16 发表

我可没功夫与你这样的纱布扯谈。我要的是证据来源。

到底谁纱布...
你自己曾转过那么多所谓的证据,你都看懂了么.
给你是不是还要帮你翻译好啊...

你就认为是同屏200000个光源吧.
没人跟你争.

[ 本帖最后由 wetwet 于 2009-2-12 13:20 编辑 ]

作者: ffcactus 时间: 2009-2-12 13:21

引用:

原帖由 wetwet 于 2009-2-12 13:19 发表

到底谁纱布...
你自己曾转过那么多所谓的证据,你都看懂了么.
给你是不是还要帮你翻译好啊...

我求什么200个光源帧数就下降到个位数的来源。
你不想扯谈你就替SUPER缩卵给啊，给个链接也可以啊。

作者: 倍舒爽 时间: 2009-2-12 13:21

引用:

原帖由 ffcactus 于 2009-2-12 13:01 发表

求所谓200个光源下Killzone2运行帧数已经降低到个位数出处。

出你个毛处，蠢就不要在这献，还嫌假丢不够？？
不会自个去看视频？？
啥？连视频都不知道在哪？
回帖不看帖连个喷的资格都没。。。

凉快去吧。。。

作者: wetwet 时间: 2009-2-12 13:24

引用:

原帖由 ffcactus 于 2009-2-12 13:21 发表

我求什么200个光源帧数就下降到个位数的来源。
你不想扯谈你就替SUPER缩卵给啊，给个链接也可以啊。

就是那个视频阿.往后面看.
没人帮你翻译.
看不懂是你自己的事情.

别再现了...

作者: ffcactus 时间: 2009-2-12 13:25

引用:

原帖由 倍舒爽 于 2009-2-12 13:21 发表

出你个毛处，蠢就不要在这献，还嫌假丢不够？？
不会自个去看视频？？
啥？连视频都不知道在哪？
回帖不看帖连个喷的资格都没。。。

凉快去吧。。。

那你指出这个帖子里，包括帖子给的链接里，到底哪里说了200个光源帧数下降到个位数。
视频在哪里？

作者: 无敌JJ 时间: 2009-2-12 13:26

FF神英语不好就怪没出处.

作者: dogsoldier 时间: 2009-2-12 13:31

By now you should know about Killzone 2's deferred rendering and its talent for multiple lights. Most games have around four light sources, so how many does Killzone 2 have?

In a previous video interview, Killzone 2 development director Arjan Brussee demonstrated a scene featuring 230 different light sources, including multiple lights on flying sentry bots.
关于killzone那神奇的deferred rendering技术和多光源，大部分游戏拥有4个光源，那么Killzone有多少呢？
在此前的访问中，Arjan Brussee演示了一个场景包含了230多个光源，包括空中sentry bots身上的几个光源。

后来3DWorld Magazine揭示在一个场景中存在350个光源........后面都是废话

I thought that number was the game's pinnacle; however, in a highly recommended
3DWorld Magazine feature (available in UK shops now), it's revealed that there are an incredible 350 light sources in one Killzone 2 scene.

But what do the numerals mean when it's the end product that matters? Well, I'm playing through the game now and it really is striking how each level is lit in the game - it gives the game's aesthetic both realism and artistic polish.

When you realise that the Helghast's glowing eyes cast light on both themselves and other objects, you'll appreciate GG's attention to detail in Killzone 2's lighting engine.

Look out for a Gamezine.co.uk Killzone 2 review coming to a website near you soon. For now, find all you need to know on our Killzone 2 game page.

有没有谁觉得，这游戏的宣传思路很有问题

作者: ffcactus 时间: 2009-2-12 13:35

引用:

原帖由 dogsoldier 于 2009-2-12 13:31 发表
By now you should know about Killzone 2's deferred rendering and its talent for multiple lights. Most games have around four light sources, so how many does Killzone 2 have?

In a previous video in ...

这个游戏虽然在宣传上有些偏，但是游戏自身做到了极高的素质，游戏性也很不错。

作者: wetwet 时间: 2009-2-12 13:43

我是不知道整个场景200个光源到底有多牛逼.
我只知道用编辑器打开个unreal3的场景同样有很高的光源密度.
又不太可能是在同屏出现200个...

目前只是一个demo.
游戏素质到底怎么样,游戏性到底怎么样,还要等正式版出来再说.
广且也要玩过的才有发言权...
宣传里写有什么用?

最后到底怎么样,还是看发售后的反响和销量.
现在什么都是浮云.也就s饭在那里yy...
貌似没玩过的才y的最凶...

[ 本帖最后由 wetwet 于 2009-2-12 13:44 编辑 ]

作者: ffcactus 时间: 2009-2-12 13:52

引用:

原帖由 无敌JJ 于 2009-2-10 22:55 发表
据说KZ2其实是没30FPS的.:D

引用:

原帖由 级替四 于 2009-2-11 08:39 发表
200个光源，做梦呢

引用:

原帖由 arthurking 于 2009-2-11 10:50 发表
天上为什么有牛在飞？因为有索饭在地上吹！
话说过来，玩了KZ2 demo，觉得那画面渣得很啊。感情DX10.1和200个光源就那效果？

引用:

原帖由 老江湖 于 2009-2-11 12:36 发表
实际神机想说我已经可以模拟DX11了，并且可以无限的模拟下去

引用:

原帖由 tobewind 于 2009-2-11 21:28 发表
有没光源当然差距很大，关键在于KZ2吹嘘光源数目，瞧，有200个电灯！
骚尼一向喜欢玩这种数字游戏啊，以前就是每秒xxxx个多边形，括弧，理论值

引用:

原帖由 arthurking 于 2009-2-12 11:02 发表
就算有人出来说kz2有1000个光源又如何？画面还不是那么渣！

很显然，最没素质最没教养的就是这些个软饭了。你大可去当他们的讲师。

作者: 必杀式球喀臂 时间: 2009-2-12 14:00

引用:

原帖由 ffcactus 于 2009-2-12 13:35 发表

这个游戏虽然在宣传上有些偏，但是游戏自身做到了极高的素质，游戏性也很不错。

无机酸又在脑内补完了？没钱还不快去赚点钱买神机玩玩？穷人真可怜

作者: 必杀式球喀臂 时间: 2009-2-12 14:01

引用:

原帖由 ffcactus 于 2009-2-12 13:52 发表

很显然，最没素质最没教养的就是这些个软饭了。你大可去当他们的讲师。

比起你这个连英文都不懂的文盲好太多了

作者: ffcactus 时间: 2009-2-12 14:14

引用:

原帖由 必杀式球喀臂 于 2009-2-12 14:00 发表

无机酸又在脑内补完了？没钱还不快去赚点钱买神机玩玩？穷人真可怜

脑内补完的是你们吧， IGN的评分不知道你看了没有。

作者: 必杀式球喀臂 时间: 2009-2-12 14:16

引用:

原帖由 ffcactus 于 2009-2-12 14:14 发表

脑内补完的是你们吧， IGN的评分不知道你看了没有。

开始转移话题了？评分怎么了？又转移话题到评分？穷人文盲也就看得懂数字了？你说你这么没用你家里人不伤心么，无机酸又穷又没文化还赚不了钱，真可怜

作者: ffcactus 时间: 2009-2-12 14:20

引用:

原帖由 必杀式球喀臂 于 2009-2-12 14:16 发表

开始转移话题了？评分怎么了？又转移话题到评分？穷人文盲也就看得懂数字了？你说你这么没用你家里人不伤心么，无机酸又穷又没文化还赚不了钱，真可怜

评分、评语里没说这个游戏极为优秀？
你就不要来秀下限了。

作者: 必杀式球喀臂 时间: 2009-2-12 14:23

引用:

原帖由 ffcactus 于 2009-2-12 14:20 发表

评分、评语里没说这个游戏极为优秀？
你就不要来秀下限了。

你玩到了无机酸？你没玩到你不是靠脑内补完？？果然穷人就是没用，只会秀下限，怎么不多读点书赚钱买ps3?虽然你的能了也就那样了，只能做ps3都买不起只能脑内补完的穷人

作者: ffcactus 时间: 2009-2-12 14:25

引用:

原帖由 必杀式球喀臂 于 2009-2-12 14:23 发表

你玩到了无机酸？你没玩到你不是靠脑内补完？？果然穷人就是没用，只会秀下限，怎么不多读点书赚钱买ps3?虽然你的能了也就那样了，只能做ps3都买不起只能脑内补完的穷人

我没玩到，可是IGN等各大专业游戏测评机构玩到了啊。我没玩到，我引用人家玩到的专家的评语总比你在这里扯谈好吧，还脑内补完。太无下限了。

作者: ffcactus 时间: 2009-2-12 14:28

http://www.youtube.com/watch?v=VEQlDZh3JQs&feature=related
这个视频？

我知道了，可能是某些人看到视频中别人DEBUG测试添加光源时，游戏画面缓慢运行。
所以以为当有200多光源时帧数下降到个位。

[ 本帖最后由 ffcactus 于 2009-2-12 14:30 编辑 ]

作者: zhangjingy 时间: 2009-2-12 14:54

引用:

原帖由 ffcactus 于 2009-2-12 14:28 发表
 http://www.youtube.com/watch?v=VEQlDZh3JQs&feature=related
这个视频？

我知道了，可能是某些人看到视频中别人DEBUG测试添加光源时，游戏画面缓慢运行。
所以以为当有200多光源时帧数下降到个位。
: ...

认真你就输了。

作者: 倍舒爽 时间: 2009-2-12 15:20

引用:

原帖由 ffcactus 于 2009-2-12 14:28 发表
 http://www.youtube.com/watch?v=VEQlDZh3JQs&feature=related
这个视频？

我知道了，可能是某些人看到视频中别人DEBUG测试添加光源时，游戏画面缓慢运行。
所以以为当有200多光源时帧数下降到个位。
: ...

就因为那句“that ‘s because this is a slow version”？？？
你不想想这句话究竟有什么含义？？真认为就是字面上的？？？真认为是slow version？？

算了，连级替四都被你称作软饭，有啥好说的。。我不是嘲笑你的iq，根本不用嘲笑，只是你实在太蠢太蠢~

级替四当时是如何抒发对gt5p画面的钦服的？？
他自个重复再重复的用3d软件花海量精力和时间去弄车体的模型，并向大家说明耗费了如何大的资源及渲染时间才出现这样的效果~
结果还是从美感实感上远远差于gt5p的及时演算画面。。。。。这招借花敬佛使得恰到好处。。这是从字面上看得出来的吗？？蠢材！！！
这才是索饭！

你跟天师这两个宗教饭还是回家杀鸡烧香拜神祈求ps3踩微倒任吧，蠢得这样子还混毛论坛啊？？

其实给你一个“滚”字就行了。。
跟你说那么多是我心肠好，你不懂不听就继续蠢你的猪吧。。

作者: 无敌JJ 时间: 2009-2-12 15:25

不不能太打击FF神的自尊啊,不然以后不来论坛我们怎么找乐子呢.

作者: 老江湖 时间: 2009-2-12 15:39

有FF在，乐趣就在

作者: 银色黎明 时间: 2009-2-12 15:46

支持RF们~~

作者: west2046 时间: 2009-2-12 16:32

引用:

原帖由 倍舒爽 于 2009-2-12 15:20 发表

就因为那句“that ‘s because this is a slow version”？？？
你不想想这句话究竟有什么含义？？真认为就是字面上的？？？真认为是slow version？？

算了，连级替四都被你称作软饭，有啥好说的。。我不是 ...

不厚道！

作者: ffcactus 时间: 2009-2-12 17:10

引用:

原帖由 倍舒爽 于 2009-2-12 15:20 发表

就因为那句“that ‘s because this is a slow version”？？？
你不想想这句话究竟有什么含义？？真认为就是字面上的？？？真认为是slow version？？

算了，连级替四都被你称作软饭，有啥好说的。。我不是 ...

你不会也和软饭一个德行吧。级替四以前干了什么关我屁事。
that ‘s because this is a slow version是什么意思，那你自己说说看。我自己是没看到这句话。我还怕自己哪里没看到，还特意把那卡得慢得死的视频看了看，我可没发现哪里说“200个光源就降低到几帧的画面”，莫非你也是脑内补完？

作者: zhangjingy 时间: 2009-2-12 17:19

LS又认真了，何必呢。

作者: 无敌JJ 时间: 2009-2-12 17:29

发觉KZ2的宣传手段蛮数字化的,

因为其他一线游戏没有说过用了多少光源,他一说就显的自己特别多.

万一战争机器2其实是用了190个,

作者: wyp 时间: 2009-2-12 18:26

引用:

原帖由 无敌JJ 于 2009-2-12 17:29 发表
发觉KZ2的宣传手段蛮数字化的,

因为其他一线游戏没有说过用了多少光源,他一说就显的自己特别多.

万一战争机器2其实是用了190个,

其实战争机器2的光源真的不差这一个。

欢迎光临 TGFC Lifestyle (http://bbs.tgfcer.com/)