» 您尚未登录:请 登录 | 注册 | 标签 | 帮助 | 小黑屋 |


发新话题
打印

[新闻] XBOX720内存细节曝光

老地方更新了

Durango Memory System Overview
We have read multiples replies and discussions around Durango’s memory system throughout the internet, due to we would like to share this information with all of you. In this article we expose the different types of memories that Durango has and how this memories work together with the rest of the system.
The central elements of the Durango memory system are the north bridge and the GPU memory system. The memory system supports multiple clients (for example, the CPU and the GPU), coherent and non-coherent memory access, and two types of memory (DRAM and ESRAM).
Memory clientsThe following diagram shows you the Durango memory clients with the maximum available bandwidth in every path.




MemoryAs you can see on the right side of the diagram, the Durango console has:
  • 8 GB of DRAM.
  • 32 MB of ESRAM.
DRAMThe maximum combined read and write bandwidth to DRAM is 68 GB/s (gigabytes per second). In other words, the sum of read and write bandwidth to DRAM cannot exceed 68 GB/s. You can realistically expect that about 80 – 85% of that bandwidth will be achievable (54.4 GB/s – 57.8 GB/s).
DRAM bandwidth is shared between the following components:
  • CPU
  • GPU
  • Display scan out
  • Move engines
  • Audio system
ESRAMThe maximum combined ESRAM read and write bandwidth is 102 GB/s. Having high bandwidth and lower latency makes ESRAM a really valuable memory resource for the GPU.
ESRAM bandwidth is shared between the following components:
  • GPU
  • Move engines
Video encode/decode engine. System coherencyThere are two types of coherency in the Durango memory system:
  • Fully hardware coherent
  • I/O coherent
The two CPU modules are fully coherent. The term fully coherent means that the CPUs do not need to explicitly flush in order for the latest copy of modified data to be available (except when using Write Combined access).
The rest of the Durango infrastructure (the GPU and I/O devices such as, Audio and the Kinect Sensor) is I/O coherent. The term I/O coherent means that those clients can access data in the CPU caches, but that their own caches cannot be probed.
When the CPU produces data, other system clients can choose to consume that data without any extra synchronization work from the CPU.
The total coherent bandwidth through the north bridge is limited to about 30 GB/s.
The CPU requests do not probe any other non-CPU clients, even if the clients have caches. (For example, the GPU has its own cache hierarchy, but the GPU is not probed by the CPU requests.) Therefore, I/O coherent clients must explicitly flush modified data for any latest-modified copy to become visible to the CPUs and to the other I/O coherent clients.
The GPU can perform both coherent and non-coherent memory access. Coherent read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and it’s limited to 10 – 15 GB/s when there is a hit. A GPU memory page attribute determines the coherency of memory access.
The CPUThe Durango console has two CPU modules, and each module has its own 2 MB L2 cache. Each module has four cores, and each of the four cores in each module also has its own 32 KB L1 cache.
When a local L2 miss occurs, the Durango console probes the adjacent L2 cache via the north bridge. Since there is no fast path between the two L2 caches, to avoid cache thrashing, it’s important that you maximize the sharing of data between cores in a module, and that you minimize the sharing between the two CPU modules.
Typical latencies for local and remote cache hits are shown in this table.
Remote L2 hitapproximately  100 cycles
Remote L1 hit
approximately  120 cycles
Local L1 Hit3 cycles for  64-bit values
5 cycles for 128-bit values
Local L2 Hit approximately 30  cycles

Each of the two CPU modules connects to the north bridge by a bus that can carry up to 20.8 GB/s in each direction.
From a program standpoint, normal x86 ordering applies to both reads and writes. Stores are strongly ordered (becoming visible in program order with no explicit memory barriers), and reads are out of order.
Keep in mind that if the CPU uses Write Combined memory writes, then a memory synchronization instruction (SFENCE) must follow to ensure that the writes are visible to the other client devices.
The GPUThe GPU can read at 170 GB/s and write at 102 GB/s through multiple combinations of its clients. Examples of GPU clients are the Color/Depth Blocks and the GPU L2 cache.
The GPU has a direct non-coherent connection to the DRAM memory controller and to ESRAM. The GPU also has a coherent read/write path to the CPU’s L2 caches and to DRAM.
For each read and write request from the GPU, the request uses one path depending on whether the accessed resource is located in “coherent” or “non-coherent” memory.
Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional read/write path. Those GPU functions include:
  • Command buffer and vertex index fetch
  • Move engines
  • Video encoding/decoding engines
  • Front buffer scan out
As the GPU is I/O coherent, data in the GPU caches must be flushed before that data is visible to other components of the system.
The available bandwidth and requirements of other memory clients limit the total read and write bandwidth of the GPU.
This table shows an example of the maximum memory-bandwidths that the GPU can attain with different types of memory transfers.

Source memoryDestination memoryMaximum read bandwidth (GB/s)Maximum write bandwidth (GB/s)Maximum total bandwidth (GB/s)
ESRAMESRAM51.251.2102.4
ESRAMDRAM68.2*68.2136.4
DRAMESRAM68.268.2*136.4
DRAMDRAM34.134.168.2

Although ESRAM has 102.4 GB/s of bandwidth available, in a transfer case, the DRAM bandwidth limits the speed of the transfer.
ESRAM-to-DRAM and DRAM-to-ESRAM scenarios are symmetrical.
Move enginesThe Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth shared between:
  • Four move engines
  • Display scan out and write-back
  • Video encoding and decoding
The display scan out consumes a maximum of 3.9 GB/s of read bandwidth (multiply 3 display planes × 4 bytes per pixel × HDMI limit of 300 megapixels per second), and display write-back consumes a maximum of 1.1 GB/s of write bandwidth (multiply 30 bits per pixel × 300 megapixels per second).
You may wonder what happens when the GPU is busy copying data and a move engine is told to copy data from one type of memory to another. In this situation, the memory system of the GPU shares bandwidth fairly between source and destination clients. The maximum bandwidth can be calculated by using the peak-bandwidth diagram at the start of this article.


Durango Memory System Example

This whole system example demonstrates what the memory bandwidth might look like when the whole system is working under a typical load (this numbers are only predictions not measured numbers)
This example assumes what’s expected to be a typical CPU load and a maximum GPU load:
  • Three display planes are enabled at 1080p resolution.
  • Display write-back is writing a 1080p image at 60 FPS.
  • Move engines are idle.
  • Read bandwidth of the command buffer and index buffer is 4 GB/s.
  • Regular GPU rendering consumes the rest of the available bandwidth.




This diagram shows our prediction of the typical bandwidth for the north bridge clients and the typical available bandwidth for the GPU clients (which are shown in blue).
Let’s start by describing the CPU. Although each CPU module can request up to 20.8 GB/s of bandwidth for read and for write, the typical bandwidth you should expect for the CPU is 4 GB/s per CPU module per direction—about 16 GB/s altogether.
You can expect typical bandwidth to be around 3 GB/s per direction for the: audio, HDD, Camera, and USBs.
The Kinect Sensor is the main consumer of the bandwidth. For example, peak bandwidth to and from the HDD is only about 50 MB/s, so the HDD cannot be seen as a major bandwidth consumer.
Because the GPU is usually pushed to the maximum, you can expect typical coherent bandwidth to be about 25 GB/s. However, this amount depends on how many resources are made snoopable.
Currently, we are not able tell exactly how much of that access will be hitting the CPU’s caches and how much of the access much will go to DRAM. So as we said above, this figure is highly speculative at the moment.
The estimated 25 GB/s of bandwidth for coherent memory access does not account for the non-coherent memory access of the GPU.
The coherent bandwidth that can flow through the north bridge is a limited at 30 GB/s. Under typical conditions, this limit shouldn’t cause you problems. But during a high load on the coherent memory traffic, the north bridge might become saturated. Once the north bridge becomes saturated, you may notice increased latencies for memory access.
CPU memory access that is Write Combined does not fall under this limitation nor does GPU memory access that is non-coherent.
Finally let’s compute how much bandwidth is left for the non-coherent GPU access to consume. Let’s assume that:
  • The sum of bandwidth from the north bridge to DRAM is 25 GB/s.
  • Some portion of the GPU coherent bandwidth misses the L2 caches.
  • Non-coherent CPU bandwidth is 3 GB/s.
This leaves 42 GB/s of DRAM bandwidth available to the GPU clients.

[ 本帖最后由 DVDRiP 于 2013-3-14 01:10 编辑 ]
附件: 您所在的用户组无法下载或查看附件


TOP

低性能确定?!



TOP

引用:
原帖由 KoeiSangokushi 于 2013-3-13 23:07 发表
低性能确定?!
我看这台xbox干脆叫kinectbox算了


TOP

看不懂,技术牛们来解读一下

TOP

引用:
原帖由 DVDRiP 于 2013-3-13 23:12 发表

我看这台xbox干脆叫kinectbox算了
XBOX SURFACE吧
DATA MOVE ENGINE几乎没有作用,微软亏大发了

TOP

弱爆了,要延期?

TOP

引用:
原帖由 首斩破沙罗 于 2013-3-13 23:13 发表
看不懂,技术牛们来解读一下
主内存带宽只有PS4的38.6%
微软花费大力气苦心研究的DATA MOVE ENGINE几乎完全没有效果

TOP

引用:
原帖由 akilla 于 2013-3-13 23:15 发表
弱爆了,要延期?
这破烂还延期?E3开完直接发售都行

TOP

引用:
原帖由 DVDRiP 于 2013-3-13 23:20 发表

这破烂还延期?E3开完直接发售都行
如果敢卖250美金直接无缝接上XBOX360还是有戏的:D

TOP

引用:
原帖由 KoeiSangokushi 于 2013-3-13 23:13 发表

XBOX SURFACE吧
DATA MOVE ENGINE几乎没有作用,微软亏大发了
因为这个时候DME是闲置的啊,全部满载的时候是这张图:
附件: 您所在的用户组无法下载或查看附件

TOP

posted by wap, platform: Huawei (C8950D)

低宽带,低转速,功耗100w以内,机箱应该比wiiu差不多大,为占领客厅打下基础。

本帖最后由 大头木 于 2013-3-13 23:25 通过手机版编辑

TOP

引用:
原帖由 讴歌123 于 2013-3-13 23:23 发表

因为这个时候DME是闲置的啊,全部满载的时候是这张图:
515362
楼主太坏了

TOP

呵呵,抢任天堂饭碗。。

TOP

引用:
原帖由 讴歌123 于 2013-3-13 23:23 发表

因为这个时候DME是闲置的啊,全部满载的时候是这张图:
515362
对了为什么有4个MOVE ENGINE?

TOP

从10楼的图来看,微软的这个ESRAM和DATA MOVE ENGINE非常牛逼
就算以后有游戏开发商抱怨PS4的内存延迟太大不给力我也不会感到稀奇

TOP

发新话题
     
官方公众号及微博