Where is the progress in RAM and why should it be overclocked?

Hi GT! We all love new hardware - it's nice to work at a fast computer, and not to look at all kinds of progress bars and other hourglasses. If everything is more or less clear with processors and video cards: here is a new generation, you will get your 10-20-30-50% of performance, then everything is not so simple with RAM.

Where is the progress in the memory modules, why the price of a gigabyte almost does not fall and how to please your computer is in our iron educational program.

DDR4

The DDR4 memory standard has several advantages over DDR3: large maximum frequencies (i.e. bandwidth), lower voltage (and heat dissipation), and, of course, double the capacity per module.

The Electronic Industries Alliance Committee for Engineering Standardization of Semiconductor Products (better known as JEDEC) is working to ensure that your Kingston RAM matches the ASUS or Gigabyte motherboard, and everyone plays according to these rules. In terms of electrics, physics and connectors everything is tough (it’s understandable, you need to ensure physical compatibility), but in terms of operating frequencies, module volumes and delays, the rules allow for some volatility: if you want to do better, do it on standard settings users had no problems.

This is exactly how DDR3 modules with a frequency higher than 1600 MHz and DDR4 with frequencies higher than 3200 MHz turned out: they exceed the basic specifications and can work both on “standard” parameters compatible with all motherboards and extreme profiles (XMP), factory tested and wired in BIOS.

Progress

Major improvements in this area are carried out in several directions at once. First, the manufacturers of memory chips themselves (Hynix, Samsung, Micron and Toshiba) constantly improve the internal architecture of the chips within the same process technology. From revision to revision, the internal topology is brought to perfection, ensuring uniform heating and reliable operation.

Secondly, the memory is slowly moving to a new process technology. Unfortunately, improvements cannot be made here as quickly as the video card or CPU manufacturers have done (they have done it in the past 10 years): a drastic reduction in the size of the working parts, that is, transistors, will require a corresponding reduction in the operating voltages, which are limited by the JEDEC standard and the memory controllers built into the CPU .

Therefore, the only thing that remains is not only to “press” production standards, but also in parallel to increase the speed of operation of each chip, which will require a corresponding increase in voltage. As a result, the frequency increases, and the volume of one module.

There are many examples of such development. In 2009-2010, the normal choice was between 2/4 gigabytes of DDR3 1066 MHz and DDR3 1333 MHz per module (both were made using the 90-nm process technology). Today, the dying standard is ready to offer you 1600, 1866, 2000 and even 2133 MHz operating frequencies on modules of 4, 8 and 16 GB, though inside it is already 32, 30 and even 28 nm.

Unfortunately, such an upgrade costs a lot of money (primarily for research, equipment purchase and production process debugging), so it’s not necessary to wait for a radical reduction in the price of 1 GB of RAM before the DDR5 release: well, there’s another doubling production price.

The price of improvements, acceleration and the search for balance

The growing volume and speed of work directly affects another parameter of the RAM - delays (they are timings). The operation of microcircuits at high frequencies is still not willing to violate the laws of physics, and various operations (searching for information on a microcircuit, reading, writing, updating the cell) require certain time intervals. The reduction of the technical process yields its fruits, and the timings grow slower than the operating frequencies, but here it is necessary to maintain a balance between the speed of linear reading and the speed of response.

For example, the memory can work on 2133 MHz and 2400 MHz profiles with the same timings (15-15-15-29) - in this case, overclocking is justified: with a higher frequency of delays in a few cycles, it will only decrease, and you will not only increase the linear speed read, but also the response speed. But if the next threshold (2666 MHz) requires an increase in delays of 1-2, or even 3 units, it is worth considering. Let's make simple calculations.

We divide the working frequency into the first timing (CAS). The higher the ratio, the better:

2133/15 = 142.2

2400/15 = 160

2666/16 = 166,625

2666/17 = 156,823

The resulting value is the denominator in fractions of 1 second / X * 1,000,000. That is, the higher the number, the lower the delay between receiving information from the memory controller and sending data back.

As can be seen from the calculations, the largest increase is an upgrade from 2133 to 2400 MHz with the same timings. Increasing the delay by 1 clock, which is necessary for stable operation at 2666 MHz, still gives advantages (but not so serious), and if your memory works at an increased frequency only with an increase in timing by 2 units, the performance will even slightly decrease relative to 2400 MHz.

The reverse is also true: if the modules do not want to increase the frequencies at all (that is, you have groped the limit for your particular memory set), you can try to play a little “free” performance, reducing delays.

In fact, there are a few more factors, but even these simple calculations will help not to screw up memory overclocking: there is no point in squeezing the maximum speed out of the modules if the results become worse than the average figures.

Practical use of memory overclocking

In terms of software, such manipulations are primarily won by tasks that constantly use memory, not in streaming read mode, but jerking random data. That is, games, photoshop and all sorts of programmer tasks.

Hardware-based systems with graphics integrated in the processor (and devoid of their own video memory) receive a significant performance increase both with reduced delays and increased operating frequencies: a simple controller and low bandwidth often becomes the bottleneck of integrated GPUs. So if your favorite “Tanks” barely crawl on the built-in graphics of an old computer - you know what you can try to take to improve the situation.

Mainstream

As it is not strange, average users benefit most from such improvements. No, of course, overclockers, professionals and players with a full wallet get their 0.5% performance using extreme modules with extremely high frequencies, but their market share is small.

What is under the hood?

White aluminum radiators are quite simple to remove. Zero step: grounding on the battery or some other metallic contact with the ground and letting the static flow away - we don’t want to make a ridiculous accident kill the memory module?

Step one: we warm the memory module with a hair dryer or active read-write loads (in the second case, you need to quickly turn off the PC, de-energize it and remove the RAM while it is still hot).

Step two: we find the side without a sticker and gently pick up the radiator with something in the center and along the edges. You can use the circuit board as a base for the lever, but with caution. Carefully choose the point of support, try to avoid pressure on the fragile elements. It is better to act according to the principle “slowly but surely”.

Step three: open the radiator and disconnect the locks. Here they are, precious chips. Soldered on one side. Manufacturer - Micron, model chips 6XA77 D9SRJ.

8 pieces of 1 GB each, factory profile - 2400 MHz @ CL16.

True, you should not remove the heat spreaders at home - tear off the seal and cry your lifetime warranty ¹ . Yes, and native radiators do an excellent job with the functions assigned to them.

Let's try to measure the effect of overclocking by the example of the HyperX Fury HX426C16FW2K4 / 32 kit. Decoding the name gives us the following information: HX4 - DDR4, 26 - factory frequency 2666 MHz, C16 - delays CL16. Next comes the color code of the radiators (in our case, white), and the description of the K4 / 32 kit - a set of 4 modules with a total volume of 32 GB. That is, it is already clear that the RAM is slightly overclocked during production: instead of the standard 2400, the profile of 2666 MHz is flashed with the same timings.

In addition to the aesthetic pleasure of contemplating the four “Snow White” in the chassis of your PC, this set is ready to offer weighty 32 gigabytes of memory and is aimed at users of conventional processors who are not particularly indulging in overclocking the CPU. Modern Intels without the letter K at the end have finally lost all possible ways to get free performance, and practically do not receive any memory bonuses with a frequency higher than 2400 MHz.

We took two computers as test benches. One is based on the Intel Core i7-6800K and ASUS X99 motherboard (it represents an enthusiast platform with a four-channel memory controller), the second with a Core i5-7600 inside (this one will be blown out for mainstream hardware with integrated graphics and no overclocking). On the first, we will check the overclocking potential of the memory, and on the second we will measure the real performance in games and work software.

Overclocking potential

With standard profiles JEDEC and factory XMP memory has the following modes:

DDR4-2666 CL15-17-17 @ 1.2V

DDR4-2400 CL14-16-16 @ 1.2V

DDR4-2133 CL12-14-14 @ 1.2V

It is easy to see that the timings settings under 2400 MHz make the memory not as responsive as the profiles 2133 and 2666 MHz.

2133/12 = 177.75

2400/14 = 171.428

2666/15 = 177.7 (3)

Attempts to start the memory at a frequency of 2900 MHz with an increase in the delays to 16-17-18, 17-18-18, 17-19-19 and even with a voltage rise to 1.3 Volt did not yield anything. Without serious loads, the computer works, but Photoshop, the archiver or the benchmark spit out errors or dump the system in the BSOD. It seems that the frequency potential of the modules is selected to the end, and the only thing left for us is to reduce the delays.

The best result that was achieved with a test set of 4 modules - 2666 MHz with CL13-14-13 timings. This will significantly increase the speed of access to random data (2666/13 = 205.07) and should show a good improvement in the results in the game benchmark. In dual-channel mode, memory accelerates better: experts from the oclab managed to bring a set of two 16 GB modules to 3000 MHz @ CL14-15-15-28 with voltage raises to 1.4 Volts - an excellent result.

Full-scale tests

For our i5 with integrated graphics, we chose GTA V as a benchmark. The game is not young, it uses API DirectX 11, which has long been known and perfectly crafted in Intel drivers, likes to consume RAM and loads the system on all fronts: GPU, CPU, Ram reading from disk. Classic. At the same time, GTA V uses the so-called. “Deferred rendering”, due to which the frame calculation time is less dependent on the complexity of the scene, that is, the test procedure will be cleaner, and the results will be more obvious.

For the average FPS, we take the values that fit into the normal course of the game: the flight of an airplane, riding in a city, and the destruction of adversaries have a uniform load profile. For such scenes (discarding 1% of the best and worst results from the data array) and get a medium-game FPS.

We will define subsidence by scenes with explosions and complex effects (waterfall under the bridge, sunset landscapes) in the same way.

Podlagivaniya and unpleasant friezes with a sharp change of environment (switching from one test case to another) happen even on the monstrous GTX 1080Ti, we will try to mark them, but we don’t take the results: it doesn’t occur in the game, and this is probably the joint of the benchmark.

Demo stand configuration

CPU: Intel Core i5-7500 (4c4t @ 3.8 GHz)

GPU: Intel HD530

RAM: 32 GB HyperX Fury White (2133 MHz CL12, 2666 MHz CL15 and 2666 MHz CL13)

MB: ASUS B250M

SSD: Kingston A400 240 GB

First, let's set the standard XMP profile frequencies: 2666 MHz with timings 15-17-17. The built-in benchmark GTA V produces identical FPS and identical drawdowns at minimum and medium settings at 720p resolution: in most scenes the counter oscillates around 30–32, and in heavy scenes and when changing one location to another, FPS sags.

The reason is obvious - the capacity of the GPU is enough, but the rasterization units simply do not have time to collect and draw a larger number of frames per second. At the “high” graphics settings, the results are rapidly deteriorating: the game begins to rest directly on the modest computational capabilities of the integrated graphics.

2133 MHz CL12

The GPU does not have its own memory, and it has to constantly pull the system memory. DDR4 bandwidth in dual channel mode at a frequency of 2133 MHz will be 64 bits (8 bytes) × 2 133 000 000 MHz × 2 channels - about 34 Gb / s, with small (up to 10%) overhead losses.

For comparison, the memory subsystem capacity of the most modest discrete NVIDIA GTX 1030 card is 48 Gb / s, and the GTX 1050 Ti (which easily issues in GTA V 60 FPS at maximum settings in FullHD) - already 112 Gb / s.

In the background you can see the same waterfall under the bridge, dropping the FPS in the in-game benchmark.

The benchmark results sank to 28 FPS on average, and lags, when changing locations and explosions of their non-stressed subsidence, turned into unpleasant microfreezes.

2666 MHz CL13

Reducing timings has significantly reduced the time to wait for a response from memory, and we already have standard results with this frequency: we can compare three benchmarks and get a clear picture. The bandwidth for 2666 MHz is already 21.3 Gb / s × 2 channels ~ 40 Gb / s, comparable to the younger NVIDIA.

The maximum FPS practically did not grow (0.1 is not an indicator and is on the verge of measurement error) - here we still run into the modest capabilities of the ROPs, but all the drawdowns have become less noticeable. In scenes with a waterfall, the result did not change due to the high computational load, in all the others — that is, during loading, explosions and other joys that slowed down the work of the video core grew by an average of 10-15%. Instead of 25–27 shots in episodes loaded with events, confident 28–29. In general, the game began to feel much more comfortable.

TL; DR and results

It is impossible to estimate the speed of the RAM by the frequency alone. DDR4 has quite large clock delays, and other things being equal, it is worth choosing a memory that not only satisfies the needs of your hardware in terms of operating frequency and volume, but also pay attention to this parameter.

The tests showed that computers based on Intel Core i-series with integrated graphics get a noticeable performance boost when using high-speed memory with low latency. The video core does not have its own resources for storing and processing data and uses the system ones to perfectly respond (up to a certain limit) to increasing frequency and decreasing timings, since the time of drawing a frame with many objects directly depends on the speed of memory access.

The most important! The Fury lineup is available in several colors: white, red and black - you can choose not only fast memory, but also the right fit for the rest of the components, as HyperPC experts do .

Kirchhoff's law and a bit of school education magic suggest that memory with black radiators will be somewhat colder in operation than other options. Well, for non-believers in svyatuyu Physics there is a wonderful proof on the educational channel MEPI.

If everything is clear with the mainstream solutions, then in the top segment, where each sistemnik is a small work of art, using HyperX memory and drives from ordinary product lines is like a sign of quality. When creating each custom project, many factors have to be taken into account: heat loads, wishes of a capricious client, distribution of air flows, acoustic issues (a powerful computer and a quiet powerful computer are tasks that differ in complexity by an order). HyperPC constantly improve their technological processes and remain faithful to reliable components - hence the excellent results in their unique assemblies. But if you prefer ready-made computers - samobsor, then a similar set or single modules HyperX Fury DDR4 can be purchased on the Ulmart network .

That's all, but we do not say goodbye. Cool summer - hot topics, subscribe to our blog and all the interesting things will not pass by.

¹ - Due to the peculiarities of the Russian legislation, the "lifetime" guarantee will be valid only 10 years from the date of purchase. However, the scale of computer hardware with the current pace of technology development and 10 years is not a small period, and there the law may change.

All Articles