Heat death, how to kill your gpu in less than a year
(Note, I have references here to limiting fan speed to 85%. I've changed my limits to 65% as of 02/2014. It's too much trouble to change every reference here. But, I did change the config file. Ron)
I've been mining Litecoin off and on since April 2013.
I'm using gpus. I wanted to post some info about killing (or not) your
gpu fans. Basically, if the fans fail, your gpu is out of commission as
a miner. The problem is that gpu's are designed to run a few hours per
day at heavy load. They are not designed for 24/7 operation with the
fans at 100%. There is a little blurb about this in the cgminer gpu
readme file. I have personal knowledge that running the fans at high
speed on a couple of particular cards will kill them in less than a
year. I've had this happen on the MSI 7850 cards with the Twin Frozr
III cooling system. RMAing these is a pain and I have to pay shipping
so I'm phasing these out of my system. I think their 7870 cards have
the same design. (Actually MSI's warranty procedure is fairly painless,
but uninstalling and packing and shipping and reinstalling is a pain.)
Not only that, I lose mining capacity in the mean time.
opinion, what you want to do is run the card at a heavy but reasonable
pace. I have my cards set to throttle up the fan and back the cpu as
needed to maintain 85 deg C. I have my upper fan limit set to 85%, as
suggested in the cgminer gpu readme. The electronics of the card should
be able to run at 85 deg C for a very long time. However, the FAN WILL
NOT RUN AT EVEN 85%, much less at 100% for a very long time. The fan
is the weak link. So, unless you want to be RMAing cards until the
factory tells you to stop, or want to prematurely junking cards that are
out of warranty, you HAVE TO PROTECT THE FAN(S) FROM PREMATURE DEATH.
should say that I have my cards in a conventional computer enclosure. I
may experiment with open rigs later, but that's not the situation I'm
I'm totally making the following numbers up. But, say
the fans were designed to run for 4 hours per day at high speed. If
you're running 24/7, which is 6 TIMES more, the card will fail 6 TIMES
sooner than it's original design life. If the design life is 3 years,
that works out to a failure after about 6 months, which is about what I
The problem is the air gap between the cards. Most
cards have the fan intake on the side. When you put two cards side by
side, there is very little air gap between cards to allow air flow. If
the motherboard is vertical, as in a tower case, any card which is on
top of another will not be happy. According to the ATX spec which I
looked up, there should be 1.6" between two double width cards (circuit
cards). If the cards are almost 1.6" thick, there will be NO ROOM for
air intake for the fans. This will make the fans ramp up to maximum
velocity striving to draw in some air and maintain the temperature
target. Thus, the fans wear out. The thinner the card you can get and
the more efficient it's cooling system, the longer they will last.
Generally, the bottom card next to the power supply will have more air
flow anyway. If you have a card that you know is weak, you can stick it
down there. That way, it can work with less fan velocity.
was looking around for new cards, I found one Gigabyte card that had
specs of 1.7" thickness. That one may not even fit in the motherboard
beside another card. Even if it did, it would get no air. I'm not
buying that one.
I'm phasing out the MSI 7850 Twin Frozr and 7870
Twin Frozr cards. They are very thick. When in adjacent slots, they
have very little space between them and they kill their fans quickly.
have some Asus 7850 cards with their DirectCU cooling system. These
cards are noticeably thinner than the MSI's and the fans run noticeably
slower, say in the 40-50% range, to keep the card cool even when they're
adjacent. Even then, there is a visibly larger air gap between them. I
haven't owned them long enough to know when they fail. But, since the
fans are running much slower, it should be much later. If you give them
an even bigger air gap, they love it. I currently have one running
overclocked from 860 MHz original to 1000 MHz, with several inches of
air next to it, running at a cool 70 deg C and 20% fan speed (which is
In considering new cards, I would stick to
manufacturers with a 3 year warranty or more. That pretty much limits
things to Asus and MSI as far as I know. XFX has some sort of lifetime
warranty but it seems like there are lots of catches when you read the
fine print. If you want to put cards adjacent to each other in a
motherboard, don't get anything thicker than 1.5" (rounded off number).
Thinner is much better. I would never buy a card for mining without
Based on reading specs only, but not experience yet, I like the new Asus and MSI R9 200 series cards.
The Asus (R9270) cards have the DirectCU II cooling system.
The MSI (R9270) cards have the Twin Frozr IV cooling system, a version upgrade compared to the cards I own.
did a subjective comparison of an older MSI 7850 and an Asus 7850 by
running the fans on each one up to 100% one at a time. To me,
subjectively, the MSI card seemed louder. I did not do any
measurements. However, if you keep the fans at 30% - 50%, this will be a
Note that I'm NOT recommending to LIMIT the fan to
50%. I'm still setting my upper fan limit to 85%, and cgminer will bump
it to 100% if the temperature exceeds the overheat number.
recommending to buy a thinner card and give it enough air space, if
possible, or overclock it less, so the fan never has to exceed 50% or so
to keep the card cool.
If you look at your card in gpuz, you can
find the default clock frequency, always a good number to keep in
mind. You can also look at the specs. If you're running Linux, and the
ATI driver, you can issue this command:
aticonfig --adapter=all --odgc
stands for overdrive get clocks. It will give you a list of the
current clock settings and loads. It will also show you the adjustable
range for that card. I don't know how rigid that is, but it is at least
The Asus 7850's, for example, have a default clock
of 860 MHz. They have an adjustable range of 300 - 1050. I normally
run them at the max of that range, adjacent to each other, and they have
no problems at all. The fans on the hottest card are running about
50%. That's not as good as 30%, but it's much better than 85%.
also, that I have my minimum gpu engine speed set to 300 MHz. YES, I'm
giving it permission to clock down that low. I want cgminer to do
whatever it takes to avoid frying the card's guts. Wearing out the fans
is one thing. Wearing out the circuit card or components is quite
another. I've seen an example of this underclocking in action. Today I
had a fan failure on an MSI 7850. I was looking at GPUz and noticed
the card clocking at 500 MHz. The fan was still at 85%, and the typical
clock speed would have been around 1000 MHz.
Well, I discovered
that one fan had failed, essentially half the cooling system. So, the
card was severely underclocking to keep the temperature below 85 deg C.
That's exactly what I wanted it to do. This is an advantage of using
the gpu-engine command in cgminer. It can overclock and underclock, if
you let it, to protect the card. It will completely shut down the card
if it exceeds the temp-cutoff number.
This turned out to be longer than I thought, so I don't know what I'll post later.
those who are interested, here's my config file. By the way, for those
still running cgminer from the command line or a batch file, it's
totally worth learning the config file. It's much easier to configure
then. You don't have to write it from scratch. Get it running with a
command first in interactive mode. Then use the Settings, Write command
to write out a config file with the current settings. You can then
edit that. It's best to start with a multi gpu setup so you know how
that's structured in the file. I've had problems doing gpu engine and
fan control with multiple independent cgminer instances. I recommend
you run all gpu's from one cgminer.
In this config file, the last
parameter is the one driving my monitor, so it's different. If you
cannot tell which card is which, drive the fans on each one to 100% one
at a time while the others are at 20% or so. Physically touch the FRAME
of the card and feel which one has vibration. DO NOT TOUCH THE FAN
WITH YOUR FINGER OR STICK ANYTHING INTO THE BLADE. Note which slot in
the config file controls that card and which instance of gpuz (for
example) monitors that card. Note which items can take multiple
parameters and which cannot. For example, most of the temperature
limits take multiple parameters, but temperature hysteresis does not.
Note that many of these parameters were generated by cgminer when
writing the file and I have not edited them. I have set the lines that I
edited to bold. If you add or delete cards, you will have to edit all multi parameter lines whether you originally edited them or not and add or delete parameters.
"pools" : [
"url" : "stratum+tcp://ltc-eu.give-me-coins.com:3334",
"user" : "USERNAME",
"pass" : "PASSWORD"
"gpu-reorder" : true,
"intensity" : "19,19,15",
"vectors" : "1,1,1",
"worksize" : "256,256,256",
"kernel" : "scrypt,scrypt,scrypt",
"lookup-gap" : "0,0,0",
"thread-concurrency" : "0,0,8192",
"shaders" : "0,0,0",
"gpu-engine" : "300-1050,300-1050,300-1000",
"gpu-fan" : "20-65,20-65,20-65",
"gpu-memclock" : "0,0,0",
"gpu-memdiff" : "0,0,0",
"gpu-powertune" : "0,0,0",
"gpu-vddc" : "0.000,0.000,0.000",
"temp-cutoff" : "95,95,95",
"temp-overheat" : "90,90,90",
"temp-target" : "84,84,84",
"api-mcast-port" : "4028",
"api-port" : "4028",
"auto-fan" : true,
"auto-gpu" : true,
"expiry" : "120",
"gpu-dyninterval" : "7",
"gpu-platform" : "0",
"gpu-threads" : "1",
"hotplug" : "5",
"log" : "1",
"no-pool-disable" : true,
"queue" : "1",
"scan-time" : "30",
"scrypt" : true,
"temp-hysteresis" : "3",
"shares" : "0",
"kernel-path" : "/usr/local/bin"
I hope this is helpful to others with similar issues and concerns.