Hashboard Failure Patterns: What Miner Logs Are Really Telling You

A beginner-safe diagnostic guide to Antminer and Whatsminer hashboard symptoms, miner logs, missing chips, EEPROM faults, thermal errors, voltage problems, and repair-shop escalation.

A miner log is not just error noise.

It is the machine telling you where the failure pattern probably starts: power, cooling, control board communication, hashboard detection, chip chain, EEPROM, or temperature sensing.

The mistake is reading every low-hashrate event as “bad hashboard.” Sometimes the board is bad. Sometimes the board is only the messenger.

The direct answer

Hashboard failures usually show up as missing chains, partial chip counts, zero ASIC detection, abnormal temperatures, EEPROM errors, bad-chip messages, nonce/pattern errors, or voltage-related instability.

But beginner-safe diagnosis should start before chip-level repair:

Save the log.
Power down safely.
Inspect fans, airflow, dust, cables, connectors, and PSU symptoms.
Reseat known-safe cables only after unplugging.
Compare whether the failure follows a board, cable, slot, PSU, or control board path.
Escalate chip-level, voltage-domain, BGA, or fixture work to trained repair.

Do not start with a heat gun. Start with evidence.

Antminer vs Whatsminer logs

Antminer and Whatsminer failures can look similar from the outside, but the diagnostic language is different.

Bitmain / Antminer

Kernel-log pattern reading

Antminers usually point you toward strings like chain[x] only has y chips, No hashboard found, EEPROM read failed, over-temp messages, fan errors, or chip/nonce errors.

MicroBT / Whatsminer

Error-code pattern reading

Whatsminers use structured numeric codes surfaced through the web interface, WhatsMinerTool, and API. Slot references like SM0, SM1, and SM2 help point to the affected hashboard position.

The point is the same either way: read the pattern before replacing parts.

The official-support pattern is boring for a reason

Bitmain’s official ANTMINER troubleshooting material follows the same first-pass logic repair shops teach: check the log, identify the affected chain or section, power down, replug/reseat cables, then escalate if the issue persists.

That sounds basic because it is basic. It is also where a lot of bad repairs are avoided.

For hashboard signal abnormalities, incomplete chip detection, PIC/temp-sensor/EEPROM issues, and abnormal temperature faults, the safe first moves are not chip replacement. They are:

capture the exact kernel log
power cycle cleanly if the error looks transient
reseat ribbon/data/power cables with the miner unplugged
verify fan and PSU behavior
use official firmware or SD recovery only when firmware/corruption is a realistic suspect
stop if the failure repeats after basic checks

If the same board keeps losing chips, sensors, EEPROM identity, or voltage stability, it becomes repair-bench work.

The seven hashboard failure buckets

Most field symptoms fit into one of these buckets.

1. Missing chips or partial chain detection

Common log shapes:

chain[2] only has 54 chips
chain[1] find 0 chips
ASIC NG=(x)
found 27 chips

What it usually means:

one board is not detecting all chips
a signal chain is breaking partway through the board
a voltage domain is weak or missing
a bad chip is stopping communication downstream
a cable/connector problem is making the board look dead

Operator translation: the miner may still run, but hashrate usually drops in proportion to the missing board or missing chips.

Beginner-safe first moves:

save the log before rebooting repeatedly
power off and unplug
inspect ribbon cables and power connectors
check for burnt connectors, dust mats, corrosion, and loose plugs
compare whether the same chain fails after reseating

Repair-shop escalation:

test fixture diagnosis
voltage-domain checks
signal tracing
chip-level isolation
BGA/chip replacement

2. Zero ASIC / no hashboard found

Common log shapes:

No hashboard found
chain[0] find 0 chips
chain[1] find 0 chips
chain[2] find 0 chips

This can be worse than a partial chain error because the control board may not be communicating with one board — or any boards.

Likely causes:

disconnected or damaged data cable
control board issue
board power not present
EEPROM or PIC-related identification fault on certain models
board-side signal problem
PSU/power delivery issue

Beginner-safe first moves:

confirm the control board boots
check whether all boards are missing or only one board is missing
inspect cables and ports
look for power connector damage
avoid hot-plugging boards or cables

Important: if all boards vanish at once, do not assume all boards died together. Think PSU, control board, firmware, or common cabling first.

3. Temperature faults that point at the board

Common messages:

ERROR_TEMP_TOO_HIGH
chip temp exceeds threshold
temp sensor read failed
127C
-40C
fan lost

A real over-temp event can damage chips, solder joints, thermal pads, and connectors. But an impossible reading like 127°C or -40°C may indicate a sensor/read problem instead of actual heat.

Likely causes:

blocked intake or exhaust
failed fan or wrong fan type
dust-packed heatsinks
missing or degraded thermal material
failed temperature sensor path
firmware/sensor read issue
board damage from prior overheating

Beginner-safe first moves:

check fan RPM and fan direction
verify all fans are detected
inspect airflow path
clean dust safely
confirm ambient temperature
do not keep rebooting into thermal shutdown

If a board overheats immediately while others are normal, the board needs deeper inspection.

4. EEPROM / board identity faults

Common messages:

EEPROM read failed
EEPROM data error
EEPROM corrupted
parser EEPROM error
SM0 detect EEPROM error

EEPROM-type errors can make a board appear missing or mismatched because the miner cannot read what the board is, what model it belongs to, or how it should be initialized.

Likely causes:

corrupted EEPROM data
poor connector/contact
firmware issue
adapter board/cable issue on Whatsminer
bad EEPROM component or board-side damage

Beginner-safe first moves:

do not randomly flash firmware from sketchy sources
verify correct firmware for the exact model
inspect connectors and cables
check whether the error follows the board or slot

EEPROM repair/programming is not beginner work unless you already have the proper tools and known-good files.

5. Voltage-domain and power instability

Voltage faults are where beginners get into trouble.

Symptoms can include:

repeated set_voltage messages
power voltage errors
high hardware errors
chip detection changing between boots
one board unstable while others are normal
all boards sagging together

The key split:

One board or one domain looks wrong → likely board-side domain/regulator/chip issue.
All boards look weak together → think PSU, input voltage, power cable, or facility power quality first.

ZeusBTC’s S19 repair guide describes domain voltage, boost-circuit, LDO, and chip signal checks as repair-bench work using proper tools. That is useful technician context, but it is not a beginner checklist.

Beginner-safe first moves:

check input voltage and breaker/PDU health
inspect PSU and power cables
compare whether all chains fail together or one board fails alone
avoid probing live high-current boards without training

Repair-shop escalation:

multimeter checks under controlled conditions
oscilloscope/signal work
test fixture
voltage-domain isolation
regulator/chip replacement

6. Pattern/nonce/chip quality errors

Some boards detect chips but still fail pattern tests or return bad work.

Common shapes:

Pattern NG
nonce error
high HW errors
chip X disabled
B_AXPCS

This can indicate a degraded chip, signal-quality issue, unstable voltage domain, bad thermal contact, or firmware/tuning pushing weak silicon too hard.

Beginner-safe first moves:

return to stock or conservative firmware settings if overclocked
verify cooling and ambient temperature
check whether errors increase under heat
compare the board against known-good operating conditions

Do not jump from nonce errors straight to chip replacement. Confirm power, cooling, and tuning first.

7. Mimics: PSU, fans, sensors, and control boards

Some of the most expensive wrong repairs start here.

A PSU problem can make every board look unstable. A fan fault can trigger thermal shutdown that looks like board failure. A control board or cable issue can make a good hashboard appear missing. A bad temperature reading can protect-shut a miner before it hashes.

Before calling a hashboard dead, ask:

Did all boards fail at once?
Did the fault follow a board, cable, slot, or PSU?
Are fans reporting correctly?
Is input voltage stable?
Are temperatures believable?
Did the issue start after firmware changes?

This is how you avoid replacing the wrong thing.

Whatsminer codes worth recognizing

Whatsminer’s error system is more structured than Antminer logs. Your exact code list depends on model and firmware, but common patterns include:

1xx — fan errors
2xx — power supply / power protection errors
30x / 32x / 35x / 360 — temperature sensor or temperature protection issues
41x / 42x / 45x — EEPROM detection/parser/transfer errors
53x — SMx hashboard not found
54x — chip ID reading errors
55x — bad chip indicators
56x / 57x / 58x — balance, transfer, or reset-type hashboard issues

A code like 531 points you toward SM1, the middle hashboard slot in many Whatsminer references. That does not automatically mean the board is dead. It means that slot/board path needs diagnosis.

Whatsminer field cheat sheet

530–532
SM0/SM1/SM2 hashboard not found. Start with cable/adapter contact before calling the board dead.

350–360
Temperature protection. Check ambient temperature, airflow, fans, and sensor communication.

410–422
EEPROM detection or parser errors. Check contacts and model/firmware match before deeper repair.

PSU/fan symptoms
Unstable voltage or stuck fans can restart a miner or lower hashrate in ways that mimic board failure.

The beginner-safe diagnostic flow

Use this order before any board-level repair.

Step 1: Save the evidence

Screenshot the dashboard. Copy the kernel log or error-code page. Note uptime, ambient temperature, pool status, fan speeds, and which board/slot is affected.

Step 2: Separate network from hardware

Pool errors and network failures can show zero accepted work while the miner itself is fine. If the miner is hashing locally but not submitting shares, that is not a hashboard failure.

Step 3: Power cycle once, not forever

A clean shutdown and restart can clear transient read errors. Reboot loops are not diagnosis. If the same chain, EEPROM, fan, or temperature error returns, stop treating it like a glitch.

Step 4: Check cooling

Fans, airflow, dust, intake restrictions, exhaust recirculation, and ambient temperature can all create board symptoms. As a working habit, keep chips below the model’s safe operating range and do not keep running into thermal protection.

Step 5: Check power

One unstable PSU or bad input circuit can create multiple fake board failures. If every chain gets weird together, think common power before individual boards. On Whatsminer troubleshooting references, unstable input voltage and PSU fan/power faults can show up as restarts, low hashrate, or hashboard-looking failures.

Step 6: Check cables and connectors

Power off first. Then inspect and reseat only what is safe: data cables, fan plugs, power connectors, and obvious loose connections. Look for heat discoloration or melted plastic. If contacts are dirty or corroded, clean carefully with appropriate electronics-safe process and let everything dry fully before power.

Step 7: Use official firmware or SD recovery only when it fits

Firmware or SD-card recovery can make sense for EEPROM/corruption symptoms, control-board software issues, or after a failed update. Use official firmware for the exact model and variant. Do not random-flash firmware as a generic repair move.

Step 8: See if the fault follows the part

If you are trained and the model allows safe controlled swapping, check whether the fault follows a board, slot, cable, or control-board port. If you are not trained, stop here and document the pattern for repair.

Step 9: Escalate to bench diagnosis

Chip-level diagnosis needs the right tools: test fixture, known-good PSU/cables, multimeter, oscilloscope, ESD process, thermal process, and model-specific repair knowledge. For Whatsminer hashboard localization, shops may use fixture-style commands and signal checks to isolate RST/CLK/ASIC paths; that is not beginner work.

When to stop and send it out

Stop before you turn a repairable board into scrap.

Send it to a trained repair bench when:

one board repeatedly finds partial chips
EEPROM errors survive cable/firmware sanity checks
voltage-domain faults are suspected
connectors are burned
a board overheats immediately
chip loss is progressive across boots or days
multiple chips are disabled or failing pattern tests
the board needs BGA/chip replacement
you do not have a fixture or ESD-safe setup

There is no shame in stopping. Good diagnosis includes knowing where the safe line is.

The Orange Signal rule of thumb

Read miner logs like an operator, not a gambler.

If all boards fail together, suspect common power/control issues first.
If one board fails consistently, isolate board/cable/slot before assuming chip failure.
If temperature readings are impossible, think sensor/read path.
If fans are wrong, fix cooling before diagnosing chips.
If EEPROM fails, do not random-flash your way into a worse problem.
If voltage domains are involved, it is bench work.
If chip replacement is involved, it is not beginner work.
Monitor hardware errors and temperature behavior daily on production miners.
Clean dust on a schedule; gentle compressed air and ESD discipline beat emergency rework.
Do not run a miner with progressive chip loss just to “see if it holds.” That can turn one fault into cascading damage.

The best repair techs are not the ones who touch the hot air station first. They are the ones who know when not to.

FAQ

Does “chain only has Y chips” always mean a bad chip?

No. It often points toward a chip-chain problem, but the cause can be a signal issue, voltage-domain issue, cable/connector fault, or board-side component problem. The log tells you where to look, not what to replace.

Does “no hashboard found” mean all hashboards are dead?

Usually no. If multiple boards disappear at once, check common causes first: control board, firmware, PSU, power delivery, and data cabling.

Are Whatsminer error codes easier than Antminer logs?

They are more structured. Whatsminers use numeric codes and SM0/SM1/SM2 slot references, while Antminers rely more heavily on kernel-log strings. Both still require pattern-based diagnosis.

Can beginners reseat cables and clean dust?

Yes, if the miner is powered off, unplugged, cooled down, and handled with basic ESD care. Beginners should not probe live boards or attempt chip-level rework without training.

What is the most common misdiagnosis?

Calling a board dead when the real issue is PSU instability, fan failure, temperature sensing, cabling, or control-board communication.

Sources and live references

ZeusBTC — Antminer S19 Hash Board Repair Guide: https://www.zeusbtc.com/manuals/Antminer-S19-Hash-Board-Repair-Guide.asp
ZeusBTC — Antminer S19j Pro Aluminum Hashboard Repair Guide: https://www.zeusbtc.com/articles/information/4676-antminer-s19j-pro-aluminum-hashboard-repair-guide
D-Central — Antminer S21 Maintenance & Repair Guide: https://d-central.tech/manuals/antminer-s21-maintenance-repair-guide/
D-Central — Antminer S21 Repair Guide: Common Issues and Solutions: https://d-central.tech/antminer-s21-repair-guide-common-issues-and-solutions/
D-Central — EEPROM Error Fix for Antminer: https://d-central.tech/eeprom-error-fix-antminer/
D-Central — Antminer Error Code Reference: https://d-central.tech/antminer-error-code-reference/
D-Central — Whatsminer Error Code Reference: https://d-central.tech/whatsminer-error-code-reference/
ZeusBTC — Whatsminer fault codes and solutions: https://www.zeusbtc.com/articles/asic-miner-troubleshooting/1688-how-to-deal-with-fault-codes-of-whatsminer-series
ZeusBTC — Whatsminer troubleshooting and solutions: https://www.zeusbtc.com/articles/asic-miner-troubleshooting/1683-whatsminer-troubleshooting-and-solutions
Bitmain Support — Troubleshooting and solutions for ANTMINER failures: https://support.bitmain.com/hc/en-us/articles/18237912339097-Troubleshooting-and-solutions-for-ANTMINER-failures
Bitmain Support — Common Problems and Solutions for ANTMINER 19 Series: https://support.bitmain.com/hc/en-us/articles/4406645619097-Common-Problems-and-Solutions-for-ANTMINER-19-series

Hashboard Failure Patterns: What Miner Logs Are Really Telling You

The direct answer

Antminer vs Whatsminer logs

Kernel-log pattern reading

Error-code pattern reading

The official-support pattern is boring for a reason

The seven hashboard failure buckets

1. Missing chips or partial chain detection

2. Zero ASIC / no hashboard found

3. Temperature faults that point at the board

4. EEPROM / board identity faults

5. Voltage-domain and power instability

6. Pattern/nonce/chip quality errors

7. Mimics: PSU, fans, sensors, and control boards

Whatsminer codes worth recognizing

The beginner-safe diagnostic flow

Step 1: Save the evidence

Step 2: Separate network from hardware

Step 3: Power cycle once, not forever

Step 4: Check cooling

Step 5: Check power

Step 6: Check cables and connectors

Step 7: Use official firmware or SD recovery only when it fits

Step 8: See if the fault follows the part

Step 9: Escalate to bench diagnosis

When to stop and send it out

The Orange Signal rule of thumb

FAQ

Does “chain only has Y chips” always mean a bad chip?

Does “no hashboard found” mean all hashboards are dead?

Are Whatsminer error codes easier than Antminer logs?

Can beginners reseat cables and clean dust?

What is the most common misdiagnosis?

Sources and live references

Related Orange Signal pieces

Get the signal behind the move.