ASIC Repair Training
Hashboard Failure Patterns: What Miner Logs Are Really Telling You
A beginner-safe diagnostic guide to Antminer and Whatsminer hashboard symptoms, miner logs, missing chips, EEPROM faults, thermal errors, voltage problems, and repair-shop escalation.
A miner log is not just error noise.
It is the machine telling you where the failure pattern probably starts: power, cooling, control board communication, hashboard detection, chip chain, EEPROM, or temperature sensing.
The mistake is reading every low-hashrate event as “bad hashboard.” Sometimes the board is bad. Sometimes the board is only the messenger.
The direct answer
Hashboard failures usually show up as missing chains, partial chip counts, zero ASIC detection, abnormal temperatures, EEPROM errors, bad-chip messages, nonce/pattern errors, or voltage-related instability.
But beginner-safe diagnosis should start before chip-level repair:
- Save the log.
- Power down safely.
- Inspect fans, airflow, dust, cables, connectors, and PSU symptoms.
- Reseat known-safe cables only after unplugging.
- Compare whether the failure follows a board, cable, slot, PSU, or control board path.
- Escalate chip-level, voltage-domain, BGA, or fixture work to trained repair.
Do not start with a heat gun. Start with evidence.
Antminer vs Whatsminer logs
Antminer and Whatsminer failures can look similar from the outside, but the diagnostic language is different.
Bitmain / Antminer
Kernel-log pattern reading
Antminers usually point you toward strings like chain[x] only has y chips, No hashboard found, EEPROM read failed, over-temp messages, fan errors, or chip/nonce errors.
MicroBT / Whatsminer
Error-code pattern reading
Whatsminers use structured numeric codes surfaced through the web interface, WhatsMinerTool, and API. Slot references like SM0, SM1, and SM2 help point to the affected hashboard position.
The point is the same either way: read the pattern before replacing parts.
The official-support pattern is boring for a reason
Bitmain’s official ANTMINER troubleshooting material follows the same first-pass logic repair shops teach: check the log, identify the affected chain or section, power down, replug/reseat cables, then escalate if the issue persists.
That sounds basic because it is basic. It is also where a lot of bad repairs are avoided.
For hashboard signal abnormalities, incomplete chip detection, PIC/temp-sensor/EEPROM issues, and abnormal temperature faults, the safe first moves are not chip replacement. They are:
- capture the exact kernel log
- power cycle cleanly if the error looks transient
- reseat ribbon/data/power cables with the miner unplugged
- verify fan and PSU behavior
- use official firmware or SD recovery only when firmware/corruption is a realistic suspect
- stop if the failure repeats after basic checks
If the same board keeps losing chips, sensors, EEPROM identity, or voltage stability, it becomes repair-bench work.
The seven hashboard failure buckets
Most field symptoms fit into one of these buckets.
1. Missing chips or partial chain detection
Common log shapes:
chain[2] only has 54 chips
chain[1] find 0 chips
ASIC NG=(x)
found 27 chips
What it usually means:
- one board is not detecting all chips
- a signal chain is breaking partway through the board
- a voltage domain is weak or missing
- a bad chip is stopping communication downstream
- a cable/connector problem is making the board look dead
Operator translation: the miner may still run, but hashrate usually drops in proportion to the missing board or missing chips.
Beginner-safe first moves:
- save the log before rebooting repeatedly
- power off and unplug
- inspect ribbon cables and power connectors
- check for burnt connectors, dust mats, corrosion, and loose plugs
- compare whether the same chain fails after reseating
Repair-shop escalation:
- test fixture diagnosis
- voltage-domain checks
- signal tracing
- chip-level isolation
- BGA/chip replacement
2. Zero ASIC / no hashboard found
Common log shapes:
No hashboard found
chain[0] find 0 chips
chain[1] find 0 chips
chain[2] find 0 chips
This can be worse than a partial chain error because the control board may not be communicating with one board — or any boards.
Likely causes:
- disconnected or damaged data cable
- control board issue
- board power not present
- EEPROM or PIC-related identification fault on certain models
- board-side signal problem
- PSU/power delivery issue
Beginner-safe first moves:
- confirm the control board boots
- check whether all boards are missing or only one board is missing
- inspect cables and ports
- look for power connector damage
- avoid hot-plugging boards or cables
Important: if all boards vanish at once, do not assume all boards died together. Think PSU, control board, firmware, or common cabling first.
3. Temperature faults that point at the board
Common messages:
ERROR_TEMP_TOO_HIGH
chip temp exceeds threshold
temp sensor read failed
127C
-40C
fan lost
A real over-temp event can damage chips, solder joints, thermal pads, and connectors. But an impossible reading like 127°C or -40°C may indicate a sensor/read problem instead of actual heat.
Likely causes:
- blocked intake or exhaust
- failed fan or wrong fan type
- dust-packed heatsinks
- missing or degraded thermal material
- failed temperature sensor path
- firmware/sensor read issue
- board damage from prior overheating
Beginner-safe first moves:
- check fan RPM and fan direction
- verify all fans are detected
- inspect airflow path
- clean dust safely
- confirm ambient temperature
- do not keep rebooting into thermal shutdown
If a board overheats immediately while others are normal, the board needs deeper inspection.
4. EEPROM / board identity faults
Common messages:
EEPROM read failed
EEPROM data error
EEPROM corrupted
parser EEPROM error
SM0 detect EEPROM error
EEPROM-type errors can make a board appear missing or mismatched because the miner cannot read what the board is, what model it belongs to, or how it should be initialized.
Likely causes:
- corrupted EEPROM data
- poor connector/contact
- firmware issue
- adapter board/cable issue on Whatsminer
- bad EEPROM component or board-side damage
Beginner-safe first moves:
- do not randomly flash firmware from sketchy sources
- verify correct firmware for the exact model
- inspect connectors and cables
- check whether the error follows the board or slot
EEPROM repair/programming is not beginner work unless you already have the proper tools and known-good files.
5. Voltage-domain and power instability
Voltage faults are where beginners get into trouble.
Symptoms can include:
- repeated
set_voltagemessages - power voltage errors
- high hardware errors
- chip detection changing between boots
- one board unstable while others are normal
- all boards sagging together
The key split:
- One board or one domain looks wrong → likely board-side domain/regulator/chip issue.
- All boards look weak together → think PSU, input voltage, power cable, or facility power quality first.
ZeusBTC’s S19 repair guide describes domain voltage, boost-circuit, LDO, and chip signal checks as repair-bench work using proper tools. That is useful technician context, but it is not a beginner checklist.
Beginner-safe first moves:
- check input voltage and breaker/PDU health
- inspect PSU and power cables
- compare whether all chains fail together or one board fails alone
- avoid probing live high-current boards without training
Repair-shop escalation:
- multimeter checks under controlled conditions
- oscilloscope/signal work
- test fixture
- voltage-domain isolation
- regulator/chip replacement
6. Pattern/nonce/chip quality errors
Some boards detect chips but still fail pattern tests or return bad work.
Common shapes:
Pattern NG
nonce error
high HW errors
chip X disabled
B_AXPCS
This can indicate a degraded chip, signal-quality issue, unstable voltage domain, bad thermal contact, or firmware/tuning pushing weak silicon too hard.
Beginner-safe first moves:
- return to stock or conservative firmware settings if overclocked
- verify cooling and ambient temperature
- check whether errors increase under heat
- compare the board against known-good operating conditions
Do not jump from nonce errors straight to chip replacement. Confirm power, cooling, and tuning first.
7. Mimics: PSU, fans, sensors, and control boards
Some of the most expensive wrong repairs start here.
A PSU problem can make every board look unstable. A fan fault can trigger thermal shutdown that looks like board failure. A control board or cable issue can make a good hashboard appear missing. A bad temperature reading can protect-shut a miner before it hashes.
Before calling a hashboard dead, ask:
- Did all boards fail at once?
- Did the fault follow a board, cable, slot, or PSU?
- Are fans reporting correctly?
- Is input voltage stable?
- Are temperatures believable?
- Did the issue start after firmware changes?
This is how you avoid replacing the wrong thing.
Whatsminer codes worth recognizing
Whatsminer’s error system is more structured than Antminer logs. Your exact code list depends on model and firmware, but common patterns include:
- 1xx — fan errors
- 2xx — power supply / power protection errors
- 30x / 32x / 35x / 360 — temperature sensor or temperature protection issues
- 41x / 42x / 45x — EEPROM detection/parser/transfer errors
- 53x — SMx hashboard not found
- 54x — chip ID reading errors
- 55x — bad chip indicators
- 56x / 57x / 58x — balance, transfer, or reset-type hashboard issues
A code like 531 points you toward SM1, the middle hashboard slot in many Whatsminer references. That does not automatically mean the board is dead. It means that slot/board path needs diagnosis.
Whatsminer field cheat sheet
SM0/SM1/SM2 hashboard not found. Start with cable/adapter contact before calling the board dead.
Temperature protection. Check ambient temperature, airflow, fans, and sensor communication.
EEPROM detection or parser errors. Check contacts and model/firmware match before deeper repair.
Unstable voltage or stuck fans can restart a miner or lower hashrate in ways that mimic board failure.
The beginner-safe diagnostic flow
Use this order before any board-level repair.
Step 1: Save the evidence
Screenshot the dashboard. Copy the kernel log or error-code page. Note uptime, ambient temperature, pool status, fan speeds, and which board/slot is affected.
Step 2: Separate network from hardware
Pool errors and network failures can show zero accepted work while the miner itself is fine. If the miner is hashing locally but not submitting shares, that is not a hashboard failure.
Step 3: Power cycle once, not forever
A clean shutdown and restart can clear transient read errors. Reboot loops are not diagnosis. If the same chain, EEPROM, fan, or temperature error returns, stop treating it like a glitch.
Step 4: Check cooling
Fans, airflow, dust, intake restrictions, exhaust recirculation, and ambient temperature can all create board symptoms. As a working habit, keep chips below the model’s safe operating range and do not keep running into thermal protection.
Step 5: Check power
One unstable PSU or bad input circuit can create multiple fake board failures. If every chain gets weird together, think common power before individual boards. On Whatsminer troubleshooting references, unstable input voltage and PSU fan/power faults can show up as restarts, low hashrate, or hashboard-looking failures.
Step 6: Check cables and connectors
Power off first. Then inspect and reseat only what is safe: data cables, fan plugs, power connectors, and obvious loose connections. Look for heat discoloration or melted plastic. If contacts are dirty or corroded, clean carefully with appropriate electronics-safe process and let everything dry fully before power.
Step 7: Use official firmware or SD recovery only when it fits
Firmware or SD-card recovery can make sense for EEPROM/corruption symptoms, control-board software issues, or after a failed update. Use official firmware for the exact model and variant. Do not random-flash firmware as a generic repair move.
Step 8: See if the fault follows the part
If you are trained and the model allows safe controlled swapping, check whether the fault follows a board, slot, cable, or control-board port. If you are not trained, stop here and document the pattern for repair.
Step 9: Escalate to bench diagnosis
Chip-level diagnosis needs the right tools: test fixture, known-good PSU/cables, multimeter, oscilloscope, ESD process, thermal process, and model-specific repair knowledge. For Whatsminer hashboard localization, shops may use fixture-style commands and signal checks to isolate RST/CLK/ASIC paths; that is not beginner work.
When to stop and send it out
Stop before you turn a repairable board into scrap.
Send it to a trained repair bench when:
- one board repeatedly finds partial chips
- EEPROM errors survive cable/firmware sanity checks
- voltage-domain faults are suspected
- connectors are burned
- a board overheats immediately
- chip loss is progressive across boots or days
- multiple chips are disabled or failing pattern tests
- the board needs BGA/chip replacement
- you do not have a fixture or ESD-safe setup
There is no shame in stopping. Good diagnosis includes knowing where the safe line is.
The Orange Signal rule of thumb
Read miner logs like an operator, not a gambler.
- If all boards fail together, suspect common power/control issues first.
- If one board fails consistently, isolate board/cable/slot before assuming chip failure.
- If temperature readings are impossible, think sensor/read path.
- If fans are wrong, fix cooling before diagnosing chips.
- If EEPROM fails, do not random-flash your way into a worse problem.
- If voltage domains are involved, it is bench work.
- If chip replacement is involved, it is not beginner work.
- Monitor hardware errors and temperature behavior daily on production miners.
- Clean dust on a schedule; gentle compressed air and ESD discipline beat emergency rework.
- Do not run a miner with progressive chip loss just to “see if it holds.” That can turn one fault into cascading damage.
The best repair techs are not the ones who touch the hot air station first. They are the ones who know when not to.
FAQ
Does “chain only has Y chips” always mean a bad chip?
No. It often points toward a chip-chain problem, but the cause can be a signal issue, voltage-domain issue, cable/connector fault, or board-side component problem. The log tells you where to look, not what to replace.
Does “no hashboard found” mean all hashboards are dead?
Usually no. If multiple boards disappear at once, check common causes first: control board, firmware, PSU, power delivery, and data cabling.
Are Whatsminer error codes easier than Antminer logs?
They are more structured. Whatsminers use numeric codes and SM0/SM1/SM2 slot references, while Antminers rely more heavily on kernel-log strings. Both still require pattern-based diagnosis.
Can beginners reseat cables and clean dust?
Yes, if the miner is powered off, unplugged, cooled down, and handled with basic ESD care. Beginners should not probe live boards or attempt chip-level rework without training.
What is the most common misdiagnosis?
Calling a board dead when the real issue is PSU instability, fan failure, temperature sensing, cabling, or control-board communication.
Sources and live references
- ZeusBTC — Antminer S19 Hash Board Repair Guide: https://www.zeusbtc.com/manuals/Antminer-S19-Hash-Board-Repair-Guide.asp
- ZeusBTC — Antminer S19j Pro Aluminum Hashboard Repair Guide: https://www.zeusbtc.com/articles/information/4676-antminer-s19j-pro-aluminum-hashboard-repair-guide
- D-Central — Antminer S21 Maintenance & Repair Guide: https://d-central.tech/manuals/antminer-s21-maintenance-repair-guide/
- D-Central — Antminer S21 Repair Guide: Common Issues and Solutions: https://d-central.tech/antminer-s21-repair-guide-common-issues-and-solutions/
- D-Central — EEPROM Error Fix for Antminer: https://d-central.tech/eeprom-error-fix-antminer/
- D-Central — Antminer Error Code Reference: https://d-central.tech/antminer-error-code-reference/
- D-Central — Whatsminer Error Code Reference: https://d-central.tech/whatsminer-error-code-reference/
- ZeusBTC — Whatsminer fault codes and solutions: https://www.zeusbtc.com/articles/asic-miner-troubleshooting/1688-how-to-deal-with-fault-codes-of-whatsminer-series
- ZeusBTC — Whatsminer troubleshooting and solutions: https://www.zeusbtc.com/articles/asic-miner-troubleshooting/1683-whatsminer-troubleshooting-and-solutions
- Bitmain Support — Troubleshooting and solutions for ANTMINER failures: https://support.bitmain.com/hc/en-us/articles/18237912339097-Troubleshooting-and-solutions-for-ANTMINER-failures
- Bitmain Support — Common Problems and Solutions for ANTMINER 19 Series: https://support.bitmain.com/hc/en-us/articles/4406645619097-Common-Problems-and-Solutions-for-ANTMINER-19-series
Related Orange Signal pieces
Weekly signal
Get the signal behind the move.
One weekly field note on Bitcoin markets, mining, energy, business, and the moves that actually matter.