PCB Debug
19 Sep 2019The most important step in debugging is finding the correct root cause.
List of possible sources of problems that can be used as a check list to review at each phase of a PCB project:
- Problem in the conceptual design: even if your plan is executed perfectly, the plan is flawed.
- Problem in the part selection: you chose the wrong part. The design is implemented perfectly, but the BOM is incorrect. You did not read the datasheet very well and did not specify the correct part.
- Problem in the schematic: you wired the parts up incorrectly. The layout was executed exactly as per the schematic, but the schematic was wrong.
- Problem in the connectivity of the layout: when you routed a trace, it did not make contact with the pad to complete the circuit.
- Excessive noise picked up due to a signal or power integrity layout issue. Maybe you routed a trace over a gap in the return path creating too much switching noise that caused a false trigger.
- Wrong part assembed to the board.
- You have a missing part.
- Bad part assembled to the board. You parts were possibly ESK damaged before you got them, or thermally damaged by your too hot soldering iron, or the batch of parts you purchased were all bad parts.
- Manufacturing defect in the fab. Maybe the trace between two pads has a defect causing it to be an open.
- Error in the assembly. You could have a cold solder joint between a lead of the IC and its pad.
- Error in the test set up. Maybe the input voltage to the SMPS needed to be 7 V and instead, you applied 5 V.
- Error in the measurements.
- Error in the code.
After the board is assembled to some degree and something is not working as expected, there are four processes that can be used to try to find the root cause:
- Check the power rails. Start at the source of the pwer and measure each node. Verify the DC voltage expected AND the ripple noise level expected is present. When possible, measure the current draw from the power supply and compare it to what you expect. Try to isolate some of the circuits on the board using jumpers to isolate the problems.
- Check the signal paths at the inputs and outputs of all the signal pins. Trace the signal path through each device. Verify the frequency, peak-to-peak vlaues. Check for continuity of the signal path, from the TX to the RX and through the entire circuit. Isolate the circuit when possible.
- Check the noise on signal lines. If there is a quiet low and high that can be used, you can measure the cross talk to these quiet lines directly. Assemble the power rail components and test the voltage.
- Monitor the power rail noise while the part operates. Often times the voltage noise on the power rail can be a useful diagnostic for what might be going wrong in the circuit. Knowing the expected power budget will give you insight into what is “normal” and what might be an indication of a problem somewhere.
Here is a check list of actions to consider taking at bring up:
- Before assembly:
- Do a visual inspection of the board to look for obvious defects like broken traces, shorted traces, or missing pads.
- Measure the resistance between different power rails and ground with a simple 2-wire DMM to verify their isolation sin larger than 1 MegOhm.
- After each component is assembled:
- Do a visual inspection to look for obvious bad solder joints.
- When some joints look a little suspicious, like a dull gray instead of shiny, apply solder flux to the leads and reflow the solder so that it clearly wets the leads and the pads.
- Any solder shorts can be cleaned up by relfow and letting the surface tension pull the excess solder to either lead. Alternatively, copper braid and excess solder removed.
- Check for any missing solder joints or for pads that do not look like they have adequate solder, especially on a part with many leads.
- Generally, it is easier to assemble the smaller parts first, then the bigger ones.
- Be aware that the hot air gun may damage plastic connectors already placed on the board.
- Always double check the orientation of parts that are plarity sensitive.
- DO NOT solder any parts on the board while it is powered on.
- Silk screen markings can make it so much easier to double check you are adding the correct part. Add the component name, the component value, and the component polarity often; in close proximity of the pads.
- The power path:
- Check the current draw from the power supply. This is one reason to use isolation jumpers to turn off some parts of the circuit so that the power draw can be measured. This is done using an external power supply that allows measuring the current draw. Is it what you expect?
- Trace the DC voltage level at each stage of the circuit. Is the DC level being distributed where it needs to be?
- Look at the noise on a scope to verify the voltage level is what is expected at each stage and the noise is low.
- Verify the decoupling capacitors are positioned in close proximity with low loop inductance to the IC they are decoupling.
- The signal paths:
- For each component, verify the signal you expect to see is what is coming out and it is connected between each output and into each input. Use the sharp tip of the 10x probe to probe the leads of packages or pads of components, or untented vias.
- For digital signals, select one as a trigger and look at the corresponding reaction in other pins.
- Consider modifying the micro code to change the signals on other pins.
- Measure the clock frequency to verify it is operating at the expected frequency.
- Trace the routing from an output to an input. If the routing is incorrect, pull a lead from the pad and connect it to the right place with AWG30 wire. This is called an Engineering Change Wire or a Green Wire, or a White wire change.
- Quiet high and quiet low noise:
- Write the microcode to make one line switch and use this as the trigger This pin will trigger the scope so you can look for synchonous switching noise on the power rail and ground pins.
- A quiet LOW pin is an I/O that is set by the microcode you write to be outputting a LOW. Normally, this voltage will be 0 V. Any voltage measured on thsi lead is cross talk from other signals switching.
- A quiet HIGH is an I/O that is set by the microcode you write to be outputting a HIGH. Normally, this voltage will be the Vcc rail voltage. This is a direct connection between the power rail on the die to the lead on the board. This is a direct measure of the power reail voltage noise on the die.
- Monitor the quiet LOW and quiet HIGH to check the switching noise when other I/O switch.
- If you can verify a specific part is not reacting with the correct outputs or responses to inputs:
- Inspect its solder joints.
- Consider reflowing any suspect joints.
- Verify it is placed on the board correctly.
- Verify with the schematic that the pin connectivity is correct.
- Verify the correct power and signals are going into the device.
- If the device is connected correctly and assembled correctly, it could be a bad part. It could have been damaged by ESK or thermally in the assembly operation.
- Replace a bad part after you have exhausted other possible explanations. Maybe it was shipped as a bad part, maybe suffered ESD damge, or maybe it was thermally damaged.
- Using a hot air gun, slowly heat up the part and its pads and pull it off the board using tweezers when the solder joints habe melted.
- Clean up the pads with a copper solder wick and soldering iron, using solder flux to help the wick suck up the solder.
- After the pads are cleaned, add new solder paste, reflow the solder paste to let it flow over the pads, and replace solder to the pads.
- Hold it in place while reflowing using the hot air gun.
- Repeat the test for the correct power and signals.
Three specific troubleshooting tricks
There is no substitute to reading the manuals or datasheets of all your parts. The better you understand your system, the better you can identify the small hints when it is ont working as you expect and can guess a possible root cause. There are three tricks you can sometimes use to help you find the root cause:
Trick 1: If you wanted to re-create the problem, how would you do that?
If you expect to see a 5 V signal coming out of a pin, but you see a 2.5 V signal, how could you make this happen onpurpose? Maybe ther is a voltage divider created. This could be by having your digital pin set as a pull-up and the 10k resistor on your board as a load.
Look for a 10k load on the output and check your code to see if the digital pin is set as an OUTPUT or as a pullup output.
When you see a specific behavior, think about how you could recreate it fi you had to, and then ckeck the product for those features you would have added to create this behavior.
Trick 2: Have you encountered this problem before?
Keep a journal, a list of the problems you have found: the symptoms, indicators, hints for the problem, and their root cause.
Each time you encounter a problem that has happened before, add it to this list.
You should have a list of symptoms and causes. If you see oscillations on the output on an LDO, this usually means you forgot the 22 uF decoupling capacitor, or the capacitor is not large enough.
When possible, take a functioning board and cause known problems, like remove the filter or decoupling capacitors, replace a resistor value with a 0 Ohm jumper, or pull one of the signal pins off the pad. Observe the signature of the failure on the board and make note of the signature and its root cause. Add this example to your list.
It is especially useful to make these perturbations in a simulation tool. This way you can observe patterns in the signals when there is a problem such as poor termination, orexcessive cross talk to an adjacent trace.
Trick 3: Round up the usual suspects
There are probably ten comon problems that are likely the root cause of most problems:
- ESD damage to the part - replace the part.
- The part was thermally damaged during soldering - replace the part.
- Bad solder joint - check under a microscope - reflow with liberal amount of solder flux, remove excess shorting solder.
- Wrong capacitor or resistor value - remove the part and check with a meter.
- Not plugged in, possibly a bad connection, or power supply not on.
- LED assembled in the wrong polarity - test the LED with an LED tester.
- Design error in the layout. Trace the signal paths to verify they connect where they are supposed to.
- A jumper flag is not connected correctly.
- The test leads are not making good contact to the pads - indicated by intermittent voltage on the scope.
- The testing cable is broken - indicated by the scope signal changes when the cable is jiggled.
Each time you encounter a problem and find the correct root cause, go ahead and add it to this list to help you in the future.
Coding Issues
De-bugging software can be notoriously difficult. Some problems are only hardware related or software related, but some problems are both. They involve the interactions of the code running the hardware, often referred to as firmware.
The first step is to try and separate the problem.
Is there a working hardware evaluation platform you can use to debug the software?
Can you emulate your hardware system on a software platform?
Is there some simple code you can run that will test the hardware? The simpler the better.
Is there a golden system you can teste your hardware on and look for specific signals?
Can you develop the right test vectors to test the hardware as it operates under some code?
Testing to a spec
In this phase, the board is tested to see if it “works”. This is a very vague term. While it is common to refer to the board as “working” or “not working”, this is ambiguous.
You are really verifying a specific spec, like:
- The LED turned on.
- The output signal on one pin was as a square wave with 5 V peak-to-peak and 1 kHz with a 40% duty cycle.
- The system booted.
- The IDE software saw the device.
But have you really tested all the functionality of the product?
When you get to the point where a few tests you perform give the result you expect, sot that the board appears to be “working”, you can move on to the testing to the spec phase. This is where you are performing all the verification tests you can think of to test that the product meets the spec.
In this phase we do characterization, performance metrics, extract figures of merit that descibe the perormance, and evaluate the performance margin. We can also determine how to optimize the desing - increase the performance margins to make it more robust, and the additional costs. This is evaluating the “bang for the buck”.