Friday, March 26, 2010

Debugging early boot problems in Linux

Got this from Here. Posting only the portion I was interested in.

Accessing the printk buffer after a silent hang on boot.

This can also be useful when we are debugging the power management (that’s what I was doing when I wrote this).

Sometimes, if the kernel hangs early in the boot process, you get no messages on the console before the hang. Sometimes it's just "Uncompressing Linux.......". That's it.

However, there may still be messages in the printk buffer, which can give you an idea of where the problem is. The kernel starts putting messages into the printk buffer as soon as it starts. They stay buffered there until the console code has a chance to initialize the console device (often the serial port for embedded devices). Even though these messages are not printed before the hang, it is still possible in some circumstances to dump the printk buffer and see the messages.

When the kernel crashes as soon as it starts booting without any prints then its possible to dump the printk buffer from the boot loader. To do it you have to know how your boot loader maps memory compared to the kernel.

U-boot example on an OMAP OSK based board

When the kernel crashes, reset the board and drop to the boot loader prompt. Its important that you don't power cycle the board.

In the source tree of the compiled kernel which just crashed do

grep __log_buf System.map

or

grep __log_buf /proc/kallsyms

This showed

$ grep __log_buf System.map
c01eca94 b __log_buf

In our case, this address maps to 0x101eca94. So I reset the target board and use the following:

OMAP5912 OSK # md 0x101eca94
101eca94: 4c3e353c 78756e69 72657620 6e6f6973    <5>Linux version
101ecaa4: 362e3220 2d36312e 2d336372 70616d6f     2.6.16-rc3-omap
101ecab4: 73282031 406a6172 72616862 762e7461    1 (sraj@bharat.v
...

Similarly, the following steps shows the printk buffer (usually what got printed just before reset)

grep printk_buf System.map

this shows the linked address of the printk_buf, e.g.:

c01ec5f4 b printk_buf.5

The address "c01ec5f4 " is in kernel virtual. Find the physical address mapping of it. So, after resetting the target board, but without letting it try to boot again, at the boot loader prompt

OMAP5912 OSK # md 101ec5f4
101ec5f4: 463e363c 69656572 6920676e 2074696e    <6>Freeing init
101ec604: 6f6d656d 203a7972 0a4b3239 73797300    memory: 92K..sys
101ec614: 296d6574 63000a2e 6b636f6c 73642220    tem)...clock "ds
101ec624: 72657070 226b635f 0a29000a 29736500    pper_ck"..)..es)
101ec634: 000a000a 766e6567 313d7a73 35303133    ....genvsz=13105
101ec644: 6f202c36 6e65646c 3d7a7376 33353536    6, oldenvsz=6553
101ec654: 73202c32 65666f7a 3d74766e 30313331    2, szofenvt=1310
101ec664: 000a3635 00000000 00000000 00000000    56..............

And you see the printk buffer that never got flushed to the UART. Knowing what's there can be very useful in debugging.