ltib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ltib] Puzzle: Kernel hangs at NFS while booting


From: Stuart Hughes
Subject: Re: [Ltib] Puzzle: Kernel hangs at NFS while booting
Date: Mon, 02 Jun 2008 09:01:19 +0100

Hi Chip,

What you have looks close to working.  The only thing I can suggest (you may have already tried) is to try mounting the actual NFS directory on some other machine just to make sure everything is working.

However, I think there is some kernel/dtb misconfiguration.  Your best bet would be to ask on the linuxppc kernel mailing list.

Regards, Stuart

On Mon, 2008-06-02 at 01:44 -0500, Chip Webb wrote:

Hello LTIB Gurus!

I have a custom board with an MPC8544E processor and am bringing it up for the first time.
U-boot seems to work fine, and I can TFTP the kernel and DTB file from the server as well.

However the system hangs when trying to connect to the NFS server and that is a puzzle.
I'm pretty sure it isn't the server because we use the same NFS server & directory to boot
a similar (vendor supplied and known good) evaluation board. I'm also pretty sure that
it's related to the ethernet configuration for my board, because I can see the server
responding to ARP requests from the kernel running on my custom board.

On this processor there are two eTSEC devices 1 and 3. The phy addresses for them
are 1 and 0 respectively. This may be part of the issue. We got it right in u-boot
and believe that we also got the appropriate settings in the device tree, but
don't know for sure at this point. (the ascii DTS is included below).

Any pointers on how best to debug this would be greatly appreciated!

Below is a snapshot of the salient messages printed during system bootup

Linux version 2.6.23 (address@hidden) (gcc version 4.1.2) #13 Mon May 19 14:02:58 CDT 2008
console [udbg0] enabled
setup_arch: bootmem
Found FSL PCI host bridge at 0x00000000e0008000.Firmware bus number: 0->0
arch: exit
Zone PFN ranges:
  DMA             0 ->   196608
  Normal     196608 ->   196608
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    0:        0 ->   196608
Built 1 zonelists in Zone order.  Total pages: 195072
Kernel command line: root=/dev/nfs rw nfsroot=192.168.41.101:/nfs/mss ip=192.168.41.44:192.168.41.101:192.168.40.1:255.255.254.0:mss:eth0:off console=ttyS0,115200
mpic: Setting up MPIC " OpenPIC  " version 1.2 at e0040000, max 1 CPUs
mpic: ISU size: 4, shift: 2, mask: 3
mpic: Initializing for 60 sources
PID hash table entries: 4096 (order: 12, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 775168k/786432k available (3536k kernel code, 10792k reserved, 140k data, 128k bss, 152k init)
Mount-cache hash table entries: 512
NET: Registered protocol family 16

PCI: Probing PCI hardware
PCI: Cannot allocate resource region 0 of device 0000:00:12.0
Generic PHY: Registered new driver
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
klips_info:ipsec_init: KLIPS startup, Openswan KLIPS IPsec stack version: 3.0.12
NET: Registered protocol family 15
klips_info:ipsec_alg_init: KLIPS alg v=0.8.1-0 (EALG_MAX=255, AALG_MAX=251)
klips_info:ipsec_alg_init: calling ipsec_alg_static_init()
ipsec_aes_init(alg_type=15 alg_id=12 name=aes): ret=0
ipsec_aes_init(alg_type=14 alg_id=9 name=aes_mac): ret=0
ipsec_3des_init(alg_type=15 alg_id=3 name=3des): ret=0
talitos: rng des/3des aes md5 sha1
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
serial8250.0: ttyS1 at MMIO 0xe0004600 (irq = 42) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 32768K size 1024 blocksize
loop: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
Gianfar MII Bus: probed
eth0: Gianfar Ethernet Controller Version 1.3-skbr, 00:e0:0c:02:00:fd
GFAR: SKB Handler initialized at CPU#0(max=32)
eth0: MTU = 1500 (frame size=1514, truesize=1800)
eth0: Running with NAPI enabled
eth0: 64/64 RX/TX BD ring size
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
Marvell 88E1101: Registered new driver
Marvell 88E1112: Registered new driver
Marvell 88E1111: Registered new driver
Marvell 88E1145: Registered new driver
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
i2c /dev entries driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
IP-Config: Complete:
      device=eth0, addr=192.168.41.44, mask=255.255.254.0, gw=192.168.40.1,
     host=mss, domain=, nis-domain=(none),
     bootserver=192.168.41.101, rootserver=192.168.41.101, rootpath=
Looking up port of RPC 100003/2 on 192.168.41.101
PHY: e0024520:01 - Link is Up - 100/Full
rpcbind: server 192.168.41.101 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.41.101
rpcbind: server 192.168.41.101 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 0.0.0.16 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/mss
VFS: Unable to mount root fs via NFS, trying floppy.
VFS: Cannot open root device "nfs" or unknown-block(2,0)
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0)
Rebooting in 180 seconds..

I have run wireshark to capture the packets sent from the new custom board to the server.
When I do this, I see that the kernel sends out an ARP request for the server's IP,
to which the server properly responds. After several seconds, the kernel sends some
more ARP requests, and the server responds. It would appears that my kernel is not seeing
the ARP responses.

The ethernet entries in my device tree are as follows (I have commented out the
secondary interface to make it as simple as possible):

        address@hidden {
            #address-cells = <1>;
            #size-cells = <0>;
            device_type = "mdio";
            compatible = "gianfar";
            reg = <24520 20>;
            phy0: address@hidden {
                interrupt-parent = <1>;
                interrupts = <a 1>;
                reg = <1>;
                device_type = "ethernet-phy";
                linux,phandle = <2>;
            };
/*
            phy1: address@hidden {
                interrupt-parent = <1>;
                interrupts = <a 1>;
                reg = <0>;
                device_type = "ethernet-phy";
                linux,phandle = <3>;
            };
*/
        };

        address@hidden {
            #address-cells = <1>;
            #size-cells = <0>;
            device_type = "network";
            model = "eTSEC";
            compatible = "gianfar";
            reg = <24000 1000>;
            local-mac-address = [00 00 00 00 00 00];
            interrupts = <1d 2 1e 2 22 2>;
            interrupt-parent = <&mpic>;
            phy-handle = <&phy0>;
        };
/*
        address@hidden {
            #address-cells = <1>;
            #size-cells = <0>;
            device_type = "network";
            model = "TSEC";
            compatible = "gianfar";
            reg = <26000 1000>;
            local-mac-address = [00 00 00 00 00 00];
            interrupts = <1f 2 20 2 21 2>;
            interrupt-parent = <&mpic>;
            phy-handle = <&phy1>;
        };
*/

I found debug statements in u-boot's common/ft_build.c file and set #define DEBUG, which
gives me the following information just before the kernel is booted: (looks OK to me).

        address@hidden {
            #address-cells = <1>;
            #size-cells = <0>;
            device_type = "mdio";
            compatible = "gianfar";
            reg = <1>;
            address@hidden {
                interrupt-parent = <1>;
                interrupts = <1>;
                reg = <1>;
                device_type = "ethernet-phy";
                linux,phandle = <2>;
            };
        };
        address@hidden {
            #address-cells = <1>;
            #size-cells = <0>;
            device_type = "network";
            model = "eTSEC";
            compatible = "gianfar";
            reg = <1>;
            local-mac-address = [00 e0 0c 02 00 fd];
            interrupts = [00 00 00 1d 00 00 00 02 00 00 00 1e 00 00 00 02 00 00 00 22 00 00 00 02];
            interrupt-parent = <1>;
            phy-handle = <2>;
        };

So anyway, it's a mystery to me why the kernel doesn't see the ARP responses and continue
mounting the NFS file system.

Any pointers that you can provide would be greatly appreciated!

Thanks!

Chip Webb


.
_______________________________________________
LTIB home page: http://bitshrine.org

Ltib mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/ltib

reply via email to

[Prev in Thread] Current Thread [Next in Thread]