[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Questions about networking
From: |
Peter Niessen |
Subject: |
[Qemu-devel] Questions about networking |
Date: |
Tue, 3 Aug 2010 10:13:26 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100520 SUSE/3.0.5 Thunderbird/3.0.5 |
Dear List,
I'm trying to set up a testbed for batch systems using qemu-kvm. So far,
I've created two machines, a master ("torque") and an execution host
("mom") for use with torque. I'm using the following command lines to
start up the virtual machines:
qemu-kvm -smp 2 -m 768 -hda ./torque.qcow2 -net
nic,vlan=1,macaddr=52:54:00:12:34:56 -net
nic,vlan=2,macaddr=52:54:00:12:34:57 -net user,vlan=2 -net
socket,vlan=1,listen=localhost:1234 -redir tcp:26022::22 -nographic
-daemonize
qemu-kvm -smp 2 -m 768 -hda ./mom.qcow2 -net
nic,vlan=1,macaddr=52:54:00:12:34:58 -net
socket,vlan=1,connect=localhost:1234 -nographic -daemonize
which I took from http://www.h7.dion.ne.jp/~qemu-win/HowToNetwork-en.html.
Everything works fine, I can see the internet from "mom" via "torque"
and NFS mount the users home directory from "torque" on "mom" and
resolve users via NIS.
Here's the ifconfig of the nodes:
torque:~ # ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:12:34:56
inet addr:192.168.42.250 Bcast:192.168.42.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe12:3456/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:707 errors:0 dropped:0 overruns:0 frame:0
TX packets:1873 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:44388 (43.3 Kb) TX bytes:2539091 (2.4 Mb)
Interrupt:11 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 52:54:00:12:34:57
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe12:3457/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:69 errors:0 dropped:0 overruns:0 frame:0
TX packets:88 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7837 (7.6 Kb) TX bytes:13548 (13.2 Kb)
Interrupt:10 Base address:0xc000
And "mom":
mom:~ # ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:12:34:58
inet addr:192.168.42.1 Bcast:192.168.42.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe12:3458/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1888 errors:0 dropped:0 overruns:0 frame:0
TX packets:752 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2514373 (2.3 Mb) TX bytes:60325 (58.9 Kb)
Interrupt:11 Base address:0x2000
The ping times between the servers are the following:
torque:~ # ping mom
PING mom.qemu (192.168.42.1) 56(84) bytes of data.
64 bytes from mom.qemu (192.168.42.1): icmp_seq=1 ttl=64 time=39.6 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=2 ttl=64 time=39.4 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=3 ttl=64 time=39.7 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=4 ttl=64 time=39.8 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=5 ttl=64 time=39.8 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=6 ttl=64 time=39.8 ms
64 bytes from mom.qemu (192.168.42.1): icmp_seq=7 ttl=64 time=39.8 ms
Do these times make sense?
However, batch operations are not working properly. Jobs start fine and
produce the right output, but when it comes to tidying up, the "mom"
machine can't contact the "torque":
Aug 3 10:10:26 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:27 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:28 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:29 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:29 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:29 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:30 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:31 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:32 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:33 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
Aug 3 10:10:34 mom pbs_mom: LOG_ERROR::Operation now in progress (115)
in scan_for_exiting, cannot connect to port 1023 in client_to_svr -
connection refused
At this time, tcpdump on the "torque" machine says:
10:10:17.072582 IP mom.qemu.1023 > torque.qemu.pbs: Flags [S], seq
25915729, win 5840, options [mss 1460,sackOK,TS val 719328 ecr
0,nop,wscale 6], length 0
10:10:17.072647 IP torque.qemu.pbs > mom.qemu.1023: Flags [S.], seq
18959859, ack 25915730, win 5792, options [mss 1460,sackOK,TS val 756722
ecr 719328,nop,wscale 6], length 0
10:10:17.152568 IP mom.qemu.1023 > torque.qemu.pbs: Flags [R], seq
25915730, win 0, length 0
10:10:18.084234 IP mom.qemu.1023 > torque.qemu.pbs: Flags [S], seq
41724490, win 5840, options [mss 1460,sackOK,TS val 720340 ecr
0,nop,wscale 6], length 0
10:10:18.084297 IP torque.qemu.pbs > mom.qemu.1023: Flags [S.], seq
34766899, ack 41724491, win 5792, options [mss 1460,sackOK,TS val 757734
ecr 720340,nop,wscale 6], length 0
10:10:18.163568 IP mom.qemu.1023 > torque.qemu.pbs: Flags [R], seq
41724491, win 0, length 0
10:10:19.095909 IP mom.qemu.1023 > torque.qemu.pbs: Flags [S], seq
57533379, win 5840, options [mss 1460,sackOK,TS val 721352 ecr
0,nop,wscale 6], length 0
10:10:19.095947 IP torque.qemu.pbs > mom.qemu.1023: Flags [S.], seq
50574033, ack 57533380, win 5792, options [mss 1460,sackOK,TS val 758745
ecr 721352,nop,wscale 6], length 0
10:10:19.175628 IP mom.qemu.1023 > torque.qemu.pbs: Flags [R], seq
57533380, win 0, length 0
netstat says:
torque:~ # netstat | grep 1023
tcp 0 0 torque.qemu:1023 mom.qemu:pbs_mom
TIME_WAIT
tcp 0 0 torque.qemu:1023 mom.qemu:pbs_mom
TIME_WAIT
Might the performance of my internal network conection (192.168.42.0/24)
not be sufficient?
Thanks for your help,
Cheers, Peter.
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
- [Qemu-devel] Questions about networking,
Peter Niessen <=