[WBEL-users] kswapd issues, on latest whitebox updates.
Mark Reynolds
mark@reynolds.net.au
Mon, 1 Mar 2004 23:47:56 +0800
I have been trying to solve a problem I've been having
with load permanently going to 1.00, when free memory
is less than 10meg, and cpu is low, and kswapd seems to get
into a never ending loop, on a stock standard squid server,
with heaps of memory (640meg).
I have been asked by a well respected kernel hacker ...
"Does whiteboxlinux have exactly the same config options
as the rhel3 kernel?"
I'm not even sure that is a question that we will be able
to get an answer to, as it seems possible to me that
redhat could publish their sources, and not the compile
time options.
I have attached my more full outline below, in case anyone
else using whitebox can reproduce this error, or wants to
try. Would appreciate some help on this, as my sort of
setup would not be unusual at all, which worries me a little.
--
Mark Reynolds
Managing Director Reynolds Technology Pty Ltd
Support 1902 291 089 http://www.reynolds.net.au
Admin 08 9474 1211 mailto:mark@reynolds.net.au
Fax 08 9474 9592 PO Box 945 South Perth 6951 WA
Sales 1300 656 424 19 Lyall St South Perth 6151 WA
ABN 73 078 831 740 ACN 078 831 740
The most obvious fault, reproducable on a daily
basis, is my latest squid proxy server, serving a small
ISP. Nothing new to me, as I have been using squid on
redhat servers for years.
Anyway, this is a whitebox dist server, running all the
latest updates that have been published.
2.4.21 kernel, with very standard squid config. 2 IDE drives,
one for boot & logs, other for cache. Both seem to be in
DMA mode OK. 640meg ram, of which squid consumes 110 meg
at the moment, but is growing. While I know squid is a
resource hog, I have the same setup on much lesser hardware.
This is a stock standard Intel 1RU rack mounted, with intel
pentium CPU.
Anyway, the problem is that after a few/many hours, or maybe
even a day, load average goes up from 0.1 ish, to 1.00.
and stays there, until squid is stop/started.
Trigger seems to be when free memory goes below 10 meg.
Sometimes, when free mem goes above 10 meg, load goes
down.
mjt spotted that ps axf | grep D , always shows kswapd, which
means it seems to be the cause of the problem.
squid does not take the server into harddisk swap, as there
is heaps of memory available.
I understand that many of the kswapd options are tunable,
but I have no idea at all what I should try.
Found this page, and thought I'd give them a try.
# cat kswapd.sh
#!/bin/bash
# http://www.sonarnerd.net/projects/linux/
echo "50 256 0 0 500 3000 75 25 0" > /proc/sys/vm/bdflush
echo "256 16 4" > /proc/sys/vm/kswapd
echo "1 10 75" > /proc/sys/vm/pagecache
echo "0 0" > /proc/sys/vm/pagetable_cache
Should I wait for a next version kernel?
The thing is, my setup is pretty boring, and yet these
issues only seemed to start for me with redhat9, and
whitebox/RHEL type servers. Everything redhat 7 and
before was always rock solid for me. So I'm wondering
if I'm doing something, or why has the redhat/2.4 kernels
gone flakey under high load/ high memory load setups
of late?
Thanks for any direction, thoughts, or suggestions you
may be able to provide.
--
Mark Reynolds
# hdparm /dev/hda
/dev/hda:
multcount = 16 (on)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 621/128/63, sectors = 5008500, start = 0
[root@proxy root]# hdparm /dev/hdc
/dev/hdc:
multcount = 16 (on)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 4976/128/63, sectors = 40132503, start = 0
[root@proxy root]#
]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda3 1559620 1023944 456448 70% /
/dev/hda1 97570 23770 68762 26% /boot
/dev/hdc1 19748292 8737044 11011248 45% /cache
none 321276 0 321276 0% /dev/shm
# ps axf | grep D
PID TTY STAT TIME COMMAND
5 ? DW 0:01 [kswapd]
14327 pts/1 S 0:00 | \_ grep D
14290 ? S 0:00 squid -D
14292 ? S 1:39 \_ (squid) -D
# top
13:46:11 up 5 days, 19:29, 3 users, load average: 1.04, 0.99, 0.75
35 processes: 33 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.5% 0.0% 1.1% 0.1% 0.9% 0.3% 94.6%
Mem: 642552k av, 635072k used, 7480k free, 0k shrd, 112696k buff
462964k actv, 112972k in_d, 3228k in_c
Swap: 818488k av, 6304k used, 812184k free 379512k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
14292 squid 15 0 86948 84M 2000 S 3.3 13.5 1:38 0 squid
2946 root 15 0 572 308 192 R 0.1 0.0 0:03 0 sshd
1 root 15 0 112 84 60 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
7 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
5 root 15 0 0 0 0 DW 0.0 0.0 0:01 0 kswapd
6 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kscand
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
13 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 kjournald
69 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
860 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
888 root 15 0 0 0 0 SW 0.0 0.0 0:04 0 kjournald
1496 root 15 0 212 180 132 S 0.0 0.0 0:00 0 syslogd
1500 root 15 0 52 4 0 S 0.0 0.0 0:00 0 klogd
1623 root 16 0 240 104 0 S 0.0 0.0 0:01 0 sshd
1646 root 15 0 1064 532 384 S 0.0 0.0 0:00 0 sendmail
1655 smmsp 15 0 788 264 256 S 0.0 0.0 0:00 0 sendmail
1665 root 15 0 160 144 92 S 0.0 0.0 0:00 0 crond
1687 root 21 0 52 4 0 S 0.0 0.0 0:00 0 mingetty
1744 root 15 0 404 4 0 S 0.0 0.0 0:02 0 sshd
1746 root 15 0 232 4 0 S 0.0 0.0 0:02 0 bash
1797 root 15 0 48 4 0 S 0.0 0.0 0:00 0 mingetty
2948 root 15 0 752 692 520 S 0.0 0.1 0:00 0 bash
7308 root 15 0 988 720 100 S 0.0 0.1 0:12 0 zebra
7311 root 15 0 676 372 228 S 0.0 0.0 0:35 0 ospfd
7317 root 15 0 1948 988 224 S 0.0 0.1 0:08 0 bgpd
9668 root 15 0 440 216 60 S 0.0 0.0 0:00 0 sshd
9670 root 15 0 240 156 4 S 0.0 0.0 0:00 0 bash
14290 root 24 0 1620 1620 1328 S 0.0 0.2 0:00 0 squid
14294 squid 25 0 272 272 228 S 0.0 0.0 0:00 0 unlinkd
14295 squid 15 0 572 572 492 S 0.0 0.0 0:03 0 diskd
# ps axf
PID TTY STAT TIME COMMAND
1 ? S 0:04 init
2 ? SW 0:00 [keventd]
3 ? SW 0:00 [kapmd]
4 ? SWN 0:00 [ksoftirqd/0]
7 ? SW 0:00 [bdflush]
5 ? RW 0:01 [kswapd]
6 ? SW 0:00 [kscand]
8 ? SW 0:00 [kupdated]
9 ? SW 0:00 [mdrecoveryd]
13 ? SW 0:01 [kjournald]
69 ? SW 0:00 [khubd]
860 ? SW 0:00 [kjournald]
888 ? SW 0:04 [kjournald]
1496 ? S 0:00 syslogd -m 0
1500 ? S 0:00 klogd -x
1623 ? S 0:01 /usr/sbin/sshd
1744 ? S 0:02 \_ sshd: root@pts/0
1746 pts/0 S 0:02 | \_ -bash
2946 ? S 0:03 \_ sshd: root@pts/1
2948 pts/1 S 0:00 | \_ -bash
14332 pts/1 R 0:00 | \_ ps axf
9668 ? S 0:00 \_ sshd: root@pts/2
9670 pts/2 S 0:00 \_ -bash
1646 ? S 0:00 sendmail: accepting connections
1655 ? S 0:00 sendmail: Queue runner@01:00:00
for /var/spool/clientmqueue
1665 ? S 0:00 crond
1687 tty2 S 0:00 /sbin/mingetty tty2
1797 tty1 S 0:00 /sbin/mingetty tty1
7308 ? S 0:12 zebra -d
7311 ? S 0:36 ospfd -d
7317 ? S 0:08 bgpd -d
14290 ? S 0:00 squid -D
14292 ? S 1:55 \_ (squid) -D
14294 ? S 0:00 \_ (unlinkd)
14295 ? S 0:04 \_ diskd 14635008 14635009 14635010
[root@proxy root]# dmesg
Linux version 2.4.21-9.0.1.EL (buildsys@builder) (gcc version 3.2.3 20030502
(Red Hat Linux 3.2.3-20)) #1 Thu Feb 19 20:23:14 CST 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000027fc0000 (usable)
BIOS-e820: 0000000027fc0000 - 0000000027ff8000 (ACPI data)
BIOS-e820: 0000000027ff8000 - 0000000028000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
639MB LOWMEM available.
On node 0 totalpages: 163776
zone(0): 4096 pages.
zone(1): 159680 pages.
zone(2): 0 pages.
Kernel command line: ro root=LABEL=/
Initializing CPU#0
Detected 534.559 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1064.96 BogoMIPS
Memory: 640048k/655104k available (1526k kernel code, 12492k reserved, 1089k
data, 164k init, 0k highmem)
zapping low mappings.
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 128K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0383f9ff 00000000 00000000 00000000
CPU: Common caps: 0383f9ff 00000000 00000000 00000000
CPU: Intel Celeron (Coppermine) stepping 03
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
Process timing init...done.
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfda95, last bus=0
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
Limiting direct PCI/PCI transfers.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x0b (Driver version 1.16)
Total HugeTLB memory allocated, 0
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
aio_setup: num_physpages = 40944
aio_setup: sizeof(struct page) = 56
Hugetlbfs mounted.
Detected PS/2 Mouse Port.
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ
SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
hda: QUANTUM FIREBALL EL2.5A, ATA DISK drive
blk: queue c0417f00, I/O limit 4095Mb (mask 0xffffffff)
hdc: QUANTUM FIREBALLP AS20.5, ATA DISK drive
blk: queue c04183cc, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hda: task_no_data_intr: error=0x04 { DriveStatusError }
hda: 5008500 sectors (2564 MB) w/418KiB Cache, CHS=621/128/63, UDMA(33)
hdc: attached ide-disk driver.
hdc: host protected area => 1
hdc: 40132503 sectors (20548 MB) w/1902KiB Cache, CHS=39813/16/63, UDMA(33)
ide-floppy driver 0.99.newide
Partition check:
hda: hda1 hda2 hda3
hdc: [PTBL] [4976/128/63] hdc1
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
Initializing IPsec netlink socket
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 161k freed
VFS: Mounted root (ext2 filesystem).
Journalled Block Device driver loaded
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 164k freed
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 20:39:53 Feb 19 2004
usb-uhci.c: High bandwidth mode enabled
PCI: Found IRQ 10 for device 00:07.2
usb-uhci.c: USB UHCI at I/O 0xef80, IRQ 10
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,3), internal journal
Adding Swap: 818488k swap-space (priority -1)
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide1(22,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
IA-32 Microcode Update Driver: v1.11 <tigran@veritas.com>
SCSI subsystem driver Revision: 1.00
inserting floppy driver for 2.4.21-9.0.1.EL
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
Intel(R) PRO/100 Network Driver - version 2.3.30-k1
Copyright (c) 2003 Intel Corporation
PCI: Found IRQ 5 for device 00:0c.0
PCI: Sharing IRQ 5 with 00:0d.0
divert: allocating divert_blk for eth0
e100: selftest OK.
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled
PCI: Found IRQ 5 for device 00:0d.0
PCI: Sharing IRQ 5 with 00:0c.0
divert: allocating divert_blk for eth1
e100: selftest OK.
e100: eth1: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled
divert: freeing divert_blk for eth0
divert: freeing divert_blk for eth1
ip_tables: (C) 2000-2002 Netfilter core team
Intel(R) PRO/100 Network Driver - version 2.3.30-k1
Copyright (c) 2003 Intel Corporation
PCI: Found IRQ 5 for device 00:0c.0
PCI: Sharing IRQ 5 with 00:0d.0
divert: allocating divert_blk for eth0
e100: selftest OK.
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled
PCI: Found IRQ 5 for device 00:0d.0
PCI: Sharing IRQ 5 with 00:0c.0
divert: allocating divert_blk for eth1
e100: selftest OK.
e100: eth1: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled
ip_tables: (C) 2000-2002 Netfilter core team
e100: eth0 NIC Link is Up 10 Mbps Half duplex
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (5118 buckets, 40944 max) - 304 bytes per conntrack
-----------------------------------------------------------------
Sent through Reynolds Technology IMP http://www.reynolds.net.au/
Configurable spam and virus free email. Anywhere, anytime.