[WBEL-users] SCSI tape drive woes

Bob Ramstad rramstad@alum.mit.edu
Thu, 19 Aug 2004 16:46:44 -0700


I've got a bit of an odd one here.

One WBEL3 system at home is running an LSI Logic SCSI board with a
fairly recent Seagate Scorpion DDS-4 autochanger.  Getting it to work
was trivial...  I had to enable multi LUN support, and put

alias scsi_hostadapter sym53c8xx

into modules.conf then mkinitrd, but once I did that, and installed
mtx, I was off to the races and had no problems loading tapes, dumping
files to them, and all that good stuff.

Oddly, the Seagate Scorpion doesn't support DDS-1 60 meter tapes, so I
wanted to hook up an old "Archive Python" DAT to a WBEL3 system here
at work.  I have an ASUS PCI-SC200 PCI board, a cable and an active
terminator which I've used with the drive before.  I added

alias scsi_hostadapter ncr53c8xx

into modules.conf and ran mkinitrd, so far so good, on reboot
/proc/scsi/scsi had good information

[root@johnny-cash root]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: ARCHIVE  Model: Python 01931-XXX Rev: 5.63
  Type:   Sequential-Access                ANSI SCSI revision: 02

and mt worked to get basic drive status

[root@johnny-cash root]# mt -f /dev/st0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 512 bytes. Density code 0x13 (DDS (61000 bpi)).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN

So, before trying to read my old tapes, I tossed in a new blank 60 m
DDS-1 tape and tried to write to it, and immediately got massive
errors:

Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0:5: ERROR (a0:0) (8-0-0) (0/3) @\
 (mem d139448:80080000).
Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0: regdump: da 00 00 03 47 00 05 \
1f 75 08 05 00 80 00 0f 02.
Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0: have to clear fifos.
Aug 19 16:07:04 johnny-cash kernel: Attached scsi tape st0 at scsi0, channel 0,\
 id 5, lun 0
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0-<5,*>: FAST-10 SCSI 6.7 MB/s (1\
50 ns, offset 8)
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0:5: ERROR (a0:0) (3-a7-0) (48/13\
) @ (script 1b8:6a5e0000).
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: script cmd = 785d8400
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: regdump: da 10 80 13 47 48 05 \
1f 02 03 05 a7 80 01 07 00.
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: have to clear fifos.

Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0:5: ERROR (a0:0) (8-0-0) (0/3) @
(mem d139448:80080000).
Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0: regdump: da 00 00 03 47 00 05 1
f 75 08 05 00 80 00 0f 02.
Aug 19 16:07:02 johnny-cash kernel: ncr53c810-0: have to clear fifos.
Aug 19 16:07:04 johnny-cash kernel: Attached scsi tape st0 at scsi0, channel 0,
id 5, lun 0
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0-<5,*>: FAST-10 SCSI 6.7 MB/s (15
0 ns, offset 8)
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0:5: ERROR (a0:0) (3-a7-0) (48/13)
 @ (script 1b8:6a5e0000).
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: script cmd = 785d8400
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: regdump: da 10 80 13 47 48 05 1
f 02 03 05 a7 80 01 07 00.
Aug 19 16:07:35 johnny-cash kernel: ncr53c810-0: have to clear fifos.

Subsequently I've tried reading a couple of different tapes, and
writing to a couple different blanks, all with massive errors
resulting.

Research around the net seems to be suggesting that I have some sort
of hardware or configuration problem.  I've checked and rechecked
everything on that front, and haven't gotten anywhere.  I've also
tried passing various arguments to the module in an attempt to get it
to do something simpler, safer or more intelligent, with no luck. 
This exact drive was running perfectly on a RedHat 7.2 box as recently
as 18 months ago.

One other oddity, once I added the driver to /etc/modules.conf, I have
been seeing errors -- which go by very quickly, during the kudzu
process -- which say something like

modprobe: modprobe can't locate module block-major-135

except that each number is listed four to six times, and the digits go
from around 65 through 135 or so.  The output does not end up in any
of the system logs.

I've got to believe that someone out there is successfully running an
old SCSI board like this one -- the ncr53c810 was included on many
motherboards, for example -- and hopefully they can shed some insight.

I'd really hate to have to pull WBEL3 off this box and revert to some
weird old version of RedHat just to get this working... but I've also
heard rumors that the 2.4 kernel series has some rather glaring SCSI
errors and I'm wondering if it is something like that which is
ultimately causing the problem.  I suppose I could build a more recent
2.4 kernel and install it and see what happens...  that seems more
appealing than starting over.

I gotta believe I'm just missing something, but it's truly weird that
the system at home was so easy to get going, and this system is so
difficult.  I guess another possibility is to try to buy a cable which
would allow me to use this drive temporarily on that other system... I
hesitate doing this, as this drive is ancient and only runs at super
slow 10 speed, I don't even know if I can successfully mix it with a
160 device, but presumably I could, given how SCSI is supposed to
allow this sort of thing.

-- Bob