[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
x86 assembler in Bash
From: |
cLIeNUX user |
Subject: |
x86 assembler in Bash |
Date: |
Fri, 16 Feb 2001 06:44:32 -0000 |
ABOUT
shasm is an assembler written in GNU Bash Version 2, which may work in
other recent unix-style "shell" command interpreters. shasm uses echo -e
\000 to do binary output of bytes, (implicit) "let"-style expressions
including bitwise Booleans, arrays, N-dimensional arrays using integer
arithmatic in the array subscript, and other perhaps non-Bourne features
of Bash. It's probably a trivial port to pdksh or zsh. shasm does NOT call
externals such as sed, dd, expr and so on. All it needs is the shell,
although it does need a cushy shell.
My target is building an all-asm Forth-like programming language, and
later an operating system, on x86 PC's. That's what I'm going to code for.
That means LOTS of x86 stuff is missing. No floats. No mmx and so on. The
stuff I will cover is the stuff for writing an OS. Compared to figuring
out how to do integer output and so on in Bash, extending ./CPU's/i386 or
porting to another CPU should be a tolerable task. As in GNU gas, the
assembler directives part of shasm is machine-independant, and further, it
is sequestered from the x86 stuff in shasm.
See also: ./docs/design and the scripts themselves.
Rick Hohensee www.clienux.com humbubba@smart.net
jan 2001 Maryland
feb 15 2001
I haven't run any shasm-assembled programs yet. The branch resolver is
acting like it works, the listing output is pretty good, and various
instructions and addressing modes work. "copy" has some problems, and
there's about 20 instructions I do want to do that I haven't yet, but
they're all stubbed out.
...............................................................
...............................................................
February 15 2001 RIGHTS
This version of shasm is hereby released into the public domain by the
author, Rick Hohensee. (Richard Allen Hohensee, Md. USA).
Subsequent versions of shasm by me may or may not not be released to the
public domain. If it doesn't say public domain, it isn't. If this release
is in a "megamail" message, it pertains to all subfiles of the message.
Relicensing and modifying this version of shasm is invited, with proper
authorship aknowledgement, particularly as pertains to concepts herein, as
opposed to implementation details.
Rick Hohensee www.clienux.com humbubba@smart.net
.......................................................................
.......................................................................
Why? Why, Rick? WHY???
The build dependancies of an all-shasm program are as minimal
as possible for CPU-specific code. I'm writing a
Forth-like language in asmacs/shasm, which shasm will
make very portable.
It just continuously cracks me up.
It occupies what was a vacant peg on the unix tools pegboard,
a vacancy for something very basic and versatile.
I know the shell and x86 asm for other reasons, so combining them
was not an enormous job.
The asmacs names are helpful.
Interactivity is helpful when hand-coding something.
Forth guys don't like big complex toolchains.
shasm might be useful for...
teaching. Big time. Self-teaching. Add some notes.
converting between Intel, gas, and Plan9 assembly syntaxes
various bootstrap situations. You can have a shasm for
a new CPU quickly.
hand-optimizing otherwise mostly high-level language programs
scripts that generate images on the fly
scripts that generate anything on the fly, e.g....
font editing
algorithmic graphic art
interactive test data generation
shasm doesn't particularly lend itself to...
disassembly of binary code
currently prevailing software marketing methods
cooperation with complex toolchains, with fancy linkers,
debuggers and so on. It could, but that stuff won't be
by me. shasm does however interact seamlessly with the
arbitrary command by the usual pipes and so on.
shasm might become...
broader in machine support
a compiler. BCPL looks like a short hop, for example. The asmacs
m4 macros that preceded shasm are a bit more H3sm-like,
as another example.
a compiled program or suite of commands. shasm doesn't use eval.
a gas back-end
part of a utility to edit arbitrary existing binary files.
Rick Hohensee www.clienux.com humbubba@smart.net
:; cLIeNUX /dev/tty2 22:07:20 /
:;
..................................................................
..................................................................
# shasm main
# machine-independant stuff; basic constants, directives...
# although...This is little-endian, which isn't machine independant. Enjoy.
# branch resolver state
unset Lname
declare -a Lname # array of label names
unset there
declare -ia there # label addresses
unset Lcount
declare -i Lcount # number of labels
unset here
declare -i here # the current assembly address
# this is how we do binary output in sh
declare -a octalbyte=( \
000 001 002 003 004 005 006 007 010 011 012 013 014 015 016 017 \
020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037 \
040 041 042 043 044 045 046 047 050 051 052 053 054 055 056 057 \
060 061 062 063 064 065 066 067 070 071 072 073 074 075 076 077 \
100 101 102 103 104 105 106 107 110 111 112 113 114 115 116 117 \
120 121 122 123 124 125 126 127 130 131 132 133 134 135 136 137 \
140 141 142 143 144 145 146 147 150 151 152 153 154 155 156 157 \
160 161 162 163 164 165 166 167 170 171 172 173 174 175 176 177 \
200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217 \
220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237 \
240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257 \
260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277 \
300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317 \
320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337 \
340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357 \
360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377 )
declare -a hex[]=(\
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f \
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f \
20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f \
30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f \
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f \
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f \
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f \
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f \
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f \
90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f \
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af \
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf \
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf \
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df \
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef \
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff )
declare -i charcount
opnote () { # takes a string
if test "$pass" = "2" ;then
let charcount=32-$charcount"&"31
while test $charcount -ne 0
do
let charcount=$charcount-1
echo -ne " " >> a.list
done
echo -e " " $* >> a.list
fi
}
bytes () { # Outputs lsByte
let here="$here+$#"
for a in $* ;do
echo -en \\${octalbyte[$a&0xff]} >> a.out
echo -n ${hex[$a]}" " >> a.list
charcount=$charcount+3
done
}
duals () {
let here="$here+$#*2"
for a in $* ;do
echo -en \\${octalbyte[$a&0xff]} >> a.out
echo -en ${hex[$a&0xff]}" " >> a.list
echo -en \\${octalbyte[$((a>>8))&0xff]} >> a.out
echo -en ${hex[$((a>>8))&0xff]}" " >> a.list
charcount=$charcount+6
done
}
quads () {
let here="$here+$#*4"
for a in $* ;do
echo -en \\${octalbyte[$a&0xff]} >> a.out
echo -en ${hex[$a&0xff]}" " >> a.list
echo -en \\${octalbyte[$((a>>8))&0xff]} >> a.out
echo -en ${hex[$((a>>8))&0xff]}" " >> a.list
echo -en \\${octalbyte[$((a>>16))&0xff]} >> a.out
echo -en ${hex[$((a>>16))&0xff]}" " >> a.list
echo -en \\${octalbyte[$((a>>24))&0xff]} >> a.out
echo -en ${hex[$((a>>24))&0xff]}" " >> a.list
charcount=$charcount+12
done
}
hexquad () { # big-endian non-spaced hex 4-byte int for $here
echo -en ${hex[$1>>24&0xff]} >> a.list
echo -en ${hex[$1>>16&0xff]} >> a.list
echo -en ${hex[$1>>8&0xff]} >> a.list
echo -en ${hex[$1&0xff]}" " >> a.list
}
herelist () { # assumes beginning of line is now
if test "$pass" = "2" ;then
hexquad $here
fi
}
ascii () {
if test $pass = 2
then
herelist >> a.list
echo "ASCII " $1 >> a.list
echo -e $1 >> a.out
here=$here+${#1}
else
here=$here+${#1}
fi
}
L () { # handle a label
if test "$1" = "h" ;then
echo "\n\n\n
L mylabel
is how you do labels in shasm. No colon. No lexer. L is for the usual
jumptarget: type label. You can also assign shell variables using $here.\n"
elif test "$pass" = "1" ;then
Lname[$Lcount]=$1
there[$Lcount]=$here
let Lcount=$Lcount+1
else
herelist
echo " (O) "$1 >> a.list
fi
}
branch () { # branch labelname branchsize
if test "$1" = "h" ; then echo "\n\n\n
The branch resolver. Called by branching opers, which pass this the label
name and branch size.\n\n"
elif test "$pass" = "1"
then # on pass 1 just skip the branch byte/dual/quad
here=$here+$2
else # Pass 2 is on us. Pass 1 was L.
let labcount=0
for lab in ${Lname[*]} ;do
if test $lab = $1 ;then
let relative=${there[$labcount]}-$here-$2
case $2 in
1)bytes $relative ;;
2)duals $relative ;;
4)quads $relative ;;
*) echo "
Wow. How did you manage this?"
;;
esac
break # out of the *LOOP* and thus the routine
fi
let labcount=$labcount+1
done
fi
}
fillthru () { # takes an integer expression. Does hex too.
if test "$1" = "h" ; then echo -e "\n\n
this is your .org directive.
\n\n"
else
herelist
let tempint=$1
if test $pass -eq 1 ;then
let here=$tempint
else
echo -e " fill through "$1" \n...\n..\n." >> a.list
if test $tempint -gt $here ;then
while test $here -le $tempint ;do
echo -en "\000" >> a.out
let here=$here+1
done
else
echo -e "fillthru is absolute.
You gave a negative, less than current. No good.
Do the arithmatic yourself.\n\n"
fi
fi
fi
}
ab () { # assemble bytes. pass-sensitive.
if test "$pass" = "2" ;then
bytes $*
else
let here=$here+$#
fi
}
ao () { # output one octal char as a byte
if test "$pass" = "2" ;then
echo -en \\$1 >> a.out
echo -en $1" " >> a.list
let here=$here+1
charcount=$charcount+4
else
let here=$here+1
fi
}
ad () { # assemble duals. pass-sensitive.
if test "$pass" = "2" ;then
duals $*
else
let here=$here+$#*2
fi
}
aq () { # assemble quads. pass-sensitive.
if test "$pass" = "2" ;then
quads $*
else
let here=$here+$#*4
fi
}
ac () { # assemble cells. pass-cell-sensitive.
if test "$pass" = "2" ;then
if test "$cell" = "2" ;then
duals $*
else
quads $*
fi
else
let here=$here+$#*$cell
fi
}
usage () {
echo -e "\n\n\n\n\n\n\n\n\n\n
The shasm command should be followed by the name of one existing file to
assemble. shasm will execute that file as a shell script. This has
security ramifications for root on multi-user systems. You can also
. main
with no args and all the shasm routines will be in your shell state
as shell commands. In that case you'll get this message anyway.
Bash set does a nice job of indenting code, by the way.
\n
Output is the files a.out and a.list in the current working directory.
\n\n
"
}
main () {
let here=0
if test $# -ne 1 || ! test -f $1 ;then
usage
else
# could loop over $* here
. machine # symlinked to machine/i386
pass=1
. $1
let here=0
pass=2
. $1
fi
}
. machine
main $* # take a list of files?
# Rick Hohensee www.clienux.com humbubba@smart.net
# jan/feb 2001
#...............................................................
#...............................................................
# test
# demo nonsense code, Whitman's sampler of instructions and whatnot
# that seem to be working currently.
fillthru 0x2ff
L bla
ascii " Oh wow. Oh wowowowowow."
fillthru 0x3ff
testAND A to C
ifzero 100
copy 0x400 to SP
push C
OR A from BP
jump bla
#.............................................................
#.............................................................
a.list doctored a bit for mailing
00000000 fill through 0x2ff
...
..
.
00000300 (O) bla
00000300 ASCII Oh wow. Oh wowowowowow.
00000318 fill through 0x3ff
...
..
.
00000400 85 301 testAND A to C
00000402 0f 84 ifzero 100
00000404 27 324 00 04 00 00 copy 0x400 to SP
0000040a 121 push C
0000040b 09 30 OR A from BP
0000040d e9 ed fe ff ff jump bla
...................................................................
...................................................................
## 80386 support for shasm see Intel's 386INTEL.TXT et cetera
__=_ # cosmetic. modestring divider.
cell="4" # global, 4 or 2. 386 or real
#pass=2 # debugging thingies
LAAETTR= # Left As An Excercise...
e () { # echo abbreviation for testing
echo $*
}
size () { # sh equivalent of a macro.
size[$side]=$1
}
type () { #
mode[$side]=$1
}
octacode () { # octal register char per occurance
let FR=$side*2
RFR=${registers[$side]}
register[$FR+$RFR]=$1
if test ${registers[$side]} = 0 # arrayed string arithmatic. Ick.
then # You CAN declare -ia
registers[$side]=1
else
registers[$side]=2
fi
}
# disambiguate base reg
getbase () { # takes $source/$dest. gives $base oct
if test -n "$indexi" # if *2^ set an index
then
base=${register[$indexi^1]} # base is not index
else # otherwise be arbitrary
base=${register[$1*2]}
# if registers=2 else (leave) index=4 ???
index=${register[$1*2+1]}
fi
}
# determine hi 2 bits of modR/M byte
# set modRM accordingly,
# appends to follow (SIBdisp)
modRMhi () { # take $source or $dest as per memref
if test "${mode[$1]}" = "dire" -o "${mode[$1]}" = ""
then
mo=3 # register-direct, no SIB = 3
elif ! test -z "${number[$1]}"
then
if test "${size[$1]}" = 1
then
mo=1 # indirect byte displacement = 1, SIB
else
mo=2 # indirect cell displacement = 2, SIB
fi
else
#############################LAAETTR
mo=0 # indirect no displacement = 0, SIB
fi
}
# SIB maybe, and a displacement maybe.
# This assembles them.
SIBdisp () { # takes a $source/dest per memref=source/dest
if test "$mo" != "3" ;then # SIB?
ao $scale${register[$indexi]}$base # SIB
# displacements
if test "$mo" = "1" ;then # byte
ab ${number[$1]}
fi
if test "$mo" = "2" ;then # cell
ac ${number[$1]}
fi
fi
}
# modR/M and SIB and displacement and scale encode
modSIBdis () { # takes $source/$dest of memref and the off register/code
getbase $1 # there is no memref in direct-direct. hmmmm.
modRMhi $1
modRM=$mo$2$base # mid and low
ao $modRM # assemble octal
SIBdisp $1 # maybe SIB, maybe displacement
}
segment () { # more "macro"'s
octacode $1
size 2
type segm
}
specialC () { #
octacode $1
type speC
}
specialD () { #
octacode $1
type speD
}
specialT () { #
octacode $1
type speT
}
small () { #
octacode $1
type dire
size 1
}
# if test $cell = 4
parse () { #
if test "$1" = "h" ; then echo -e "\n HELP STUFF "
else
## initial oper state
source=0
register[0]="" register[1]="" register[2]="" register[3]=""
mode[0]="" mode[1]=""
size[0]=$cell size[1]=$cell
shift="0" index="" indexi=""
number[0]="" number[1]=""
registers[0]=0 registers[1]=0
side=0 # left=0, right=1.
let sides=1
scale=0
for arg in $*
do
if test "$wasshifter" = "yes" # preempt the rest if last was *2^
then
wasshifter="no"
let shift=$arg
scale=$arg
else
# Bash "set" indents nicely. I don't here.
case $arg in
to) sides=2 ; source=0 ; dest=1 ; side=1 ;;
A) octacode "0" ;; C) octacode "1" ;;
D) octacode "2" ;; B) octacode "3" ;;
SP) octacode "4" ;; BP) octacode "5" ;;
SI) octacode "6" ;; DI) octacode "7" ;;
+|@) mode[$side]="memo" ;;
from) sides=2 ; side=1 ; dest=0 ; source=1 ;;
byte) size "1" ;; dual) size "2" ;; quad) size "4" ;;
CS) segment 1 ;; DS) segment 3 ;;
SS) segment 2 ;; ES) segment 0 ;;
FS) segment 4 ;; GS) segment 5 ;;
AL) small 0 ;; CL) small 1 ;; DL) small 2 ;; BL) small 3 ;;
AH) small 4 ;; CH) small 5 ;; DH) small 6 ;; BH) small 7 ;;
CR0) specialC 0 ;; CR2) specialC 2 ;; CR3) specialC 3 ;;
DR0) specialD 0 ;; DR1) specialD 1 ;; DR2) specialD 2 ;;
DR3) specialD 3 ;; DR6) specialD 6 ;; DR7) specialD 7 ;;
TR6) specialT 6 ;; TR7) specialT 7 ;;
"*2^") let indexi=$side*2+${registers[$side]}-1
index=${register[$indexi]}
wasshifter="yes"
mode[$side]="memo" ;;
*) number[$side]=$arg ;;
esac # end tokens case-switch
fi # end *2^ short-circuit
done # end args loop, resume reasonable indentation.
minsize=4 # default is 4, not $cell
if test "${size[0]}" = 1 \
-o "${size[1]}" = 1 ;then
minsize=1
elif test "${size[0]}" = 2 \
-o "${size[1]}" = 2 ;then
minsize=2
fi
# accumulate a case switch string
# start with sides, minsize and sourcesize
modestring=$sides$__$minsize$__${size[$source]}
# disambiguate source mode
if ! test -z "${mode[$source]}" ;then
modestring=$modestring$__${mode[$source]}
elif test "${registers[$source]}" = 2 ;then
modestring=$modestring$__"memo"
elif ! test -z "${number[$source]}" ;then
modestring=$modestring$__"imme"
else
modestring=$modestring$__"dire"
fi
# dest size/mode if it exists
if test "$sides" = 2 ;then
modestring=$modestring$__${size[$dest]}
if ! test -z "${mode[$dest]}" ;then
modestring=$modestring$__${mode[$dest]}
elif test "${registers[$dest]}" = 2 ;then
modestring=$modestring$__"memo"
else
modestring=$modestring$__"dire"
fi
fi
fi # end of ~help
} ####### end of parse #######
##############
#####
## prefixes and other one-zies
#
lock () { # prefix
if test "$1" = "h" ; then echo -e "\n\nIntel LOCK\n
SMP instruction atomicity extender.\n"
else
herelist
ab 0xf0
opnote lock $*
fi
}
repeating () { # prefix
if test "$1" = "h" ; then echo -e "\n\nIntel REP\n
repeat folowing instruction until CL is 0\n"
else
herelist
ab 0xf3
opnote repeating $*
fi
}
repeatnon0 () { # prefix
if test "$1" = "h" ; then echo -e "\n\nIntel REPNZ
repeat following instruction while ZF and CL are not 0\n"
else
herelist
ab 0xf2
opnote repeatnon0 $*
fi
}
otheroperandsize () { # prefix
if test "$1" = "h" ; then echo -e "\n\n\n\n
The following instruction is to be interpreted at the opposite operand
size from what the current default is, as per this segment\'s
descriptor.\n\n"
else
herelist
ab 0x66
opnote otheroperandsize $*
fi
}
otheraddresssize () { # prefix
if test "$1" = "h" ; then echo -e "\n\n\n\n
following instruction is to be interpreted at the opposite address size
from what the current default is, as per this segment\'s descriptor.\n"
else
herelist
ab 0x67
opnote otheraddresssize $*
fi
}
CS () { # prefix
if test "$1" = "h" ; then echo -e "\n\n
Following instruction is to use segment CS\n"
else
herelist
ab 0x2e
opnote CS $*
fi
}
SS () { # prefix
if test "$1" = "h" ; then echo -e "\n
\nFollowing instruction is to use segment SS\n"
else
herelist
ab 0x36
opnote SS $*
fi
}
DS () { # prefix
if test "$1" = "h" ; then echo -e "\n
\nFollowing instruction is to use segment DS\n"
else
herelist
ab 0x3e
opnote DS $*
fi
}
ES () { # prefix
if test "$1" = "h" ; then echo -e "\n
\nFollowing instruction is to use segment ES\n"
else
herelist
ab 0x26
opnote ES $*
fi
}
FS () { # prefix
if test "$1" = "h" ; then echo -e "\n
\nFollowing instruction is to use segment FS\n"
else
herelist
ab 0x64
opnote FS $*
fi
}
GS () { # prefix
if test "$1" = "h" ; then echo -e "\n
\nFollowing instruction is to use segment GS\n"
else
herelist
ab 0x65
opnote GS $*
fi
}
# ## ## ## ## ## ## ## ## ## edit +330 machine ## ## ## ## ## ## ##
# The first hairy one. Several others are minor
# mods to this one; OR, add...
AND () { #
if test "$1" = "h" ;then echo -e "\n\n
Boolean bitwise AND. Result is true only if A AND B are true.
one-bit results (truth table) with input bits A and B
B
1 0
_|_______________
|
0 | 0 0
A |
1 | 0 1
\n"
else
herelist
parse $*
case "$modestring" in
# immediate byte to A
1_1*) ab 0x24
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x25
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 4
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 4
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 4
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x20
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x21
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x22
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x23
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nAND doesn't support
" $modestring " mode. " ;;
esac
opnote AND $*
fi
}
GDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SGDT\n
store contents of Global Descriptor Table Register to memory at physical
address. Crucial to protected mode.\n"
else
herelist
parse $*
ab 0x0f 0x01
ac ${number[$source]} # physical address
opnote GDT $*
fi
}
IDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SIDT\n
store contents of Interrupt Descriptor Table Register to memory at
physical address. Crucial to protected mode.\n"
else
herelist
parse $*
ab 0x0f 1
ac ${number[$source]} # physical address
opnote IDT $*
fi
}
LDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SLDT\n
store Local Descriptor Table Register to memory physical address.\n\n"
else
herelist
parse $*
ab 0x0f 0
ac ${number[$source]} # physical address
opnote LDT $*
fi
}
LS1bit () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BSF\n
Find least significant ON-bit. 10 clocks +. Result is the number
of leading 0 bits.\n\n"
else
herelist
parse $*
ab 0x0f 0xbc
modSIBdis $source ${register[$dest]}
opnote LS1bit $*
fi
}
MS1bit () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BSR\n
0F r32,r/m32 10+3n Bit scan reverse on r/m cell
Find most significant ON-bit. 10 + \(3 x offbits\) clocks.
Flags effected: Zero\n\n"
else
herelist
ab 0x0f 0xbd
parse $*
modSIBdis $source ${register[$dest]}
opnote MS1bit $*
fi
}
XOR () { #
if test "$1" = "h" ; then echo -e "\n
Boolean bitwise Exclusive-OR. Result is true if exclusively A OR B is
true. A XOR 1 toggles A, for example.
one-bit results (truth table) with input bits A and B
B
1 0
_|_______________
|
0 | 1 0
A |
1 | 0 1
\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x34
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x35
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 6
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 6
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 6
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x30
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x31
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x32
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x33
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nXOR doesn't support
" $modestring " mode. " ;;
esac
opnote XOR $*
fi
}
OR () { #
if test "$1" = "h" ; then echo -e "\n\n
Boolean bitwise OR. AKA "inclusive OR". If either source bit, A OR B, is
1, then result is 1.
one-bit results (truth table) with input bits A and B
B
1 0
_|_______________
|
0 | 1 0
A |
1 | 1 1
\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x0c
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x0d
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 1
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 1
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 1
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x08
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x09
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x0a
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x08
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nOR doesn't support
" $modestring " mode. " ;;
esac
opnote OR $*
fi
}
add () { #
if test "$1" = "h" ; then echo -e "\n\n
add without including the carry (flag) bit. \n "
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x04
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x05
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 0
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 0
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 0
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x00
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x01
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x02
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x03
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nadd doesn't support
" $modestring " mode. " ;;
esac
opnote add $*
fi
}
addwithcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel ADC\n
add including the pre-existing carry (flag) bit.\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x14
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x15
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 2
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 2
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 2
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x10
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x11
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as ??
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x12
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as ??
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x13
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\n\naddwithcarry doesn't support
" $modestring " mode. " ;;
esac
opnote addwithcarry $*
fi
}
biton () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BTS\n
Save bit in carry flag and set addressed bit to 1 in source value. 6
clocks.\n\n"
else
herelist
parse $*
case $modestring in
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x0f 0xba
modSIBdis $dest 5
ac ${number[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x0f 0xab
modSIBdis $dest ${register[$source]} ;;
*)
echo -e "\n\nbiton doesn't support
" $modestring " mode. " ;;
esac
opnote biton $*
fi
}
bitoff () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BTR\n
Save bit in carry flag and reset to 0.\n\n"
else
herelist
parse $*
case $modestring in
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x0f 0xba
modSIBdis $dest 6
ac ${number[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x0f 0xb3
modSIBdis $dest ${register[$source]} ;;
*)
echo -e "\n\nbitoff doesn't support
" $modestring " mode. \n" ;;
esac
opnote bitoff $*
fi
}
call () { # jsr, splice, #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CALL\n\n\n
Jump to the immediately following address or other value, normally that
of a subroutine, stacking a frame to return to or occurance of the return
(Intel RET) instruction. Frames vary widely by type of call on 386+. There
are intersegment jumps, which are selectors defining gates of various
types in (32 bit) protected mode. call or similar is also known as jsr or
gosub on other machines. The variants of call usually require a FAR
syntactic spamatazoan in other assemblers. shasm syntaxes for the more
mutated forms of call are...
Call intersegment to full pointer given, shasm syntax...
segment offset
call dual 0xxxx to quad 0xxxxx
LAAETTR (gas doesn't do segments explicitly either, IIRC. Nor do most
other CPUs, BTW.)\n\n"
else
herelist
parse $*
case $modestring in
1_[24]_[24]_imme) # same segment relative
ab 0xe8
branch $1 $cell ;;
# same segment from reg
1_[24]_[24]_dire)
ab 0xff
modSIBdis $source 2 ;;
# other segment from mem
1_[24]_[24]_memo)
ab 0xff
modSIBdis $source 3 ;;
2_[24]_[2]_imme_[24]_imme) # possible??????
ab 0x9a
ad ${number[$source]}
ac ${number[$dest]} ;;
*)
echo -e "\n\ncall doesn't support
" $modestring " mode. " ;;
esac
opnote call $*
fi
}
clearswitched () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CLTS\n
clear the task-switched flag in EFLAGS. Ha3sm doesn't use the 386
task-switch facilities, BTW. Most 386 unices do, I think.\n\n"
else
herelist
ab 0x0f 6
opnote clearswitched $*
fi
}
copy () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MOV\n
catch-all.\n\n\n"
else
herelist
parse $*
case $modestring in
*speC) ab 0x0f 0x22
modSIBdis $source ${register[$dest]} ;;
*speD) ab 0x0f 0x23
modSIBdis $source ${register[$dest]} ;;
*speT) ab 0x0f 0x26
modSIBdis $source ${register[$dest]} ;;
*speC_*) ab 0x0f 0x20
modSIBdis $dest ${register[$source]} ;;
*speD_*) ab 0x0f 0x21
modSIBdis $dest ${register[$source]} ;;
*speT_*) ab 0x0f 0x24
modSIBdis $dest ${register[$source]} ;;
2_1_1_memo_1_dire)
if test "${registers[$dest]}" = "0";then
ab 0xa0
ab ${number[$source]}
else
ab 0x8a
modSIBdis $source ${register[$dest]}
fi ;;
# source r/m cell to cell reg
# reg-reg already decoded as ??
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
if test "${registers[$dest]}" = "0";then
ab 0xa1
ac ${number[$source]}
else
ab 0x8b
modSIBdis $source ${register[$dest]}
fi ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire)
if test "${registers[$dest]}" = "0";then
ab 0xa1
ac ${number[$source]}
else
ao 27${register[source]}
modSIBdis $dest 2
ac ${number[$source]}
fi ;;
2_1_1_dire_1_memo)
if test "${registers[$dest]}" = "0";then
ab 0xa2
ab ${number[$dest]}
else
ab 0x88
modSIBdis $dest ${register[$source]}
fi ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_memo)
if test "${registers[$dest]}" = "0";then
ab 0xa3
ac ${number[$dest]}
else
ab 0x89
modSIBdis $dest ${register[$source]}
fi ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire)
ao 26${register[source]}
modSIBdis $dest 2
ab ${number[$source]} ;;
2_1_1_imme_1_memo)
ab 0xc6
modSIBdis $dest 2 # 2 is a don't care I hope
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_memo)
ab 0xc7
modSIBdis $dest 2
ac ${number[$source]} ;;
# r/m to segment reg
2_?_?_memo_?_segm)
ab 0x8d
modSIBdis $dest 2
ac ${number[$source]} ;;
# segment reg to r/m
2_?_?_segm_?_memo)
ab 0x8c
modSIBdis $dest 2
ac ${number[$source]} ;;
*)
echo -e "\n\ncopy doesn't support " \
$modestring " mode. " ;;
esac
opnote copy $*
fi
}
copyextend () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MOVSX\n
copy, sign-extending the destination.\n\n"
else
herelist
parse $*
case $modestring in
2_1_1_memo*) ab 0x0f 0xbe
modSIBdis $source ${register[$dest]} ;;
2_2_2_memo*) ab 0x0f 0xbf
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\ncopyextend doesn't support
" $modestring " mode. " ;;
esac
opnote copyextend $*
fi
}
copy0extend () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MOVZX\n
copy, filling the high-order bits of the destination with zeros. Much
different than a less-than-whole-register copy.\n\n\n"
else
herelist
parse $*
case $modestring in
2_1_1_memo*) ab 0x0f 0xb6
modSIBdis $source ${register[$dest]} ;;
2_2_2_memo*) ab 0x0f 0xb7
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\ncopy0extend doesn't support
" $modestring " mode. " ;;
esac
opnote copy0extend $*
fi
}
downroll () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel ROR\n
down-significance roll, rotate. "source register" must be CL, the
roll amount. \n\n\n"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 1
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 1 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 1 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 1
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 1 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 1 ;;
*)
echo -e "\n\ndownroll doesn't support
" $modestring "mode. " ;;
esac
opnote downroll $*
fi
}
downrollcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel RCR\n
down-significance roll, rotate. The carry bit is part of the roll.\n\n\n"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 3
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 3 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 3 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 3
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 3 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 3 ;;
*)
echo -e "\n\ndownrollcarry doesn't support
" $modestring " mode. " ;;
esac
opnote downrollcarry $*
fi
}
downshift () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SAR\n
down-significance bitshift.\n"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 7
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 7 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 7 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 7
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 7 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 7 ;;
*)
echo -e "\n\n\ndownshift doesn't support
" $modestring " mode. " ;;
esac
opnote downshift $*
fi
}
extendAtoD () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CWD\n
Sign-extend A into A:D, i.e. D becomes all the same as the sign bit of
A.\n\n"
else
herelist
ab 0x99
opnote extendAtoD $*
fi
}
decreasing () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel STD\n
Set memory segment loop (string) operations direction flag to
towards-lower-addresses. Segment ops then will traverse the segments
high-to-low. \n"
else
herelist
ab 0xfd
opnote decreasing $*
fi
}
decrement () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel DEC\n
decrement. 2 clocks.\n\n\n"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0xfe
modSIBdis $source 1 ;;
1_[24]_memo)
ab 0xff
modSIBdis $source 1 ;;
1_[24]_dire)
ao 11${register[$source]} ;;
*)
echo -e "\n\ndecrement doesn't support
" $modestring " mode. " ;;
esac
opnote decrement $*
fi
}
escape () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel IRET\n
Interrupt return. Various stack effects per system state.\n"
else
herelist
ab 0xcf
opnote escape $*
fi
}
farSS () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LSS\n
Load SS:r32 with pointer from memory\n"
else
herelist
parse $*
ab 0x0f 0xb2
modSIBdis $source $dest
opnote farSS $*
fi
}
farDS () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LDS\n
Load DS:r32 with pointer from memory\n\n"
else
herelist
parse $*
ab 0xc5
modSIBdis $source $dest
opnote farDS $*
fi
}
farES () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LES\n
Load ES:register with pointer from memory\n\n"
else
herelist
parse $*
ab 0xc4
modSIBdis $source $dest
opnote farES $*
fi
}
farFS () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LFS\n
Load FS:register with pointer from memory\n\n"
else
herelist
parse $*
ab 0x0f 0xb4
modSIBdis $source $dest
opnote farFS $*
fi
}
farGS () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LGS\n
Load GS:r32 with pointer from memory\n\n"
else
herelist
parse $*
ab 0x0f 0xb4
modSIBdis $source $dest
opnote farGS $*
fi
}
exchange () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel XCHG\n
exchange 2 values in one, usually 3 clock, instruction.\n\n\n"
else
herelist
parse $*
case $modestring in
1*) ao 22${register[$source]} ;;
2_1*) ab 0x86
modSIBdis $source $dest ;;
2_[24]*)
ab 0x87
modSIBdis $source $dest ;;
*) echo -e "\n\nexchange doesn't support
" $modestring " mode. " ;;
esac
opnote exchange $*
fi
}
ifbit () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BT\n
Save bit in carry flag. Bit position to act on is \"source\" argument
in shasm.\n\n"
else
herelist
parse $*
case $modestring in
2_?_?_dire*)
ab 0x0f 0xa3
modSIBdis $dest ${register[$source]} ;;
2_?_?_imme*)
ab 0x0f
modSIBdis $dest 4
ab ${number[$source]} ;;
*)
echo -e "\n\nifbit doesn't support " $modestring " mode.
Supported modes are 2_?_?_imme* and
2_?_?_dire*. " ;;
esac
opnote ifbit $*
fi
}
signextend () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CBW/CWDE\n
If cell = 2, make all bits of DX the same as the most
significant bit of AX, i.e. sign-extend AX into DX.
If cell = 4, sign-extend AX within A (EAX). 3 clocks.\n\n\n"
else
herelist
ab 0x98
opnote signextend $*
fi
}
clearcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CLC\n
unset the carry flag. Make it 0.\n\n\n"
else
herelist
ab 0xf8
opnote clearcarry $*
fi
}
enter () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel ENTER\n
Enter is a complex instruction for creating a lexical level frame for
lexical block languages like Pascal. See also: leave.
...................................................................
enter ( 16bit framesize, 8bit lexical levels)
{
level = level MOD 32 // level is byte 4 of instr encoding
Push BP
frame-ptr = SP // frame-ptr is a hardware temp var
if level > 0
{
for (i = 1 TO (level - 1))
{
BP = BP - 4
Push value _at_ BP
}
Push frame-ptr
}
BP = frame-ptr // BP is now old
SP = SP - ZeroExtend(First operand)
}
...................................................................
enter can take up to 139 clocks, depending on the levels argument. The
most interesting things I see about enter is that it uses two internal
variables that aren't registers, and that it does a looping dereference
over an array of up to 31 pointers. That is, it collects dispersed values.
It does all this atomically, which is important when molesting the return
stack. enter/leave is what gives BP it's framepointer designation. They
are the only instructions that use BP implicitly.
In shasm syntax levels is source, frame size is dest, i.e. source and dest
don't mean what they usually do
e.g.
enter 200 to 3 \n\n"
else
herelist
parse $*
ab 0xc8
ad ${number[$source]}
ab ${number[$dest]}
opnote enter $*
fi
}
flags () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LAHF\n
copy AH into FLAGS, which is low dual of EFLAGS.\h\h"
else
herelist
ab 0x9f
opnote flags $*
fi
}
testAND () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel TEST\n
Do an AND and set the flags accordingly, but don't actually assert the
result value on either of the arguments.\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0xa8
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0xa9
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0xf6
modSIBdis $dest 0
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0xf7
modSIBdis $dest 0
ac ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x84
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x85
modSIBdis $dest ${register[$source]} ;;
*)
echo -e "\n\ntestAND doesn't support
" $modestring " mode. " ;;
esac
opnote testAND $*
fi
}
testsubtract () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CMP\n
Do an AND and set the flags accordingly, but don't actually assert the
result value on either of the arguments.\n\n"
else
herelist
parse $*
case "$modestring" in
# immediate byte to A
1_1*) ab 0x3c
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x3d
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 7
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 7
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 7
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x38
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x39
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x3a
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x38
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\ntestsubtract doesn't support
" $modestring " mode. " ;;
esac
opnote testsubtract $*
fi
}
# Got milk?
storemachinestatusdual () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SMSW\n
Store machine status dual to EA dual
Legacy 286 thing. Can save a byte or two on a bootsector.\n\n"
else
herelist
parse $*
# check possible
ab 0x0f 1
modSIBdis $source 4
opnote storemachinestatusdual $*
fi
}
increasing () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CLD\n
Setstring operations direction flag to toward-higher-addresses\n\n"
else
herelist
ab 0xfc
opnote increasing $*
fi
}
increment () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel INC\n
add 1 to whatever\n\n"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0xfe
modSIBdis $source 0 ;;
1_[24]_memo)
ab 0xff
modSIBdis $source 6 ;;
1_[24]_dire)
ao 10${register[$source]} ;;
*)
echo -e "\n\nincrement doesn't support
" $modestring " mode. " ;;
esac
opnote increment $*
fi
}
interrupts () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel STI\n
Allow external hardware to interrupt the CPU.\n\n"
else
herelist
ab 0xf3
opnote interrupts $*
fi
}
subtract () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SUB\n
Subtract without including borrow (carry) bit.\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x2c
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x2d
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 5
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 5
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 5
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x28
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x29
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x28.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x2a
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x29.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x28
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nsubtract doesn't support
" $modestring " mode. " ;;
esac
opnote subtract $*
fi
}
jump () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JMP\n
Partial support here.
Unconditional branch. Various modes.\n\n\n"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0xeb
branch $1 1 ;;
1_[24]_[24]_imme)
ab 0xe9
branch $1 $cell ;;
1_[24]_[24]_dire)
ab 0xff
modSIBdis $source 4 ;;
*)
echo -e "\n\njump doesn't support
" $modestring " mode. " ;;
esac
opnote jump $*
fi
}
ifzero () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JZ\n
Branch if 0. Very frequently occuring instruction.\n\n "
else
herelist
parse $*
case $modestring in
1_1*)
ab 0x74
branch $1 1 ;;
1_[24]*)
ab 0x0f 0x84
branch $1 $cell ;;
*)
echo -e "\n\nifzero doesn't support
" $modestring " mode. " ;;
esac
opnote ifzero $*
fi
}
ifnonzero () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JNE/JNZ\n
branch if zero flag contains zero, meaning that a recent operation did not
result in a zero.\n\n"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0x75
branch $1 1 ;;
1_[24]*)
ab 0x0f 0x85
branch $1 $cell ;;
*)
echo -e "\n\nifnonzero doesn't support
" $modestring " mode. " ;;
esac
opnote ifnonzero $*
fi
}
linear () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LEA\n
Store effective address for memory reference in register. This does the
address arithmatic and leaves the result of that, and doesn't fetch the
referenced object.\n\n"
else
herelist
ab 0x8d
modSIBdis $source ${register[$dest]}
opnote linear $*
fi
}
leave () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\tIntel LEAVE\n
exuent a Pascal-style module frame. see also: enter\n\n"
else
herelist
ab 0xc9
opnote leave $*
fi
}
limit () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\tIntel LSL\n
load the limit value from a segment descriptor.\n"
else
herelist
ab 0x0f 3
modSIBdis $source ${register[$dest]}
opnote limit $*
fi
}
lookup () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\tIntel XLATB\n
Set AL to memory byte DS:[BX + unsigned AL]. One-byte instruction.\n"
else
herelist
ab 0xd7
opnote lookup $*
fi
}
loop () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\tIntel LOOP\n
branch short if CL is not 0. I don't know if forward branches are
possble.\n\n"
else
herelist
ab 0xe2
ab ${number[$source]}
opnote loop $*
fi
}
loopz () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LOOPZ\n
branch short if CL is not 0 AND zero flag is true.\n\n"
else
herelist
ab 0xe1
ab ${number[$source]}
opnote loopz $*
fi
}
loopnz () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LOOPNZ\n
branch short if CL is not 0 AND zero flag is false.\n\n"
else
herelist
ab 0xe0
ab ${number[$source]}
opnote loopnz $*
fi
}
loadmachinestatusdual () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LMSW\n
286 control register shortcut\n\n"
else
herelist
ab 0x0f 1
madSIBdis $source 6
opnote loadmachinestatusdual $*
fi
}
multiply () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel IMUL\n
F6 /5 r/m8 9-14/12-17 AX^[ AL * r/m byte
F7 /5 r/m32 9 -38/12-41 EDX:A ^[ A * r/m cell
0F/r r32,r/m32 9-38/12-41 cell register ^[ cell
register * r/m cell
6B /r ib r16,r/m16,imm8 9-14/12-17 dual register ^[ r/m16 *
sign-extended immediate byte
6B /r ib r32,r/m32,imm8 9-14/12-17 cell register ^[ r/m32 *
sign-extended immediate byte
6B /r ib r16,imm8 9-14/12-17 dual register ^[ dual
register * sign-extended
immediate byte
6B /r ib r32,imm8 9-14/12-17 cell register ^[ cell
register * sign-extended
immediate byte
69 /r iw r16,r/m16,imm16 9-22/12-25 dual register ^[ r/m16 *
immediate dual
69 /r immcell r32,r/m32,imm329-38/12-41 cell register ^[ r/m32 *
immediate cell
69 /r iw r16,imm16 9-22/12-25 dual register ^[ r/m16 *
immediate dual
69 /r immcellr32,imm32 9-38/12-41 cell register ^[ r/m32 *
immediate cell
9 to 41 clocks.
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nmultiply doesn't support " $modestring " mode. " ;;
esac
opnote multiply $*
fi
}
negate () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel NEG\n
F6 /3r/m82/6 Two's complement negate r/m byte
F7 /3r/m32 2/6
Two's complement negate, 2 or 6 clocks. simple NOT, then increment.
"
else
herelist
parse $*
case $modestring in
1*)
ab 0xf6
modSIBdis $source 3 ;;
2*)
ab 0xf7
modSIBdis $source 3 ;;
*)
echo -e "\n\nnegate doesn't support
" $modestring " mode. " ;;
esac
opnote negate $*
fi
}
nocarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CLC\n
unset carry flag. To 0.\n\n"
else
herelist
ab 0xf8
opnote nocarry $*
fi
}
nop () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel NOP\n
Do nothing. This actually is the OR A with A version of OR.\n\n"
else
herelist
ab 0x90
opnote nop $*
fi
}
NOT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel NOT\n
Boolean bitwise not. Invert all the bits. All zeros become ones and
vice-versa.\n\n"
else
herelist
parse $*
case $modestring in
1_1_1_memo)
ab 0xf6
modSIBdis $source 2 ;;
1_[24]_[24]_memo)
ab 0xf7
modSIBdis $source 2 ;;
*)
echo -e "\n\nNOT doesn't support
" $modestring " mode. " ;;
esac
opnote NOT $*
fi
}
nointerrupts () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CLI\n
disable external hardware interrupts to the CPU.
seful at boot time maybe and for profound state-changes like
process-switches.\n\n"
else
herelist
ab 0xfa
opnote nointerrupts $*
fi
}
overflowtrap () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel INTO\n
cause invocation of a trap handler IF overflow bit is set.\n\n"
else
herelist
ab 0xce
opnote overflowtrap $*
fi
}
priviledge () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel ARPL\n
Adjust requested priviledge level of r/m16 to not less than RPL of r16
\n\n"
else
herelist
ab 0x63
modSIBdis $dest ${register[$source]}
opnote priviledge $*
fi
}
pullcore () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel POPA\n
copy (pop) DI, SI, BP, SP, B, D, C, and A off the stack.
adjusting stack pointer SP accordingly.\n\n"
else
herelist
ab 0x61
opnote pullcore $*
fi
}
pull () { partial # unstack #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel POP\n
Copy top of stack into operand, adjusting stack pointer SP accordingly.
\n\n"
else
herelist
parse $*
case $modestring in
1_[24]_[24]_dire)
ab 0x8f
modSIBdis $source 0 ;; # must be DI
1_1_1_imme)
ab 0x6a ${number[$source]} ;;
1_[24]_[24]_dire)
ao 13${register[$source]} ;;
*segm)
if test ${register[$source]} = "0" ;then # ES
ab 0x07
elif test ${register[$source]} = "2" ;then # SS
ab 0x17
elif test ${register[$source]} = "3" ;then # DS
ab 0x1f
elif test ${register[$source]} = "4" ;then # FS
ab 0x0f 0xa1
elif test ${register[$source]} = "5" ;then # GS
ab 0x0f 0xa9
else
echo -e "\n\npush doesn't support
" $modestring " mode. "
fi ;;
*)
echo -e "\n\npull doesn't support
" $modestring " mode. " ;;
esac
opnote pull $*
fi
}
pullflags () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel POPF\n
copy top of stack into flags reg, adjusting stack pointer SP
accordingly.\n\n"
else
herelist
ab 0x9d
opnote pullflags $*
fi
}
pushcore () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel PUSHA\n
copy the eight main regs onto the stack, adjusting stack pointer SP
accordingly.\n\n"
else
herelist
ab 0x60
opnote pushcore $*
fi
}
pushflags () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel PUSHF\n
copy FLAGS onto the top of stack, adjusting stack pointer SP
accordingly.\n\n"
else
herelist
ab 0x9c
opnote pushflags $*
fi
}
push () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel PUSH\n
copy operand onto top of stack, adjusting stack pointer SP
accordingly.\n\n"
else
herelist
parse $*
case $modestring in
1_[24]_[24]_imme)
ab 0x68
ac ${number[$source]} ;;
1_1_1_imme)
ab 0x6a ${number[$source]} ;;
1_[24]_[24]_dire)
ao 12${register[$source]} ;;
*segm)
if test ${register[$source]} = "0" ;then # ES
ab 0x06
elif test ${register[$source]} = "1" ;then # CS
ab 0x0e
elif test ${register[$source]} = "2" ;then # SS
ab 0x16
elif test ${register[$source]} = "3" ;then # DS
ab 0x1e
elif test ${register[$source]} = "4" ;then # FS
ab 0x0f 0xa0
elif test ${register[$source]} = "5" ;then # GS
ab 0x0f 0xa8
else
echo -e "\n\npush doesn't support
" $modestring " mode. "
fi ;;
*)
echo -e "\n\npush doesn't support
" $modestring " mode. " ;;
esac
opnote push $*
fi
}
quadextend () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CDQ\n
sign-extend A to D:A
\n\n"
else
herelist
ab 0x99
opnote quadextend $*
fi
}
readable () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel VERR\n
0F 00 /4 r/m16 pm=10/11 Set ZF=1 if segment can be read,
selector in r/m16
0F 00 /5 r/m16 pm=15/16 Set ZF=1 if segment can be written,
selector in r/m16
Set zero flag to true if segment of given selector can be written.\n\n"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nreadable doesn't support
" $modestring " mode. " ;;
esac
opnote readable $*
fi
}
recieve () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel IN\n
E4 ib AL,imm8 12,pm=6*/26** Input byte from immediate port
into AL
E5 ib A,imm812,pm=6*/26** Input cell from immediate port
into A
EC AL,DX 13,pm=7*/27** Input byte from port DX into AL
ED A,DX 13,pm=7*/27** Input cell from port DX into A
Input from port\n\n"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nrecieve doesn't support
" $modestring " mode. " ;;
esac
opnote recieve $*
fi
}
return () { partial support, near #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel RET\n
C3 10+m Return (near) to caller
CB 18+m,pm=32+m Return (far) to caller, same
privilege
CB pm=68 Return (far), lesser privilege,
-------> switch stacks
C2 iw imm16 10+m Return (near), pop imm16 bytes of
parameters
CA iw imm16 18+m,pm=32+m Return (far), same privilege, pop
imm16 bytes
Return from a call. Various stack frames by call type.\n\n"
else
herelist
parse $*
case $modestring in
1*)
ab 0xc2
ad ${number[$source]} ;;
*)
ab 0xc3 ;;
esac
opnote return $*
fi
}
rights () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LAR\n
MORE TEXT HERE
r16 becomes r/m16 masked by FF00 \n\n"
else
herelist
ab 0x0f 2
modSIBdis $source ${register[$dest]}
opnote rights $*
fi
}
setGDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LGDT\n
Load pointer at memory operand into Global Descriptor Table Register. This
and setIDT are the only instructions that always interpret an address as
physical, since they set up the memory protection scheme.\n\n"
else
herelist
ab 0x0f 0x01
modSIBdis $source 2
opnote setGDT $*
fi
}
setIDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LIDT\n
Load pointer at memory operand into Interrupt Descriptor Table Register.
This and setGDT are the only instructions that always interpret an address
as physical, since they set up the memory protection scheme.\n\n"
else
herelist
ab 0x0f 0x01
modSIBdis $source 3
opnote setIDT $*
fi
}
setLDT () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LLDT\n
Load pointer at memory operand into Local Descriptor Table Register.\n\n"
else
herelist
ab 0x0f 0x00
modSIBdis $source 2
opnote setLDT $*
fi
}
savetask () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel STR\n
Load EA dual into task register. Ha3sm doesn't use the 386+ task handling
facilities. Most 386 unices do I think.\n\n"
else
herelist
ab 0x0f 0x00
modSIBdis $source 1
opnote savetask $*
fi
}
send () { HORKED
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel OUT\n
HORKED
EE DX,AL 11,pm=5*/25** Output byte AL to port number in
DX
EF DX,A 11,pm=5*/25** Output cell AL to port number
in DX
Output to a port number\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0xe6
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0xe7
ac ${number[$source]} ;;
*)
echo -e "\n\nsend doesn't support
" $modestring " mode. " ;;
esac
opnote send $*
fi
}
setcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel STC\n
assert carry=true, 1.\n\n"
else
herelist
ab 0xf9
opnote setcarry $*
fi
}
setflags () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SAHF\n
copy AH to the FLAGS register-half.\n\n"
else
herelist
ab 0x9e
opnote setflags $*
fi
}
sleep () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel HLT\n
halt processor until next hardware interrupt. Be nice to your CPU.\n\n"
else
herelist
ab 0xf4
opnote sleep $*
fi
}
subtractborrow () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SBB\n
Subtract with borrow\n\n"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x1c
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x1d
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 3
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 3
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 3
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x18
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x19
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x1a
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as ?
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x18
modSIBdis $source ${register[$dest]} ;;
*)
echo -e "\n\nsubtractborrow doesn't support
" $modestring " mode. " ;;
esac
opnote subtractborrow $*
fi
}
task () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel LTR\n
Load EA dual into task register\n\n"
else
herelist
ab 0x0f 0
modSIBdis $source 3
opnote task $*
fi
}
# ###### this REALLY _IS_ 3 distinct instructions
# well, all distinct opcodes are, but these three are pretty different.
trap () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel INT\n
CC 3 33 Interrupt 3--trap to debugger
CDibimm8 37 Interrupt numbered by immediate
CE Fail:3,pm=3;
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\ntrap doesn't support " $modestring " mode. " ;;
esac
opnote trap $*
fi
}
multiply () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MUL\n
F6 /4AL,r/m8 9-14/12-17Unsigned multiply (AX ^[ AL * r/m byte)
F7 /4A,r/m329-38/12-41Unsigned multiply (EDX:A ^[ A * r/m
cell)
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nmultiply doesn't support " $modestring " mode. " ;;
esac
opnote multiply $*
fi
}
unbit () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BTC\n
Save specified bit of operand into carry flag and complement it in operand.
\n\n"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_dire*)
ab 0x0f 0xbb
modSIBdis $dest ${register[$source]} ;;
2_1_1_imme*)
ab 0x0f 0xba
modSIBdis $dest 7 ;;
*)
echo -e "\n\nunbit doesn't support
" $modestring " mode. " ;;
esac
opnote unbit $*
fi
}
invertcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CMC\n
flip the carry bit\n\n"
else
herelist
ab 0xf5
opnote invertcarry $*
fi
}
signeddivide () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel IDIV\n
19 to 43 clocks.
"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0xf6
modSIBdis $source 7 ;;
1_[24]*)
ab 0xf7
modSIBdis $source 7 ;;
*)
echo -e "\n\nsigneddivide doesn't support
" $modestring " mode. " ;;
esac
opnote signeddivide $*
fi
}
unsigneddivide () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel DIV\n
"
else
herelist
parse $*
case $modestring in
# immediate byte to A
1_1*) ab 0x3c
ab ${number[$source]} ;;
# immediate cell to A
1*) ab 0x3d
ac ${number[$source]} ;;
# immediate source byte to byte r/m
2_1_1_imme_1_dire | 2_1_1_imme_1_memo)
ab 0x80
modSIBdis $dest 7
ab ${number[$source]} ;;
# immediate cell to r/m cell
2_[24]_[24]_imme_[24]_dire | 2_[24]_[24]_imme_[24]_memo)
ab 0x81
modSIBdis $dest 7
ac ${number[$source]} ;;
# immediate source byte to cell r/m
2_1_1_imme_[24]_dire | 2_1_1_imme_[24]_memo)
ab 0x83
modSIBdis $dest 7
ab ${number[$source]} ;;
# reg byte source to byte r/m
2_1_1_dire_1_dire | 2_1_1_dire_1_memo)
ab 0x38
modSIBdis $dest ${register[$source]} ;;
# register cell to r/m cell
2_[24]_[24]_dire_[24]_dire | 2_[24]_[24]_dire_[24]_memo)
ab 0x39
modSIBdis $dest ${register[$source]} ;;
# byte r/m source to byte reg
# reg-reg already decoded as 0x20.
# I think that's OK.
2_1_1_memo_1_dire)
ab 0x3a
modSIBdis $source ${register[$dest]} ;;
# source r/m cell to cell reg
# reg-reg already decoded as 0x21.
# I think that's OK.
2_[24]_[24]_memo_[24]_dire)
ab 0x38
modSIBdis $source ${register[$dest]} ;;
*)
echo "BONK" ;;
esac
opnote unsigneddivide $*
fi
}
uprollcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel RCL\n
up-significance bit roll including carry in the ring of bits rolled
"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 2
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 2 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 2 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 2
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 2 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 2 ;;
*)
echo -e "\n\nuprollcarry doesn't support
" $modestring " mode. " ;;
esac
opnote uprollcarry $*
fi
}
uproll () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel ROL\n
"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 0
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 0 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 0 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 0
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 0 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 0 ;;
*)
echo -e "\n\nuproll doesn't support
" $modestring " mode. " ;;
esac
opnote uproll $*
fi
}
upshift () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SAL\n
up-significance bitshift. zeros roll in on low-significance end, bits are lost
on the high-significance end.\n\n"
else
herelist
parse $*
case $modestring in
2_[24]_[24]_imme_[24]_memo)
ab 0xc1
modSIBdis $dest 1
ab ${number[$source]} ;;
1_1_1_memo)
ab 0xd0
modSIBdis $dest 1 ;;
2_1_1_dire*) # source is CL
ab 0xd2
modSIBdis $dest 1 ;;
2_1_1_imme_1_memo)
ab 0xc0
modSIBdis $dest 1
ab ${number[$source]} ;;
1_[24]_[24]_memo)
ab 0xd1
modSIBdis $dest 1 ;;
2_[24]_[24]_dire*) # source is CL
ab 0xd3
modSIBdis $dest 1 ;;
*)
echo -e "\n\nupshift doesn't support
" $modestring " mode. " ;;
esac
opnote upshift $*
fi
}
widedownshift () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SHRD\n
0Fr/m32,r32,imm8 3/7 r/m32 gets SHR of r/m32 concatenated with r32
0Fr/m32,r32,CL 3/7 r/m32 gets SHR of r/m32 concatenated with r32
down-significance bitshift of a composite operand made of ??????????
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nwidedownshift doesn't support
" $modestring " mode. " ;;
esac
fi
opnote widedownshift $*
}
wideupshift () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SHLD\n
composite up-significance bitshift.
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nwideupshift doesn't support
" $modestring " mode. " ;;
esac
opnote wideupshift $*
fi
}
within () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel BOUND\n
62/r BOUND r32,m32&32 10
Check if r32 is within bounds, (passes test). Bounds are adjacent
32 bit values in memory.\n\n"
else
herelist
parse $*
ab 0x62
modSIBdis $source ${register[$dest]}
opnote within $*
fi
}
writeable () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel VERW\n
Test if current process is allowed to write to given ???????
"
else
herelist
opnote writeable $*
fi
}
####################
#### segment (string) ops want very badly to be macros. Enjoy.
fill () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel STOSD\n
AAm8 4 Store AL in byte ES:[DI], update DI
AB STOSD 4
Store A in cell ES:[DI], update DI\n\n"
else
herelist
opnote fill $*
fi
}
recieves () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel INSB\n
6Cr/m8,DX 15,pm=9*/29** Input byte from port DX into ES:DI
6Dr/m32,DX15,pm=9*/29** Input cell from port DX into ES:DI
"
else
herelist
opnote recieves $*
fi
}
segmentcopy () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MOVSD\n
"
else
herelist
opnote segmentcopy $*
fi
}
segmentcompare () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel SCASB\n
AEm8 7 Compare bytes AL-ES:[DI], update DI
AFm32 7 Compare cells A-ES:[DI], update DI
"
else
herelist
opnote segmentcompare $*
fi
}
segmentcopy () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel MOVSB\n
"
else
herelist
opnote segmentcopy $*
fi
}
segmentifsubtract () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CMPS\n
A6 m8,m8 10 Compare bytes ES:[DI] (second
operand) with [SI] (first
operand)
A7 m32,m32 10 Compare cells ES:[DI]
(second operand) with [SI]
(first operand)
"
else
herelist
opnote segmentifsubtract $*
fi
}
segmentifsubtractbytes () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel CMPSB\n
"
else
herelist
opnote segmentifsubtractbytes $*
fi
}
sendsegment () {
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel OUTS\n
6E DX,r/m8 14,pm=8*/28** Output byte [SI] to port in DX
6F DX,r/m32 14,pm=8*/28** Output cell [SI] to port in DX
"
else
herelist
parse $*
case $modestring in
*)
echo -e "\n\nsendsegment doesn't support " $modestring " mode. " ;;
esac
opnote sendsegment $*
fi
}
# Note to self: BLINK STUPID!
ifnocarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JAE\n
"
else
herelist
parse $*
case $modestring in
1_1*)
ab 0x73
branch $1 1 ;;
1_[24]*)
ab 0x0f 0x83
branch $1 $cell ;;
*)
echo -e "\n\nifzero doesn't support
" $modestring " mode. " ;;
esac
opnote ifnocarry $*
fi
}
ifcarry () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JB\n
jump if the carry bitflag is set, 1.\n\n"
else
herelist
parse $*
case $modestring in
1*)
ab 0x72
branch $1 1 ;;
1_[24]*)
ab 0x0f 0x82
branch $1 $cell ;;
*)
echo -e "\n\nifcarry doesn't support
" $modestring " mode." ;;
esac
opnote ifcarry $*
fi
}
ifC0 () { #
if test "$1" = "h" ; then echo -e "\n\t\t\t\t\tIntel JCXZ\n
jump if C register is zero.\n\n"
else
herelist
parse $*
case $modestring in
1*)
ab 0xe3
branch $1 1 ;;
*)
echo -e "\n\nifC0 doesn't support
" $modestring " mode." ;;
esac
opnote ifC0 $*
fi
}
# --# x86 too-ugly-to-live stuff. AAA and friends
# ."CPU'/386_de-emphasized"
#................................................................
#................................................................
shasm is character-by-character from-scratch new. It's not based on any
existing assembler. It's based on the old Intel text file 386INTEL.TXT,
i.e. it's based on the 80386 itself. I suspect it's the first utterly new
x86 assembler in quite a while. I'm not an 8086 guy. I hate the pig. I had
Commodore stuff when the 8086 was prevalent.
Recent Forths and BCPL are built on the concept of a cell, an integer the
size of an address. The 80386 can be looked at the same way vis-a-vis the
real/pmode issues of the machine. shasm takes advantage of this. I'm not
familiar with IA64, but the concept of a cell is a very general and
forward-compatible thing. The 386 is more flexible than the cell concept
as usually needed, because most things can assume the cellsize of the
machine is a constant, and it isn't on x86.
The generality of cells and what I hope is the generality of the asmacs
oper names makes shasm code look general. shasm looks like COBOL. How
general is it? Is there a truly portable assembler lurking in here? Is the
386 a useful subset of other desktop CPU's? How many Bash routines on top
of shasm constitutes a compiler?
Assembling the bitfields, bytes and cells of machine code is not what the
unix shell is designed for. A hallmark of unix is simple specialized tools
that do one thing well. That's proven to be an excellent design guide, but
it's subject to some variation. If you must have the costs of a particular
thing, and typically you must incur the cost of a command interpreter on a
computer, then the best thing a thing can be is as versatile as possible
per cost. This is why people carry Leatherman's. And money. Very versatile
stuff, money. This is the great strength of BASIC. It's horrid, but it can
do just about anything. The value of versatility is why unix shells have
accreted features over the years, and shasm shows that Bash 2.x is capable
of just about anything, given a long enough period of time to get it done.
Relative to the functionality of a classico Bourne sh, the addenda
required for complete generality are trivial, and thus a bargain. In view
of the rich interactive conveniences of Bash, echo -e and let are highly
cost-effective. The default interface to the system is the place where
de-specialization makes the most sense. The fact that shasm allows
arbitrary development with just a shell shows this.
An advantage of the shell is ease of implementing a parser. sh case
statements take globbing patterns as switch strings, for example. shasm
therefor has a very accepting syntax. In particular, Intel, AT&T and Plan9
syntaxes can all probably be converted to shasm with simple ed scripts.
.....................................................................
.....................................................................
How shasm works docs/design
shasm is a collection of shell routines that append arbitrary binary data
to a file named a.out. There are numerous routines for making that data
x86 machine code. shasm itself is written entirely in GNU Bash. shasm
makes each individual opcode assembler an actual shell subroutine
(function), and thus the file to be assembled (if you use an input file)
is actually a shell script, as opposed to something parsed by one. This is
a result of the pathologically un-orthogonal x86 instruction set, and the
desire to avoid extranaeity. You basically have to give a lot of brains to
each instruction to assemble x86, so you might as well make them routines.
This means we want the syntax of a shasm assembly source file to map to
the syntax of shell commands without too many contortions. For example, we
don't want to implement macros, which requires parsing the whole file in
ways I am not sure the shell can do, and which would be pathologically
slow anyway. More pathologically slow, that is. Not to mention, shell
"macro expansion" is available anyway, sortof. Here's the syntax of a
shell command or subroutine...
(one shell command)
[assignments] command [ arguments ]
One shasm instruction has this syntax...
(one shasm instruction)
opcode [ arguments ]
If the opcode is a command, which it is, we're almost done. We can juggle
the arguments to our heart's content. The only limitation is we can't use
shell meta-ops like < & and so on in shasm tokens. Prefixes are the issue.
We make them commands also. CS, SS and similar exist as commands and as
defined argument tokens representing register operands. Shell syntax
disambiguates them by context for us. Keep them on separate lines, or use
semicolon. That's right, semicolon is as per usual for the shell, and so
are # comments and everything else native to your friendly neighborhood
unix command interpreter.
If shasm didn't have a branch resolver it wouldn't even be worthy of the
term assembler. (Not that it is anyway, but...) Forward branches have to
be dealt with after they are resolved. Issue. How do we take two passes
over the same shell script, with different actions on pass 1 and pass 2?
We call the actual assembler script by sourcing it from main(), twice, and
pass-sensitive actions are in a few pass-sensitive routines. The
pass-sensitive state, the list of branches to resolve, is global to main()
and it's progeny. All opcodes and so on can use the same few output
routines, which can do one thing on pass 1 and finish up on pass 2.
If you are using shasm interactively, without an assembly script, you
yourself have to maintain the pass variable by hand.
Opcodes do the syntax-checking they need. The very minimum they need.
Error checking is what a.list is for anyway. Prefixes just get assembled
without checking. There is no inter-op checking, and we've arbitrarily
made prefixes distinct ops.
The listing of values that are created as octal strings are just output to
a.list in octal. A clump of 3 characters in a.list is an octal byte. Going
from octal or integer back to a string is tricky, and lo and behold,
likely not worth it. The values in question are composed octally anyway,
(the 386 modR/M and SIB bytes and many oper bytes) so they might as well
be viewed in octal. Other bytes are 2 hexadecimal characters. If you look
at the code for the modR/M and SIB stuff, you'll notice that octal values
are actually built up as text strings of ascii representions of the octal
digits. This is the form in which all bytes are presented to the various
binary outputters based on echo -e.
On a P166 one fancy-mode instruction invoked as a command takes .03
seconds or more. At that rate it would take, oh, 3 hours to assemble Linux
from gcc-produced shasm source, if there were such a bizarre thing. Just
to assemble it, not compile the C. gas is designed to serve gcc. shasm
isn't, so for what shasm is intended for that's plenty good enough. It
should take about 2 minutes to assemble Ha4sm when it's re-written in
shasm. I looked at using a "string" as an IO buffer, but at a glance it
didn't seem to make a noticeable difference. Anyway, figure 100 times
slower than gas. Well-optimized machine code is often worth that. The
interactivity of shasm is worth much more than that, if you're hand-coding
something.
The address mode parser builds globbing pattern match strings of the form
1_2_4_imme_2_dire
for each instruction encountered that calls parse(). Instructions
that have several addressing modes all call parse(). The modestring is
used in a case switch in the instruction which implements the specifics of
that instruction. The cases map roughly to the various main opcode bytes
that a particular instruction can start with. The case thing isn't
particularly efficient or clear, but it does map to how the 386 decodes
things reasonably well. Another method might be better for another CPU, or
for x86, but the modestrings thing is OK. Shell case constructs are very
flexible because each case test string is a globbing pattern. The format
of the modestring is...
#ofoperands_smallestoperandsize_sourcesize_sourcetype_destsize_desttype
so...
2_[24]_4_????_4_dire
means "a 2-operand instruction with a smallest operand of 2 or 4 bytes,
with a 4-byte source operand of any type and a 4-byte destination operand
of type register-direct" .
The [24] in the above example modestring is a glob pattern for "2 or 4".
That's the range of address cell sizes on 386+. The concept of a cell is
quite valuable, being what recent Forths, BCPL, and my H3sm language are
based on. This gives these languages, and maybe shasm, a bit of platform
independance. ON x86 you need platform-independance on one platform, pmode
and real. The concept of a cell might also be forward-compatible to
post-IA32 devices, for example.
The wholesale renaming of 386 opcodes is helpful, in my experience.
Assembly language mnemonics have remained more cryptic than need be over
the last 20 years. What a verbose name does is moves the comment into the
name, so to speak, which eliminates some mental indirection. The names
used in x86 shasm arose while writing the Janet_Reno bootsector for x86 as
m4 macros called "asmacs". (Janet_Reno by the way is a bootsector, just
the bootsector, that gets into pmode and can call AT BIOS routines in
pmode interrupt handlers.) I suspect that looking at generic-ized names
for x86 instructions may suggest a subset of the x86 instruction set that
is somewhat portable. Assembly language directives, as oposed to opcodes,
(the stuff in shasm main basically) are already portable. shasm main is
little-endian, but the generalization of that is trivial.
Shasm's general from/to syntax wasn't too tough. It involves some short
arrays. Instructions operands come in with a leftness or rightness, and
then "to" or "from" assigns source and dest to the values of left and
right, as the case may be. Thereafter things can refer to $source or $dest
and not care about left and right, but rather care about what they need
to.
a.list resembles GNU gas -anl output. (Highly recommended, BTW.) The
implementation oddities there are the aformentioned octal cheat, and
left-justifying the text column, the rightmost column. That uses a
per-line character counter and a Boolean mask.
There's some extra added lameness in the source/dest indirection
"pointers" vis-a-vis integers and array subscripts. I do some "math" with
if/then and strings. That can and should probably be cured with
declare -ia , an array of integers.
The branch resolver traverses the list of labels doing name matching. I
suspect that can be improved quite a bit. H3sm doesn't do that, IIRC.
Rick Hohensee www.clienux.com humbubba@smart.net
jan/feb 2001
............................................................................
............................................................................
<html><head><title>seedoc of cLIeNUX shasm</title></head>
<h1>shasm</h1>
<h3>NAME</h3>
shasm - binary file assembler written entirely in the GNU Bash shell
<h3>PAGE DATE</h3>
Jan. 2001
<h3>INTERFACE</h3>
<em>shasm filename</em>
<br><br>
or interactively, <br>
<br><em>. shasm</em>
<br>
<br> output is to the created files a.out and a.list
<h3>DESCRIPTION</h3>
<h4>Good news</h4>
shasm is a trivially extensible, utterly flexible collection of unix
shell routines for assembling arbitrary binary files, including 80386
machine language programs. You probably already know a lot about how it
works. shasm can be run on any computer you can get <em>Bash</em> or
similar installed on. It uses only the shell. shasm provides gobs of
user-feedback.
<h4>Bad News</h4>
shasm is about 100 times slower than gas. The functionality provided is
just what I need to assembly a particular programming language on 386+.
shasm needs a fairly featureful shell; ash won't cut it.
<p>
<h4>local-scope jargoneering</h4>
<em>bytes</em>, <em>duals</em> and <em>quads</em> are integers of 1, 2
and 4 bytes respectively. An <em>oper</em> is the first byte or the
characteristic byte pair of a specific x86 machine instruction. An
<em>argument</em> is any syntactic modifier to an instruction other than
prefixes. Arguments in shasm are separated by spaces. An <em>operand</em>
is the actual value the instruction will act on at runtime, as defined by
the arguments. Note that I've based this on "instruction" without
defining that. It usually means the thing there is an Intel name for, but
not always. A <em>macro</em> in shasm is a shell expression or routine
additional to what shasm provides.
<h4>amble</h4>
<em>shasm</em> is an assembler written in the GNU Bash unix-style command
interpreter "shell" to make my <em>H3sm</em> 3-stack programming language
maximally portable. The initial shasm is for that purpose, and is for a
subset of the x86 instruction set which is most useful for systems
programming languages and operating systems. A side-effect of that goal is
that shasm is a flexible and relatively easy-to-use means of creating any
kind of arbitrary binary file. Because it's a set of scripts, the scripts
themselves are the executable, the authoratative documentation, and are
part of the user interface. shasm does not use anything external to the
shell, such as sed, dd and so on. Most assemblers these days are geared to
run in the background supporting a high-level language. shasm is more for
coding machine language directly, interactively.
<p>
The initial shasm provides machine code assembly for a subset of the x86
instruction set. shasm implements a non-cryptic set of names for x86
instructions that I find helpful called <em>asmacs</em>. If you prefer
Intel names you can easily transliterate them back in, for the most part.
There are a few names that don't map one-to-one though.
<p>
Shasm interprets the file to be assembled as a shell script. The opcodes
in shasm are shell subroutines (functions), and any routine in shasm, and
any functionality of Bash, is available throughout the assembly source.
<h4>shell argument syntax</h4>
shasm's syntax is actually the behavior of shell argument processing.
Usually one machine instruction and it's arguments are assembled by one
shell routine and it's following arguments. An operator is followed by
space-separated arguments. Arguments to the operator can be shell
expressions if they are contiguous or quoted. The exception is that
instruction prefixes are handled like separate instructions by shasm.
Instruction delimiting and argument delimiting are thus shell-style.
Instructions are separated by ends of lines, and lines may be continued
with a terminating <em>\</em> or subdivided with <em>;</em> as per usual
in the shell. Arguments are separated by spaces, also as is typical in the
shell. shasm itself doesn't do any character-by-character parsing/lexing,
so some things that are usually prefixes in other assemblers are separate
tokens in shasm. For example, there are two separators for the source and
destination sides of an instruction's arguments. They are <em>to</em> and
<em>from</em>. These are the equivalent of a comma in e.g. GNU gas, and
must be separated from other arguments by spaces.
<p>
The most important variable is <em>here</em>, which is the current
assembly address. here is equivalent to period in other assemblers (and is
degenerately analagous to HERE in Forth). here is a declared integer. You
can use here in Bash expressions as you see fit. <em>L</em> is the label
specifier, and <em>fillto</em> is equivalent to the .org directive of
other assemblers. fillto fills from here to the address specified with
zero-bytes.
<p>
The high-level utility of the shell provides many other features typical
of assemblers implicitly. Examples:
<ul> <li> <em>. <filename></em> is your .include directive.
</li>
<li><em>MOV () { copy $* ; } </em> renames an opcode or
shasm routine.
</li>
<li>Shell routines more complex than the preceeding constitute "macros".
</li>
<li>Suffixes and other constructs are implicit to shell string concatenation.
</li>
<li><em>declare -i pi=314159</em> declares an integer constant.
</li>
<li><em>echo</em> can send arbitrary progress info to the user anytime
</li>
<li>shell conditional and looping constructs can control assembly
</li>
</ul>
<p>
I use the terms "byte", "dual" and "quad" for integers of 1, 2 and 4
bytes. The directives <em>bytes</em>, <em>duals</em> and <em>quads</em>
assemble integers literally. They take one or more numeric or expression
arguments, as is typical for shell commands. Bash and other recent
unix-like shells provide a rich set of operators, but expression syntax is
tricky. shasm itself is full of examples. For each argument to e.g.
"bytes", one integer of the size specified (a byte in this case) is
appended to the assembly. Arguments with larger values than the type being
appended are truncated, low-significance end surviving the truncate. For
x86 there are also operand qualifiers called <em>byte</em>, <em>dual</em>
and <em>quad</em>. These are not directives.
<p>
Assembler directives are machine-independant. shasm is therefor split
into two scripts; the main one and the one for the CPU in question.
Currently you have one choice of CPU; x86. shasm has no linker, sections,
or debugging functionality. Please let me know if any of that changes.
<p>
The L style labels are for branch resolution. If you want to label some
point in the assembly for other uses do
<pre>
mydatalabel=$here
</pre>
and be careful with name conflicts. The <em>ascii</em> directive assembles
a string.
<p>
A shasm opcode writes to two output files, <em>a.out</em> and
<em>a.list</em>. a.out is the raw binary assembly, and a.list is a
hexadecimal/octal listing. An item in a.list of the form 234 is a byte in
octal, whereas 22 is hex. The 386 modR/M and SIB bytes get built as octal
and might as well be displayed that way. Hopefully by the time you read
this there will be some <em>ELF</em> goodies in the shasm package for
running or linking shasm-generated code, and perhaps a libsys.a as shasm
source.
<p>
A shell is roughly 100 times slower than compiled C at low-level stuff.
I've just tried to avoid making shasm unnecessarily worse than that. More
importantly, I don't see any data capacities in shasm that are likely to
be exceeded by any reasonable file of code. I do think shasm makes machine
language less daunting, and may be useful for playing around with other
types of binary data files.
<p>
<a href=file://localhost/help/see/shasmx86.1.html> x86 shasm</a> has it's
own seedoc for operator syntax and so on.
...................................................................
...................................................................
<html><head><title>seedoc for the x86 specifics of shasm</title></head>
<h1>shasm 80386 specifics </h1>
The initial target of <em>shasm</em> is the Intel family of processors,
primarily the 80386. The assembled instructions for these machines have a
complex format that can involve: various instruction prefixes; one or two
instruction bytes; zero, one, or two argument qualifier bytes; and zero,
one or two register, literal number or memory reference arguments. There
can be two register arguments to one instruction, but one x86 instruction
can only have one memory reference. A single memory reference can consist
of up to two registers and two literal values.
<P>
Every shasm x86 routine name is the assembler of that instruction, the
interactive help prompt for that instruction or directive, and the
assembly lister of that instruction. If you have sourced shasm into your
shell state you can do <em>L help</em> or <em>copy help</em> from the
shell prompt for examples.
<h4>to/from syntax</h4>
shasm has two oper argument separators; <em>to</em>, and <em>from</em>.
This means the source and destination operands can be in either
source/dest or dest/source order for any oper that cares.
<p>
<h4>asmacs</h4>
shasm uses a non-cryptic set of instruction names I call <em>asmacs</em>.
I find it helpful. The Intel names are in the help prompts. As far as I am
concerned, the usual style of opcode naming is a strange habitual
anachronism. Yes, there is something frustrating about typing a dozen or
so characters for one opcode, but as far as I can see it's typing well
spent. The typing of machine code is not it's primary difficulty. The
stuff is plenty cryptic without names like "LMSW". When you see a pig like
<em>loadmachinestatusword</em>, ask yourself how often you plan to use
that instruction.
<p>
I'm interested in the OS-related functionality of the 80386. Early shasms
by me will address that, but won't address later architectures, floating
point coprocessors, and what I consider to be legacy functionality like
the x86 ASCII instructions. The important thing, and the very problematic
thing, is the ornate 386 instruction format. It goes something like
this... <pre>
[] means optional. not to scale
|<----- prefixes ---->|<- operator ->|<------ mode --------->|dis][im]
[rep/lp][oas][oos][seg]oper [ oper2 ][ modeR/M [ SIB ]][dis][im]
[..|...|... [..|...|...]]
[m |r |R/M [s |i i|b b]]
[o |e o| [h |n n|a a]]
[d |gop| [i |d d|s s]]
[e |irc| [f |e e|e e]]
[ |s o| [t |x x| ]]
[ |t d| [ | | ]]
[ |e e| ]
[ |r | ]
[ ]
|<---------------------bytes-------------------------------->| various
</pre>
<em>dis</em> is an optional displacement of a memory reference, and
<em>im</em> is an optional immediate value. Either may be 1, 2 or 4 bytes
in size. The only thing that's not optional is the first operator byte,
"oper". Oper is 0x90 for NOP, "no operation", for example. NOP is a
one-byte instruction. In order to need all the above fields in one
instruction you would have to be doing something very bizarre, like
deliberately trying to use all the above fields in one instruction just
for amusement. Real world instructions vary from one to 6 or so bytes
usually. Average in protected mode is probably 3 or so. The problem is
that a very general-purpose instruction like <em>copy</em> (Intel MOV)
might have cases that together use all the above fields, and shasm has to
figure out which possibility is intended.
<p>
Prefixes are simple. shasm cheats and handles them as separate one-byte
opcodes. In a few cases they leave some switches set to control what
follows, but usually they just happen independantly. Syntactically, they
are separate opcodes, and must be on separate logical shell lines. That's
the cost of the cheat. This is the case for the segment register prefixes
<em>CS</eM>, <em>SS</eM>, <em>DS</eM>, <em>ES</eM>, <em>FS</eM>, and
<em>GS</eM>, the default cellsize switches <em> otheroperandsize</em> and
<em>otheraddresssize</em>, for <em>lock</em>, and for the loop prefixes
<em>repeating</em> and friends. If you want them on the same line with the
instruction they control, use semicolon.
<p>
The shasm keyword that assembles a particular <em>oper</em> is a named
shell routine. Everything after oper is derived from arguments to that
routine. This is where we must confront the syntax issues. We have to name
registers, differentiate between register contents themselves and the
memory locations they might be pointing at, differentiate between
addresses and literal numbers, specify the sizes of numbers, and so on. We
have to do all this in the shell. One welcome bit of Forth-like simplicity
I cling to for this is that shasm parses tokens whole. Shasm tokens are
separated by spaces. It doesn't use character-wise prefixes or suffixes.
The shasm scanner is the shell, the tokenizer is shell argument handling,
and there is no lexer. There is a fair bit of grammar to parse though,
thanks to the rich history of the 386. Shell expressions can be whatever
the shell allows, but as oper arguments they must look like single tokens
to shasm.
<p>
<h4>registers</h4>
The main x86 register names in shasm are <em>A</em>, <em>B</em>,
<em>C</em>, <em>D</em>, <em>SP</em>, <em>BP</em>, <em>SI</em> and
<em>DI</em>. Names of sizes of things in shasm are in terms of bytes.
register name sub-register qualifiers are <em>byte</em>,
<em>dual</em> and <em>quad</em>. For example, AH in the usual usage can be
"hbyte A" in shasm. EAX is "quad A", but can usually be specified as just
A. A sub-register spec usually sets the size for the whole instruction. A
full-size value at the current assembly size, which on x86 may alternately
be 16 or 32 bits, is called a <em>cell</em>. "cell" is in fact the
assembly mode global variable, and $cell will be 2 or 4.
<p>
Special registers have fairly typical names like TR6. AH and friends are
avalable as such also.
<h4>memory addressing</h4>
x86 memory references can consist of a displacement, a base register, an
index register, and a scale to upshift the index register by. The math
that represents that is...
<pre>
displacement + base register value + index register value << scale
</pre>
"displacement" is a signed literal. "scale" can be absent, 1, 2 or 3,
i.e. index multiplied by 2, 4 or 8. That can all be done quickly to
generate one effective address on 386+. That's just the address; that's
not taking into account the segment prefix and the size of the item in
memory to be acted upon. The variations and sub-cases of this format are
the addressing modes of the x86. Complex addressing modes are assembled
into the modR/M and SIB bytes. An instruction can only have one mode byte,
and thus only one memory reference.
<p>
An assembler is supposed to allow you to not think about obscura like
the mode and SIB bytes. What you do have to think about is what your
arguments to a particular machine instruction are. By this I mean the
actual objects the instruction acts on at runtime, not syntactical
arguments to the instruction's shasm name. On x86 it's useful to think of
machine instruction arguments as source and destination. If you have two
arguments you need to separate them, and indicate which is source and
which is dest. That's what to/from does.
<p>
From there the oper can further break down the instruction. Memory
references can be seen as such if they have at least one of <em>@</em>,
<em>+</em> or <em>*2^</em> ("...times two to the...") in shasm token form,
i.e. space-separated. An operand containing two registers is also
recognized as a memory reference operand. + and @ can be anywhere on the
same side of the to/from as the memory reference operand. *2^ is more
syntactically specific, and is the only shasm token that asserts a
positional grammatical relationship within one side of the "to" or "from".
The token immediately preceding *2^ must be the index register that will
be shifted, and the token immediately following the *2^ must be or
evaluate to 0, 1, 2 or 3.
<p>
The following increments a memory cell
pointed at by the contents of DI, not the contents of DI itself.
<pre>
increment @ DI
</pre>
All this memref construction is address arithmatic, which is always at
the prevailing oper address size. The default operand size is assumed if
an oper doesn't specify a sub-register argument. If the byte keyword are
seen the instruction is byte size. Usually if you want a 2 byte action
you'll use the otheroperandsize prefix. "byte" is an oper argument.
<em>bytes</em> is a shasm directive. There is no name conflict here
because of the "s", but even if they were the same string, the shell would
interpret them as appropriate by context, since command arguments may be
any string, and are handled as strings. This means that for example the
segment registers can have the same names as prefix operators and as oper
register name arguments, e.g. "DS". If DS is alone on a shell logical
line, it's a prefix. If it's an argument to an oper, it's the register as
an operand of the instruction as a register, not a segment spec.
<P>
..........................................................................
..........................................................................
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- x86 assembler in Bash,
cLIeNUX user <=