SSE技术所提供的指令可以对压缩的单精度浮点数进行运算,类似于MMX,SSE也提供了新的指令用于将数据赋值给XMM寄存器,或者从XMM寄存器获取数据,以及对XMM寄存器中的压缩浮点数进行数学运算等。SSE指令有两种后缀:一种是PS后缀,例如下面要提到的MOVAPS指令...
Instruction 指令 |
Description 指令描述 |
---|---|
MOVAPS |
Move four aligned, packed single-precision values to XMM registers or memory 将4个对齐的压缩单精度浮点数加载到XMM寄存器或内存中 |
MOVUPS |
Move four unaligned, packed single-precision values to XMM registers or memory 将4个非对齐的压缩单精度浮点数加载到XMM寄存器或内存中 |
MOVSS |
Move a single-precision value to memory or the low doubleword of a register 将一个单精度浮点数加载到内存中或加载到一个XMM寄存器的低32位 |
MOVLPS |
Move two single-precision values to memory or the low quadword of a register 将2个单精度浮点数加载到内存中或加载到一个XMM寄存器的低64位 |
MOVHPS |
Move two single-precision values to memory or the high quadword of a register 将2个单精度浮点数加载到内存中或加载到一个XMM寄存器的高64位 |
MOVLHPS |
Move two single-precision values from the low quadword to the high quadword 将2个单精度浮点数从XMM寄存器的低64位传值到高64位 |
MOVHLPS |
Move two single-precision values from the high quadword to the low quadword 将2个单精度浮点数从XMM寄存器的高64位传值到低64位 |
Opcode 指令操作码 |
Mnemonic 助记符 |
Description 指令描述 |
---|---|---|
0F 28 /r | MOVAPS xmm1, xmm2/m128 |
Move packed single-precision floating-point values from xmm2/m128 to xmm1. 将压缩单精度浮点数从一个XMM寄存器或128位内存里,传值到一个XMM寄存器中 |
0F 29 /r | MOVAPS xmm2/m128, xmm1 |
Move packed single-precision floating-point values from xmm1 to xmm2/m128. 将压缩单精度浮点数从一个XMM寄存器,传值到一个XMM寄存器或128位的内存中 |
上面的助记符使用的是Intel的汇编语法,源与目标操作数的顺序和AT&T汇编语法的顺序刚好相反。
在MOVAPS指令所在的链接页面,还给出了比上表更加详细的描述信息,从其描述信息里,可以看到,当源或目标操作数为内存位置时,该内存位置必须是16字节对齐的,否则就会产生general-protection exception (#GP) 即通用保护性异常,在之前的"汇编数据处理 (四) 数据处理结束篇"的文章中,有一个ssefloat.s的程式,该程式使用的是MOVUPS指令用于对非对齐的内存数据进行操作,如果我们将ssefloat.s里的MOVUPS指令改为MOVAPS的话,程式运行时就会抛出"segmentation fault"的段错误。
gas汇编器提供了.align的伪指令,可以让内存位置按照指定的字节数进行对齐,如下面的代码片段:
.section .data
.align 16
value1:
.float 12.34, 2345.543, -3493.2, 0.4491
.section .text
.globl _start
_start:
movaps value1, %xmm0
|
Instruction 指令 |
Description 描述 |
---|---|
ADDPS |
Add two packed values. 对两个压缩单精度浮点数进行加法运算 |
SUBPS |
Subtract two packed values. 对两个压缩单精度浮点数进行减法运算 |
MULPS |
Multiply two packed values. 对两个压缩单精度浮点数进行乘法运算 |
DIVPS |
Divide two packed values. 对两个压缩单精度浮点数进行除法运算 |
RCPPS |
Compute the reciprocal of a packed value. 计算压缩单精度浮点数的近似倒数 |
SQRTPS |
Compute the square root of a packed value. 计算压缩单精度浮点数的平方根 |
RSQRTPS |
Compute the reciprocal square root of a packed value. 计算压缩单精度浮点数的平方根的近似倒数 |
MAXPS |
Compute the maximum values in two packed values. 获取两个压缩单精度浮点数中的最大值 |
MINPS |
Compute the minimum values in two packed values. 获取两个压缩单精度浮点数中的最小值 |
ANDPS |
Compute the bitwise logical AND of two packed values. 对两个压缩单精度浮点数进行按位逻辑与的运算 伪表达式: Destination[0..127] = Destination[0..127] & Source[0..127]; |
ANDNPS |
Compute the bitwise logical AND NOT of two packed values. 先对压缩浮点数里的目标操作数进行取反,再与源操作数进行按位与运算 伪表达式: Destination[0..127] = ~Destination[0..127] & Source[0..127]; |
ORPS |
Compute the bitwise logical OR of two packed values. 对两个压缩单精度浮点数进行按位逻辑或的运算 伪表达式: Destination[0..127] = Destination[0..127] | Source[0..127]; |
XORPS |
Compute the bitwise logical exclusive-OR of two packed values. 对两个压缩单精度浮点数进行按位逻辑异或的运算 伪表达式: Destination[0..127] = Destination[0..127] xor Source[0..127]; |
# ssemath.s - An example of using SSE arithmetic instructions
.section .data
.align 16
value1:
.float 12.34, 2345., -93.2, 10.44
value2:
.float 39.234, 21.4, 100.94, 10.56
.section .bss
.lcomm result, 16
.section .text
.globl _start
_start:
nop
movaps value1, %xmm0
movaps value2, %xmm1
addps %xmm1, %xmm0
sqrtps %xmm0, %xmm0
maxps %xmm1, %xmm0
movaps %xmm0, result
movl $1, %eax
movl $0, %ebx
int $0x80
|
$ as -gstabs -o ssemath.o ssemath.s $ ld -o ssemath ssemath.o $ gdb -q ssemath Reading symbols from /root/asm_example/adv/ssemath...done. (gdb) b _start Breakpoint 1 at 0x8048074: file ssemath.s, line 13. (gdb) r Starting program: /root/asm_example/adv/ssemath Breakpoint 1, _start () at ssemath.s:13 13 nop (gdb) s 14 movaps value1, %xmm0 (gdb) s 15 movaps value2, %xmm1 (gdb) s 17 addps %xmm1, %xmm0 (gdb) print $xmm0 $1 = {v4_float = {12.3400002, 2345, -93.1999969, 10.4399996}, ............................................................. (gdb) print $xmm1 $2 = {v4_float = {39.2340012, 21.3999996, 100.940002, 10.5600004}, ............................................................. (gdb) |
17 addps %xmm1, %xmm0 (gdb) s 18 sqrtps %xmm0, %xmm0 (gdb) print $xmm0 $1 = {v4_float = {51.5740013, 2366.3999, 7.74000549, 21} ...................................................... (gdb) |
18 sqrtps %xmm0, %xmm0 (gdb) s 19 maxps %xmm1, %xmm0 (gdb) print $xmm0 $2 = {v4_float = {7.18150425, 48.6456566, 2.78208661, 4.5825758}, ............................................................... (gdb) |
19 maxps %xmm1, %xmm0 (gdb) s 20 movaps %xmm0, result (gdb) print $xmm0 $3 = {v4_float = {39.2340012, 48.6456566, 100.940002, 10.5600004}, ............................................................. (gdb) s 22 movl $1, %eax (gdb) x/4f &result 0x80490c0 |
Instruction 指令 |
Description 描述 |
---|---|
CMPPS |
Compare packed values. 对压缩单精度浮点数进行比较 |
CMPSS |
Compare scalar values. 对标量单精度浮点数进行比较 |
COMISS |
Compare scalar values and set the EFLAGS register. 对标量单精度浮点数进行比较,同时设置EFLAGS标志寄存器 |
UCOMISS |
Compare scalar values (including invalid values) and set the EFLAGS register. 对标量单精度浮点数进行比较(包括NaN之类的无效的值),并且设置EFLAGS寄存器 |
CMPPS imp, source, destination |
Imp Value imp操作数的值 |
Comparison 需要执行的比较操作 |
---|---|
0 |
Equal 进行等于比较 |
1 |
Less than 进行小于比较 |
2 |
Less than or equal 进行小于等于的比较 |
3 |
Unordered 检测操作数中,是否存在无效的浮点数 |
4 |
Not equal 进行不等于的比较 |
5 |
Not less than 进行不小于的比较 |
6 |
Not less than or equal 进行不小于等于的比较 |
7 |
Ordered 检测操作数中包含的值,是否都是有效的浮点数 |
CMPPS $0, %xmm1, %xmm0 |
switch(imp) { case 0: Operator = OperatorEqual; break; case 1: Operator = OperatorLessThan; break; case 2: Operator = OperatorLessOrEqual; break; case 3: Operator = OperatorUnordered; break; case 4: Operator = OperatorNotEqual; break; case 5: Operator = OperatorNotLessThan; break; case 6: Operator = OperatorNotLessOrEqual; break; case 7: Operator = OperatorOrdered; break; } CMP0 = Destination[0..31] Operator Source[0..31]; CMP1 = Destination[32..63] Operator Source[32..63]; CMP2 = Destination[64..95] Operator Source[64..95]; CMP4 = Destination[96..127] Operator Source[96..127]; if(CMP0 == true) Destination[0..31] = 0xFFFFFFFF; else Destination[0..31] = 0; if(CMP1 == true) Destination[32..63] = 0xFFFFFFFF; else Destination[32..63] = 0; if(CMP2 == true) Destination[64..95] = 0xFFFFFFFF; else Destination[64..95] = 0; if(CMP3 == true) Destination[96..127] = 0xFFFFFFFF; else Destination[96..127] = 0; |
Pseudo Instruction 伪指令 |
Description 描述 |
---|---|
CMPEQPS |
Equal 进行等于比较 |
CMPLTPS |
Less than 进行小于比较 |
CMPLEPS |
Less than or equal 进行小于等于的比较 |
CMPUORDPS |
Unordered 检测操作数中,是否存在无效的浮点数 |
CMPNEQPS |
Not equal 进行不等于的比较 |
CMPNLTPS |
Not less than 进行不小于的比较 |
CMPNLEPS |
Not less than or equal 进行不小于等于的比较 |
CMPORDPS |
Ordered 检测操作数中包含的值,是否都是有效的浮点数 |
# ssecomp.s - An example of using SSE comparison instructions
.section .data
.align 16
value1:
.float 12.34, 2345., -93.2, 10.44
value2:
.float 12.34, 21.4, -93.2, 10.45
.section .bss
.lcomm result, 16
.section .text
.globl _start
_start:
nop
movaps value1, %xmm0
movaps value2, %xmm1
cmpeqps %xmm1, %xmm0
movaps %xmm0, result
movl $1, %eax
movl $0, %ebx
int $0x80
|
$ as -gstabs -o ssecomp.o ssecomp.s $ ld -o ssecomp ssecomp.o $ gdb -q ssecomp Reading symbols from /root/asm_example/adv/ssecomp...done. (gdb) b _start Breakpoint 1 at 0x8048074: file ssecomp.s, line 13. (gdb) r Starting program: /root/asm_example/adv/ssecomp Breakpoint 1, _start () at ssecomp.s:13 13 nop (gdb) s 14 movaps value1, %xmm0 (gdb) s 15 movaps value2, %xmm1 (gdb) s 17 cmpeqps %xmm1, %xmm0 (gdb) s 18 movaps %xmm0, result (gdb) s 20 movl $1, %eax (gdb) x/4x &result 0x80490c0 |
Instruction 指令 |
Description 描述 |
---|---|
PAVGB |
Computes the average of packed unsigned byte integers 计算压缩无符号字节整数的平均值 |
PAVGW |
Computes the average of packed unsigned word integers 计算压缩无符号字整数的平均值 |
PEXTRW |
Copies a word from an MMX register or XMM register to a general-purpose register 从MMX寄存器或XMM寄存器中,拷贝字整数到指定的通用寄存器里 |
PINSRW |
Copies a word from a general-purpose register to an MMX register 从通用寄存器中,拷贝字整数到指定的MMX寄存器或XMM寄存器里 |
PMAXUB |
Computes the maximum value of packed unsigned byte integers 计算压缩无符号字节整数的最大值 |
PMAXSW |
Computes the maximum value of packed signed word integers 计算压缩有符号字整数的最大值 |
PMINUB |
Computes the minimum value of packed unsigned byte integers 计算压缩无符号字节整数的最小值 |
PMINSW |
Computes the minimum value of packed signed word integers 计算压缩有符号字整数的最小值 |
PMULHUW |
Multiplies packed unsigned word integers and stores the high result 对压缩无符号字整数进行乘运算,并将结果的高16位存储到目标寄存器 |
PSADBW |
Computes the sum of the absolute differences of unsigned byte integers 计算压缩无符号字节整数的差值的绝对值的总和 |
if(OperandSize == 64) { //PSADBW instructions when using 64-bit operands: Temporary0 = GetAbsoluteValue(Destination[0..7] - Source[0..7]); Temporary1 = GetAbsoluteValue(Destination[8..15] - Source[8..15]); Temporary2 = GetAbsoluteValue(Destination[16..23] - Source[16..23]); Temporary3 = GetAbsoluteValue(Destination[24..31] - Source[24..31]); Temporary4 = GetAbsoluteValue(Destination[32..39] - Source[32..39]); Temporary5 = GetAbsoluteValue(Destination[40..47] - Source[40..47]); Temporary6 = GetAbsoluteValue(Destination[48..55] - Source[48..55]); Temporary7 = GetAbsoluteValue(Destination[56..63] - Source[56..63]); Destination[0..15] = CalculateSum(Temporary0...Temporary7); Destination[16..63] = 0; } else { //PSADBW instructions when using 128-bit operands: Temporary0 = GetAbsoluteValue(Destination[0..7] - Source[0..7]); Temporary1 = GetAbsoluteValue(Destination[8..15] - Source[8..15]); Temporary2 = GetAbsoluteValue(Destination[16..23] - Source[16..23]); Temporary3 = GetAbsoluteValue(Destination[24..31] - Source[24..31]); Temporary4 = GetAbsoluteValue(Destination[32..39] - Source[32..39]); Temporary5 = GetAbsoluteValue(Destination[40..47] - Source[40..47]); Temporary6 = GetAbsoluteValue(Destination[48..55] - Source[48..55]); Temporary7 = GetAbsoluteValue(Destination[56..63] - Source[56..63]); Temporary8 = GetAbsoluteValue(Destination[64..71] - Source[64..71]); Temporary9 = GetAbsoluteValue(Destination[72..79] - Source[72..79]); Temporary10 = GetAbsoluteValue(Destination[80..87] - Source[80..87]); Temporary11 = GetAbsoluteValue(Destination[88..95] - Source[88..95]); Temporary12 = GetAbsoluteValue(Destination[96..103] - Source[96..103]); Temporary13 = GetAbsoluteValue(Destination[104..111] - Source[104..111]); Temporary14 = GetAbsoluteValue(Destination[112..119] - Source[112..119]); Temporary15 = GetAbsoluteValue(Destination[120..127] - Source[120..127]); Destination[0..15] = CalculateSum(Temporary0...Temporary7); Destination[16..63] = 0; Destination[64..79] = CalculateSum(Temporary8...Temporary15); Destination[80..127] = 0; } |