SSE2, yes, is available in 32-bit mode.
it is not used by default by C compilers though.
x86-64 made it mandatory, and 64-bit ABIs used it by default for doing
floating point.
I don't recall x86's mode-switching semantics off the top of my head,
but I do believe that it is possible to run 64-bit instructions in
32-bit mode. The problem is the C ABI, particularly register saves and
restores, don't handle 64-bit stuff properly when compiled with 32-bit
targets for compilers.
not exactly...
64-bit operations can only be used (at all) in "long mode", whereas so
can 32-bit operations. otherwise, one is in "legacy mode" which only has
32-bit instructions.
it is, however, possible to create a 32-bit OS in long mode (mostly the
same as before), which could in-turn run 64-bit code in processes.
however, running in long-mode, one can no longer make use of VMM86
(Virtual 86) mode, segmented addressing, or several other rarely-used
features (TSS-based processes anyone?...), and several instructions are
officially dropped (IIRC, they were dropped from the Opteron, but Intel
partly re-added them in their implementation, and AMD followed Intel's
lead AFAIK).
I am not currently aware of any OS's which have done the above.
AFAIK, the OS would be mostly the same as in 32-bit legacy mode, apart
from needing to use new page-tables and a few other things.
SSE2 dates back to something like the Pentium II, so it's not a 64-bit
mode thing. Although I think SSE2 itself may only be limited to single
precision floats.
no. SSE was added in the Pentium3, and SSE2 and later IIRC in the
Pentium4 and AMD Athlon lines.
Pentium 2 only had MMX, which was considerably worse (64-bit byte and
short vectors aliased to FPU registers type stuff), which was rarely
used as most people would rather have a working FPU than lame byte vectors.
SSE added new registers and vector floating point operations.
SSE2 added scalar floats and doubles, double-vectors, and most MMX
instructions were retrofitted onto SSE (allowing byte and word-vector
operations and similar).
SSE3/... mostly added more modest extensions.
XOP (AMD) and AVX (Intel) basically add a much more drastic set of new
features (as well as allowing basically 3-5 register instruction forms
and 256-bit YMM registers, which can hold 4 doubles or 8 floats).
Poking around the hotspot source code for Java 7 does indicate that sse2
support is in the JIT, although I don't know the entire set of
circumstances that triggers it.
probably CPU support.
there are few good reasons not to use it anymore.
now as for performance (x87 vs SSE), there are tradeoffs either way:
naive scalar SSE can be slower than well-generated x87, but is generally
faster than naive x87.
vector SSE is, at this point, generally somewhat faster than trying to
use x87 (in most cases, a few operations are faster on x87 absent using
SSE3 or SSE4 instructions, such as vector dot-product, ...). vector SSE
was slightly slower than x87 in the Pentium3.
x87 still has a few features which SSE doesn't, such as the
trigonometric functions, ... so x87 can still be used for these (manual
calculation is slower).