My experience is similar and I agree. I see a lot of incorrect
assumptions made by people who "know assembly language" from the days of
the 68K or 386 about the performance of code on modern descendents of
those processors. The core assembly may the same, but the machines
underneath bear next to no resemblance to their forbears.
The biggest incorrect assumption I see about assembly is the idea that
any C code could be re-written in assembly to make it faster.
Occasionally it makes sense to do this, but only occasionally.
I think knowledge of assembly is more important for smaller processors -
if you are using bigger and more complex processors, you can usually
ignore the low-level details. But on smaller devices, understanding the
assembly - and in particular, understanding the processor architecture -
can make a significant difference to the size and performance of your
code. Obvious examples are that if you know your processor has only
single-point hardware floating point, don't use doubles if you can avoid
them. If your chip is 16-bit, don't use 32-bit types if you don't need
them.
Less obvious examples would be knowing the number of pointer registers
available, and taking that into account in code, or knowing the range of
"static pointer + index" and considering that when putting a temporary
array on the stack or statically allocated.
And of course when things don't work, or don't work fast enough, it can
be useful to examine the generated assembly code.
I also think that a background in assembly programming gives a developer
better insight into what is happening under the hood, and the resulting
code is often more efficient. But there is a danger that people get too
carried away, and write code full of "micro-optimisations" which are
detrimental to the clarity, correctness or maintainability of the code,
which are unnecessary with modern tools, and might be pessimisms on
newer processors. There is a balance to be struck.