My Best Teaching Is One-on-One

一対一が僕のベスト

Of course, I team teach and do special lessons, etc.

当然、先生方と共同レッスンも、特別レッスンの指導もします。

But my best work in the classroom is after the lesson is over --
going one-on-one,
helping individual students with their assignments.

しかし、僕の一番意味あると思っている仕事は、講義が終わってから、
一対一と
個人的にその課題の勉強を応援することです。

It's kind of like with computer programs, walking the client through hands-on.
The job isn't really done until the customer is using the program.

まあ、コンピュータプログラムにすると、得意先の方に出来上がった製品を体験させるようなことと思います。
役に立たない製品はまだ製品になっていないと同様です。

Wednesday, June 28, 2017

Your Memory Map is No Longer Trivia

You think you don't really care about how applications and such are laid out in your computer's memory, but you should.

I have worked up a little programming exercise to help you examine your computer's memory map, in my programming is fun blog (which I really wish I had more time for). It explains a little about what memory layout means. If you aren't familiar with the problem, or just want to look at a little trivial bit of C language code that can show something about your computer's layout, take a look at that.

[JMR201707011001: 

gallier2 points out something I tend to forget about: pmap will give you much more information than the little (emphasis on little) program I wrote, posted and linked above -- stkreveal.c .
man pmap
It's a useful tool. And I think it's in cygwin, as well. If you don't understand what it's telling you, reading about and playing with that little stkreveal.c program might help.

(I need to spend more time working with the low level OS tools.)

]

You may recall some horror stories about stack smashing and stack crashing in the distant past -- maybe even within the last ten years. You may remember that it describes a technique for someone who wants to access the stuff on your computer without your permission to do so. You may remember feeling relieved when the various vendors said the problems were solved (for some domain of the problems).

Recently, one of the companies that is currently investigating computer security decided to revisit the problem. This time, the easy crashes and smashes are quite well protected, but they found some new ways to get around the protections.

I got the news on the openbsd misc user list today. And I found the report here.

(Together with the Kaby Lake and Skylake problems, I got motivated to write this rant and the programming rant.)

At first, I was wondering why they were misspelling "crash". But they were just having a little fun, and pointing out that the existing protections are not sufficient. (If you wonder why they can joke about something like this, you have to understand that waiting all day for a program to break something can get a little boring.)

If the waiting all day sounds like the problem isn't too bad, don't worry, some of what they found works in less than a minute.

Okay. Worry a little.

Most of the vendors have been implementing mitigation techniques, and they aren't hard. The guard pages don't consume memory, whether 4K or 1M, for one. They only consume mapping table entries (which Intel has been delinquent in giving us enough of).

Those techniques aren't perfect, either, but they help. Your average $Kr!p+ k!DDi35 may not have enough patience to use them, so you probably only have to worry about government security organizations and organized crime. (Organized crime doesn't get the tech until a little after the government, usually, anyway.)

Part of my purpose in this rant is to tell anyone who might be wondering, why I don't have a lot of positive thoughts for either Intel or Microsoft.

This problem has been known for a long time. Fixing it is not hard. I'll explain that in another rant, maybe today, maybe later. But it means the processors you make can't be quite as fast. And it means the OS and applications you make can't have quite as many features.

And that means there can be something besides price and apparent ubiquity to differentiate the competition's product from yours. It gives the competition more room to compete with you on their terms instead of yours.

(It would mean that Intel wouldn't be able to just buy up all the best semiconductor engineers, to keep them off of the competitions' payroll. And it would mean that Microsoft's sales department couldn't run their engineering.) 

(And it would mean you couldn't just smooth talk your customers and invite them out for a game of golf and a visit to the nearest mosh pit to seal your deal. You'd have to compete on meaningful functionality.)

If you've already read my Memory Layout rant, here's what the "Stack Clash" business is, in the overview. (If you haven't and are lost, go read that.) First, an early 32-bit addressing CPU might have memory laid out something like this:



0x000FFFFF
  stack (dynamic variables, stack frames, return pointers)
0x000FFxxx ← SP
gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x000FF000
  application code
0x00080000
  operating system code, variables, etc.
0x00000000


[JMR201707011035:

To make this really clear, I am intending, by heap, to include everything allocated by mmap() and brk() and such, as well.

]

That's way over-simplified, but note that the same problem remains. And faster processors can eat up memory faster, so the extra memory doesn't really help protect things.

A slightly more modern, 32-bit map might look something like this:


0x0FFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0x0FFxxxxx ← SP
gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x08000000
  application code
0x04000000
  operating system code, variables, etc.
0x00000000



This is also still way over-simplified, but the gaps are mostly mapped without physical memory, and so is much of the application and OS space. Accessing those gap spaces allows the OS to add more memory in some cases and terminate renegade processes in others. If the guard page is accessed, the OS can be
pretty sure the application is out of control.

This is much improved, and it is the way many 32-bit OSses were mapped ten years ago. But it can be a little tight, motivating us to use a small guard page, to avoid wasting address space.
The small guard page is an important part of the problems the Stack Clash uncovered. If a program has large enough local variables, particularly, larger than the guard page, it can sometimes be caused to allocate one of those large variables without hitting the guard page.

And there are similar problems that opening up the memory map makes a little easier to deal with. So, we'd prefer something like this:



0xFFFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0xFxxxxxxx ← SP
gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x80000000
  application code
0x40000000
  operating system code, variables, etc.
0x00000000



You can see how this gives lots more room. In particular, with this kind of map, we can usually use 1M guard pages, which are much harder to force a program to miss.

Taking this to 64-bit CPUs, you might think the addressing ranges pretty nearly completely mitigate the problems, but let's see what Intel, the motherboard vendors, and the OS vendors have given us. It looks something like this:


0x00007FFFFFFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0x00007Fxxxxxxxxxx ← SP
gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x0000400000000000
  application code
0x0000200000000000
  operating system code, variables, etc.
0x0000000000000000



That's roomy, but what we want, of course, is more like this:


0x7FFFFFFFFFFFFFFF
  stack (dynamic variables, stack frames, return pointers)
0x7FFFxxxxxxxxxxxx ← SP
gap
  guard page (Access to this page triggers OS responses.)
  gap
  heap (malloc()ed variables, etc.)
  statically allocated variables
0x4000000000000000
  application code
0x2000000000000000
  operating system code, variables, etc.
0x0000000000000000



You want to get each major block in memory as far away from every other as we can. But Intel says that practical considerations give them an excuse to scrimp on decoding and claim higher processor speeds.

(Higher processor speeds than their competitors so they can maintain their stranglehold on certain sectors of the CPU market, and use that stranglehold to keep pushing relentlessly at the rest of the semiconductor market.)

I'm not explaining how Microsoft fits into this, but a little thought should produce the obvious.

[JMR201707011014:

We in the industry have been far too long designing to the black hat skills of yesterday. 1M guard pages are better than 4K guard pages, but they really aren't enough, either. (I will refrain from explaining why here, since I am not inclined to educate the black-hats. People who figure these things out on their own tend to behave more responsibly with the knowledge.)

] 

Hopefully, I can I have now posted an outline of a different sort of solution one of these days, and a discussion of how to go one step further in hardware, to really protect the return addresses.



(OT, but I'm getting a little tired of the way Google's javascript gadgetry keeps mishandling characters used in XML tags when I try to edit things like the above as HTML. If it gets scrambled, that's probably why. And I do need to start using using my off-line tools and quit using their on-line tools, to just avoid the problems altogether. Or maybe Google didn't want me talking about government security organizations, since that's the paragraph that seemed to beak the round-trip editing.)