I have worked up a little programming exercise to help you examine your computer's memory map, in my programming is fun blog (which I really wish I had more time for). It explains a little about what memory layout means. If you aren't familiar with the problem, or just want to look at a little trivial bit of C language code that can show something about your computer's layout, take a look at that.
[JMR201707011001:
gallier2 points out something I tend to forget about: pmap will give you much more information than the little (emphasis on little) program I wrote, posted and linked above -- stkreveal.c .
man pmapIt's a useful tool. And I think it's in cygwin, as well. If you don't understand what it's telling you, reading about and playing with that little stkreveal.c program might help.
(I need to spend more time working with the low level OS tools.)
]
You may recall some horror stories about stack smashing and stack crashing in the distant past -- maybe even within the last ten years. You may remember that it describes a technique for someone who wants to access the stuff on your computer without your permission to do so. You may remember feeling relieved when the various vendors said the problems were solved (for some domain of the problems).
Recently, one of the companies that is currently investigating computer security decided to revisit the problem. This time, the easy crashes and smashes are quite well protected, but they found some new ways to get around the protections.
I got the news on the openbsd misc user list today. And I found the report here.
(Together with the Kaby Lake and Skylake problems, I got motivated to write this rant and the programming rant.)
At first, I was wondering why they were misspelling "crash". But they were just having a little fun, and pointing out that the existing protections are not sufficient. (If you wonder why they can joke about something like this, you have to understand that waiting all day for a program to break something can get a little boring.)
If the waiting all day sounds like the problem isn't too bad, don't worry, some of what they found works in less than a minute.
Okay. Worry a little.
Most of the vendors have been implementing mitigation techniques, and they aren't hard. The guard pages don't consume memory, whether 4K or 1M, for one. They only consume mapping table entries (which Intel has been delinquent in giving us enough of).
Those techniques aren't perfect, either, but they help. Your average $Kr!p+ k!DDi35 may not have enough patience to use them, so you probably only have to worry about government security organizations and organized crime. (Organized crime doesn't get the tech until a little after the government, usually, anyway.)
Part of my purpose in this rant is to tell anyone who might be wondering, why I don't have a lot of positive thoughts for either Intel or Microsoft.
This problem has been known for a long time. Fixing it is not hard. I'll explain that in another rant, maybe today, maybe later. But it means the processors you make can't be quite as fast. And it means the OS and applications you make can't have quite as many features.
And that means there can be something besides price and apparent ubiquity to differentiate the competition's product from yours. It gives the competition more room to compete with you on their terms instead of yours.
(It would mean that Intel wouldn't be able to just buy up all the best semiconductor engineers, to keep them off of the competitions' payroll. And it would mean that Microsoft's sales department couldn't run their engineering.)
(And it would mean you couldn't just smooth talk your customers and invite them out for a game of golf and a visit to the nearest mosh pit to seal your deal. You'd have to compete on meaningful functionality.)
If you've already read my Memory Layout rant, here's what the "Stack Clash" business is, in the overview. (If you haven't and are lost, go read that.) First, an early 32-bit addressing CPU might have memory laid out something like this:
0x000FFFFF
stack (dynamic variables, stack frames, return pointers)
0x000FFxxx ← SP
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x000FF000
application code
0x00080000
operating system code, variables, etc.
0x00000000
[JMR201707011035:
To make this really clear, I am intending, by heap, to include everything allocated by mmap() and brk() and such, as well.
]
That's way over-simplified, but note that the same problem remains. And faster processors can eat up memory faster, so the extra memory doesn't really help protect things.
A slightly more modern, 32-bit map might look something like this:
0x0FFFFFFF
stack (dynamic variables, stack frames, return pointers)
0x0FFxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x08000000
application code
0x04000000
operating system code, variables, etc.
0x00000000
This is also still way over-simplified, but the gaps are mostly mapped without physical memory, and so is much of the application and OS space. Accessing those gap spaces allows the OS to add more memory in some cases and terminate renegade processes in others. If the guard page is accessed, the OS can be
pretty sure the application is out of control.
This is much improved, and it is the way many 32-bit OSses were mapped ten years ago. But it can be a little tight, motivating us to use a small guard page, to avoid wasting address space.
The small guard page is an important part of the problems the Stack Clash uncovered. If a program has large enough local variables, particularly, larger than the guard page, it can sometimes be caused to allocate one of those large variables without hitting the guard page.
And there are similar problems that opening up the memory map makes a little easier to deal with. So, we'd prefer something like this:
0xFFFFFFFF
stack (dynamic variables, stack frames, return pointers)
0xFxxxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x80000000
application code
0x40000000
operating system code, variables, etc.
0x00000000
You can see how this gives lots more room. In particular, with this kind of map, we can usually use 1M guard pages, which are much harder to force a program to miss.
Taking this to 64-bit CPUs, you might think the addressing ranges pretty nearly completely mitigate the problems, but let's see what Intel, the motherboard vendors, and the OS vendors have given us. It looks something like this:
0x00007FFFFFFFFFFF
stack (dynamic variables, stack frames, return pointers)
0x00007Fxxxxxxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x0000400000000000
application code
0x0000200000000000
operating system code, variables, etc.
0x0000000000000000
That's roomy, but what we want, of course, is more like this:
0x7FFFFFFFFFFFFFFF
stack (dynamic variables, stack frames, return pointers)
0x7FFFxxxxxxxxxxxx ← SP
gap
guard page (Access to this page triggers OS responses.)
gap
heap (malloc()ed variables, etc.)
statically allocated variables
0x4000000000000000
application code
0x2000000000000000
operating system code, variables, etc.
0x0000000000000000
You want to get each major block in memory as far away from every other as we can. But Intel says that practical considerations give them an excuse to scrimp on decoding and claim higher processor speeds.
(Higher processor speeds than their competitors so they can maintain their stranglehold on certain sectors of the CPU market, and use that stranglehold to keep pushing relentlessly at the rest of the semiconductor market.)
I'm not explaining how Microsoft fits into this, but a little thought should produce the obvious.
[JMR201707011014:
We in the industry have been far too long designing to the black hat skills of yesterday. 1M guard pages are better than 4K guard pages, but they really aren't enough, either. (I will refrain from explaining why here, since I am not inclined to educate the black-hats. People who figure these things out on their own tend to behave more responsibly with the knowledge.)
]
(OT, but I'm getting a little tired of the way Google's javascript gadgetry keeps mishandling characters used in XML tags when I try to edit things like the above as HTML. If it gets scrambled, that's probably why. And I do need to start using using my off-line tools and quit using their on-line tools, to just avoid the problems altogether. Or maybe Google didn't want me talking about government security organizations, since that's the paragraph that seemed to beak the round-trip editing.)
Not really clear what your problem is.
ReplyDeleteAs for the gap between the stack and the heap, it is not empty. All the memory mapped shared objects are mapped there, this means that even if you ware allocating on the stack variables so big that they englobe the guerd pages, you would still hit write protected pages.
You can check with the pmap command (Unix and Linux) what the memory space of the process looks like.
Between the heap and the stack ...
ReplyDelete... depends on what you are calling the heap.
And that is part of the reason we have trouble with things like this, because what one person calls heap is another person's something else.
In the above, I'm considering everything not on the stack as on the heap, to simplify the explanation. (I don't know about you, but I run into trouble when I try to explain everything at once.)
I mentioned shared objects in the later post, BTW:
http://defining-computers.blogspot.com/2017/06/keeping-return-address-stack-separate.html
That post will explain more about why I am ranting on this subject.
And thanks for mentioning pmap. It would be useful to include in these rants.