How I love it when people without deep knowledge of some subject write authoritative sounding articles.
Without guarantee of completeness, to avoid the spread of misinformation:
- IPv6 fragmentation has nothing to do with some "minimum payload size" (whatever that is) - there simply is no fragmentation being done by routers, the sender still can fragment however it pleases, and presumably will do so whenever it has to send a packet that doesn't fit through the path MTU.
- The end points use Packet Too big ICMP6 messages to determine _path_ MTU, which is different from just "the MTU".
- With IPv4, the sender chooses whether a router will fragment when the packet exceeds the next-hop MTU or whether the router should drop the packet and send a Fragmentation Needed ICMP message - where the latter again is used for path MTU discovery.
- Path MTU discovery is useful because it allows the sending IP implementation to push the chunking higher up the stack when the sending higher-level protocol has the capability (as is the case with TCP, but not with UDP, for example), which tends to produce lower overhead. Unfortunately, some clueless firewall administrators, such as those responsible for AWS EC2, do filter all ICMP because they for unknown reasons consider it to be bad, thus breaking PMTUD, which can lead to hanging TCP connections.
- TCP sequence numbers are for bytes, always, with the special case of SYN and FIN also counting as "bytes" in the sequence, but never for segments.
This is an important part of the way I learn. I will read something and then explain it to someone else. It makes me think deeper about the issue as I form the words and it gives me a great chance to get corrected when I am making unfair assumptions. I always preface this conversations with "as I understand it" or "from what I read" or some other disclaimer. I used to have a coworker who would give me soooo much guff about these disclaimers since I'd drop so many of them in one of these conversations. I just felt it was important to make it clear I wasn't coming from a place of authority and more from the perspective of a guy who is bumbling through it and trying to figure out what the hell is going on.
It's the way I learn, too.. But it doesn't make me write things as a guy who's knowledgeable.
I'm a total noob, yet the first few paragraphs made me cringe because I felt there were some odd things. I had a weird feeling about it. It wouldn't have bothered me if there wasn't this "A periodical about best practices and advanced techniques in Objective-C"..
Or using the word "great contributors", etc. I mean, one has to be humble because unless one really knows his stuff, he shouldn't talk that way.
If the writing style was more in the "I'm learning and journaling my progress", it would've been more than okay, and knowledgeable people wouldn't have a problem with it.
I was in forums and learning to design my PCB's, I'd post my design and ask for feedback, and people who'd spent 30+ years would comment on them and point flaws on what I thought was nice and would find a thousand flaws in it. And I got back to wrok, iteration after iteration.. Until these really great guys who do that for a living would say "Beautiful work".
Had I posted something like "advanced PCB design" in the "this is how it's done" way, they'd have ignored me and I would've stayed more ignorant than I still am.
There was a question on the Python mailing list asking how long it takes to say that one knows how to program. People with 40+ years actively programming said: I'll let you know when I'm there.
Humility goes a long way. Heck even when I read things on the nmap mailing list, I don't feel that tone that they consider they know more than you do even though they really, really know their stuff.
Could you post an example of the "I'm learning and journaling my progress" writing style? I'd like to start doing this and I don't want to come off as an expert on things I'm just learning.
One thing is to not publish it -- a learning journal is probably much more important for you to write than for anyone to read. Then give yourself a couple of years or decades of learning time, and if you still want to write about it, what you wrote as a beginner will give you valuable insights into the beginner's mind, things you have probably forgotten.
And of course you can publish it (might be good for feedback), just state that it's a learning journal, not "best practices".
Great idea. I have a notebook where I write down ideas for companies, things I think about. I think it is a really, really good practice to write it down..
The reason I'm saying that is that human beings have selective memory. They tend to remember things they did the right way, they remember their good ideas, times they were right, etc.
I used to note my ideas that would seem genius.. And then I'd look at them a couple months later and it's humbling. How stupid could I be.
But there is a good thing about this: It taught me a valuable lesson.. It taught me to focus on real needs, and not some fancy thoughts I have at 3AM. Like real needs.
And I know that at an early stage, one needs to let go of critical things and be open and not dismiss ideas, etc.. But it's just that some ideas are plain stupid and I had plenty of those.
I write them down, then cross things. Not a real need, not a problem. Now I'm thinking about an idea that I'd use if it were available. And I'm not the only one.
I was on the forum chat, and I'd send pictures and they'd help me see, they'd open my eyes and explain things. Why 90° tracks are a no-no, etc.
People are tremendously generous with their time as long as your attitude is okay. I learned orders of magnitude more on the internet than in college, and still do every minute I'm online.
It's nice to put disclaimers in there, but if it's the first time a person has heard the information, the disclaimer is basically ignored. Because what are you going to do when you have to troubleshoot a tcp connection or write an application? Go back and find a book on tcp and learn the whole thing from the beginning? Unlikely, as you already have what you consider to be knowledge about tcp. Even if you don't consider it to be authoritative, you probably have just enough to get in trouble.
That's why I find the whole "blind leading the blind" way of teaching to be counter-productive. Not that it's really serious or anything; nobody's going to lose a leg if you screw up your tcp connection. But when extended to other more serious topics, it can be dangerous to teach things to people if you're not sure about the subject matter. For example, something as simple as jumping a car battery actually isn't simple at all (when done correctly).
I think there are arguments for both sides - if you do know something very well, you might overlook pitfalls when explaining it to someone who doesn't have your background, for example, while someone who has only just got the grasp of it might be well aware of what might confuse a newbie, so I think there is some value in unexperienced people writing about what they learn and how they learn it, with appropriate disclaimers, and assuming that they still try to not state as fact anything they aren't actually sure about.
And I also think, to a degree, it actually is the responsibility of the reader to judge what to use that supposed new knowledge for. Trying to make sense of tcpdump output when debugging some application software bug? Why not? Writing an IP stack? I hope any sensible person would pick up a book and some RFCs first. Unreliable knowledge can still be useful and harmless in figuring things out, you just shouldn't use it to try and build things.
Then again, practically, we can observe that people do build systems without ever having looked into the relevant standards, and I would actually argue the effects are worse than one superficially might think. I mean, if you look at how ridiculously insecure the web/web browsers are, for example - how did that come to be? I would think one major factor is exactly that people didn't (and often still don't) read the relevant standards, something as apparently uncritical as the HTML spec or the HTTP spec, but instead just wrote what they thought was HTML, and wrote books about what they thought was HTML, and so on - resulting in a need for browser vendors to accomodate all this crap out there that isn't HTML or HTTP but that people still expect to be rendered by their browser in some way or another, and so, due to end-user market share pressure, we now have security vulnerabilities in browsers that are there only because fixing them would break stuff that noone really ever had any right to expect to work, but they thought they were just creating some totally uncritical website using the "knowledge" they learned from some other clueless person, and those security problems can have quite serious consequences.
I think it's more likely that mistakes will happen if someone believes they know what they are doing. But how do they know if they know what they are doing or not? That's where "sensible person" becomes subjective to me.
I wrote an IP stack, of sorts, and used Wikipedia to do it. I'm aware that it's probably crappy, but only because it was basically designed to be. If I had tried to design it well, I might lead myself to believe I had done it correctly, for example because I found no problems with it in my testing. But as you're aware, there's plenty of problems with tcp/ip stacks that only come up as edge cases. So even if I was being sensible I might end up with shitty code and push it into a product, and then we're screwed. But if I had learned the stack correctly I couldn't be in that mess.
A kind of solution lies in forums like HN, though. Sure, the posts are fallible and are often upvoted merely because they are perceived as authoritative. But we have the comments section, and knowledgeable persons who can speak up and educate. So it may not matter at all who's teaching, as long as somebody picks up the slack.
I guess my point is: The reasonable (and responsible) thing to do when you actually build something (rather than just learn about something out of curiosity or to be able to use the understanding in troubleshooting) is to read the primary sources, the standard documents, and in particular to be aware that whatever you learned from hearsay is not reliable enough to actually build a product on if there is an option to get your hands on the primary source. Especially with internet technology, we are in the great position that W3C recommendations and RFCs are freely available for everyone, so there isn't really much of a reason not to read them.
That might not be quite enough for a really good implementation, but overall software quality would be a hell of a lot better if everyone did that, it's just amazing when you look just at websites and also emails, how many people just make up how they think things work rather that reading the standards that are only a google search away.
When it comes to complicated subjects like internals of TCP/IP, they may make more harm than good. Volume 1 of "TCP/IP Illustrated" is, I think, 700 pages long, not without reason. When one writes condensed articles like this it makes sense to stay high-level, because the moment you get into discussing SYN/ACK handshake, you are in danger of leaving large gaps in your explanation or making unreasonable stretches to complete the picture.
On a side note, it's funny to see that most of the paragraphs in the original article end with a link to Wikipedia — as a reader, I can go there myself, what good those articles do to me?
"We’ve created a new public repository on GitHub that contains all current and past objc.io articles. If you find any mistakes or have suggestions for improvements, please don’t hesitate to file issues, or even better: submit a pull request!"
Make a pull request so that people like me can learn about networking too.
Anybody writing routing code would be foolish to use this, or any other "simplified" article as a protocol reference. But, I'll agree that it is presented in such a way (and with enough technical detail) that any technical errors should be corrected.
If I had deep knowledge in this area, I'd probably applaud the effort and send corrections, rather than criticize.
I'll give them the benefit of the doubt and say he got his terms wrong. The IPv6 RFC states that IPv6 requires a minimum MTU of 1280 bytes. I guess that's what he meant.
IPv6 requires that every link in the internet have an MTU of 1280 octets or greater. On any link that cannot convey a 1280-octet packet in one piece, link-specific fragmentation and reassembly must be provided at a layer below IPv6.
Which wouldn't really make me any more confident in the reliability of the whole thing?! Confusing lower-level fragmentation and reassembly with IPv6 fragmentation is not exactly a mistake you'd be likely to make when you understand what that actually means, I would think.
Let's not guess and help correct the article, shall we? Documentation, manpage, textbook and programming books can contain errors.
On the flip side, I like to see more people helping critiquing these articles so newbies like me can get the most out of it (though I already took computer network...).
It would be one you would make as you were learning it, which is honestly what half the blogs that cross here are. It just happens that the neat thing this person learned today was about some networking protocols they use every day.
Just out of curiosity, what do you do? Is this knowledge germane to where you work? I've just recently become interested in this stuff, so I'm curious to get a lay of the land.
I do ... software development? ;-) There isn't really any particular category for what I do, though I tend to do more of the lower-level/backend stuff of projects, and knowing how the stuff that you build on works internally certainly is useful in optimizing and debugging.
As for getting an understanding of how TCP/IP works, I think Stevens' classic TCP/IP Illustrated still is a good book to get started, even if somewhat dated in some details (no IPv6, in particular), but the general principles still apply. Though maybe there are newer equally good books around that I just don't know about?
Well, yeah, it's a book by the same name, but is it the same quality (well, it's not just the same name, of course, but a new author obviously can make a big change in quality, in either direction ;-)?
In any case, my recommendation was referring to the old edition by Stevens alone, no clue about the new one, though at least the newly covered material seems appropriate to me.
(For anyone who might not be aware: W. Richard Stevens died in 1999, so the new edition is by a different author, though apparently based on the old material.)
I have not worked through the entirety of the original, which I also own, but the new one seems pretty good to me so far. But you may have a different opinion as somebody who knows much more than me about TCP/IP.
I'm also going to cut him some slack with the bit about giving each segment a unique number. While formally the sequence number identifies each byte of data, it really is about providing heuristics to identify and correct for out of order, fragmented, missing & duplicate segments. It is important that it be about bytes, particularly for things like SACK, but if you are trying to simplify things you might describe it as being about the segments.
Sure, nothing wrong with simplifying things, but "Both ends are sending sackOK. This will enable Selective Acknowledgement. It switches the sequence numbers and acknowledgment number to use byte range instead of TCP segment numbers." is just flat-out wrong, and in particular suggests that "numbered segments" is not a simplification but an actual fact about how the thing works.
And regarding dropping the ICMP message about fragmentation... good firewall implementations have the firewall discover the MTU behind it and express THAT.. even better they might hide the hops behind it.
There is no such thing as an "MTU behind it", there is a separate path MTU for each and every ordered address pair, more or less (and that's not even static, obviously).
And obviously I was talking about packet filters, not about some kind of application firewall, which obviously doesn't have anything to do with filtering of packets anyhow.
What an unpleasant attitude. The author clearly made a lot of effort and as far as I know it's all accurate. If you know better then say so pleasantly.
Without guarantee of completeness, to avoid the spread of misinformation:
- IPv6 fragmentation has nothing to do with some "minimum payload size" (whatever that is) - there simply is no fragmentation being done by routers, the sender still can fragment however it pleases, and presumably will do so whenever it has to send a packet that doesn't fit through the path MTU.
- The end points use Packet Too big ICMP6 messages to determine _path_ MTU, which is different from just "the MTU".
- With IPv4, the sender chooses whether a router will fragment when the packet exceeds the next-hop MTU or whether the router should drop the packet and send a Fragmentation Needed ICMP message - where the latter again is used for path MTU discovery.
- Path MTU discovery is useful because it allows the sending IP implementation to push the chunking higher up the stack when the sending higher-level protocol has the capability (as is the case with TCP, but not with UDP, for example), which tends to produce lower overhead. Unfortunately, some clueless firewall administrators, such as those responsible for AWS EC2, do filter all ICMP because they for unknown reasons consider it to be bad, thus breaking PMTUD, which can lead to hanging TCP connections.
- TCP sequence numbers are for bytes, always, with the special case of SYN and FIN also counting as "bytes" in the sequence, but never for segments.