Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
If everything in Unix development is POSIX, what the hell does C library do? (thejaywalker.net)
36 points by jaywalker on Jan 2, 2011 | hide | past | favorite | 22 comments


There's something a bit strange about the approach of this post that suggests a confusion of concepts (though I'm not saying there's anything factually incorrect about it).

The standard C library and the POSIX API are two quite different things.

The standard C library is a library of functions defined by the various C language standards. These functions should be available on any C language implementation regardless of the platform (Unix/POSIX, Windows, embedded RTOS, micro-controller with no OS, etc.).

The POSIX API is a platform specification for Unix like platforms (though the API is sometimes available non-natively on non-Unix like platforms). It is comparable to the win32 API on windows. It has no particular connection to the C language other than that C is the language in which the API is provided.

Perhaps one reason for this confusion is that today many people are familiar with higher level programming environments like Java or Python which provide much richer standard libraries than do C and C++. They include things like multithreading and network communications. C programmers need to use platform specific API's like POSIX or win32 to access these services. Also note that pretty much any higher level language running on a modern OS is ultimately using the platform system API's to provide such services because the OS kernel doesn't allow user space programs access to them other than through a system call mechanism (which the platform API's typically wrap).

One other possible source of confusion that the post doesn't mention is that on Unix the library one links to for both the C standard library functions and the Unix/POSIX system API's is called libc (it's usually called glibc on Linux). Despite the name this is not the same thing as the C standard library.


So, here's a question I have as a Win32 developer, which may tie in with the whole what-is-POSIX topic. What does the API documentation mean when it refers to C calls like "fputs(3)" or "lseek(2)" (as in the example here: http://linux.die.net/man/3/fseek )? How are these different from 'normal' fputs() and lseek()?

I've never seen the One True Explanation of this anywhere, probably because it's something that Linux/BSD/POSIX/whatever people learn on day 1. The question seems to be strangely un-Googleable.


Those are Unix manpage section numbers. fseek(3) means fseek in section 3 of the manpages. This can help to differentiate between a command line program and C function of the same name, for example. See http://www.december.com/unix/ref/mansec.html


Section 2 is for system calls. In other words libc functions which are just thin wrappers around the system call interface provided by the kernel.

Section 3 is for library API's. It typically includes the C standard library functions in addition to pages for other libraries which are installed on the system.

If you look at your examples you will find that fputs and fseek are declared in stdio.h, which is a C standard library header whereas lseek is declared in unistd.h which is a Unix/POSIX header.

lseek is a low level call to the kernel's filesystem interface. You can't access a file more directly unless you write a kernel module or modify the kernel. fseek and fputs wrap the low-level filesystem interface in a stream abstraction.

Also fseek and fputs should be available on any platform with a C compiler (including Windows) whereas lseek is only normally available on Unix/POSIX platforms.


Although others have explained things a bit, I'll try making an analogy to win32: Section 2 ("system calls") contains functions which are traditionally serviced directly by the kernel. In win32, this would mean that they (historically) correspond more-or-less directly to a Nt*() function in ntdll (or a kernel entry point). Section 3 ("library functions") are all other library functions, including ones serviced by third-party libraries.


Not sure where you got the idea that fork() isn't a system call. It certainly is on my computer.

I think the overall sentiment is right on, but I'm not sure I get the last question. posix is a superset, yes, but it doesn't provide replacements for the C library. Perhaps better phrased as "Why limit yourself to only standard C?"?


I caught the same thing, but learned something interesting while double checking I wasn't crazy (from man 2 fork):

Since version 2.3.3, rather than invoking the kernel's fork() system call, the glibc fork() wrapper that is provided as part of the NPTL threading implementation invokes clone(2) with flags that provide the same effect as the traditional system call. The glibc wrapper invokes any fork handlers that have been established using pthread_atfork(3).

At least on my ubuntu box.

  andrew-think ⚑ ~ strace -f sh -c 'sh &' 2>&1 | egrep -A1 'clone|fork'
  clone(Process 2646 attached
  child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff002bac9d0) = 2646


Ah, but just because glibc doesn't call the fork() system call doesn't mean the kernel doesn't have one. :) I bet you could still get to by calling something like __glibc_old_fork(), or at the very least a manual syscall(SYS_FORK) should work.

POSIX itself doesn't even use the words system call, only function, to describe them.


Yep:

  $ grep fork /usr/include/asm/unistd_64.h     
  #define __NR_fork				57
  __SYSCALL(__NR_fork, stub_fork)
  #define __NR_vfork				58
  __SYSCALL(__NR_vfork, stub_vfork)
  
  $ strace -f ruby -e 'syscall 57' 2>&1 | egrep 'fork|clone'
  fork(Process 4138 attached (waiting for parent)

  $ strace -f ruby -e 'fork' 2>&1 | egrep 'fork|clone'
  execve("/home/andrew/.rvm/rubies/ree-1.8.7-2010.02/bin/ruby", ["ruby", "-e", "fork"], [/* 47 vars */]) = 0
  clone(Process 4145 attached


It so happens, that Linux kernel has a system call named fork. But nothing states that this system call has to conform to the POSIX specification of fork(). If it does so, it does so for convenience.

What makes Linux as an operating system conform to POSIX is the API, implemented in C. In this API, the function fork() has to do what the POSIX specification states. As stated, this fork() does not even use the system call fork in the resent versions of glibc.

As POSIX is a superset of the C standard library, it does provide a replacement. The C standard library exists to provide the very basic functionality in it, in environments and operating systems that are not POSIX compliant.

C standard library is part of the C implementation. The POSIX API, implemented in what ever language, is part of the operating system.


Ah, fork(). One of my favorite C calls. Not.

That looks to be some questionable phrasing from the article.

Yes, fork() is a system call on many platforms, but not on all.

And fork() is not ubiquitous. If anything, it's one of the common and can be one of the more intractable sources of porting problems within C code.

fork() does rather more than many C programmers might realize. About 5 or 10% of the calls I've encountered in the many C applications I've ported will use fork() for most or all of what it can do. The remaining calls throw that context away, and can often be equally or better served with vfork()/exec() or some other C call.

Those applications that use fork() for what it can do can be and usually are more difficult to port to the various platforms that lack copy-on-read virtual memory.


vfork is probably worse because you have to be very careful if exec fails. Calling a function to log the error is likely to mess up the parent's memory.


vforks restrictions mean that in most cases where fork is used in unix development besides the trivial fork/exec combination it won't be the proper choice.

Every single major operating system except one has fork. Colour me not very concerned.


thanks for correcting me on fork(); not sure what I was thinking when I wrote that statement :)


Well, I guess I should have waited for this discussion to reach a conclusion :) I read in Advanced Linux Programming (the famous ALP book) that fork() as well as creating threads via pthread library use clone() in the background. Any ways, the discussion here is quite informative!


One example of good restriction to the "C" library is the lua reference implementation - because of that it can compile for lots of platforms (well 32-bit and above) without a problem.

Another one is zlib, and lots of more.


As a result there are features missing that you might find in other implementations, e.g. standard Lua has IO but cannot create directories. But getting halfway there on all platforms is better than all the way on one platform and making you work out how to fix it to run on yours.


Yes, I agree, but back in the Apple ][ DOS era (before Pro-DOS) there were even not directories.

As an example that went in the other direction: The Common Lisp language specification! It even has host, path, version number for describing file names - but I think this now hampers the language's image if one stars coding in it for the first time.


Certainly, but you don't have to go as far back as that for an example: just compare the drive letters in Windows to the lack of them in Unix.

It works very well in Lua's favour to stick to just the C library given the desire to use it in embedded computers. However, it's valid to wonder why a basic feature supported by 99% of computers is not available out of the box; you judge things based on what's available now, not the past. I should think the C library would be quite different had it been designed 20 years later.


I do not wish to be a troll but most of the information in that post is plain wrong. For one fork() is indeed a system call. If you want a good understanding of the true essence of POSIX (UNIX API) I recommend reading: Advanced Programming in the UNIX Environment.


The definition of exactly what is a system call and what is not varies by platform. For example, some OSes may implement sleep() as a system call; others may implement it using nanosleep() or even alarm() and sigwait(). Even fork() has been implemented as a library function, eg in cygwin. What method is used to implement low-level library functions like this should be considered an implementation detail.


Maybe you wish to elaborate on what else is wrong with the article, other than a nitpick on what the author admitted was a spur of the moment?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: