Anything that provides access to GPU command queues is welcome. It's been clear for a while that OpenGL and D3D are ill-suited to modern ways about thinking about GPUs. This also explicitly supports multithreaded clients, each with their own command queues.
My concern is drivers: drivers for video cards on OS X haven't been as good as Windows drivers. That, and of course the specter of another platform-specific API. This invites a comparison with Mantle. I don't think either Metal or Mantle will "win", but they're good prototypes for the next generation of GPU APIs.
I'm daring to post from a position of ignorance, trusting intuition to be correct. But this is a framework for shipping (graphical and other) workloads over to the GPU. Surely it isn't mobile-only?
Just for instance, Apple sell an extremely expensive workstation with two very powerful graphics cards. Providing better ways to use a Mac Pro's GPUs is a good way to sell more Mac Pros, and a way to unlock more computational power as the power of CPU cores has levelled off.
I agree with you. I'll bet this is coming to OS X sooner rather than later, and people are largely missing its significance as a GPGPU programming environment.
Despite Apple's best efforts, OpenCL uptake seems to be sluggish. CUDA continues to dominate developer mindshare, by providing a far better language, API, and toolchain.
Compare the C++11 subset supported by the Metal shading language to the device language of CUDA C++. Templates are a huge feature. Ahead-of-time compilation is huge (vs shipping strings to the driver like OpenCL). It retains the basic workgroup structure of OpenCL with local and global memory, so it looks feasible to map to NVIDIA and AMD hardware. Is there really anything PowerVR specific in here? People seem to be inferring an awful lot from the name, but nothing sticks out at first glance.
The features of the shading language would make porting applications from CUDA less painful. If they went all in on XCode dev tools to make it a rival to NSight for profiling/debugging, maybe those Mac Pro GPUs wouldn't seem so neglected.
> Despite Apple's best efforts, OpenCL uptake seems to be sluggish. CUDA continues to dominate developer mindshare, by providing a far better language, API, and toolchain.
i'm afraid i have to disagree with you there. over the past 5 years, CUDA popularity has peaked and is actually starting to decline. i would cite my source for that, but i'm on my phone.
aot vs online compilation is another kettle of fish. aot isn't necessarily better, although it is a more attractive offer for developers if they don't wish to ship kernel source. regardless, OpenCL 2.0 has SPIR, an llvm-it dialect that addresses this issue.
templates are not a big deal. GPGPU silicon is not well suited for complex computation (for some values of "complex") at least. i really wish C++ language features wouldn't get into the language. it's not going to be good.
if metal can replace OpenGL, then maybe i can get behind this. the failure of longs peak really set into motion it's relative demise. the API is in serious need of work.
For me, AOT compilation is not about shipping binaries vs. source. The software I write is mostly used for internal research, so it's not going anywhere. It's difference between getting compilation errors from 'make' or having to pull them out of the OpenCL C API at runtime. Setting a breakpoint in an OpenCL kernel... I remember Intel's stuff basically working fine, but you had to pass a vendor-specific option to hint at the file path. In all these little ways, the workflow is just sucky. Much of it can be worked around, but I'm too lazy to write more application code to do all the chores that the CUDA toolchain takes care of already. I'm glad to hear that 2.0 is fixing this.
What's your alternative to templates for generic code, exactly? C macros? Scripted pre-processing that further screws up the already marginal tool support for debugging and profiling? Copy+paste? Templates are completely orthogonal to "complex computation" -- I just want to use a device function on different data types without run-time overhead.
On the topic of complex computation, I'm constantly surprised at the kind of features NV adds to CUDA and how well they actually work. I'm also surprised at the kinds of things people do on the hardware. If someone implements a high performance lock-free data structure on the GPU, you can't look at that and say oh, that's too complex, you shouldn't do that.
Also, until we're all working on computers that look like the PS4 with a unified global memory, there's a huge incentive to cram the awkward bits of your program onto the GPU any way you can, even if it drags a bit, because that's where the data is.
NVIDIA has really gone to some extremes. malloc in device functions. vtable support. Dynamic parallelism. Metal has none of this stuff.
<EDIT: Just now saw your other replies downthread. I'm leaving this comment because it reflects my personal experience and opinions, but don't feel like you need to repeat yourself to clarify your position re: templates, etc>
What we know: the stated goal of Metal is exactly the same as the goal of Mantle; reducing CPU overhead.
Hypothesis 1: Metal aims to replicate that for iOS, while Mantle can be used in the Mac Pro (which uses ATI cards)
Hypothesis 2: Metal could wrap around Mantle on OSX and some other similar interface on iOS where Mantle is not available, for a unified Apple interface without having to write their own ATI drivers
I suspect Apple wants to hedge their bets between ATI and Nvidia and thus will not support Mantle (or CUDA, which exists on OS X but appears to receive no love from Apple).
Sorry, I was unintentionally vague, I meant without having to write their own version of Mantle. The other upside being if Metal wraps around Mantle, they can still expose Mantle to software that targets Mantle.
The name "Metal" gives a hint, though -- the API provides better performance by getting you closer to the metal. The metal in the A7 is different than the metal in those Mac Pros.
The GPUs in iOS devices are not made by Apple or exclusive to their platform. They are PowerVR GPUs designed by Imagination Technologies (and fabricated by Samsung). Many Android devices feature the same GPU designs that Apple uses.
But you can't replace the GPU and Apple has full source code to the driver stack (and probably wrote most of it themselves instead of using the source from ImgTec).
The only place where this is likely true is for their Intel graphics drivers - Intel is probably the only graphics company on the planet that treats their graphics hardware as something less than the most cherished of all trade secrets.
No, you can bet damned good money they get a "manual" and a binary blob from their GPU vendor and they have to do the same reverse engineering as the Mesa folk do, only they have the added advantage of not needing to recontribute any of their changes publicly.
I doubt OS X contains much if any code that Apple doesn't have the source to. This isn't so much speculation as common sense.
Apple as a company is the very embodiment of control. Ever since Jobs took the reigns back they have held the entire production pipeline of their products in an iron grip. I wouldn't imagine software, especially driver code that has such a massive impact on user experience to be any different.
GPU vendors allow game engine developers of a reasonable size access to the source of their drivers, why do you think that a company much more powerful (and actually a customer) would be denied the same access?
The implication they probably (re)wrote most if it themselves wouldn't surprise me much either. They have already shown that low level engineering is neither beneath them or beyond their capability. They produced their own ARM chips well ahead of the other vendors (even beating out veterans in that space like Qualcomm) and have continued to show they want to control everything except maybe the fab.
So lets be honest, the only thing that is likely to be true is they have access to everything and probably influence development of PowerVR hardware significantly.
> The only place where this is likely true is for their Intel graphics drivers
false. i cannot cite a source for this, but i know they have direct access to current nvidia and AMD driver source trees, which they themselves extend/modify.
If you have access to OpenGL/CL Metal isn't going to give you much (other than perhaps a prettier API). Since most PC/Mac games are written in OpenGL/CL it will only make things slower.
The Metal API only really gives you extra perf if you're currently using SceneKit.
Well that doc is in the iOS Developer Library, so I'd have to assume it's iOS only. I read through a chunk of it, no mention of OSX (or, for that matter, iOS). Quite possible it's rolling out on iOS and will be expanded in reach to cover OSX in the future.
It's somewhat scary to see new graphics APIs being introduced.
The fragmentation of OpenGL is enough of a headache, but at least it offers some semblance of "write once, run anywhere." The introduction of Mantle and Metal, plus the longstanding existence of Direct3d, makes me worry that OpenGL will get no love. And then we'll have to write our graphics code, three, four, or goodness knows how many times.
I know: It's not realistic to expect "write once, run anywhere" for any app that pushes the capabilities of the GPU. But what about devs like me (of whom there are many) that aren't trying to achieve AAA graphics, but just want the very basics? For us, "write once, run anywhere" is very attractive and should be possible. I can do everything I want with GL 2.1, I don't need to push a massive number of polys, I don't need a huge textures, and I don't need advanced lighting.
That worked out very well on Windows with DirectX. The engines clearly provided an equal experience for both OpenGL and DirectX, not favoring the platform specific API at all.
True, but as with advanced rendering techniques, not everyone wants to use off-the-shelf engines. In a simple application, there's something to be said for writing your own, simple graphics code.
> OpenGL 5 will happen in the next two years (and GLES 4) and it will take the wind out of all these alt-language sails.
2 years? in two years time nvidia will have their next line of GPUs out (pascal), intel will have launched knights landing (the many core xeon), and the next version of OS X will be released. 2 years is a long time, and it will already be behind the curve from it's announcement. khronos have a history of stagnation and disappointment.
> I can't see Khronos making the same fuck up twice, as they did with Longs Peak.
I'm all for deprecating old OpenGL features. Old apps won't fail to run for a while yet. When they do, the community will no doubt create a compatibility layer like they did with DosBox.
In my opinion, this is great news for OpenGL, it brings competition.
The problem with OpenGL is legacy code and committee hell.
If something works, like precompiled shaders, OpenGL committee will include it in the future spec.
I use OpenGL for very simple things, OpenGL needs to loose weight and get slim. I am certainly not satisfied with OGL 2.1. It is a design that does not make sense anymore with actual hardware(it did like 10 years ago).
Actually, the name Metal reminded me of S3 MeTaL, one of the APIs during the time when each and every 3D accelerator manufacturer had their own API ;).
And now we are at a state where almost all OS vendors and some GPU manufacturers have their own API. I'm only waiting for Nvidia to announce their own API. They did buy 3dfx back in the days, so they have (bought) some experience with own APIs.
Glide is not remotely relevant to anything anymore, and hasn't been for years. Besides, NV have much more recent experience with their own APIs, in form of CUDA and Cg.
I like the general trend indicated by this and Mantle and DX12, but the return to full-on platform fragmentation is a bit depressing.
You will need more modern stuff than OpenGL2.1 because OpenGL2.1 uses a really deprecated model. You'd really want use the new rendering system based on VBOs and Shaders instead of glBegin().
I'm pretty sure they didn't. My concern though is that OpenGL will fade away.
OpenGL isn't exactly a product. There's no company that "writes" the OpenGL software. Rather, it's a specification published by a consortium. "Writing" the OpenGL libraries is a task that each GPU maker does independently. So you have a bunch of different implementations of the same API.
For a long time, GPU makers have focused their attention on DirectX and done a lackluster job with their OpenGL implementations. If APIs like DirectX and Metal continue to proliferate, there will be less and less time and less incentive to maintain a good OpenGL implementation.
You can thank AMD for this one. This is exactly why I supported Mantle initially - not necessarily because I thought Mantle will replace DirectX and OpenGL, but because it would push all the others to adopt similar changes to their APIs.
And this is exactly what happened, first OpenGL (through AMD's extension for now at least), then Microsoft with DirectX 12 [1], and now Apple, too.
Before you get too excited, though, remember Mantle "only" improved the overall performance of Battlefield 4 by about 50%. It can probably get better than that, but don't expect "10x" improvement or anything close to it.
From what I understand, it's more accurate to say that it reduces CPU use by about 50% (rather than improving performance). If CPU is not your bottleneck, you won't see a performance improvement. If it is (which is common these days), then you'll have twice as many free CPU cycles on your main thread, which will translate into however much improvement your GPU can muster.
The 10x improvement that Apple claims is presumably rooted in Mantle's design. The Mantle docs also claim an order of magnitude improvement in draw calls.
Sadly BF4 is kind of an optimal case for an API like Mantle. Their rendering workload is heavily CPU-bound, so Mantle produces big gains. A lot of other game engines will see much smaller improvements because all the weight is on the GPU, or on systems Mantle doesn't touch, like physics.
Any renderer which isn't CPU (read: driver) bound doesn't take performance seriously or hasn't been optimized yet.
(Not that that refutes what you're saying, I'm just saying that most serious engines are in the same category as BF4 when it comes to gains from better driver APIs)
This to me is more surprising than Swift. But it will make for difficult platform decisions. But since there are 4 game platforms already working on it (Unreal hasn't committed) maybe it's not a bad idea at all.
Sorry everybody, I was thinking of the slide of engines / companies[1]. I am aware Epic makes the Unreal Engine, but wasn't aware that the Zen Garden on Metal demo was intact UE4.
You can still use OpenGL directly. I believe Metal is aimed more at engines, so they aren't providing an abstraction of an abstraction of a system call.
This seems to be Apple's answer to Google's RenderScript. It is too bad big companies (Google, Apple) are developing their own GPU software stack instead of building upon and furthering existing frameworks such as OpenCL. OpenCL desperately needs a kick in order to catch up with CUDA. Instead they are focusing on things like SyCL, hoping to catch up with already superior projects such as C++AMP. OpenCl should rather fix their poor specification and get implementers on the same page about it. The mobile community could have been a driving force. Instead, frustrated with what OpenCL is, mobile decided to roll their own. As always.
> But yes, OpenCL with its basic C API is way behind what CUDA offers in terms of language support.
strange, language support is really the only thing that CUDA doesn't have over OpenCL. there are C++ (and python, Java, various others) bindings for host code, at least. if you're looking to use device intrinsics in your kernel code (at the cost of portability) then blame nvidia for not exposing it (and for their lack of support for OpenCL in general).
> Maybe SPIR will fix it, but it remains to be seen if anyone on HPC will care.
I guess "language support" was referring to the restrictions within the kernel language. And CUDA is better there - they support templates, you can typecheck a CUDA kernel call. With OpenCL you have to jump through hoops to get that. While some projects have managed to do it, it could be better still.
What bugs me about OpenCL is the intentional vagueness of the specification that gives every implementer the freedom to do whatever they want with the result that performance portability is often difficult to achieve.
templates are not something that (outside of simple uses) you'd want to use in your kernels. regardless, nvidia should be pushing their improvements through to OpenCL by exposing extensions. they might get adopted into the core profile.
> What bugs me about OpenCL is the intentional vagueness of the specification that gives every implementer the freedom to do whatever they want with the result that performance portability is often difficult to achieve.
well, that flexibility is required for OpenCL to be meaningful. that's where the variation in the hardware platforms exists. it's what differentiates compute devices. if that vagueness wasn't there, then we couldn't have things like OpenCL on FPGAs (altera, xilinx)
as for your statement on performance portability, perhaps that is an issue (but that's entirely dependent on the type of problem you're trying to compute). but something i don't understand is this;
you could have picked a proprietary API to do your compute. but say you choose CL. you optimize for your hardware, then what do you know - it's not really that fast on other hardware. but you're entirely overlooking the biggest boon here - your code ran on the other hardware in the first place. getting performant code is now only a matter of optimizing for that piece of hardware.
you could argue that's entirely too complicated, but that's what we have been doing already with our regular C/C++ programs (SSE/AVX/SMP...)
Templates are an essential tool to write type-independent algorithms. They enable meta-programming, an invaluable tool to provide flexible yet efficient active libraries to users. They allow automated kernel-space exploration. So templates are exactly what you want.
I understand the need for a standard that supports various different architectures, even architectures that might not exist yet. I guess I just dislike the way the did it. Compared to other standards (that also leave various things to the implementer), I think they did a poor job. They should have defined the semantics and the types better. The entire buffer mapping for example is a huge mess. Nvidia went ahead and fitted pinned memory in there somewhere. Others didn't, with the result that the meaning of the code changes completely depending on which library you link against.
I'm not arguing against OpenCL here, I'm saying they could do even better. It should not be too much effort too. And if companies like Apple and Google would have chimed in, we would have pretty awesome OpenCL standard and implementations today.
As for your argument about hand-optimization: C++ library implementers [0,1] (and compiler vendors probably too) found abstractions, tricks and tools that give performance portability today. They are of course domain-specific but it is possible.
> Templates are an essential tool to write type-independent algorithms. They enable meta-programming, an invaluable tool to provide flexible yet efficient active libraries to users. They allow automated kernel-space exploration. So templates are exactly what you want.
but OpenCL C only has primitive types. templates become more useful when you have classes, but bringing classes to the GPU is.. well, less than optimal.
> Compared to other standards (that also leave various things to the implementer), I think they did a poor job
i don't know what your complaints are exactly, but i don't share your opinions - i think OpenCL is almost as flexible as it needs to be.
> The entire buffer mapping for example is a huge mess
i disagree. clCreateBuffer creates a buffer, clEnqueue(Read|Write)Buffer reads or writes to it. you can do more advanced transfers with the *rect variants, but you kind of probably know what you're doing at that point.
you want pinned memory? call clCreateBuffer with CL_MEM_ALLOC_HOST_POINTER. and instead of Enqueue(Read|Write) use Enqueue(Map|Unmap). wether or not you get pinned memory is up to the runtime (and nvidia's runtime does not guarantee it - it's an impossible one to make).
> Others didn't, with the result that the meaning of the code changes completely depending on which library you link against
as mentioned, use map/unmap. it works on all the runtimes, and at least isn't any slower than read/write. as for what library you link to, that's also a moot point - we have ICDs now, you link to a shim layer that dynamically links the appropriate run time during context creation (you can have several OpenCL platforms on one machine).
> As for your argument about hand-optimization: C++ library implementers [0,1] (and compiler vendors probably too) found abstractions, tricks and tools that give performance portability today. They are of course domain-specific but it is possible.
i haven't looked into either of your links in detail, but with the various BLAS/LAPACK libraries that exist, which are also far more mature (and more widely used), would almost certainly be a better choice. lots of these already work on GPUs and are optimized to death by beings who think in assembly.. most of them are in fortran, as well (although they have front ends for several languages).
at the kernel level, fortran isn't really that different to C. it has a power operator. there is no for loop. you certainly will not be reading from files inside a kernel, so.. why do you want to use fortran? i can't tell you whom, but one of the big OpenCL vendors is actually working on fortran OpenCL kernels, as a direct result of SPIR.
if you want C++ in your kernels.. well, you're going to have a bad time if you want performance.
using OpenCL is not like OpenMP, you don't just add a few pragmas and you're set. C code needs to be rewritten for OpenCL. this is largely copy and paste, due to the syntax being so similar, assuming you have mathsy things in your kernels, but the same is true for fortran. replace array access brackets with square brackets, replace power operators with the built-in power functions, etc.
porting legacy (usually fortran) codes to OpenCL/CUDA is actually what i do for a living.
The 'fill command buffers with commands' approach they adopted is basically an alternate way of achieving the same goals as AZDO. You can view AZDO GL as just filling a command buffer for you behind the scenes (with a few other quirks)
Indeed, it also explains why they put the A7 in all their devices (instead of recycling the A6 as they did previous years). An Apple TV with A7 and Metal will be competitive with Xbox360 / PS3 / Wii U spec-wise. And without a fan!
Open GL ES is not going away, just a more performant option is available. Also, Metal is supported by at least two of the most popular engines (Unity and UE4), so they should handle most of the cross-platform work, provided you are using one of those.
Being "another walled garden thing" will hardly be its death knell. Proprietary single-vendor APIs are already common on games consoles, and it hardly seems to have caused a problem. Look at DirectX, even; not quite single-GPU-vendor, but hardly renowned for its portability, and popular nonetheless. These things will succeed if the platform is popular, and other factors are pretty irrelevant.
I suspect iOS is popular enough to make this work.
I don't mean it won't be used. You are right, for example PS4 uses its own PSSL. But it's still a failure, because it's bad for developers who have to support multiple APIs and bad for users who likely won't get some titles for their platform because developers have no resources to support multiple APIs.
> But it's still a failure, because it's bad for developers who have to support multiple APIs and bad for users who likely won't get some titles for their platform because developers have no resources to support multiple APIs.
That doesn't make it a failure, that makes it something you don't like.
This could easily be very big. If it helps developers make faster/better looking games on iOS they'll do it, especially if it's supported by middlware (since most devs probably don't make their own engine).
It that happens to make it harder to port games to Android (or at least get them to look as good), so much the better for Apple.
I don't like it for a reason that it multiplies exclusive titles. I consider it a failure. Normally authors should aim to reach the most audience, not to exclude users because they use a different OS. This API will proliferate the later.
A failure is something which doesn't succeed in achieving its goals. We often approximate this by looking at how widely used something is. For an API, this measure seems reasonable.
It has nothing to do with your views on vendor lock-in.
Direct3D also causes lock-in. That doesn't make it a failure.
It's not a failure. It's an option for developers who are constrained by the fundamental performance impact of a generic OpenGL layer when it sits between you and the hardware. The reason this is good is because it's platform specific, so it can make assumptions about hardware that Apple is in full control of.
I'm sure the big engines will just support multiple code paths where needed. And any existing developer has the option to stick with GL.
Three things I suppose, coming at it from having worked on medium-to-largeish game development efforts:
0. Graphics APIs come far down the list of things people think about when planning a project. Platform support (or lack thereof) is driven by other concerns. It's a business or political decision, based on platform popularity (hence my original comment), not a technical one. Options for handling a skill shortfall include hiring more people or paying somebody else to do it.
1. Going purely by revenue and ability to attract new fanboys to the system, the bulk of development resource supply is not actually terribly interested in cross-platform graphics APIs. People using graphics middleware will use graphics middleware. Developers writing their own technology (and the people writing the graphics middleware) would actually rather have N simpler APIs for N platforms than a single complicated one that tries to support everything. That is then pretty much everybody in this space covered.
(OK, yes - Direct3D11 is not especially simple, though I think it's simpler than OpenGL. But Windows is kind of popular. See point 1.)
2. The bulk of your average game's code is non-graphical, and the vast majority of the graphics code is not API-oriented or is shaders. (So, more shader languages is not a good thing, but you have options. See, e.g., http://aras-p.info/blog/2014/03/28/cross-platform-shaders-in... - it doesn't seem to have been a big issue on the multi-platform projects I've worked on.)
The whole point is to be closer to the "metal". Doing that across GPU vendors would be, well, OpenGL. It's the same as the native interfaces for the PS4 and other consoles - it's supposed to be platform/GPU specific. There's a limit to how much performance a general purpose graphics abstraction can enable.
For many developers, the cross-platform API that they care about is Unity or some equivalent. They will continue to be cross-platform and get a free speedup from the removal of an unnecessary layer of abstraction.
I really don't see this being the case. Any competently designed graphics engine has an abstraction layer between the app's graphics routines end the platform's graphics API. While it's not trivial, another API shouldn't be something that prevents an app from being ported to other platforms.
There wasn't much of a a perceived interest in gaming on the Mac and Linux platforms. The Mac and Linux graphics drivers were awful. A vicious cycle, which we're (hopefully) seeing start to break now.
Mac is supported by at least some major games, which is a step forward. The Steam box should speed things along nicely for Linux, hopefully.
As pjmlp points out, games were successfully ported to different consoles, it's just that Mac and Linux weren't seen as worthwhile targets.
I suspect porting a Windows game to PS3 would be much harder than porting to Mac, but developers managed.
Saw lots of HN news on Apple, I own absolutely no Apple devices, as I think while them are well made, the software ecosystem coming with it is similar to whatever Microsoft had before, i.e. essentially closed, and that is important to me, it made me to own zero Apple devices.
Am I alone here? I'm running Linux everywhere from home to my office for years, and my tablet/cellphone is Android-based.
for what reason that my comment as above deserves down-votes? could those down-voters explain why this should be down-voted? you have down-vote rights does not mean you can abuse it, shame on you HN
You are attempting to hijack a thread about Apple's new Graphics API, Metal. I downvoted you for this and because your post adds nothing to the discussion.
still, the silent down-vote without any warning or notice or explanation/suggestion sucks. that reminds me you all just like the YC founder himself, who has a really low EQ, IMO.
Can HN provide a link to close account? I know this is off-topic still, but feel free to down-vote as many times as you want.
again, how can I close my account, searched around and found nothing for that.
https://developer.apple.com/library/prerelease/ios/documenta...
The shading language appears to be precompiled (which is sorely missing from OpenGL) and based somewhat on C++11.
https://developer.apple.com/library/prerelease/ios/documenta...
My concern is drivers: drivers for video cards on OS X haven't been as good as Windows drivers. That, and of course the specter of another platform-specific API. This invites a comparison with Mantle. I don't think either Metal or Mantle will "win", but they're good prototypes for the next generation of GPU APIs.