And there you have it. Perhaps the greatest thing to happen to RISC-V since the ...

dragontamer · on Sept 14, 2020

RISC-V still seems too ad-hoc to me, and really new. Hard to say where it'd go for now.

I know momentum is currently towards ARM over POWER, but... OpenPOWER is certainly a thing, and has IBM / Red Hat support. IBM may be expensive, but they already were proven "fair partners" in the OpenPOWER initiative and largely supportive of OSS / Free Software.

ChuckMcM · on Sept 14, 2020

I would love OpenPOWER to succeed. I just don't see the 48 pin QFP version that costs < $1 and powers billions of gizmos. For me the ARM ecosystem's biggest win is that it scales from really small (M0/M0+) to really usefully big (A78) and has many points between those two architectures.

I don't see OpenPOWER going there, but I can easily see RISC-V going there. So, for the moment, that is the horse I'm betting on.

dragontamer · on Sept 14, 2020

Not quite 48-pin QFP chips, but 257-pin embedded is still smaller than Rasp. Pi. (Just searched what NXP's newest Power-chip is, and its a S32R274: 2MB 257-pin BGA. Definitely "embedded" size, but not as small as Cortex-M0)

To be honest, I don't think that NVidia/ARM will screw over their Cortex-M0 or Cortex-M0+ customers over. I'm more worried about the higher-end, whether or not NVidia will "play nice" with its bigger rivals (Apple, Intel, AMD) in the datacenter.

exikyut · on Sept 14, 2020

The FS32R274VCK2VMM appears to be the cheapest in this series; Digi-Key have it for $30, NXP has it for "$13 @ 10K". This is for a 200MHz part.

https://www.nxp.com/part/FS32R274VCK2VMM

https://www.digikey.com/product-detail/en/nxp-usa-inc/FS32R2...

The two related devkits list for $529 and $4,123: https://www.digikey.com/products/en/development-boards-kits-...

--

Those processors make quite a few reference to an "e200", which I think is the CPU architecture. I discovered that Digi-Key lists quite a few variants of this under Core Processor; and checking the datasheets of some random results suggests that they are indeed Power architecture parts.

https://www.digikey.com/products/en/integrated-circuits-ics/...

The cheapest option appears to be the $2.67@1000, up-to-48MHz SPC560D40L1B3E0X with 256KB ECC RAM.

Selecting everything >100MHz finds the $7.10@1000 SPC560D40L1B3E0X, an up-to-120MHz part that adds 1MB flash (128KB ECC RAM).

Restricting to >=200MHz finds the $13.32@500 SPC5742PK1AMLQ9R has which has dual cores at 200MHz, 384KB ECC RAM and 2.5MB flash, and notes core lock-step.

--

After discovering the purpose of the "view prices at" field, the landscape changes somewhat.

https://www.digikey.com/products/en/integrated-circuits-ics/...

The SPC574S64E3CEFAR (https://www.st.com/resource/en/datasheet/spc574s64e3.pdf) is 140MHz, has 1.5MB code + 64KB data flash and 96KB+32KB data RAM, and is available for $14.61 per 1ea.

The SPC5744PFK1AMLQ9 (https://www.nxp.com/docs/en/data-sheet/MPC5744P.pdf) is $20.55@1, 200MHz, 2.5MB ECC flash, 384KB ECC RAM, and has two cores that support lockstep.

The MPC5125YVN400 (https://www.nxp.com/docs/en/product-brief/MPC5125PB.pdf) is $29.72@1, 400MHz, supports DDR2@200MHz (only has 32KB onboard (S)RAM), and supports external flash. (I wonder if you could boot Linux on this thing?)

rvense · on Sept 14, 2020

These are all basically ten-year-old parts, aren't they?

ChuckMcM · on Sept 14, 2020

Yes but hey, the core ARM ISA is like 40 years old. The key is that they are in fact "low cost SoCs" which is not something I knew existed :-).

Its really too bad the dev boards are so expensive but I get you need a lot of layers to route that sort of BGA.

rvense · on Sept 15, 2020

Sure, the ARM ISA is old, but a few things have happened in microarchitecture since then. I wouldn't be rushing to use a 10-year-old ARM over a newer one. The Cortex cores are pretty great compared to ARM9 or whatever.

dragontamer · on Sept 15, 2020

The ARM Cortex-M3 was released in 2006 and is still a popular microcontroller core. Microcontrollers have a multi-decade long lifespan. (I'm still seeing new 8051-based designs...)

There are still new chips using the Cortex-M3 today. Microcontroller devs do NOT want to be changing their code that often.

New chips move the core to lower-and-cheaper process nodes (and lower the power consumption), while otherwise retaining the same overall specifications and compatibility.

dragontamer · on Sept 14, 2020

They're all low end embedded parts with highly integrated peripherals. Basically: a microcontroller.

No different than say, Cortex-M0 or M0+ in many regards (although ARM scales down to lower spec'd pieces).

_ofdw · on Sept 14, 2020

I miss DIP chips that would fit on breadboards. I don't have steady enough hands to solder QFP onto a PCB, and I'm too cheap to buy an oven :(

Ecco · on Sept 14, 2020

You’re supposed to drag-solder those. Look it up on YouTube, it’s super easy. The hardest part is positioning the chip, but it’s actually easier than with an oven, because you can rework it if you only solder one or two pins :)

IshKebab · on Sept 14, 2020

There's an even easier way than drag soldering - just solder it normally, without worrying about bridges. You can put tons of solder on it.

The use some desoldering braid to soak up the excess solder. It will remove all the bridges and leave perfect joints.

_ofdw · on Sept 14, 2020

Wow, just looked up a video and some guy did an 0.5mm pitch chip pretty darn quickly. Thank you!

Ecco · on Sept 14, 2020

You’re welcome! Also, flux. Lots of it. Buy some good one, and use tons of it. Then clean the hell out of your PCB!

ChuckMcM · on Sept 14, 2020

https://www.digikey.com/catalog/en/partgroup/smt-breakout-pc... can help. I have hand soldered STM32s to this adapter (use flux, a small tip)

foldr · on Sept 14, 2020

A cheap toaster oven or hot air tool works fine for these. Or, as others have said, a regular soldering iron with lots of flux.

awill · on Sept 14, 2020

the war is over. Arm has won. That dominance will take a long time to fade. AWS and Apple's future is Arm.

darksaints · on Sept 14, 2020

OpenPOWER is pretty awesome but would be nowhere near as awesome as an OpenItanium. IMHO, Itanium was always mismarketed and misoptimized. It made a pretty good server processor, but not so good that enterprises were willing to migrate 40 year old software to run on it.

In mobile form, it would have made a large leap in both performance and battery life. And it would have been a fairly easy market to break into: the average life of a mobile device is a few years, not a few decades. Recompilation and redistribution of software is the status quo.

anarazel · on Sept 14, 2020

IMO VLIW is an absurdly bad choice for a general purpose processor. It requires baking in a huge amount of low level micro-architectural details into the compiler / generated code. Which obviously leads to problems with choosing what hardware generation to optimize for / not being able to generate good code for future architectures.

And the compiler doesn't even come close to having as much information as the CPU has. Which basically means that most of the VLIW stuff just ends up needing to be broken up inside the CPU for good performance.

dragontamer · on Sept 14, 2020

VLIW was the best implementation (20 years ago) of instruction level parallelism.

But what have we learned in these past 20 years?

* Computers will continue to become more parallel -- AMD Zen2 has 10 execution pipelines, supporting 4-way decode and 6-uop / clock tick dispatch per core, with somewhere close to 200 registers for renaming / reordering instructions. Future processors will be bigger and more parallel, Ice Lake is rumored to have over 300-renaming registers.

* We need assembly code that scales to all different processors of different sizes. Traditional assembly code is surprisingly good (!!!) at scaling, thanks to "dependency cutting" with instructions like "xor eax, eax".

* Compilers can understand dependency chains, "cut them up" and allow code to scale. The same code optimized for Intel Sandy Bridge (2011-era chips) will continue to be well-optimized for Intel Icelake (2021 era) ten years later, thanks to these dependency-cutting compilers.

I think a future VLIW chip can be made that takes advantage of these facts. But it wouldn't look like Itanium.

----------

EDIT: I feel like "xor eax, eax" and other such instructions for "dependency cutting" are wasting bits. There might be a better way for encoding the dependency graph rather than entire instructions.

Itanium's VLIW "packages" is too static.

I've discussed NVidia's Volta elsewhere, which has 6-bit dependency bitmasks on every instruction. That's the kind of "dependency graph" information that a compiler can provide very easily, and probably save a ton on power / decoding.

jabl · on Sept 14, 2020

I agree there is merit in the idea of encoding instruction dependencies in the ISA. There have been a number of research projects in this area, e.g. wavescalar, EDGE/TRIPS, etc.

It's not only about reducing the need for figuring out dependencies at runtime, but you could also partly reduce the need for the (power hungry and hard to scale!) register file to communicate between instructions.

floatboth · on Sept 15, 2020

Main lesson: we failed to make all the software JIT-compiled or AOT-recompiled-on-boot or something, that would allow retargeting the optimizations for the new generation of a VLIW CPU. Barely anyone even tried. Well I guess in the early 2000s there was this vision that everything would be Java, which is JIT, but lol

dragontamer · on Sept 15, 2020

Your point seems invalid, in the face of a large chunk of HPC (neural nets, matrix multiplication, etc. etc.) getting rewritten to support CUDA, which didn't even exist back when Itanium was announced.

VLIW is a compromise product: its more parallel than a traditional CPU, but less parallel than SIMD/GPUs.

And modern CPUs have incredibly powerful SIMD engines: AVX2 and AVX512 are extremely fast and parallel. There are compilers that auto-vectorize code, as well as dedicated languages (such as ipsc) which work for SIMD.

Encoders, decoders, raytracers, and more have been rewritten for Intel AVX2 SIMD instructions, and then re-rewritten for GPUs. The will to find faster execution has always existed, but unfortunately, Itanium failed to perform as well as its competition.

floatboth · on Sept 16, 2020

I'm not talking about rewrites and GPUs. I'm saying we do not have dynamic recompilation of everything. As in – if we would have ALL binaries that run on the machine (starting with the kernel) stored in some portable representation like wasm (or not-fully-portable-but-still-reoptimizable like llvm bitcode) and recompiled with optimization for the current exact processor when starting. Only that would solve the "new generation of VLIW-CPU needs very different compiler optimizations to perform, oops all your binaries are for first generation and they are slow now" problem.

GPUs do work like this – shaders recompiled all the time – so VLIW was used in GPUs (e.g. TeraScale). But on CPUs we have a world of optimized, "done" binaries.

ATsch · on Sept 14, 2020

All of this hackery with hundreds of registers just to continue to make a massively parallel computer look like an 80s processor is what something like Itanium would have prevented. Modern processors ended up becoming basically VLIW anyway, Itanium just refused to lie to you.

dragontamer · on Sept 14, 2020

When standard machine code is written in a "Dependency cutting" way, then it scales to many different reorder registers. A system from 10+ years ago with only 100-reorder registers will execute the code with maximum parallelism... while a system today with 200 to 300-reorder buffers will execute the SAME code with also maximum parallelism (and reach higher instructions-per-clock tick).

That's why today's CPUs can have 4-way decoders and 6-way dispatch (AMD Zen and Skylake), because they can "pick up more latent parallelism" that the compilers have given them many years ago.

"Classic" VLIW limits your potential parallelism to the ~3-wide bundles (in Itanium's case). Whoever makes the "next" VLIW CPU should allow a similar scaling over the years.

-----------

It was accidental: I doubt that anyone actually planned the x86 instruction set to be so effectively instruction-level parallel. Its something that was discovered over the years, and proven to be effective.

Yes: somehow more parallel than the explicitly parallel VLIW architecture. Its a bit of a hack, but if it works, why change things?

anarazel · on Sept 14, 2020

I don't understand how an increase, including the implied variability, of CPU internal parallelism and VLIW benefits go together?

dragontamer · on Sept 14, 2020

I'm talking about a mythical / mystical VLIW architecture. Obviously, older VLIW designs have failed in this regards... but I don't necessarily see "future" VLIW processors making the same mistake.

Perhaps from your perspective, a VLIW architecture that fixes these problems wouldn't necessarily be VLIW anymore. Which... could be true.

moonchild · on Sept 14, 2020

Have you seen the mill cpu?

javajosh · on Sept 14, 2020

Has anyone?

dralley · on Sept 14, 2020

At the rate they're going, all the patents they've been filing will be expired by the time they get a chip out the door.

KMag · on Sept 14, 2020

> And the compiler doesn't even come close to having as much information as the CPU has.

Unless your CPU has a means for profiling where your pipeline stalls are coming from, combined with dynamic recompilation/reoptimization similar to IBM's project DAISY or HP's Dynamo.

It's not going to do well as out-of-order CPUs that make instruction re-optimization decisions for every instruction, but I wouldn't rule out software-controlled dynamic re-optimization getting most of the performance benefits of out-of-order execution with a much smaller power budget, due to not re-doing those optimization calculations for every instruction. There are reasons most low-power implementations are in-order chips.

csharptwdec19 · on Sept 14, 2020

I feel like what you describe is possible. When I think of what Transmeta was able to accomplish in the early 2000s just with CMS, certainly so.

darksaints · on Sept 14, 2020

Traditional compiler techniques may have struggled with maintaining code for different architectures, but a lot has changed in the last 15 years. The rise of widely used IR languages has led to compilers that support dozens of architectures and hundreds of instruction sets. And they are getting better all the time.

The compiler has nearly all of the information that the CPU has, and it has orders of magnitude more. At best, your CPU can think a couple dozen cycles ahead of what it is currently executing. The compiler can see the whole program, can analyze it using dozens of methodologies and models, and can optimize accordingly. Something like Link Time Optimization can be done trivially with a compiler, but it would take an army of engineers decades of work to be able to implement in hardware.

dragontamer · on Sept 14, 2020

> At best, your CPU can think a couple dozen cycles ahead of what it is currently executing.

The 200-sized reorder buffer says otherwise.

Loads/stores can be reordered for 200+ different concurrent objects on modern Intel skylake (2015 through 2020) CPUs. And its about to get a bump to 300+ sized reorder buffers in Icelake.

Modern CPUs are designed to "think ahead" almost the entirety of DDR4 RAM Latency, allowing reordering of instructions to keep the CPU pipes as full as possible (at least, if the underlying assembly code has enough ILP to fill the pipelines while waiting for RAM).

> Something like Link Time Optimization can be done trivially with a compiler, but it would take an army of engineers decades of work to be able to implement in hardware.

You might be surprised at what the modern Branch predictor is doing.

If your "call rax" indirect call constantly calls the same location, the branch predictor will remember that location these days.

KMag · on Sept 14, 2020

With proper profiling (say, reservoir sampling of instructions causing pipeline stalls), and dynamic recompilation/reoptimization like IBM's project DAISY / HP's Dynamo, you may get performance near a modern out-of-order desktop processor at the power budget of a modern in-order low-power chip.

You get instructions scheduled based on actual dynamically measured usage patterns, but you don't pay for dedicated circuits to do it, and you don't re-do those calculations in hardware for every single instruction executed.

It's not a guaranteed win, but I think it's worth exploring.

dragontamer · on Sept 14, 2020

But once you do that, then you hardware optimize the interpreter, and then its no longer called a "dynamic recompiler", but instead a "frontend to the microcode". :-)

KMag · on Sept 14, 2020

No doubt there is still room for a power-hungry out-of-order speed demon of an implementation, but you need to leave the door open for something with approximately the TDP of a very-low-power in-order-processor with performance closer to an out-of-order machine.

branko_d · on Sept 14, 2020

Neo: What are you trying to tell me? That I can dodge "call rax"?

Morpheus: No, Neo. I'm trying to tell you that when you're ready, you won't need "call rax".

---

Compiler has access to optimizations that are at the higher level of abstraction than what CPU can do. For example, the compiler can eliminate the call completely (i.e. inline the function), or convert a dynamic dispatch into static (if it can prove that an object will always have a specific type at the call site), or decide where to favor small code over fast code (via profile-guided optimization), or even switch from non-optimized code (but with short start-up time) to optimized code mid-execution (tiered compilation in JITs), move computation outside loops (if it can prove that the result is the same in all iterations), and many other things...

saagarjha · on Sept 14, 2020

There is no way a compiler can do anything for an indirect call that goes one way for a while and the other afterwards. A branch predictor can get both with if not 100% accuracy about as close to it as you can possibly get.

branko_d · on Sept 15, 2020

Sure.

My point was simply that the compiler may be in position to disprove the assumption that this call is in fact dynamic (it may actually be static) or that it has to be a call in the first place (and inline the function instead).

I'm certainly not arguing against branch predictors.

formerly_proven · on Sept 14, 2020

> The compiler has nearly all of the information that the CPU has, and it has orders of magnitude more.

The CPU has something the compiler can never have.

Runtime information.

That's why VLIW works great for DSP which is 99.9 % fixed access patterns, while being bad for general purpose code.

drivebycomment · on Sept 14, 2020

Itanuim deserved its fiery death and resurrection doesn't make any sense whatsoever. It's a dead end architecture, and humanity gained (by freeing up valuable engineering power to other more useful endeavors) when it died.

darksaints · on Sept 14, 2020

Itanium was an excellent idea that needed investment in compilers. Nobody wanted to make that investment because speculative execution got them 80% of the way there without the investment in compilers. But as it turns out, speculative execution was a phenomenally bad idea, and patching its security vulnerabilities has set back processor performance to the point where VLIW seems like a good idea again. We should have made those compiler improvements decades ago.

dragontamer · on Sept 14, 2020

NVidia Volta: https://arxiv.org/pdf/1804.06826.pdf

Each machine instruction on NVidia Volta has the following information:

* Reuse Flags

* Wait Barrier Mask

* Read/Write barrier index (6-bit bitmask)

* Read Dependency barriers

* Stall Cycles (4-bit)

* Yield Flag (1-bit software hint: NVidia CU will select new warp, load-balancing the SMT resources of the compute unit)

Itanium's idea of VLIW was commingled with other ideas; in particular, the idea of a compiler static-scheduler to minimize hardware work at runtime.

To my eyes: the benefits of Itanium are implemented in NVidia's GPUs. The compiler for NVidia's compiler-scheduling flags has been made and is proven effective.

Itanium itself: the crazy "bundling" of instructions and such, seems too complex. The explicit bitmasks / barriers of NVidia Volta seems more straightforward and clear in describing the dependency graph of code (and therefore: the potential parallelism).

----------

Clearly, static-compilers marking what is, and what isn't, parallelizable, is useful. NVidia Volta+ architectures have proven this. Furthermore, compilers that can emit such information already exist. I do await the day when other architectures wake up to this fact.

StillBored · on Sept 14, 2020

GPU's, aren't general purpose compute. EPIC did fairly well with HPC/etc style applications as well, it was everything else that was problematic. So, yes there are a fair number of workload and microarch decision similarities. But right now, those workloads tend to be better handled with a GPU style offload engine (or as it appears the industry is slowly moving, possibly a lot of fat vector units attached to a normal core).

dragontamer · on Sept 14, 2020

I'm not talking about the SIMD portion of Volta.

I'm talking about Volta's ability to detect dependencies. Which is null: the core itself probably can't detect dependencies at all. Its entirely left up to the compiler (or at least... it seems to be the case).

AMD's GCN and RDNA architecture is still scanning for read/write hazards like any ol' pipelined architecture you learned in college. The NVidia Volta thing is new, and probably should be studied from a architectural point of view.

Yeah, its a GPU-feature on NVidia Volta. But its pretty obvious to me that this explicit dependency-barrier thing could be part of a future ISA, even one for traditional CPUs.

rrss · on Sept 14, 2020

FWIW, this article suggests the static software scheduling you are describing was introduced in Kepler, so it's probably at least not entirely new in Volta:

https://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-r...

> NVIDIA has replaced Fermi’s complex scheduler with a far simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIA’s compiler. In essence it’s a return to static scheduling

and I think this is describing more or less the same thing in Maxwell: https://github.com/NervanaSystems/maxas/wiki/Control-Codes

dragontamer · on Sept 14, 2020

I appreciate the info. Apparently NVidia has been doing this for more years than I expected.

StillBored · on Sept 14, 2020

I think your conflating OoO and speculative execution. It was OoO which the itanium architects (apparently) didn't think would work as well as it did. OoO and being able to build wide superscaler machines, which could dynamically determine instruction dependency chains is what killed EPIC.

Speculative execution is something you would want to do with the itanium as well, otherwise the machine is going to be stalling all the time waiting for branches/etc. Similarly, later itaniums went OoO (dynamically scheduled) because it turns out, the compiler can't know runtime state..

https://www.realworldtech.com/poulson/

Also while googling for that, ran across this:

https://news.ycombinator.com/item?id=21410976

PS: speculative execution is here to stay, it might be wrapped in more security domains and/or its going to just be one more nail in the business model of selling shared compute (something that was questionably from the beginning).

xgk · on Sept 14, 2020

   questionably from the beginning

Agreed. If you look at what's the majority of compute loads (e.g. Instagram, Snap, Netflix, HPC) then that's (a) not particularly security critical, and (b) so big that the vendors can split their workload in security critical / not security critical, and rent fast machines for the former, and secure machines for the latter.

I wonder which cloud provider is the first to offer this in a coherent way.

Quequau · on Sept 14, 2020

I dimly recall reading an interview with one of Intel's Sr. Managers on the Itanium project where he explained his thoughts on why Itanium failed.

His explanation centred on the fact that Intel decided early on that Itanium would only ever be an ultra high end niche product and only built devices which Intel could demand very high prices for. This in turn meant that almost no one outside of the few companies who were supporting Itanium development and certainly not most of the people who were working on other compilers and similar developer tools at the time, had any interest in working on Itanium because they simply could not justify the expense of obtaining the hardware.

So all the organic open source activity that goes on for all the other platforms which are easily obtainable by pedestrian users simply did not go on for Itanium. Intel did not plan on that up front (though in hindsight it seemed obvious) and by the time that was widely recognised within the management team no one was willing to devote the sort of scale of resources that were required for serious development of developer tools on a floundering project.

jabl · on Sept 14, 2020

> Itanium was an excellent idea that needed investment in compilers.

ISTR that Intel & HP spent well over a $billion on VLIW compiler R&D, with crickets to show for it all.

How much are you suggesting should be spent this time for a markedly different result?

drivebycomment · on Sept 14, 2020

By late 2000s, instruction scheduling research was largely considered done and dusted, with papers like:

https://dl.acm.org/doi/book/10.5555/923366 https://dl.acm.org/doi/10.1145/349299.349318

and many, many others (it produced so many PhDs in 90s). And, needless to say, HP and Intel hired so many excellent researchers during the heydays of Itanium. So I don't know on what basis you think there wasn't enough investment. So I have no choice but to assume you're ignorant of the actual history here, both in academics and industry.

It turns out instruction scheduling can not overcome the challenge of variable memory and cache latency, and branch prediction, because all of those are dynamic and unpredictable, for "integer" application (i.e. bulk of the code running on the CPUs of your laptop and cell phones). And, predication, which was one of the "solutions" to overcome branch misprediction penalties, turns out to be not very efficient, and is limited in its application.

For integer applications, it turns out the instruction level parallelism isn't really the issue. It's about how to generate and maintain as many outstanding cache misses at a time. VLIW turns out to be insufficient and inefficient for that. Some minor attempts are addressing that through prefetches and more elaborate markings around load/store all failed to give good results.

For HPC type workload, it turns out data parallelism and thread-level parallelism are much more efficient way to improve the performance, and also makes ILP on a single instruction stream play only a very minor role - GPUs and ML accelerators demonstrate this very clearly.

As for the security and the speculative execution, speculative execution is not going anywhere. Naturally, there are many researches around this like:

https://ieeexplore.ieee.org/abstract/document/9138997 https://dl.acm.org/doi/abs/10.1145/3352460.3358306

and while it will take a while before the real pipeline implements ideas like above thus we may continue to see some smaller and smaller vulnerabilities as the industry collectively plays whack-a-mole game, I don't see a world where the top of the line general-purpose microprocessor giving up on speculative execution, as the performance gain is simply too big.

I have yet to meet any academics or industry processor architects or compiler engineer who think VLIW / Itanium is the way to move forward.

This is not to say putting as much work to the compiler is a bad idea, as nVidia has demonstrated. But what they are doing is not VLIW.

xigency · on Sept 14, 2020

> I never liked Softbank owning it, but hey someone has to.

I understand what you're saying and this seems to be the prevailing pattern but I really don't understand it. ARM could easily be a standalone company. For some reason, mergers are in.

ChuckMcM · on Sept 14, 2020

I would like to understand what you know about ARM that it's board of director's didn't (doesn't) know? In my experience companies merge when they are forced to, not because they want to.

I have always assumed that their shareholders were offered so much of a premium on their shares that they chose to sell them rather than hold onto them. Clearly based on their fiscal 2015 results[1] they were a going concern.

[1] https://www.arm.com/company/news/2016/02/arm-holdings-plc-re...

xigency · on Sept 15, 2020

I don't have any specific knowledge of the ARM sale but I've seen other mergers.

Clearly, if a buyer is willing to pay a premium for shares, it's because they believe the company is worth that premium. If the shareholders were optimistic, they might consider that as a signal of the company's underlying value and choose not to sell.

Sometimes activist shareholders will pressure a company to sell, like in the case of the Whole Foods sale to Amazon. Jana Partners owned ~8% and they motivated Whole Foods to look for buyers. They dumped their shares after the sale was announced and before it was executed. Whether that was the best for the other 92% of shareholders, the company's employees, and its customers is another question entirely.

My question is really outside of that. There will always be banks, investors, and companies that are motivated to consolidate and integrate companies to be increasingly competitive to the point of being anti-competitive. What is the counter force that helps maintain moderately sized companies that are profitable on their own?

Koshkin · on Sept 14, 2020

The next Apple machine I am going to buy will be using RISC-V cores.

saagarjha · on Sept 14, 2020

I suspect you'll be waiting for quite a long time, if not forever.

dmix · on Sept 14, 2020

FWIW Apple isn't even listed on RISC-V's membership page:

https://riscv.org/membership/members/

While some like Google and Alibaba are listed as platinum founding members.

nickik · on Sept 14, 2020

I agree that Apple wont do RISC-V but you don't need to be a member to us it.

gumby · on Sept 14, 2020

Because of this transaction? I’m sure this deal will have absolutely no impact on apple’s deal with ARM.

Apple could of course Afford to invest in RISC-V (and surely has played with it internally) but they have enough control of their future under the current arrangement that it will be a long long time before they feel any need to switch — 15 years at least.

acomjean · on Sept 14, 2020

Apple and Nvidia don't seem to see eye to eye. Nvidia doesn't support macs (CUDA support was pulled a year or two ago) and apples don't include Nvidia cards.

This could change.

gumby · on Sept 14, 2020

This is because Nvidia screwed apple (from Apple’s POV) years ago with some bad GPUs to the point where Apple flat out refuses to source Nvidia parts. I don’t know the internal details of course just the public ones so can’t say if Apple is being petty or if Nvidia burned the bridge while the problem was unfolding.

Given that the CEO was the supply chain guy at the time I suspect the latter, as I’d imagine he’d be more dispassionate than Jobs.

In any case I seriously doubt nvidia could, much less would benefit from cancelling Apple’s agreement.

TomVDB · on Sept 14, 2020

> ... to the point where Apple flat out refuses to source Nvidia parts.

I've seen this argument made before.

It would be a valid point if Apple stopped using Nvidia GPUs in 2008 (they did), and then never used them again. And yet, 4 years later, they used Nvidia GPUs on the 2012 MacBook Retina 15" on which I'm typing this.

ksec · on Sept 14, 2020

And the 2012 GeForce GPU had GPU panic issues and let say higher chances of GPU failures.

And then that was that. We haven't seen Nvidia GPU again.

TomVDB · on Sept 14, 2020

I haven’t seen any of those in 8 years, but I’ll take you at your word...

That said: AMD GPUs have also had their share of issues on MacBooks.

delfinom · on Sept 15, 2020

AMD doesn't grossly lie about their thermal specifications like Nvidia consistently does. To the point engineers can't design shit properly. It's one thing to make mistakes in engineering, that can be smoothed out with cash. It's another to outright lie and cover up.

gumby · on Sept 14, 2020

Thanks for this correction!

CamperBob2 · on Sept 14, 2020

ROFL. Why not wait for the upcoming Josephson junction memristors, while you're at it?