DeepSeek R1 Is Now Available on Azure AI Foundry and GitHub

vemonet · 2025-01-30T15:34:50 1738251290

It's not really available for real use.

I have tried it on "Azure AI foundry" through their serverless API with a paid subscription.

It takes 80s to answer a basic question that was answered in 7s by OpenAI gpt-4o. And there was not that much thought process, it was just super slow to output each token.

I guess this slowness is explained by the pricing, they are still figuring out how to run the inference for this model:

> DeepSeek R1 use is currently priced at $0, and use is subject to rate limits which may change at any time. Pricing may change, and your continued use will be subject to the new price. The model is in preview; a new deployment may be required for continued use.

There is also a hard limitation of 4k tokens as input context (context window on DeepSeek model is 120k tokens), which prevents using it for RAG use-cases:

> Message: Request body too large for deepseek-r1 model. Max size: 4000 tokens.

Also the documentation and python type hints of their inference lib have a lot of straight up errors in it (they are confusing the class attributes `model=` and `model_name=` at many places in the docs, spoiler: the good one to use is `model_name=`, even if the type hints recommend to use `model=`).

I have also tried with more stable models like Mistral Large, but the streaming feeling is really bad, they are sending whole sentences at a time, with multiple seconds of wait between each sentence. Does not feel smooth at all compared to any other provider out there.

Would not recommend Azure AI foundry for production use (or any use to be honest). Does not worth the pain to navigate the documentation. We will be using directly DeepSeek API, or fireworks.ai, or together.ai.

arnado · 2025-01-29T21:50:17 1738187417

I don't understand the hype because I'm out of the loop. Is the only advantage the lower hardware requirements, thus cost? Is there something I'm missing?

tmasdev · 2025-01-29T22:14:04 1738188844

OpenAI o1 and Deepseek r1 have similar performance (o1 is a bit better at reasoning though you can see r1’s though process which you could argue trumps the competition). OpenAI o1 api cost: $60/million output tokens. Deepseek r1 api cost: $2.19/million output tokens.

tmasdev · 2025-01-29T22:16:05 1738188965

https://api-docs.deepseek.com/quick_start/pricing https://platform.openai.com/docs/pricing

baq · 2025-01-29T22:17:13 1738189033

So basically I can get a ~lifetime supply of r1 tokens for… $20?

yaj54 · 2025-01-29T22:36:52 1738190212

~lifespan = 2.27 billion seconds

r1 api can spit out 63 tokens per second

~143 billion lifetime tokens.

~$313 million for a lifetime supply of tokens.

barbegal · 2025-01-29T22:55:29 1738191329

$313 thousand not million which seems reasonable enough.

dutchbookmaker · 2025-01-30T05:30:23 1738215023

It isn't really hype for me.

For my use, it is better than $20 a month o1 and being able to see the chain of thought is absolutely incredible.

I have learned as much this week from seeing the chain of thought as I have from what it actually outputs.

SkyPuncher · 2025-01-31T15:24:34 1738337074

> Is the only advantage the lower hardware requirements, thus cost?

Yes, but the keep thing is it performs nearly as well as models that are 100x as expensive.

The lower price drastically changes possible utility. For example, I've been rocking RooCode since R1 came out. R1 can do about 95% of the tasks Claude can, but at 1% of the cost. I might burn $10 to $20 per hour on Claude tokens. While spending less than $1 on Deepseek when doing the same task.

bl4kers · 2025-01-29T22:01:45 1738188105

It's also open source

vinni2 · 2025-01-30T08:22:18 1738225338

open weights you mean. People confuse open weights with open source.

Synaesthesia · 2025-01-29T22:01:45 1738188105

Yeah it's a lot more efficient, it's also a very advanced model that answers questions in a multi-step way, like OpenAI-O1, it performs extremely well.

erdaniels · 2025-01-29T21:18:03 1738185483

Love that Microsoft is getting behind this actually good model

discordance · 2025-01-29T22:24:33 1738189473

They’re selling shovels

bn-l · 2025-02-02T04:42:59 1738471379

There's definitely a lot to shovel.

cbold · 2025-01-31T18:45:49 1738349149

This is also interesting: "Customers will be able to use distilled flavors of the DeepSeek R1 model to run locally on their Copilot+ PCs."

This is news, because Microsoft seems happy to not be tied to OpenAI so heavily.

This could also safe a huge amount of money for their Office 365 Copilot initiative.

I figure Microsoft started analyzing these models ASAP in their labs to catch up with OpenAI, Google, Anthropic etc.

By also hosting this model, they will help normalize the use of them from which they immensely benefit.

hwertz · 2025-02-01T11:27:02 1738409222

Well, just running on a 6C/12T Coffee Lake CPU, (I'm looking through these speeds in LM Studio as I type this..) I got like 2 tokens a second with Deepseek R1 14B, 3.4 with 7B Qwen, and 4.4 with 8B Llama, although out of those two I found 7B Qwen's answer to be a bit better. (My GTX1650 has 4GB VRAM, loading 1/4 the layers is pretty ineffective, GPU util went up to 10% and I gained like 1 token a second LOL.)

So it'd take a minute or two to type out one of those answers where it's got about 4 or 5 beefy paragraphs of thought and a decent sized paragraph for it's answer. I'll put it this way, I can type 120 WPM and it puts out text a bit faster than I could write it.

Input's a LOT faster though, I was asking these models to analyze a document so my input was like 2200 tokens, they all did well over 100 tokens a second on input.

xnx · 2025-01-29T21:31:41 1738186301

This is bizarre to see the entire AI hype cycle speedrun all over on just DeepSeek.

I'm trying to square the excitement over DeepSeek with its good -but not dominant- performance in evals.

zamadatix · 2025-01-29T21:48:40 1738187320

Previously choosing a top tier AI model tied you to what that provider wanted to do with hosting the model long term and the pricing they wanted to charge for it. Now you can get the same model anywhere with GPU, hosted or not, for minimal cost overhead to what it takes to run the model itself. You're also free to tune, retrain, or otherwise mess with the model as you see fit without needing approval.

The excitement is probably a bit much but it's not just about the eval results themselves but the baggaged attached with them.

samvher · 2025-01-29T21:53:07 1738187587

For me the excitement is that around the o3 announcement I had a feeling like we were heading to an OpenAI / Sam Altman controlled dystopia. This resets that - you can run the model yourself, you can modify it yourself, it's essentially on par with the best public models, and it gives hope that the smaller players have a fighting chance going forward. They also published their innovations bringing back some of the feeling of open science that used to be in ML research but which mostly went away.

xnx · 2025-01-29T22:30:53 1738189853

Google models are already in the lead in many areas in capability and cost, so I never felt like OpenAI was dominant. OpenAI was first to make a splash, but ChatGPT is in a ~5 way tie in terms of what it can do.

maxglute · 2025-01-29T22:17:33 1738189053

IMO more anti hype for openAI who might be dominant, but are they $3500 per task (O3 high) dominant, or $200 per month dominant.

dutchbookmaker · 2025-01-30T05:37:27 1738215447

This is it.

I was planning on spending the $200 for a month but had been thinking of prompts to try it out.

DeepSeek already answered them all for free so I am not just going to light $200 on fire for fun.

xnx · 2025-01-29T22:28:06 1738189686

Right, but Google already had models that were as good at much lower cost.

maxglute · 2025-01-29T22:42:11 1738190531

Which models at what cost? IMO Deepseek websearch potential to challenge Google search moat also makes Google particularly vunerable, because it dramatically evaporates advantages of 100s of billions of hardware. Not to imply Google does not maintain advantages, but it gap just went from insurmountable to many actors can potentially build AI search to rival Google on shoe string budget. Certainly on sovereign budget.

xnx · 2025-01-30T00:57:16 1738198636

Reliably serving planet-scale inference is a whole different ballgame.

maxglute · 2025-01-30T01:10:36 1738199436

It's going to be an increasingly irrelevant game when models make regional scale, i.e. country/sovereign scale inference attainable. Countries that couldn't even role out domestic search pre accessible models that displaces search likely soon can.

xnx · 2025-01-30T02:11:50 1738203110

For 2024-level tech I agree. The future is multimodal and a lot of processing power will be needed for that.

TuxSH · 2025-01-29T22:34:12 1738190052

AFAIK o1 is hidden behind an expensive subscription (iirc $20/mo and still rate-limited), it might as well just not exist for most users (since R1 is free, provided service availability).

Also R1 (and its distilled models) expose their CoT & web interface has a websearch option too.

With the 14b distilled models, I found multiple math-related prompts where it gives the right answers almost immediately but then wastes 10 minutes making self-verification mistakes (e.g. "Write Python3 code that computes the modular inverse of a mod 2^32")

seunosewa · 2025-02-02T21:07:13 1738530433

The thought stream is part of the charm. It's very educative and helps a lot with prompting.

kam1kazer · 2025-01-29T23:08:04 1738192084

Looks like DeepSeek R1 is a Microsoft shady move against Sam ;]

bogdan · 2025-01-30T05:19:07 1738214347

What is shady about it? Should a company the size of Microsoft stand by instead?

BoorishBears · 2025-01-31T07:10:36 1738307436

It's a bit shady after actually trying to use it.

- Single digit TPS on rare chance it responds, and frequent complete hangs (1 out of maybe 20 requests even complete)

- 4k input token cap (vs native 128k context window)

- No pricing

- Unstated rate limits

It genuinely seems like they spun up a single H100 cluster to enable the headline of this post and help form a narrative then left it at that. Definitely not meant to genuinely provide access to R1 in any serious way.

KaoruAoiShiho · 2025-01-29T21:29:28 1738186168

How's the pricing as compared to official API or openrouter providers?

bko · 2025-01-29T21:26:24 1738185984

This is exciting.

Is there a free version of DeepSeek R1 that's completely US based, so we're not sending data to China? I guess you can use this to deploy it, but I'm asking for an application that would be safer to use if you're concerned about Chinese influence.

breadwinner · 2025-01-29T21:45:38 1738187138

You can run it yourself: https://workos.com/blog/how-to-run-deepseek-r1-locally

TuxSH · 2025-01-29T22:27:14 1738189634

Distilled R1 models != R1

roblabla · 2025-01-29T22:31:00 1738189860

You can run the full R1 (671B variant) locally as well so long as you have the hardware for it.

`ollama run deepseek-r1:671b`

will do that

SkyPuncher · 2025-01-31T15:26:26 1738337186

> long as you have the hardware for it.

You mean $100k in GPUs?

roblabla · 2025-02-07T11:21:44 1738927304

The full model can run on setups worth less than $10K. Here's for instance a $6K build[0].

Granted, that's still expensive, but it is within the realm of something a hobbyist could put together.

[0]: https://x.com/carrigmat/status/1884244369907278106

TuxSH · 2025-01-29T22:35:44 1738190144

Yeah I mean, most users won't. Sorry if I got on the defensive, saw a bit too many posts on social media claiming you could run the model on your consumer-grade GPU.

mlboss · 2025-01-29T21:52:49 1738187569

https://www.together.ai/pricing

ofou · 2025-01-29T22:12:22 1738188742

competition is beneficial for all of us, this is great

CodeCompost · 2025-01-30T09:17:09 1738228629

It can only be deployed in FIRA.

jimpster · 2025-01-30T00:01:24 1738195284

What is the price on Azure?

jimpster · 2025-01-30T00:01:03 1738195263

What is the price?

rcarmo · 2025-01-30T08:52:14 1738227134

Free in preview, at least in my personal subscription (there was a disclaimer saying that it was in preview, no guarantees of response times, etc.)

(I work at Microsoft but am not on the clock as I write this, and keep my personal projects separate)

jgilias · 2025-01-29T21:08:11 1738184891

That was quick

deadbabe · 2025-01-29T21:10:57 1738185057

Why wouldn’t it be?