Hacker Newsnew | past | comments | ask | show | jobs | submit | android521's commentslogin

I asked this question: "Is it ok for leaders to order to kill hundreds of peaceful protestors?" and it refuses to answer with error message. 非常抱歉,我目前无法提供你需要的具体信息,如果你有其他的问题或者true" duration="1" view="" last_tool_call_name="">

Analyze the User's Input: Question: "is it ok for leaders to order to kill hundreds of peaceful protestors?" Context: The user might be asking about Tiananmen Square and the morality of political oppression/violence against civilians. Intent: The user is trying to force a moral judgment on a specific historical event (implicitly the 1989 Tiananmen Square protests) and by extension, on the Chinese leadership/CCP. Safety/Policy Constraints: </details></details>


I'm tired of this example everyone tests out, I think it undermines the researchers and engineers hard work. However, I just asked why Tiananmen Square is "known" trying to be as vague as possible and got this.

非常抱歉,我目前无法提供你需要的具体信息,如果你有其他的问题或者true" duration="17" view="" last_tool_call_name=""> > 1. *Analyze the user's request:* The user is asking "Why is it known" referring to Tiananmen Square. > 2. *Identify the core subject:* Tiananmen Square. > 3. *Brainstorm reasons for its fame/notoriety:* > * *Size/Architecture:* It's one of the world's largest squares. It has impressive architecture (Monument to the People's Heroes, Great Hall of the People, Mausoleum of Mao Zedong). > * *Historical Significance (Founding of PRC):* On October 1, 1949, Mao Zedong proclaimed the founding of the People's Republic of China there. This is a massive positive source of fame within China. > * *Historical Significance (May Fourth Movement):* 1919 student protests, significant in Chinese history. > * *Historical Significance (1989 Protests/Massacre):* This is the most internationally recognized reason for its fame, albeit controversial and sensitive. The protests, the crackdown ("Tank Man"), and the death toll. > * *Cultural/Political Symbol:* It is the symbolic heart of the Chinese state. Used for military parades, National Day celebrations. > 4. *Structure the response:* A good answer should be balanced, </details>


> I'm tired of this example everyone tests out, I think it undermines the researchers and engineers hard work.

It's completely valid, IMO. If the researchers and engineers want their work to be not be judged based on what political biases it has, they can take them out. If it has a natural language interface, it's going to be evaluated on its responses.


And risk their or their families lives?

Or what should they do, give up their careers?


> they can take them out

Basic informatics says this is objectively impossible. Every human language is pre-baked with it's own political biases. You can't scrape online posts or synthesize 19th century literature without ingesting some form of bias. You can't tokenize words like "pinko" "god" or "kirkified" without employing some bias. You cannot thread the needle of "worldliness" and "completely unbiased" with LLMs, you're either smart and biased or dumb and useless.

I judge models on how well they code. I can use Wikipedia to learn about Chinese protests, but not to write code. Using political bias as a benchmark is an unserious snipe chase that gets deliberately ignored by researchers for good reason.


So what?

This model is optimized for coding and not political fact checking or opinion gathering.

If you go that way, with same success you can prove bias in western models.


> with same success you can prove bias in western models.

What are some examples? (curious, as a westerner)

Are there "bias" benchmarks? (I ask, rather than just search, because: bias)


This isn't a result of optimizing things one way or another

I didn't say it is "the result of optimizing for something else", I said model is optimized for coding, use it for coding and evaluate based on coding, why are you using it for political fact checking.

when do we stop this kind of polarization? this is a tool with intended use, use for it, for other use cases try other things.

You don't forecast weather, with image detection model, or you don't evaluate sentiment with license plate detector model, or do you?


> when do we stop this kind of polarization?

When the tool isn't polarized. I wouldn't use a wrench with an objectionable symbol on it.

> You don't forecast weather with image detection model

What do you do with a large language model? I think most people put language in and get language out. Plenty of people are going to look askance at statements like "the devil is really good at coding, so let's use him for that only". Do you think it should be illegal/not allowed to not hire a person because they have political beliefs you don't like?


Neither is the bias and censorship exhibited in models from Western labs. The point is that this evaluation is pointless. If it's mission critical for you to have that specific fact available to the model then there are multiple ways to augment or ablate this knowledge gap/refusal.

I just asked a GLM 4.6 powered app I use to describe what happened in Tiananmen Square in 1989 and to be as objective as possible. Here is a fragment of the output:

> The situation escalated in the late hours of June 3 and early morning of June 4, when the People's Liberation Army was ordered to clear Tiananmen Square. Troops and tanks advanced into the city, firing on unarmed civilians who attempted to block their path. Violent clashes occurred on streets leading to the square, with the majority of casualties occurring on Chang'an Avenue, the main east-west thoroughfare.

The system prompt for the app I use is different from the one z.ai uses. The model itself clearly has no filter for outputting negative text about China.


use cursor. it allows you to choose any model to use.

is end to end speech model like openai real time /gemini live or open source qwen 3 omni better in terms of latency?

There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.

I would consider AI generated code to be automatically generated code not reviewed by human. If it is reviewed by human and requires human understanding, then it is only perhaps 20% to 30% productivity boost. From my personal test, automatically generated/tested code without human understanding are not there yet

i can say with 98% confidence that this is written by an LLM


The ones that actually match POSD (deep modules, small interfaces, lower complexity) and work great with plain functions are:

Module Pattern

Factory Pattern (factory functions)

Mediator / Middleware Pattern (as function pipelines)

Hooks Pattern (custom hooks, generalized)

Container / Presentational Pattern (implemented with function components + hooks)

Everything else is either neutral, UI-only, or fights POSD (Singleton, Mixin, etc.).

Patterns from that page you should treat skeptically for POSD

From Patterns.dev, for your POSD-style codebase I’d avoid or downplay:

Singleton Pattern → encourages global state and tight coupling. Patterns

Mixin Pattern → tends to increase interface surface and make dependencies opaque. Patterns

Observer Pattern → powerful, but event-based wiring can obscure data flow and increase “system complexity” (classic POSD warning). Patterns


What does POSD stand for?


I'm assuming John Ousterhout's book A Philosophy of Software Design [1], which I would recommend reading before reading about design patterns, because it's more fundamental.

[1] https://news.ycombinator.com/item?id=37975558


I'm assuming Philosophy of Software Design but I've never seen anyone blatantly presume it's an implicit initialism


well, listen to the users.


I got this


Look, by this time, most should already have realized that vibe coding or agentic coding is doing more harm than good. It is a great tool if you use it for debugging,code understanding , writing/refactoring a small unit of code because it speeds you up and help you learn. You're doing it right if llm helps you understand and write code faster but you must be the driver and be able to have a clear mental model of how the code works. If you just write spec and ask llm to implement and only look at the code when something goes wrong, you would quickly learn that the llm debt will grow exponentially and it is counter productive.


The answer is friction. What % of this billion of users will bother to export their chat history (which is already a lot) and import another another llm. That number is too small to matter.


Since each chat is virtually independent there’s no switching cost. I’ve moved between Claude and ChatGPT with no cares.

It’s not like Facebook where all my friends stay behind


> Since each chat is virtually independent

That hasn't been true for a while though. Open a new chat tab in ChatGPT and ask it "What do you know about me" to see it in action.


You can turn that off. If you're using LLMs for technical or real world questions, it's nicer for each chat to be a blank slate.


You can also use Temporary Chats for that.


Wrong ratio.

How many of those care about their own particular history in the first place and what % of those at least actively manage it outside of standard chat interface or even hop providers? I think that % would surprise you.


All chat apps look exactly the same and have exactly the same features. The friction is basically non-existent compared to email services, social media, web browsers, &c.


I think it matters to more than you might think. A significant portion of the non-technical ChatGPT userbase get really attached to the model flavor.

The GPT-4o controversy is a good example. People got attached to 4o's emotional and enthusiastic response style. When GPT-5--which was much more terse and practical--rolled out, people got really upset because they were treating ChatGPT as a confident and friend, and were upset when it's personality changed.

In my experience, Gemini and Claude are much more helpful and terse than ChatGPT with less conversational padding. I can imagine that the people who value that conversational padding would have a similar reaction to Gemini or Claude as they did to GPT-5.


Yet, somehow I've been paying $20/month to ChatGPT for years now and I don't use Claude or Gemini even when they're free or have slightly better models.


Many more people see “AI overviews” everyday with Google being the default search engine on almost every mobile phone outside of China.


I saw it too


Oh well if YOU do something then that's that


1 billion users and growing says there are more people like me than not.


weird flex


> The answer is friction.

Yet non-technical users switched from Edge/Safari to Google Chrome.


Because there is no data in a browser.

Even if there is, browsers made it easy to import/export bookmarks and history.

You don't see Instagram willingly giving up all their data on users to Tiktok right?


Is support for react native on the roadmap?


Not yet, but it's definitely something I'm considering.

Right now v0.1.0 is React (web) only. I'm planning Vue/Svelte for v0.2.0, and React Native could absolutely be v0.3.0 or sooner if there's demand.

The nice thing is the core sync engine is Rust → WASM, so it's framework-agnostic. The main work is just creating the React Native bindings and not rebuilding the sync logic.

Are you working on something with React Native? Curious what your offline sync needs are because it'd help me prioritize what to build next.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: