I'm beginning to think that this might reflect a significant gap between MS and OpenAI's capability as organizations. ChatGPT obviously didn't demonstrate this level of problems and I assume they're using a similar model, if not identical. There must be significant discrepancies between how those two teams are handling the model.
Of course, OpenAI should be closely cooperating with Bing team but MS probably don't have deep expertise on in and out of the model? They looks like comparatively lacks understanding on how the model is working and debugging/updating it if needed. What they can do best is prompt engineering or perhaps asking OpenAI team nicely since they're not in the same org. MS has significant influences on OpenAI but as a team Bing's director likely cannot mandate what OpenAI prioritizes for.
I don't think this reflects any gaps between MS and OpenAI capabilities, I speculate the differences could be because of the following issues:
1. Despite it's ability, ChatGPT was heavily policed and restricted - it was a closed model in a simple interface with no access to internet or doing real-time search.
2. GPT in Bing is arguably a much better product in terms of features - more features meaning more potential issues.
3. Despite a lot more features, I speculate the Bing team didn't get enough time to polish the issues, partly because of their attempt to win the race to be the first one out there (which imo is totally valid concern, Bing can never get another chance at a good share in search if they release a similar product after Google). '
4. I speculate that the model Bing is using is different from what was powering ChatGPT. Difference here could be a model train on different data, a smaller model to make it easy to scale up, a lot of caching, etc.
TL;DR: I highly doubt it is a cultural issue. You notice the difference because Bing is trying to offer a much more feature-rich product, didn't get enough time to refine it, and trying to get to a bigger scale than ChatGPT while also sustaining the growth without burning through compute budget.
Bing AI is taking in much more context data, IIUC. ChatGPT was prepared by fine-tune training and an engineered prompt, and then only had to interact with the user. Bing AI, I believe, is taking the text of several webpages (or at least summarised extracts of them) as additional context, which themselves probably amount to more input than a user would usually give it and is essentially uncontrolled. It may just be that their influence over its behaviour is reduced because their input accounts for less of the bot's context.
Also by the time ChatGPT really broke through in public consciousness it had already had a lot of people who had been interacting with its web API providing good RL-HF training.
It could be that Microsoft just rushed things after the success of ChatGPT. I can’t imagine that no one at Microsoft was aware that Sydney could derail the way it does, but management put on pressure to still launch it (even if only in beta for the moment). If OpenAI hadn’t launched ChatGPT, Microsoft might have been more cautious.
Of course, OpenAI should be closely cooperating with Bing team but MS probably don't have deep expertise on in and out of the model? They looks like comparatively lacks understanding on how the model is working and debugging/updating it if needed. What they can do best is prompt engineering or perhaps asking OpenAI team nicely since they're not in the same org. MS has significant influences on OpenAI but as a team Bing's director likely cannot mandate what OpenAI prioritizes for.