{"id":899,"date":"2025-08-08T01:33:57","date_gmt":"2025-08-07T21:33:57","guid":{"rendered":"http:\/\/www.actutech.app\/openai-gets-caught-vibe-graphing\/"},"modified":"2025-08-08T01:33:57","modified_gmt":"2025-08-07T21:33:57","slug":"openai-gets-caught-vibe-graphing","status":"publish","type":"post","link":"http:\/\/www.actutech.app\/en\/openai-gets-caught-vibe-graphing\/","title":{"rendered":"OpenAI gets caught vibe graphing"},"content":{"rendered":"<figure>\n<p><img decoding=\"async\" alt=\"\" data-caption=\"Something\u2019s off with that chart on the left.\" data-portal-copyright=\"\" data-has-syndication-rights=\"1\" src=\"https:\/\/platform.theverge.com\/wp-content\/uploads\/sites\/2\/2025\/08\/videoframe_296179_106cf7.png?quality=90&amp;strip=all&amp;crop=0,0,100,100\" \/><figcaption>\n\tSomething\u2019s off with that chart on the left.\t<\/figcaption><\/p><\/figure>\n<p class=\"has-text-align-none\">During its <a href=\"https:\/\/www.theverge.com\/openai\/748017\/gpt-5-chatgpt-openai-release\" target=\"_blank\" rel=\"noopener\">big GPT-5 livestream on Thursday<\/a>, OpenAI showed off a few charts that made the model seem quite impressive \u2014 but if you look closely, some graphs were a little bit off.<\/p>\n<p class=\"has-text-align-none\">In one, ironically showing how well GPT-5 does in \u201cdeception evals across models,\u201d the scale is all over the place. For \u201ccoding deception,\u201d for example, the chart shown onstage says GPT-5 with thinking apparently gets a 50.0 percent deception rate, but that\u2019s compared to OpenAI\u2019s smaller 47.4 percent o3 score which somehow has a larger bar. OpenAI appears to have accurate numbers for this chart in its <a href=\"http:\/\/i%20added%20that%20he%20says%20it%E2%80%99s%20correct%20in%20the%20blog%20post%E2%80%A6\/\" target=\"_blank\">GPT-5 blog post<\/a>, however, where GPT-5\u2019s deception rate is labeled as 16.5 percent.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">who&rsquo;s making these graphs <a href=\"https:\/\/t.co\/Zt6yhZuUoo\" target=\"_blank\">pic.twitter.com\/Zt6yhZuUoo<\/a><\/p>\n<p>\u2014 Shrey Kothari (@shreyk0) <a href=\"https:\/\/twitter.com\/shreyk0\/status\/1953509438255464603?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">August 7, 2025<\/a><\/p><\/blockquote>\n<\/div>\n<\/figure>\n<p class=\"has-text-align-none\">With <a href=\"https:\/\/x.com\/EgeErdil2\/status\/1953505551570415718\" target=\"_blank\">this chart<\/a>, OpenAI showed onstage that one of GPT-5\u2019s scores is <em>lower<\/em> than o3\u2019s but is shown with a bigger bar. In this same chart, o3 and GPT-4o\u2019s scores are different but shown with equally-sized bars. It was bad enough that CEO Sam Altman commented on it, <a href=\"https:\/\/x.com\/sama\/status\/1953513280594751495\" target=\"_blank\">calling it<\/a> a \u201cmega chart screwup,\u201d though he noted that a correct version <a href=\"https:\/\/www.theverge.com\/openai\/748017\/gpt-5-chatgpt-openai-release\" target=\"_blank\" rel=\"noopener\">is in OpenAI\u2019s blog post<\/a>.<\/p>\n<p class=\"has-text-align-none\"> An OpenAI marketing staffer also <a href=\"https:\/\/x.com\/pranaveight\/status\/1953517360071299113\" target=\"_blank\">apologized<\/a>, saying, \u201cWe fixed the chart in the blog guys, apologies for the unintentional chart crime.\u201d<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century <a href=\"https:\/\/t.co\/HXsK2CWCon\" target=\"_blank\">pic.twitter.com\/HXsK2CWCon<\/a><\/p>\n<p>\u2014 Ege Erdil (@EgeErdil2) <a href=\"https:\/\/twitter.com\/EgeErdil2\/status\/1953505551570415718?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">August 7, 2025<\/a><\/p><\/blockquote>\n<\/div>\n<\/figure>\n<p class=\"has-text-align-none\">OpenAI didn\u2019t immediately respond to a request for comment. And while it\u2019s unclear if OpenAI used <a href=\"https:\/\/www.theverge.com\/openai\/748017\/gpt-5-chatgpt-openai-release\" target=\"_blank\" rel=\"noopener\">GPT-5<\/a> to actually make the charts, it\u2019s still not a great look for the company on its big launch day \u2014 especially when it is touting the \u201csignificant advances in reducing hallucinations\u201d with its new model.<\/p>","protected":false},"excerpt":{"rendered":"<p>Something\u2019s off with that chart on the left. During its big GPT-5 livestream on Thursday, OpenAI showed off a few [&hellip;]<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-899","post","type-post","status-publish","format-standard","hentry","category-non-classe"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/posts\/899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/comments?post=899"}],"version-history":[{"count":0,"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/posts\/899\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/media?parent=899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/categories?post=899"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.actutech.app\/en\/wp-json\/wp\/v2\/tags?post=899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}