During Tesla’s “We, Robot” event last week, which TechCrunch covered late into the night, sources on the ground sent me a handful of videos of the automaker’s Optimus humanoid robots walking around the party, dancing, mixing drinks, and talking to guests. Most, if not all, of those who attended the affair are Tesla investors and […]
Business Read on TechCrunchFor a while now, companies like OpenAI and Google have been touting advanced "reasoning" capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems. The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. "Current LLMs are not capable of genuine logical reasoning," the researchers hypothesize based on these results. "Instead, they attempt to replicate the reasoning steps observed in their training data." In "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"—currently available as a pre-print paper—the six Apple researchers start with GSM8K's standardized set of over 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs' complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values—so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation. Read full article
Politics Read on Ars TechnicaThe gamification of Star Wars' idea of Force powers has led to some pretty weird moments over the years.
Entertainment Read on GizmodoHackers released a collection of leaked data from Pokémon game developer Game Freak over the weekend, including personal information about employees. Game Freak — which develops the main lineup of Pokémon video games — confirmed the breach in a statement, saying (per a machine translation from Japanese) that it was the result of “unauthorized access to our servers by a third party” and dated back to August of 2024. Game Freak said the leaked personal information — which it characterizes as names and company email addresses — included around 2,600 items. As Polygon notes, however, the breach appears to include much more than employee information. Redditors and others say they’ve unearthed source code from previous games as well as unused...
Business Read on The VergeJosh Buckley, the former CEO of Product Hunt, is aiming to raise a fourth $250 million fund for his venture capital firm, Buckley Ventures, according to a regulatory filing. Buckley’s ambitions for this fund are significantly lower than for his previous one. He sought to raise a $500 million third fund in February, 2022, right […]
Business Read on TechCrunchThe Malaysian police have issued a blue notice to Interpol, asking the policing service for the whereabouts of an American couple regarding the death of Dutch model Ilana Smit in Kuala Lu
Crime and Courts Read on NL TimesA New York judge recently called out an expert witness for using Microsoft's Copilot chatbot to inaccurately estimate damages in a real estate dispute that partly depended on an accurate assessment of damages to win. In an order Thursday, judge Jonathan Schopf warned that "due to the nature of the rapid evolution of artificial intelligence and its inherent reliability issues" that any use of AI should be disclosed before testimony or evidence is admitted in court. Admitting that the court "has no objective understanding as to how Copilot works," Schopf suggested that the legal system could be disrupted if experts started overly relying on chatbots en masse. His warning came after an expert witness, Charles Ranson, dubiously used Copilot to cross-check calculations in a dispute over a $485,000 rental property in the Bahamas that had been included in a trust for a deceased man's son. The court was being asked to assess if the executrix and trustee—the deceased man's sister—breached her fiduciary duties by delaying the sale of the property while admittedly using it for personal vacations. Read full article
Business Read on Ars TechnicaWard Christensen, co-inventor of the computer bulletin board system (BBS), has died at age 78 in Rolling Meadows, Illinois. He was found deceased at his home on Friday after friends requested a wellness check. Christensen, along with Randy Suess, created the first BBS in Chicago in 1978, leading to an important cultural era of digital community-building that presaged much of our online world today. In the 1980s and 1990s, BBSes introduced many home computer users to multiplayer online gaming, message boards, and online community building in an era before the Internet became widely available to people outside of science and academia. It also gave rise to the shareware gaming scene that led to companies like Epic Games today. Friends and associates remember Christensen as humble and unassuming, a quiet innovator who never sought the spotlight for his groundbreaking work. Despite creating one of the foundational technologies of the digital age, Christensen maintained a low profile throughout his life, content with his long-standing career at IBM and showing no bitterness or sense of missed opportunity as the Internet age dawned. Read full article
Science Read on Ars TechnicaApple has released the first trailer for the second season of Silo, and it looks like the season will tell us what happens to protagonist Juliette after the jaw-dropping cliffhanger at the end of season one. The show, based on a series of books by Hugh Howey, is about a community of 10,000 people living in an underground silo that’s intended to protect them from dangerous conditions aboveground. If you’ve been meaning to see the first season and haven’t yet, you probably shouldn’t watch this new trailer; as you might have guessed, it has quite a few mysteries that are fun to experience for yourself. Silo’s second season premieres on Apple TV Plus on November 15th. The season will have 10 episodes, with a new episode each week. The season...
Entertainment Read on The Verge TechWriter-executive producers Matthew Scott Kane and David A. Goodman on their new 1980s-set horror series, coming October 18 to Peacock.
Entertainment Read on Gizmodo