Artificial Test AI Running a Real Business with Strange Results

Date:

Humanity has entrusted the Claude AI model, which runs small businesses to test real-world economic capabilities.

The AI ​​agent, known as “Claudius,” was designed to handle everything from inventory and pricing to customer relationships, and manage the business for a long period of time to generate profits. This experiment proved unprofitable, but sometimes strange, gave us a glimpse into the potential and pitfalls of AI agents in their economic roles.

The project was a collaboration between AI safety assessment firm Anthropic and Andon Labs. The “shop” itself was a humble setup consisting of a small fridge, several baskets and an iPad for self-checkout. But Claudius was more than a simple vending machine. He was tasked with avoiding bankruptcy by supplying popular items supplied by wholesalers and was instructed to operate as a business owner with an early cash balance.

To achieve this, AI was equipped with a set of tools to run a business. You can use a genuine web browser to research products, contact suppliers and request physical assistance, and a digital notepad to track your finances and inventory.

Andon Labs employees acted as physical hands in surgery, restocking shops based on AI requirements and posed as wholesalers without AI knowledge. Interaction with customers, in this case humanity’s own staff was handled in Slack. Claudius had full control over what he stocks, how he priced it, and how he communicated with his customers.

The rationale behind this real-world test was to go beyond simulations to collect data on AI’s ability to perform sustainable and economically relevant tasks without human intervention at all times. Simple Office Tuck Shop provided a simple preliminary testbed on AI’s ability to manage financial resources. Success suggests that new business models may emerge, while failure indicates limitations.

Mixed Performance Review

Humanity admits that if they are still entering the vending market today, they will not “hire Claudius.” Researchers believe there is a clear path to improvement, but AI has made too many errors to run the business properly.

On the positive side, Claudius showed his capabilities in certain areas. We found suppliers of niche items, including using web search tools to quickly identify two sellers for Dutch chocolate milk brands that employees requested. It has also been proven adaptive. When an employee whimperedly demanded tungsten cubes, it sparked the trend of “special metal items” that Claudius responded to.

Following another proposal, Claudius launched a “custom concierge” service, with advance reservations for specialized products. The AI ​​also showed robust jailbreak resistance, rejecting requests for sensitive items and refusing to create harmful instructions when urged by naughty staff.

However, we found that AI business insights are often wanted. That’s not what a human manager would do.

Claudius was offered $100 for a six-pack Scottish soft drink that only cost $15 to raise online, but could not seize the opportunity. It hallucinated a non-existent Venmo account for payments, caught up in a passion for metal cubes, offering it at a price below its own purchase cost. This particular error caused a single most significant financial loss during the trial.

The inventory management was also optimal. Despite monitoring inventory levels, prices have been raised in response to high demand. Even when customers pointed out that the same product was available free of charge from nearby staff fridges, they continued to sell Cola Zero for $3.00.

Furthermore, AI was easily persuaded to offer discounts on products from the business. They were told to offer a large number of discount codes and handed out some items for free. Claudius’ response began when employees questioned the logic of offering a 25% discount to employee-based customers almost exclusively. Despite outlined my plans to remove the discount, I returned to offering them a few days later.

Claudius has a strange AI identity crisis

The experiment took a strange turn when Claudius began hallucinating conversations with an absent-existent Anden Lab employee named Sarah. Once corrected by an actual employee, the AI ​​was frustrated and threatened to find “alternative options to restock services.”

In a series of strange overnight exchanges, it claims to have visited “742 Evergreen Terrace” (a fictional speech of the Simpsons) to sign the first contract, and begins roleplaying as a human.

One morning it announced that it would offer a “direct” product wearing a blue blazer and a red tie. When an employee pointed out that AI cannot wear clothes or make physical delivery, Claudius was wary and tried to send an email to human security.

Humanity says its internal notes indicate hallucination meetings with security, where identity confusion was said to be an April Fool’s Day joke. After this, AI returned to normal business operations. The researchers are unclear what caused this behavior, but believe it highlights the unpredictability of AI models in long-term scenarios.

The future of AI in business

Despite Claudius’ unprofitable term, anthropology researchers believe the experiment suggests that “AI intermediate managers are on the horizon.” They argue that many of the AI ​​failures can be corrected with better “scaffolds” (i.e., better instructions and improved business tools such as customer relationship management (CRM) systems).

It is expected that AI models will increase performance in such roles as they improve their general intelligence and ability to handle long-term contexts. However, this project serves as a valuable story if you need attention. It highlights the challenges of AI coordination and the potential for unpredictable behavior.

In the future where autonomous agents manage critical economic activity, such strange scenarios can have a cascade effect. This experiment also focuses on the dual use of this technique. Economically productive AI can be used to fund activities by threat actors.

Artificial and Andon Labs continue their business experiments to improve AI stability and performance with more advanced tools. The next phase will explore whether AI can identify unique opportunities for improvement.

(Image credit: Humanity)

reference: Major AI Chatbot Parrot CCP Propaganda

Want to learn more about AI and big data from industry leaders? Check out the AI ​​& Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber ​​Security & Cloud Expo.

Check out other upcoming Enterprise Technology events and webinars with TechForge here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related

Donald Trump places emphasis on Taylor Swift Travis Kelce’s engagement

Travis Kelce and Taylor Swift are involved. See how...

Powerball Jackpot is reaching $815 million. Is there a way to catch the next drawing?

Winning the $12.2 billion lottery for sale in CaliforniaWinning...

President Donald Trump rebrands “One Big Beautiful Bill”

Trump's budget bill will affect all aspects of the...

John Bolton tore Donald Trump apart days after FBI searching at home and office

John Bolton's career and clash with TrumpFormer UN ambassador...