Manus’ ex-lead dropped function calling — what it changes for e-commerce

Last updated: 3 April 2026 · Reading time: 11 min

Quick summary The Manus backend lead replaced function calling with eval code after 2 years of developing AI agents. Son post sur r/LocalLLaMA a recueilli 1 927 upvotes et 409 commentaires. This technical shift is transforming the way agents interact with e-commerce catalogs. Here’s what that means in practice for your stack.

Why did an AI agent expert abandon function calling?

Le 12 March 2026, le backend lead de Manus — one of the most followed AI agent projects — published a post on r/LocalLLaMA which shook the technical community. After 2 years of building agents with function calling, he explained why he abandoned this approach. Result: 1 927 upvotes et 409 commentaires en quelques jours.

To understand the scope of this shift, you need to understand what function calling is. This is the standard mechanism by which an LLM (language model) interacts with the outside world. The template has a catalog of predefined functions — for example search_product(query, category, price_max) — and generates a structured call with the right parameters.

This model has dominated the ecosystem since 2023. OpenAI, Anthropic, Google: all LLM providers offer function calling as the main method for building agents. Frameworks like LangChain, CrewAI and AutoGen are built around this paradigm.

The limits identified by the lead of Manus

The developer’s observation is precise: after thousands of iterations in production, function calling presents frictions structurelles :

  • Rigidity of the diagram: each new capability requires the definition of a new function, with its typed parameters, its documentation, its tests. The agent is limited to the catalog of functions that the developer anticipated
  • Explosion combinatoire: when an e-commerce agent must manage filtering, sorting, comparison, discount calculation, stock checking and recommendation, the number of necessary functions grows exponentially
  • Cumulative latency: Each function call represents a round trip between the LLM and the host system. A complex journey may require 8 to 12 sequential calls
  • Perte de contexte: between two calls, the LLM must reconstruct the context of the conversation. Intermediate results are serialized then deserialized, with a loss of information at each step

The key word of his analysis: le function calling transforme le LLM en dispatcher. He chooses which function to call and with what parameters, Mays he is unable to compose original actions. He is a telephone switchboard operator in a world that requires a developer.

1 927 upvotes sur r/LocalLLaMA pour le post du backend lead de Manus (12 March 2026)

Comment fonctionne l’alternative : le code eval ?

The alternative proposed by the Manus lead is radical in its simplicity: instead of defining a catalog of functions, we let the LLM generate code which is executed directly. The model is written in Python (or JavaScript), the code runs in a controlled environment, and the result returns to the LLM.

Un exemple concret : la recherche produit

With classic function calling, an e-commerce agent must call a series of functions:

// Classic function calling: 3 sequential calls
search_products(query= »waterproof coat », category= »men »)
filter_results(min_rating=4.0, max_price=150)
sort_results(field= »price », order= »asc »)

With the eval code, the LLM writes directly:

# Eval code: one execution
results = db.query(« SELECT * FROM products
  WHERE category = ‘homme’
AND description LIKE ‘%waterproof%’
  AND rating >= 4.0
  AND price <= 150
  ORDER BY price ASC »)

A single block of code replaces three function calls. The LLM freely composes its logic instead of dividing it into atomic calls.

Community validation

The reaction on r/LocalLLaMA was immediate and massive. The most upvoted comment (211 upvotes) reported a convergent experience:

“I experimented with Python code eval as my only tool… it worked remarkably well. » r/LocalLLaMA · 211 upvotes · Top comment

Another very popular comment went further:

“JIT natural language to sed awk regex was the real superpower all along. » r/LocalLLaMA · Top comment

The idea is powerful: LLMs are already powerful code generators. Asking them to generate executable code is more natural than asking them to choose from a menu of predefined functions. It’s using their primary strength: the generation of structured text.

Why it works: The theory behind code eval

Function calling forces the LLM to operate in a espace d’actions discret : il choisit parmi N fonctions. Le code eval lui ouvre un espace d’actions continu: he freely composes instructions. The difference is fundamental. An agent with 50 predefined functions can perform 50 actions. An agent with a Python interpreter can perform an infinite number of combinations.

En pratique, cela signifie qu’un agent e-commerce en code eval peut :

  • Combine filters that the developer did not intend: “All coats with a review/price ratio above the category median”
  • Generate dynamic analyzes: calculate a personalized relevance score for each product in real time
  • Adapt your strategy to the context: write a different workflow depending on the complexity of the user request

Quels avantages pour les agents e-commerce ?

For an e-retailer, the transition from function calling to code eval has direct implications on the quality of the agent experience and the speed of development.

1. Des recommandations produit radicalement plus fines

A function calling agent recommends a product because it corresponds to predefined criteria (category, price, rating). An agent in eval code can compose sophisticated recommendation logic in real time: weight the rating by the recency of reviews, cross-reference availability with delivery time, and integrate the customer’s browsing history — all in a single block of code generated on the fly.

2. A massive reduction in latency

Instead of 8 to 12 sequential function calls for a complete recommendation journey, the agent generates un seul bloc de code which executes the entirety of the logic. The latency perceived by the user decreases significantly. For an e-commerce chatbot, this is the difference between a response in 3 seconds and a response in 800 milliseconds.

3. Reduced maintenance cost

Function calling requires maintaining a catalog of documented, tested and versioned functions. Every new feature — a new filter, a new sort criterion, a new promo rule — involves server-side code. With the eval code, the LLM adapts to the available context. All you need to do is expose the data and a basic API to it; he makes up the rest.

4. Instant adaptation to unforeseen demands

The typical case: a customer asks “Compare these 3 products taking into account free returns and the eco-responsibility of the materials”. With function calling, this request fails if no function compare_products_with_return_policy_and_sustainability does not exist. With eval code, the agent writes comparison code on the fly, querying the available structured data.

409 comments on the post r/LocalLLaMA — proof of the intensity of the debate in the AI ​​community

Quels sont les risques de cette approche ?

The debate on r/LocalLLaMA was lively, and the community identified real risks. One of the most shared comments was direct:

“The OP’s post is a psyop to give your LLM agent full rights to your terminal. » r/LocalLLaMA · Warning comment

The tone is provocative, Mays the substance is relevant. Letting an LLM generate and run code in production raises serious questions.

Security: expanded attack surface

Avec le function calling, the agent appelle des fonctions que vous have written, tested and validated. The scope of action is defined in advance. With the eval code, the LLM can generate du code que vous n’avez jaMays vu. Les risques sont concrets :

  • Injection de code malveillant: a user who manipulates the prompt can cause the LLM to generate destructive code
  • Unauthorized access: generated code could attempt to access system resources (files, network, databases) outside the perimeter
  • Data exfiltration: the code could send sensitive data (supplier prices, margins, customer data) to an external endpoint

The answer: sandboxing

La solution technique est le sandboxing: run the generated code in an isolated container with strict permissions. Best practices identified by the community:

  • Ephemeral container: each execution runs in a container destroyed after use
  • Permissions minimales: read-only access to product data, no outgoing network access
  • Timeout strict: 5 seconds maximum of execution to block infinite loops
  • Validation de l’output: the result of the code is validated (type, size, format) before being sent back to the LLM
  • Allowlist d’imports: only authorized modules (pandas, json, math) can be imported

Reliability: the LLM can be wrong

An LLM generates code that semble correct Mays contient des erreurs subtiles. A discount calculation with incorrect rounding, a filter that excludes valid products, a query that ignores pre-order products. In function calling, functions are tested individually. In eval code, each execution is potentially unique.

La mitigation : des tests de validation automatiques on the code output. If the result of a product recommendation is empty while the catalog contains 2,000 references, the system triggers a fallback to deterministic logic.

Observability: debugging the unpredictable

An advantage of function calling: each call is traced, logged, reproducible. With eval code, the generated code is different each time it is run. Debugging a problem reported by a customer means looking for code that no longer exists. The solution: systematically log the generated code, its inputs, its outputs and its execution time.

Sur r/ArtificialIntelligence, un post avec 532 upvotes et 419 commentaires warned about the risks of autonomous AI agents in general: “AI agents today are far more dangerous than you think. » This message of caution directly concerns the eval code in e-commerce production.

Que signifie « construire des apps pour un monde qui change » ?

Alongside the function calling debate, another Reddit thread has hit the e-commerce community. On r/vibecoding, un post avec 234 upvotes et 521 commentaires posait la question frontalement : « We’re building apps for a world that’s about to stop using them. »

« We’re building apps for a world that’s about to stop using them. » r/vibecoding · 234 upvotes · 521 commentaires

The idea is that the interfaces classiques du e-commerce — category pages, faceted filters, product sheets, basket — are designed for huMayns who browse. When an AI agent intervenes between the consumer and the catalog, these interfaces become secondary. The agent interacts directly with the data, not the layout.

L’interface invisible : quand the agent remplace le navigateur

An AI agent in eval code interacts with your catalog in a radically different way than a huMayn visitor:

  • Il interroge votre API REST ou votre structured data flow (Schema.org, Google Merchant Feed)
  • It writes its own filtering and aggregation queries — it composes the business logic instead of being subjected to the filters you have planned
  • It synthesizes reviews, specifications and conditions of sale into a single response for the user

Cela signifie que l’investissement dans le design de pages produit, aussi important soit-il pour le trafic huMayn, est supplemented by an investment in the data layer. Both are necessary. Retailers that expose rich structured data, well-documented APIs, and comprehensive product feeds are the ones agents recommend.

The parallel with SEO: from visible to exploitable

History repeats itself. In 2010, e-retailers who thought “my site is beautiful, therefore it will rank well” were overtaken by those who understood the SEO technique: semantic markup, internal marketing, performance. In 2026, the same shift occurs: sites structured for agents take the advantage over sites designed only for the eyes.

The r/vibecoding post generated 521 commentaires — double the thread on Manus. The question touches a nerve: a significant part of the technical community realizes that software architecture will be redesigned around agent interaction, and that this shift is already underway.

How can you apply these lessons to your e-commerce stack?

The shift from function calling to code eval has direct lessons for any e-retailer who integrates or plans to integrate AI agents into their customer journey. Here are the 5 concrete actions.

Action 01
Expose full structured data

An agent in eval code composes its own requests. The more complete and structured your data is (Schema.org Product, Offer, AggregateRating), the more the agent can build relevant recommendations. Each missing attribute is a comparison criterion lost to a competitor who provides it.

Action 02
Documenter votre API pour les agents

Si vous disposez d’une API REST (WooCommerce, Shopify, PrestaShop), documentez-la clairement dans votre fichier llms.txt. Specify endpoints, response formats, and filter settings. An agent who understands your API can write optimal code to query your catalog.

Action 03
Secure agent interactions

Adoptez une approche hybride : function calling for critical operations (paiement, modification de commande, gestion de stock) et code eval pour les exploratory operations (research, comparison, recommendation). Financial transactions remain within a controlled perimeter.

Action 04
Enrichir le flux produit

Eval code agents use data feeds (Google Merchant Center, Meta Product Feed) as a priMayre source. Enrich your feeds with tous les attributs disponibles: material, certifications, delivery time, return policy, standardized size. A comprehensive feed is the best investment for agentic visibility.

Action 05
Monitorer les interactions des agents

AI agents leave traces: specific user-agent (GPTBot, ClaudeBot, PerplexityBot), characteristic request patterns, rapid access sequences. Set up dedicated monitoring to understand comment les agents utilisent votre catalogue: what pages, what data, what frequency.

The hybrid approach: the recommended strategy

The debate on Reddit is not a binary choice. The reality in e-commerce production is a spectre :

  • Critical operations (payment, stock, customer data): function calling with strict validation
  • Exploratory operations (recherche, recommandation, comparaison) : code eval avec sandbox
  • Analytical operations (reporting, trends, segmentation): batch eval code, outside of real time

This segmentation allows you to take advantage of the flexibility of eval code where it creates value, while maintaining control where security requires it.

What this means for your digital strategy

Le virage du lead de Manus est un signal fort : AI agents are evolving towards more autonomy. They are capable of generating their own interaction logic with your catalog. E-retailers who prepare gain a structural advantage: their data is used in depth, their products are recommended as a priority, their catalog becomes the reference that agents consult.

Function calling dominated agent building for 3 years. Eval code opens a new chapter. For e-commerce, the lesson is the same as in SEO: the best strategy is to make your data so complete and structured that any agent, regardless of their internal functioning, finds what they are looking for in you first.

Prepare your catalog for AI agents

Free audit of your e-commerce stack: structured data, product flow, API, agent visibility. Results in 48 hours.

Book a free audit

Frequently asked questions

Qu’est-ce que le function calling en IA ?

Function calling is a mechanism where a language model (LLM) generates a structured call to a predefined function with typed parameters. The agent has a catalog of available functions and chooses which to call depending on the context. This is the dominant mode of building AI agents since 2023.

Qu’est-ce que le code eval comme alternative ?

The eval code consists of letting the LLM directly generate executable code (Python, JavaScript) instead of calling predefined functions. The code is then executed in a controlled environment (sandbox). This approach offers more flexibility because the agent freely composes its actions.

Is code eval riskier than function calling?

Code eval introduces a broader attack surface: LLM can potentially generate malicious or unexpected code. Sandboxing (isolated containers, restricted permissions) is essential. In e-commerce production, additional validation layers make it possible to control each execution.

How does this development impact e-commerce SEO?

AI agents become more autonomous in the way they interact with catalogs. An agent in eval code can write its own filtering and aggregation queries. Sites with complete structured data (Schema.org, REST API, product feed) are better leveraged by these agents.

Faut-il reconstruire ses agents e-commerce avec du code eval ?

The hybrid approach is recommended: keep function calling for critical operations (payment, stock management) and use the eval code for exploratory tasks (product comparison, trend analysis). The choice depends on the level of control required by each operation.

Que signifie « construire des apps pour un monde qui change » en e-commerce ?

A post on r/vibecoding (234 upvotes) warned: classic interfaces (category pages, faceted filters, shopping carts) could be bypassed by AI agents that interact directly with the data. E-retailers benefit from exposing their data via APIs and structured formats, in addition to the visual interface.

Stéphane Jambu

Stéphane Jambu

SEO & AI Engineer

Engineer by training, I manage 1,300+ semantic clusters deployed for 650+ e-commerce and B2B clients from Southeast Asia. What sets me apart: I demonstrate. First call = live audit of your site.

Follow on LinkedIn

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *