It might surprise you but mathematically Musk & Wenfeng would probaly get on very well ; to make a mark they began with finacial ai products but today whats common is both are developing 100 times more efficient reasoning models than chats which spend far too much time on writing not the actual logic inovation that the customer as questioner values;
whats next for both guys - my guess is china will free wengfeng up to advance education whereas for a while musk will be tied up with DOGE top down government crisis - how ironical one of china's greatest intelligences is advancing all the peoples agency; america's is cuight up in how extremely top heavy systems (left or right) fail to serve the peoples at communittes we still all live in
======================================
here's grom 3 as of march 2025 (to be fair I didnt ask it to compare the 2 leaders just where their most popular chats are outracing the expensive chatgp3
- Grok 3: Launched in February 2025, Grok 3 comes in flavors like Grok 3 Reasoning (beta) and Grok 3 Mini Reasoning. Unlike chat-first models optimized for fluent conversation, Grok 3’s “Think” and “DeepSearch” modes prioritize step-by-step problem-solving—math (93.3% on AIME 2025), science (84.6% on GPQA), coding (79.4% on LiveCodeBench). It’s built to “think through” problems, not just generate text, with a 1M token context window for deep analysis. Andrej Karpathy’s vibe check (February 2025) pegged it near OpenAI’s o1-pro, a reasoning beast, not a chatty assistant.
- DeepSeek: DeepSeek R1 (January 2025) and V3 (open-source, 671B parameters) focus on “chain-of-thought” reasoning—excelling in STEM tasks like coding (82.6% HumanEval) and math (90.2% MATH). Its Mixture-of-Experts (MoE) architecture activates only 37B parameters per query, targeting efficiency in multi-step logic over casual banter. It’s less about storytelling, more about structured problem-solving—think theorem proofs or code optimization.
- Architecture Efficiency:
- DeepSeek’s MoE: With 671B total parameters, DeepSeek activates just 37B per task—sparse computing cuts power and time. It’s like deploying a crack team instead of an army. Trained on 2,048 NVIDIA H800 GPUs in 55 days for $5.5M, it’s lean compared to OpenAI’s behemoths.
- OpenAI’s Dense Model: GPT-4o’s 1.8T parameters run full-tilt per query—great for broad tasks but overkill for reasoning alone. Training costs likely exceed $100M (estimated), with massive GPU clusters. It’s a generalist, not a specialist—less efficient when reasoning is the sole focus.
- Training Smarts:
- DeepSeek’s RL Edge: DeepSeek skips heavy supervised fine-tuning (SFT) upfront, using reinforcement learning (RL) with cold-start data, then SFT for polish. This hones reasoning—math, coding—without bloating the model for chat fluff. It’s a sniper, not a shotgun.
- OpenAI’s Broad Pretraining: GPT-4o’s vast dataset (books, web, up to 2023) and GPT architecture aim for conversational depth—nuance, creativity, context. Reasoning (o1 family aside) is bolted on, not native, so it chugs more compute to “think” through problems.
- Task Focus:
- DeepSeek’s Niche: Optimized for STEM—math, code, science—it skips chatty overhead. AIME or LiveCodeBench? DeepSeek nails it fast, no small talk. Real-time data? Less so—it’s academic, not dynamic.
- OpenAI’s Versatility: GPT-4o juggles everything—poetry, customer support, coding—via plugins (DALL-E 3, web browsing). Reasoning’s there (o1 scores 79% on AIME), but it’s not the priority—efficiency dips when analysis trumps generation.
- Compute Scaling:
- DeepSeek: $5.5M training shows you don’t need billions to reason well—smaller scale, bigger focus. Grok 3’s 200,000 H100 GPUs (10X Grok 2) dwarf it, yet DeepSeek competes.
- OpenAI: Massive compute (o1-pro, $200/month) buys power, but not efficiency—reasoning modes “think” longer (minutes) than DeepSeek’s snappy responses.
- DeepSeek: Open-source (MIT license), cost-effective—perfect for youth coders in Nairobi or Mumbai. HALOS roads (previous chat)? It could optimize junctions cheap and fast.
- Grok 3: Free on X (rate-limited), 1M-token depth—youth can probe big problems (health, cities) with Huang’s Omniverse (2025) tie-ins. Less efficient than DeepSeek, but flashier.
No comments:
Post a Comment