Error Pop-Up - Close Button Could not find Kindle Notes & Highlights for that user.

David Borish's Blog, page 4

February 21, 2025

February 17, 2025

February 12, 2025

February 11, 2025

February 10, 2025

February 7, 2025

Decoding DeepSeek: The $720M Reality Behind the $5M Myth and the Innovations that Rattled the Industry


In a development that mirrors the early stages of the TikTok controversy, US lawmakers are now pushing for an immediate ban of DeepSeek on government devices, citing national security concerns. This move comes amid growing scrutiny of the company's remarkable technical achievements and contested claims about its development costs. As we unpack DeepSeek's journey from a quantitative trading firm's side project to a major AI player, a complex picture emerges of technical innovation, strategic GPU acquisition, and mounting regulatory challenges.


The story of DeepSeek's rise begins well before its public emergence. High-Flyer, its parent company, made a series of strategic moves in GPU acquisition that would later prove crucial. In 2021, they built what would become China's largest A100 cluster, comprising 10,000 GPUs, carefully timing this expansion before export controls took effect. By late 2023, they had secured an additional 2,000 H800 GPUs, completing this purchase just before these units were banned. Today, their infrastructure spans approximately 50,000 GPUs, distributed across trading, research, and AI model training operations.


DeepSeek's claim of developing their model for just $5 million has raised significant skepticism in both the AI and financial communities, and for good reason. A detailed analysis of their infrastructure reveals the true scale of their investment: their original 2021 purchase of 10,000 A100 GPUs alone cost between $100-150 million, while their strategic acquisition of 2,000 H800 GPUs in late 2023 added another $50-60 million to their infrastructure costs. When accounting for their remaining approximately 38,000 GPUs, even with conservative pricing estimates, the hardware costs alone exceed $450 million. Factor in the necessary cooling systems, power infrastructure, and data center costs—which typically add 30-40% to the total—and DeepSeek's true infrastructure investment likely falls between $590-720 million. This suggests that the publicized $5 million figure refers only to the incremental costs of training the specific model, such as power consumption and engineering time, while omitting the massive infrastructure investment that made it possible.


DeepSeek's Technical Innovations: A Deep Dive

While the controversy over costs has dominated headlines, DeepSeek's genuine technical innovations deserve careful examination. The company has made several breakthrough advances that have significantly pushed the boundaries of efficient AI model training and deployment.


Revolutionary Attention Mechanism

DeepSeek's Multi-head Latent Attention (MLA) architecture represents a fundamental rethinking of how large language models process information. Traditional attention mechanisms require enormous memory resources to track relationships between different parts of text. MLA achieves remarkable 80-90% memory savings compared to standard attention mechanisms through a novel approach that compresses these relationships into a latent space. This isn't just an incremental improvement—it's a paradigm shift that enables the processing of much longer text sequences while using significantly less computational resources.


The implementation of MLA required solving complex technical challenges, particularly in the integration with rotary positional embeddings (RoPE). DeepSeek's engineers developed a sophisticated approach that maintains positional information accuracy while working within their compressed attention framework. This breakthrough alone has implications far beyond their own models, potentially offering a path forward for the entire field of large language models.


Advanced Mixture of Experts Architecture

DeepSeek's implementation of Mixture of Experts (MoE) architecture pushes technical boundaries in ways that even their competitors haven't attempted. While companies like Google and Microsoft typically use 8-16 experts with 2 active at once, DeepSeek developed a system using 256 experts with only 8 active at any given time—a 32:1 ratio that represents a dramatic leap forward in model efficiency.


This wasn't simply a matter of scaling up existing approaches. DeepSeek developed:


A novel routing mechanism that replaces the standard auxiliary loss approach

Advanced load balancing techniques that ensure efficient utilization across all experts

A custom implementation that enables the model to effectively handle a 600B+ parameter space while only using 37B parameters for any given computation


Infrastructure Optimization

Perhaps most impressively, DeepSeek achieved remarkable efficiency gains through low-level optimizations that few companies have attempted:


Custom CUDA Implementation: Rather than relying on standard libraries, DeepSeek developed custom implementations below the CUDA layer, allowing for more precise control over GPU resources.

Direct SM Scheduling: Their engineers created a custom streaming multiprocessor scheduling system that optimizes how computational tasks are distributed across GPU cores.

Communications Architecture: Instead of using the standard NVIDIA Collective Communications Library (NCCL), DeepSeek developed a custom communications scheduling system that better suits their specific architecture.

PTX-Level Programming: The team optimized at the PTX (Parallel Thread Execution) level—essentially writing GPU assembly code—to squeeze maximum performance from their hardware.


These optimizations weren't just technical exercises. They've resulted in concrete performance improvements, enabling DeepSeek to achieve inference costs of $2 per million tokens—a fraction of what competitors typically charge.


The Warning Signs: A Pattern of Underestimation

On July 1st, 2024, I published the article "China's Recent AI Surge Challenges US Dominance: A Wake-Up Call for the West," highlighting China's rapid advancement in AI capabilities. The piece noted how Chinese models, particularly Alibaba's Qwen series, were beginning to dominate international benchmarks. Despite presenting clear evidence of China's progress, the article faced significant skepticism, with some dismissing it as propaganda.


Just two months later, on August 30th, 2024, a follow-up article "China's AI Takes the Lead: A Second Wake-Up Call for the West" documented China's continued acceleration, with Alibaba's Qwen2-VL outperforming OpenAI's GPT-4V. These warnings about China's AI capabilities being underestimated went largely unheeded by the broader tech community and media.


The Regulatory Response

The situation has escalated rapidly in February 2025, with bipartisan legislation introduced for a government-wide ban. Several federal agencies have already taken preemptive action to restrict usage, following similar moves by multiple countries internationally. The discovery of potential data sharing with China Mobile has only intensified these concerns.


DeepSeek's technical prowess is particularly evident in their cost efficiency, achieving $2 per million tokens compared to competitors' substantially higher rates. They've successfully worked around H800's limited interconnect bandwidth through innovative optimization techniques and developed novel approaches to memory management. However, these achievements come with significant challenges: capacity limitations have forced them to suspend API registrations, their inference serving capability remains limited, and they face increasingly restricted GPU access due to export controls.


The company's journey reflects a broader pattern in Chinese tech companies' expansion into Western markets: initial success followed by mounting regulatory scrutiny. As with TikTok before it, DeepSeek's government device ban may presage broader restrictions. Yet regardless of regulatory outcomes, their technical contributions to AI efficiency and training methodology will likely influence the industry for years to come.


DeepSeek's story ultimately serves as a crucial case study in the complex interplay between technical innovation, market dynamics, and geopolitical tensions in the AI race. While their cost claims may be disputed, their technical achievements and the regulatory response they've prompted offer important lessons about the future of global AI development and competition.


 •  0 comments  •  flag
Share on Twitter
Published on February 07, 2025 05:20

February 4, 2025

Deep Research: OpenAI's Latest Tool Reshapes Professional Research



OpenAI has introduced Deep Research, their latest development in AI-assisted research capabilities. The system combines OpenAI's O3 reasoning model with web-search capabilities, creating a tool that performs comprehensive research tasks with remarkable efficiency and accuracy.


The system represents a significant step forward in AI capabilities, showing performance metrics approximately ten times better than GPT-4. When tested on Humanity's Last Exam, the latest benchmark which I recently wrote about in my article "Humanity's Last Exam: The Ultimate Test of AI's Academic Capabilities" which evaluates knowledge across numerous academic subjects, Deep Research achieved notably higher success rates than its predecessors. This improvement stems from its ability to actively search and analyze web content, process information from multiple sources, and generate detailed, cited research papers.




What sets Deep Research apart is its ability to function at a level comparable to professional research analysts. Traditional research analysts typically come equipped with advanced degrees, years of professional experience, and expertise in specific methodologies. They spend years developing skills in data analysis, statistical interpretation, and academic writing. Deep Research has shown proficiency in many of these areas, particularly in gathering and synthesizing information, analyzing data, and producing well-documented reports.


The practical applications of Deep Research are already evident in various fields. In healthcare, it has proven valuable in analyzing treatment options and providing evidence-based recommendations. A notable example comes from Felipe Millan, OpenAI's government go-to-market lead, who used the system to research treatment options for his wife's cancer diagnosis. The tool provided comprehensive analysis of various studies and treatment approaches, offering insights that helped inform their medical decisions.


In the business sector, Deep Research excels at market analysis and trend identification. The system can process vast amounts of data about consumer behavior, market adoption rates, and industry trends, delivering insights that typically require teams of analysts and considerable time to produce. Its ability to analyze complex market data and generate actionable insights has significant implications for business strategy and decision-making.

The economic impact of Deep Research is substantial. According to OpenAI, the system can already handle a meaningful percentage of economically valuable work across various complexity levels. Its performance on high-value tasks, while still developing, suggests significant potential for growth and improvement.


Currently, Deep Research is available to Pro users at $200 monthly, with a limit of 100 queries per month. This limitation reflects the compute-intensive nature of the tool, which combines advanced model capabilities with extensive web searching and data processing. OpenAI plans to expand access to Team and Enterprise users in the future.


The implications of Deep Research extend beyond its immediate capabilities. The tool represents a significant advancement in AI's ability to conduct independent research, synthesize information, and generate insights. While it currently serves as a powerful complement to human researchers rather than a replacement, its development suggests expanding possibilities for AI in complex research tasks.


What distinguishes Deep Research from traditional research tools is its ability to process information in real-time, verify data across multiple sources, and make autonomous decisions about research directions. The system maintains comprehensive tracking of sources and citations, ensuring transparency and reliability in its findings.


For perspective, human research analysts with comparable skills command significant compensation in today's market. Senior research analysts in financial services or technology sectors typically earn base salaries ranging from $85,000 to $150,000 annually, with total compensation including bonuses potentially reaching $200,000 or more. In specialized fields like healthcare or biotechnology, where expertise in both research methodology and domain knowledge is crucial, salaries can exceed $175,000. Management consulting firms often pay their research analysts between $90,000 and $160,000, while market research analysts in technology companies earn between $80,000 and $130,000 annually. These figures reflect the high value placed on skilled professionals who can conduct thorough research, analyze complex data, and provide actionable insights – capabilities that Deep Research now demonstrates at a fraction of the cost.


The development of Deep Research reflects a broader trend in the evolution of AI capabilities. As these systems become more sophisticated in handling complex research tasks, they are likely to transform how we approach knowledge discovery and synthesis across academic disciplines, business sectors, and scientific fields.


Looking ahead, the role of AI in research and analysis will likely continue to expand. Deep Research demonstrates that AI can effectively handle many aspects of professional research work, from initial data gathering to final analysis and presentation. While human expertise remains crucial, particularly in interpreting results and making strategic decisions, tools like Deep Research are poised to significantly enhance research capabilities across all fields of study.




 •  0 comments  •  flag
Share on Twitter
Published on February 04, 2025 05:08

January 30, 2025

Prompts Aren't Enough: U.S. Copyright Office Issues Major AI Copyright Guideline



The U.S. Copyright Office has released a comprehensive report that establishes clear guidelines on the relationship between artificial intelligence and copyright protection, marking a significant development in how AI-generated works will be treated under copyright law.


In the report released January 2025, the Copyright Office maintains that human authorship remains a fundamental requirement for copyright protection, while providing detailed guidance on how AI-assisted works may qualify for protection. The report, titled "Copyright and Artificial Intelligence, Part 2: Copyrightability," comes after extensive consultation with over 10,000 commenters representing diverse stakeholders from all 50 states and 67 countries.


A key finding is that prompts alone - the text instructions given to AI systems - do not provide sufficient creative control to warrant copyright protection. However, the Office acknowledges that AI can be used as a tool in creating copyrightable works when there is substantial human creative input and control over the expressive elements.


The report outlines several important principles:


Works generated entirely by AI without meaningful human creative input cannot receive copyright protection

Using AI as an assistive tool does not disqualify a work from copyright protection

Original expression created by human authors remains protected even when combined with AI-generated content

The evaluation of human contribution must be assessed on a case-by-case basis


"Copyright protects the original expression in a work created by a human author, even if the work also includes AI-generated material," the report states, while emphasizing that "copyright does not extend to purely AI-generated material, or material where there is insufficient human control over the expressive elements."


The Office explicitly rejected calls for new legislation or sui generis rights for AI-generated works, finding that existing copyright law principles are sufficient to address current challenges. This decision comes despite some stakeholders arguing that protecting AI-generated works could promote innovation and creativity.


The report also addresses international developments, noting that while different countries are taking varied approaches, there appears to be an emerging consensus on the requirement for human authorship. The Office plans to continue monitoring technological and legal developments to determine if any adjustments to these guidelines become necessary.


To assist creators and businesses, the Copyright Office will provide ongoing guidance through additional registration materials and updates to its Compendium of U.S. Copyright Office Practices. The report represents a balanced approach that aims to preserve the incentives for human creativity while acknowledging the growing role of AI in creative processes.


This guidance comes at a crucial time as AI tools become increasingly sophisticated and widely available, providing much-needed clarity for creators, businesses, and the legal community on how copyright law will apply to works created with AI assistance.


Click here to read the full report: Copyright and Artificial Intelligence


 •  0 comments  •  flag
Share on Twitter
Published on January 30, 2025 03:51