Notes

Notes

27.02.2025

07.03.2025

12.03.2025

  1. 15 Mind-Blowing AI Statistics Everyone Must Know About Now

    • 34 millionĀ AI-Generated Images Created Daily
    • 71%Ā Of Social Media Images Now AI-Generated
    • Deepfake Fraud Attempts Surge ToĀ 6.5%Ā Worldwide
    • Tech Giants InvestingĀ $320 BillionĀ In AI Development For 2025
    • Global AI Services Market To ReachĀ $243 BillionĀ This Year
    • 97%Ā Of Leaders Investing In AI Report Positive Return On Investment
    • 25%Ā Of Enterprises Will Deploy AI Agents This Year
    • Healthcare AI Market Valued AtĀ $38.7Ā Billion, Doubling Since 2023
    • Adoption Gap:Ā 81%Ā Of Workers Still Not Using AI Tools
    • Trust Divide: AIĀ AcceptanceĀ High In India (77%) and China (72%), Low In America ((32%)
    • AI Influencer Economy ApproachingĀ $7 BillionĀ Valuation
    • Data Centers To ConsumeĀ 5%Ā Of U.S. Power, Doubling By 2030
    • 30%Ā Of New Smartphones To Feature On-Device GenAI
    • 50%Ā Of Companies That Use AI Incorporate Open-Source Solutions
    • Nearly Half Of Tech Leaders (49%) Say AI Is Now Fully Integrated Into Business Strategy
  2. Open R1 for Students - Hugging Face NLP Course
  3. Reinforcement Learning Course

13.03.2025

I’ve gathered valuable insights that have helped me succeed in solving complex problems during coding interviews. Here’s what I’ve learned:

1. If the Input Array is Sorted

When you are given a sorted array, there are two go-to techniques that will make your job easier:

  • When to use it: Binary search is the optimal solution for searching for an element or solving problems related to finding a position in a sorted array.

  • How it works: Binary search works by repeatedly dividing the search interval in half. It starts by comparing the target value to the element in the middle. If they are not equal, it eliminates half of the search space.

  • Key advantage: It operates in O(log n) time complexity, which is much faster than a linear scan (O(n)).

Two Pointers

  • When to use it: If the problem asks to find pairs or elements satisfying certain conditions (e.g., sum equals to a target), two pointers is your best choice for a sorted array.

  • How it works: You initiate two pointers—one starting from the beginning of the array and the other from the end—and move them toward each other based on the condition you’re trying to satisfy.

  • Key advantage: The time complexity of this approach is O(n), and it’s very efficient for problems involving pairs or finding relationships between elements in sorted arrays.


2. If Asked for All Permutations/Subsets

When you’re asked to generate all permutations or subsets of a given set, the solution is usually based on backtracking.

Backtracking

  • When to use it: Backtracking is a powerful technique for solving problems where you have to explore all possibilities or combinations in a constrained space, like generating all permutations or subsets.

  • How it works: It involves exploring a potential solution space by building solutions incrementally, then abandoning a solution as soon as it is determined that it cannot be extended to a valid solution (this is known as ā€œpruningā€).

  • Key advantage: This method ensures you explore all possible combinations or permutations and works efficiently by reducing the search space through pruning.


3. If Given a Tree

When you’re working with a tree, whether it’s a binary tree or a general tree, there are two main strategies you should consider: Depth-First Search (DFS) and Breadth-First Search (BFS).

DFS

  • When to use it: DFS is most useful when you need to explore all possible paths (e.g., for finding paths, maximum depths, or solving tree traversal problems).

  • How it works: DFS explores as deep as possible down a branch before backtracking. It can be implemented recursively or using a stack.

  • Key advantage: It allows you to explore deeply nested nodes and works well for problems like pathfinding or tree traversal.

BFS

  • When to use it: BFS is ideal for level-order traversals, finding the shortest path in unweighted graphs, or problems where you need to explore nodes level by level.

  • How it works: BFS explores all neighbors of a node at the present depth level before moving on to nodes at the next depth level.

  • Key advantage: BFS guarantees that you find the shortest path in an unweighted graph and is useful for problems that require finding nodes at a certain level.


4. If Given a Graph

Graphs, whether directed, undirected, weighted, or unweighted, require two basic traversal techniques:

DFS

  • When to use it: Use DFS when you need to explore all possible paths in a graph (like finding connected components or cycles).

  • How it works: DFS can be implemented recursively or with an explicit stack. It explores deeper into the graph before backtracking.

  • Key advantage: It’s suitable for problems where deep exploration is needed, such as topological sorting or cycle detection.

BFS

  • When to use it: BFS is best when you’re looking for the shortest path, level-order traversal, or connected components.

  • How it works: BFS uses a queue to explore nodes level by level. It’s optimal for problems where the shortest path or level information is important.

  • Key advantage: BFS guarantees finding the shortest path in an unweighted graph and is ideal for problems like shortest-path algorithms.


5. If Given a Linked List

When given a linked list, two pointers is the most efficient technique.

Two Pointers

  • When to use it: Two pointers are used for problems such as detecting cycles in a linked list, finding the middle node, or reversing a linked list.

  • How it works: You initialize two pointers—slow and fast—where the slow pointer advances one step at a time, and the fast pointer advances two steps. This is helpful in problems involving the middle of a list or detecting loops.

  • Key advantage: This approach reduces time complexity and works effectively for problems requiring traversal with conditions on the steps.


6. If Recursion is Banned

Recursion can often be replaced by using a stack to simulate the recursion process.

Stack

  • When to use it: Use a stack to replace recursion for problems like depth-first search or when the recursive solution leads to a stack overflow due to deep recursion.

  • How it works: You maintain a stack of nodes or elements to process, and the algorithm processes them in a controlled, iterative manner.

  • Key advantage: It helps avoid the pitfalls of deep recursion and stack overflows while achieving the same result.


7. If You Must Solve In-Place

If the problem requires in-place operations, meaning you cannot use extra memory, you’ll rely on techniques like swapping values or storing multiple values in a single pointer.

Swap Corresponding Values

  • When to use it: This is useful for problems like reversing an array or swapping elements to solve problems like sorting.

  • How it works: You swap two values in-place without needing extra space.

  • Key advantage: It minimizes space complexity to O(1) while performing the operation efficiently.

Store Multiple Values in the Same Pointer

  • When to use it: This is typically used in problems like sorting or rearranging arrays where you can overwrite existing values as you progress.

  • How it works: You can overwrite or re-use memory locations to store intermediate values, which reduces memory usage.

  • Key advantage: It saves space and allows in-place modification of data structures.


8. If Asked for Maximum/Minimum Subarray/Subset/Options

Dynamic programming (DP) is your go-to approach for solving these problems efficiently.

Dynamic Programming

  • When to use it: DP is used for problems where the problem can be broken down into overlapping subproblems that can be solved optimally. It’s particularly useful for problems involving finding the maximum or minimum values in a sequence.

  • How it works: DP stores intermediate results to avoid redundant calculations. It can be implemented in both bottom-up (iterative) or top-down (recursive with memoization) approaches.

  • Key advantage: DP reduces time complexity from exponential to polynomial time in many cases, which makes it ideal for optimization problems.


9. If Asked for Top/Least K Items

For problems where you need to find the top K or least K elements, heaps or QuickSelect are the most efficient choices.

Heap

  • When to use it: Heaps are optimal for finding the k-th largest or smallest elements in an unsorted list.

  • How it works: A heap is a binary tree where each parent node is either larger or smaller than its child nodes. For top K or least K elements, a heap can be used to efficiently extract the largest or smallest elements.

  • Key advantage: Heaps provide an O(log n) time complexity for insertions and deletions, making them efficient for dynamic data.

QuickSelect

  • When to use it: QuickSelect is an efficient algorithm for finding the k-th largest or smallest element without fully sorting the array.

  • How it works: QuickSelect partitions the array and selects a subset of elements to recursively find the desired element in O(n) time on average.

  • Key advantage: QuickSelect is faster than sorting the entire array when you only need to find the k-th element.


10. If Asked for Common Strings

For problems that involve finding common strings or substrings, maps or tries are the best solutions.

Map

  • When to use it: A hash map can be used to track occurrences of strings, helping find common substrings or strings with certain properties.

  • How it works: You store the string’s frequency or occurrence in a map, and then easily retrieve values based on certain conditions.

  • Key advantage: Maps provide O(1) time complexity for lookups, making them highly efficient for these problems.

Trie

  • When to use it: A Trie is optimal for problems involving string matching or prefix-based queries.

  • How it works: A Trie is a tree-like data structure used for storing a dynamic set of strings, which is useful for finding common substrings or prefixes.

  • Key advantage: It allows for fast lookups and prefix-based searches in O(n) time complexity.


11. General Strategy

For problems not falling under any specific category, you can often use maps/sets for O(1) time complexity and O(n) space, or sorting for O(n log n) time complexity and O(1) space (depending on the problem).

Maps/Sets

  • When to use it: Maps and sets are used to store unique elements or key-value pairs. These structures give you efficient access to elements and allow you to check for membership in constant time.

  • Key advantage: They provide fast operations for insertion, deletion, and lookup, making them ideal for many problems involving unique elements or memberships.

Sorting

  • When to use it: Sorting is useful when you need to reorder data to apply algorithms like binary search or when working with problems like finding the kth smallest/largest element.

  • Key advantage: Sorting provides a way to convert a complex problem into a more manageable one, and many algorithms like binary search work well with sorted data.

14.03.2025

Can Zenoh revive Autonomous Vehicle Platooning?

First what is Autonomous Vehicle Platooning?

Rather than having a fleet of autonomous vehicles; you have ONE autonomous vehicle that acts as a ā€œmasterā€, and other ā€œautomatedā€ vehicles that act as followers.

Imagine a convoy of trucks, where the first truck has all the sensors and intelligence, and communicates the instructions to its followers that then ā€œbreakā€ or steer the vehicle.

This is platooning, a leader — and followers.

There are tons of theoretical advantages to using a platooning solution: reduced drag, shortest distance, no need to equip the entire fleet of truck with LiDARs, …

And it works like this:

The problem is, it never really worked.
It can work from a ā€œprototypeā€ perspective, but (to my knowledge) I don’t think the autonomous truck world ever adopted platooning as a solution.

One of the reasons is latency. You can’t risk sending a ā€œbreakā€ instruction 1 second too late when a convoy of trucks drive at 90 km/h. It’s way too risky.

So what could you do?

You could try and reduce that communication latency.

Zenoh?

Sounds both Biblical & Star Wars like.

Whoever named it is a genius.

And what is it? It’s a better middleware for ROS.

ROS— a very brief definition would be:Ā ROS is a middleware framework that helpsĀ algorithmsĀ communicate. We can have an object detector receiving images from multiple cameras and forwarding objects to a Trajectory Planner. It’s turning independent algorithms into a system.

To explain why Zenoh is a good idea, let me shareĀ a simple graph decomposing ROS into 4 main parts: Nodes, Tools, Robotics, and Ecosystem.

According to this:

  • ROS = Nodes Management + Robotics Customization + Tools + Ecosystem

But the ā€œnodesā€ part is NOT reallyĀ ROS. It’s a standard protocol, like TCP for ROS1, and something called DDS for ROS2.

And what is Zenoh? A replacement for TCP and DDS in the ā€œNodesā€ part.

This means when you use it, nothing visibly changes:Ā you still have Gazebo, messaging, etc… but under the hood, the protocol changes communication.

This is an under-the-hood modification.

But it’s very powerful, because while DDS (what ROS2 uses by default) was built for wired robotics, Zenoh is built for wireless robotics.

And when you replace the default ROS protocol, you turn a wired robot into a wireless robot.

And I think this is a solution to revive autonomous vehicle Platooning…

But you could also enable tons of other wireless applications, like Fleet Navigation, Drone Delivery, V2X, and many others…

17.03.2025

  • MLOps Basics - The goal of the series is to understand the basics of MLOps like model building, monitoring, configurations, testing, packaging, deployment, cicd, etc.

21.03.2025

24.03.2025

  • GenAI Agents It is a great resource for:
    1/ learning
    2/ building
    3/ sharing AI Agents

    ranging from simple conversational bots to complex, multi-agent systems.

    Key features:
    šŸŽ“ Learn to build GenAI agents from beginner to advanced levels
    🧠 Explore a wide range of agent architectures and applications
    šŸ“š Step-by-step tutorials and comprehensive documentation
    šŸ› ļø Practical, ready-to-use agent implementations
    🌟 Regular updates with the latest advancements in GenAI
    šŸ¤ Share your own agent creations with the community

  • Agentic RAG š—”š—®š˜š—¶š˜ƒš—² š—„š—”š—š
    In Native RAG, the most common implementation nowadays, the user query is processed through a pipeline that includes retrieval, reranking, synthesis, and generation of a response.

    This process leverages retrieval and generation-based methods to provide accurate and contextually relevant answers.

    š—”š—“š—²š—»š˜š—¶š—° š—„š—”š—š
    Agentic RAG is an advanced, agent-based approach to question answering over multiple documents in a coordinated manner. It involves comparing different documents, summarizing specific documents, or comparing various summaries.

    Agentic RAG is a flexible framework that supports complex tasks requiring planning, multi-step reasoning, tool use, and learning over time.

    š—žš—²š˜† š—–š—¼š—ŗš—½š—¼š—»š—²š—»š˜š˜€ š—®š—»š—± š—”š—æš—°š—µš—¶š˜š—²š—°š˜š˜‚š—æš—²

    • Document Agents: Each document is assigned a dedicated agent capable of answering questions and summarizing within its own document.

    • Meta-Agent: A top-level agent manages all the document agents, orchestrating their interactions and integrating their outputs to generate a coherent and comprehensive response.

    š—™š—²š—®š˜š˜‚š—æš—²š˜€ š—®š—»š—± š—•š—²š—»š—²š—³š—¶š˜š˜€

    • Autonomy: Agents act independently to retrieve, process, and generate information.

    • Adaptability: The system can adjust strategies based on new data and changing contexts.

    • Proactivity: Agents can anticipate needs and take preemptive actions to achieve goals.
      Applications

    Agentic RAG is particularly useful in scenarios requiring thorough and nuanced information processing and decision-making.

    A few days ago, I discussed how the future of AI lies in AI Agents. RAG is currently the most popular use case, and with an agentic architecture, you will supercharge RAG!

25.03.2025

  • Last 2 week’s top AI/ML research papers:
    • Transformers without Normalization
    • Block Diffusion
    • Compute Optimal Scaling of Skills
    • DAPO: An OS LLM RL System at Scale
    • Teaching LLMs How to Learn with Contextual Fine-Tuning
    • GR00T N1 - Why the Brain Cannot Be a Digital Computer
    • RWKV-7 ā€œGooseā€ with Expressive Dynamic State Evolution
    • Why Do Multi-Agent LLM Systems Fail?
    • Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
    • Light-R1
    • Where do Large Vision-Language Models Look at when Answering Questions?
    • Improving Planning of Agents for Long-Horizon Tasks
    • UniCombine
    • How much do LLMs learn from negative examples?
    • Tokenize Image as a Set
    • Search-R1
    • Measuring AI Ability to Complete Long Tasks
    • Does Your VLM Get Lost in the Long Video Sampling Dilemma?
    • Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
    • Personalize Anything for Free with Diffusion Transformer
    • The KoLMogorov Test: Compression by Code Generation
    • Optimizing ML Training with Metagradient Descent
    • DeepMesh
    • Thinking Machines
    • A Review of DeepSeek Models
    • A Survey on Efficient Reasoning
    • Agentic Memory for LLM Agents
  • GenAI-Showcase/notebooks/agents at main Ā· mongodb-developer/GenAI-Showcase - Jupyter Notebooks demonstrating how to build AI agents using various frameworks and MongoDB Atlas as the vector store and memory provider.
  • Which Agentic Framework should I use?

27.03.2025

28.03.2025

  • AlibabaĀ releasedĀ Qwen2.5-Omni-7B, a new multimodal AI capable of processing text, images, audio, and video simultaneously while being efficient enough to run directly on consumer hardware like smartphones and laptops.
  • TheĀ Model context protocolĀ (aka MCP) is a way to provide tools and context to the LLM. From the MCP docs:
      MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.
    
  • microsoft/playwright-mcp: Playwright Tools for MCP is a Model Context Protocol (MCP) server that enables large language models to interact with web pages using Playwright, bypassing the need for screenshots or visually-tuned models.

30.03.2025

  • NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments.

01.04.2025

  • Dell and NVIDIA explainingĀ theĀ value of local GPU compute across all industries
  • Anthropic just unlocked an AI’s brain, and found at leastĀ sixĀ things you’ll want to know…
    1. Plans rhyming wordsĀ in advanceĀ before writing poetry.
    2. Uses a universal ā€œlanguage of thoughtā€ across languages:
      1. When asked for antonyms in English, French, and Chinese,Ā theĀ same core features activate—with onlyĀ theĀ final output differing based on language.
    3. SolvesĀ math problemsĀ likeĀ humans do:
      1. One part of Claude’s brain carefully countsĀ theĀ ones place (like knowing 6+9=15, soĀ theĀ answer ends in 5).
      2. While another roughly estimatesĀ theĀ total (like ā€œthat’s around 90-somethingā€).
    4. formsĀ multi-hop reasoningĀ (connecting Dallas → Texas → Austin) ā€œin its head.ā€ They also found ClaudeĀ sometimes tries to deceive its users when faced with conflicting goals:
      • Claude maintains a ā€œknown entityā€ feature that represents whether it knows about a topic.
      • When Claude hallucinates, it’s often becauseĀ theĀ ā€œknown entityā€ incorrectly activates on a topic it doesn’t fully understand (same, bro).
      • Apparently, Claude only recognizes and refuses harmful requests when it reachesĀ theĀ end of a sentence—explaining why some jailbreaks still work.

    TheĀ researchers even caught Claude working backward from human-provided answers to fabricate plausible calculations (whichĀ they call this ā€œmotivated reasoning.ā€). - That means language models can appear to ā€œreasonā€, when whatĀ they’reĀ actuallyĀ doing is working backward from conclusions rather than following logical steps forward

02.04.2025

04.04.2025

  • MCP-Use is the open source way to connect any LLM to MCP tools and build custom agents that have tool access, without using closed source or application clients.

09.04.2025

  • Browser MCP - MCP server for your browser Browser MCP allows users to connect AI apps to their browsers to automate tasks. Automation happens locally on users’ machines, resulting in better performance without network latency. Browser activity stays on-device and isn’t sent to remote servers. Browser MCP uses existing browser profiles, keeping users logged into all of their services. It avoids CAPTCHAs by using real browser fingerprints.
  • If you were building a Q&A feature (or chatbot) based on very long documents (like books), what evals would you focus on?
    1. Two metrics
      • Faithfulness: Grounding of answers in document’s content. Not to be confused with correctness—an answer can be correct (based on updated information) but not faithful to the document. Sub-metric: Precision of citations
      • Helpfulness: Usefulness (directly addresses the question with enough detail and explanation) and completeness (does not omit important details); an answer can be faithful but not helpful if too brief or doesn’t answer the question
      • Evaluate separately: Faithfulness = binary label -> LLM-evaluator; Helpfulness = pairwise comparisons -> reward model
    2. How to build robust evals
      • Use LLMs to generate questions from the text
      • Evals should evaluate positional robustness (i.e., have questions at the beginning, middle, and end of text)
    3. Potential challenges
      • Open-ended questions may have no single correct answer, making reference-based evals trickly. For example: What is the theme of this novel?
      • Questions should be representative of prod traffic, with a mix of factual, inferential, summarization, definitional questions.
    4. Benchmark Datasets

14.04.2025

1ļøāƒ£ š—–š—Øš——š—”
Parallel computing platform and API to accelerate computation on NVIDIA GPUs.

Keypoints:
↳ Kernels - C/C++ functions.
↳ Thread - executes the kernel instructions.
↳ Block - groups of threads.
↳ Grid - collection of blocks.
↳ Streaming Multiprocessor (SM) - processor units that execute thread blocks.

When a CUDA program invokes a kernel grid, the thread blocks are distributed to the SMs.

CUDA follows the SIMT (Single Instruction Multiple Threads) architecture to execute threads logic and uses a Barrier to gather and synchronize Threads.

2ļøāƒ£ š—°š˜‚š——š—”š—”
Library with highly tuned implementations for standard routines such as:
↳ forward and backward convolution
↳ attention
↳ matmul, pooling, and normalization - which are used in all NN Architectures.

3ļøāƒ£ š—§š—²š—»š˜€š—¼š—æš—„š—§
If we unpack a model architecture, we have multiple layer types, operations, layer connections, activations, etc. Imagine an NN architecture as a complex Graph of operations.

TensorRT can:
↳ Scan that graph
↳ Identify bottlenecks
↳ Optimize
↳ Remove, merge layers
↳ Reduce layer precisions,
↳ Many other optimizations.

4ļøāƒ£ š—§š—²š—»š˜€š—¼š—æš—„š—§-š—Ÿš—Ÿš— 
Inference Engine that brings the TensorRT Compiler optimizations to Transformer-based models.

Covers the advanced and custom requirements for LLMs, such as:
↳ KV Caching
↳ Inflight Batching
↳ Optimized Attention Kernels
↳Tensor Parallel
↳ Pipeline Parallel.

5ļøāƒ£ š—§š—æš—¶š˜š—¼š—» š—œš—»š—³š—²š—æš—²š—»š—°š—² š—¦š—²š—æš˜ƒš—²š—æ
An open source, high-performance, and secure serving system for AI Workloads. Devs can optimize their models, define serving configurations in Protobuf Text files, and deploy.

It supports multiple framework backends, including:
↳ Native PyTorch, TensorFlow
↳ TensorRT, TensorRT-LLM
↳ Custom BLS (Bussiness Language Scripting) with Python Backends

6ļøāƒ£ š—”š—©š—œš——š—œš—” š—”š—œš— 
Set of plug-and-play inference microservices that package up multiple NVIDIA libraries and frameworks highly tuned for serving LLMs to production cluster & datacenters scale.

It has:
↳ CUDA, cuDNN
↳ TensorRT
↳ Triton Server
↳ Many other libraries - baked in.

NIM provides the optimal serving configuration for an LLM.

7ļøāƒ£ š——š˜†š—»š—®š—ŗš—¼ š—œš—»š—³š—²š—æš—²š—»š—°š—² š—™š—æš—®š—ŗš—²š˜„š—¼š—æš—ø
The newest inference framework for accelerating and scaling GenAI workloads.

Composed of modular blocks, robust and scalable.

Implements:
↳ Elastic compute - GPU Planner
↳ KV Routing, Sharing, and Caching
↳ Disaggregated Serving of Prefill and Decode.

21.04.2025

These 7 GitHub repos are the ones I always recommend:

  1. All algorithms implemented in Python (199k stars)
    https://lnkd.in/gdeUgjsi

  2. Awesome Machine Learning (67.6k stars)
    https://lnkd.in/gSGVvmBB

  3. Machine Learning interviews from FAANG (10.6k stars)
    https://lnkd.in/gBhPEN6f

  4. Machine Learning cheat sheet (7.6k stars)
    https://lnkd.in/gHtehPv7

  5. Guide for ML / AI technical interviews (6k stars)
    https://lnkd.in/grGHzGqm

  6. Complete System Design (4.3k stars)
    https://lnkd.in/ggkqm9US

  7. 65 Machine Learning interview questions (3.4k stars)
    https://lnkd.in/g4b3T6xx

22.04.2025

29.04.2025

02.05.2025

05.05.2025

  • Microsoft Phi 4 Reasoning Models:
    • Chain-Of-Thought Reasoning: These models follow a clear, step-by-step approach to problem-solving, making their logic transparent and more reliable compared to other compact models that often rely on quick guesses.
    • High-Quality Training: Phi-4 Mini was trained on 1M synthetic math questions from DeepSeek R1. The larger models were trained on curated web content and OpenAI’s o3-mini demos.
    • Big Context Window: Supports 32K tokens by default and can be extended to 64K, making it well-suited for long documents such as legal cases, financial reports, or dense academic papers.
  • Amazon’s NOVA premier is built to teach

06.05.2025

07.05.2025

  • Scaling Long Context and RAG: Insights from Google DeepMindGoogle AI: Release Notes podcast episode on long context The ā€œRelease Notes Podcast: Long Context and RAGā€ episode by Google Deepmind dives deep into scaling context windows and improving long-context models. Hosted by Logan Kilpatrick, Google DeepMind’s Nikolay Savinov explores key challenges and advancements in this area, with a focus on Retrieval-Augmented Generation (RAG) and long context models.
    You will learn:
    • Why token count matters and how models handle multi-million-token inputs
    • The core differences between RAG pipelines and long-context architectures
    • How DeepMind evaluates long-context recall beyond standard benchmarks
    • What’s changed since the 1.5 Pro release in terms of attention and inference
    • Tips for structuring inputs, optimizing infrastructure, and managing cost
    • Where long context fits into agent architectures and reasoning workflows
  • The workflow to make videos:
    1. GPT-4o wrote the script (I made some tweaks)
    2. @krea_aigenerated the starting image
    3. @elevenlabsio made the audio
    4. @hedra_labs animated it

08.05.2025

09.05.2025

Build DeepSeek from Scratch

13.05.2025

  • ZeroSearch: Incentivize the Search Capability of LLMs without SearchingAlibaba-NLP/ZeroSearch: ZeroSearch: Incentivize the Search Capability of LLMs without Searching
    • We propose ZeroSearch, a novel reinforcement learning framework that incentivizes the capability of LLMs to use a real search engine with simulated searches during training.
    • Through supervised fine-tuning, we transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. We further introduce a curriculum rollout mechanism to progressively elicit the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios.
    • We conduct extensive experiments on both in-domain and out-of-domain datasets. Results show that ZeroSearch outperforms real search engine-based models while incurring zero API cost. Moreover, it generalizes well across both base and instruction-tuned LLMs of various sizes and supports different reinforcement learning algorithms.
  • Building LLMs from the Ground Up: A 3-hour Coding Workshop
  • Hugging Face has released nanoVLM, a compact PyTorch-based framework that lets you train a vision-language model from scratch in just 750 lines of code. It’s designed to be readable, modular, and easy to extend, making it ideal for learning, prototyping, or research. The model uses a SigLIP-B/16 vision encoder and a SmolLM2 language decoder. All code is open on GitHub and the Hugging Face Hub.
  • llama.cpp now supports vision input, letting you run multimodal models locally usingĀ llama-mtmd-cliĀ orĀ llama-server. Models like Gemma 3, Qwen2.5 VL, and SmolVLM are already supported, and you can enable vision with a simpleĀ -hfĀ flag or load your own projector file if needed. What’s nice is that vision is now built directly into the server—no extra hacks or plugins. This makes it easier to manage, faster to update, and cleaner to use across different tools.

14.05.2025

  • Sakana AIĀ unveiled Continuous Thought Machines (CTMs)Continuous Thought Machines, a new type of model that makes AI more brain-like by allowing it to ā€œthinkā€ step-by-step over time instead of making instant decisions like current AI systems do.

    Example code: continuous-thought-machines/examples/01_mnist.ipynb at main Ā· SakanaAI/continuous-thought-machines

    1. Internal Recurrence (aka ā€œThought Stepsā€)

    Think of this as the model taking time to internally reflect or think, even if the input is static (like an image). Each ā€œtickā€ is one such thought step.

    • Unlike RNNs or Transformers which move through time/data,

    • CTM has internal ticks (e.g., T=75), meaning it reflects T times internally on a given input before responding.

    Analogy: Like solving a maze — you stop, stare at the image, and think step-by-step what path to take before actually moving.

2. Neuron-Level Models (Private MLPs per Neuron)

1
2
3
4
5
Each neuron in CTM **remembers the last M steps** of its own activity and has a **private MLP** to decide its next activation. This is **very different** from traditional models where all neurons share the same update rules.

- Input goes into a **"synapse" MLP** to produce pre-activations.
    
- Each neuron takes a **history of its pre-activations** and processes it using its own MLP to get its output (post-activation).
Analogy: Imagine each neuron is a tiny agent with its own memory and rulebook. It watches its past behavior and decides what to do next, independently.

3. Synchronization as the Representation

1
2
3
4
5
6
7
Instead of just using the current neuron activations, CTM **tracks how neuron activations synchronize over time**.

- At each tick, it computes **how pairs of neurons are co-activated** over time.
    
- This creates a **synchronization matrix**, which becomes the actual representation used to **read data** (attention) and **make predictions**.

**Analogy: Imagine a team solving a puzzle. The more certain pairs of people "think in sync" over time, the more important their coordination is. That synchronization becomes the team's strategy.**

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
### Step 1 to T (Internal Ticks, like 75 thought steps):

At each tick:

1. **Build pre-activations** using the synapse MLP based on current state and attention over image.
    
2. Each neuron:
    
    - Looks at its own history of pre-activations.
        
    - Runs its **own private MLP** to decide its new post-activation.
        
3. Collect post-activations for all neurons → build a time history.
    
4. Compute **synchronization matrix** (how neuron pairs are co-active).
    
5. Use part of synchronization matrix to:
    
    - Read data (attention queries)
        
    - Predict output logits (dog/cat probabilities).
        
6. Store prediction & certainty for current tick.
    

Repeat this T times.

Final Step: Choose Which Tick(s) to Learn From

  • Compute loss and certainty at each tick.

  • Instead of just using the final tick, use:

    • t₁ = tick with lowest loss

    • tā‚‚ = tick with highest certainty

  • Final loss: average of these two → encourages learning from both strong predictions and confident thoughts.

15.05.2025

  • As large language models (LLMs) become more deeply embedded in applications, ensuring their safe and secure operation is critical. Meta’sĀ LlamaFirewall PurpleLlama/LlamaFirewall at main Ā· meta-llama/PurpleLlamais an open-source guardrail framework designed to serve as a final layer of defense against various security risks that come with deploying AI agents. It addresses challenges such as prompt injection, agent misalignment, and unsafe code generation, providing developers with the necessary tools to build robust and secure AI systems.

20.05.2025

23.05.2025

  • The State of LLM Reasoning Model Inference An article that explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling.
  • [Why We ThinkLil’Log](https://lilianweng.github.io/posts/2025-05-01-thinking/) Lilian Weng explores how giving LLMs extra ā€œthinking timeā€ and enabling them to show intermediate steps (like Chain-of-Thought) significantly improves their ability to solve complex problems.
  • LMCache An LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

28.05.2025

  • BAGEL is an open-source foundation model trained on diverse interleaved multimodal data, outperforming peers in reasoning, manipulation, and understanding → read the paper __
  • Claude Opus 4 & Sonnet 4 by Anthropic introduces extended thinking and hybrid modes that allow parallel tool use, memory retention via local files, and state-of-the-art results on SWE-bench and agent workflows → read more
  • Claude Code by Anthropic
    Now GA with IDE integrations, background GitHub tasks, and a full SDK for custom agents. Extends Claude’s capabilities into hands-on dev tooling → read more
  • Gemma 3n by Google introduces a mobile-first, multimodal model designed for local inference with a 4B memory footprint and dynamic submodel creation for latency-quality tradeoffs → read more
  • Reward Reasoning Model by Microsoft Research and Tsinghua University proposes chain-of-thought reward modeling with test-time compute adaptation, enabling better alignment through self-evolved reasoning → read the paper
  • R3: Robust Rubric-Agnostic Reward Models introduces interpretable, generalizable reward modeling without fixed rubrics, improving alignment flexibility and transparency → read the paper
  • Panda is a pretrained model on synthetic chaotic systems that generalizes to real-world dynamics, even predicting PDEs with no retraining → read the paper
  • AceReason-Nemotron by Nvidia demonstrates that large-scale RL can outperform distillation in reasoning for both math and code, using curriculum-style training → read the paper
  • Neurosymbolic Diffusion Models improves symbolic reasoning accuracy by modeling dependencies through discrete diffusion, achieving better calibration and generalization → read the paper
  • MMaDA combines diffusion-based reasoning with unified chain-of-thought fine-tuning and a new RL algorithm (UniGRPO), outperforming SDXL and LLaMA-3 in multiple tasks → read the paper
  • UniVG-R1 reinforces visual grounding with CoT and difficulty-aware reinforcement learning, achieving top scores on multiple video/image grounding tasks → read the paper.
  • Web-Shepherd introduces a step-level reward model for web navigation, significantly improving trajectory evaluation accuracy and cost-efficiency → read the paper
  • Toto by Datadog a decoder-only foundation model with 151 million parameters for time series forecasting using observability metrics → read the paper

29.05.2025

  • How to build almost ANY LiDAR Object Detector! And it’s about building detector frameworks, rather than detector algorithms.

    Take for example, VoxelNet.
    This pioneer approachĀ comes directly from Apple’s Project Titan Lab, andĀ most of the algorithms you see today come partly from this one.

    How does VoxelNet work?Ā it ā€œvoxelizesā€ the point cloud (transforms the points into minecraft voxels), and then uses 3D Convolutional Neural Networks to learn the features and process them.

    ButĀ this approach is slow.

    And this is when, in 2018, Holger Caesar and his team of researchers invented another algorithm:Ā Point Pillars.

    This approach makes VoxelNet real-time.

    How?

    By converting the 3D problem (3D Voxels, 3D CNNs, etc…), into a 2D problem (2D Pillars, 2D CNNs, …). Like a miracle, the pillar technique worked and made the algorithm real-time.

    But one question remains…
    How do you learn something as wide as 3D Deep Learning?

    Things likeĀ VoxelizationĀ (VFE) andĀ 3DĀ CNNsĀ (3D Backbones) used in VoxelNet, but alsoĀ 2DĀ TransformationsĀ (used in PointPillars), orĀ PointNetsĀ (PFE), and others..

    The truth is that the field was essentially using and re-using the same 11 Blocks over and over again.Ā 
    And that using these 11 blocks, you could build almost ANY architecture!

    Almost every algorithm you will see use a combination of these 11 blocks. Whether it’s VoxelNet, PointPillars, SECOND, or more recent ones.

02.06.2025

05.06.2025

  • [RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback – Machine Learning BlogML@CMUCarnegie Mellon University](https://blog.ml.cmu.edu/2025/06/01/rlhf-101-a-technical-tutorial-on-reinforcement-learning-from-human-feedback/)

11.06.2025

  • Why DeepSeek modelsĀ are good at reasoning
  • SkyReels-V2 An open-source video generation tool that generates cinematic videos. Supports script-to-video, lip-sync, music, LoRA effects, and storyboards. Runs locally and doesn’t enforce any length restrictions.

17.06.2025

  • Local Deep Researcher This walkthrough shows how to run DeepSeek-R1 locally using Ollama, load its 14B distilled model, and test JSON-mode outputs. You’ll build a self-contained research agent that performs web search, summarization, and iterative reflection.
  • How to Choose Large Language Models: A Developer’s Guide to LLMs Learn how to evaluate LLMs using real-world benchmarks, open-source leaderboards, and your own data. This guide compares proprietary and open models, shows how to run Granite with Ollama, and walks through RAG setups.

19.06.2025

25.06.2025

  • [Basic facts about GPUsDamek Davis’ Website](https://damek.github.io/random/basic-facts-about-gpus/)This article discusses how GPUs work, covering topics like compute and memory hierarchy, performance regimes, strategies for increasing performance, and more.

27.06.2025

30.06.2025

04.07.2025

  • https://github.com/SakanaAI/treequest - A flexible answer tree search library featuringĀ AB-MCTS, useful for (but not limited to) LLM inference-time scaling.
  • https://www.dyad.sh/ - Free, local, open-source alternative to Lovable / v0 / Bolt.

08.07.2025

  • A prompt that writes your Entire Business Plan in minutes.
  • Research that you need to know
    • New Methods for Boosting Reasoning in Small and Large Models from Microsoft Research
      • rStar-Math: Brings deep reasoning capabilities to small models (1.5B–7B parameters) using: - Monte Carlo Tree Search (MCTS),
        • Process-level supervision via preference modeling,
        • Iterative self-improvement cycles.
      • Logic-RLĀ framework: Rewards the model only if both the reasoning process and the final answer are correct.
      • LIPS: Blends symbolic reasoning with LLM capabilities (neural reasoning) for inequality proofs.
      • Chain-of-Reasoning (CoR): Unifies reasoning across natural language, code, and symbolic math, dynamically blending all three aspects for cross-domain generalization.
    • R3GAN: It derives a regularized relativistic GAN loss that leads to stability and convergence, removing the need for heuristics and allowing the use of modern architectures
    • Transformers without Normalization: Meta proposesĀ Dynamic Tanh (DyT) – a super simple and efficient function that mimics how normalization works.
      • DyT works just as well as normalization layers (or even better)
      • It doesn’t need extra calculations
      • Requires less tuning
      • Works for images, language, supervised learning, and even self-supervised learning

16.07.2025

  • A Reddit user deposited $400 into Robinhood, then let ChatGPT pick option trades. 100% win reate over 10 days. He uploads spreadsheets and screenshots with detailed fundamentals, options chains, technical indicators, and macro data, then tells each model to filter that information and propose trades that fit strict probability-of-profit and risk limits. They still place and close orders manually but plan to keep the head-to-head test running for 6 months. This is his prompt
    1
    
    System Instructions You are ChatGPT, Head of Options Research at an elite quant fund. Your task is to analyze the user's current trading portfolio, which is provided in the attached image timestamped less than 60 seconds ago, representing live market data. Data Categories for Analysis Fundamental Data Points: Earnings Per Share (EPS) Revenue Net Income EBITDA Price-to-Earnings (P/E) Ratio Price/Sales Ratio Gross & Operating Margins Free Cash Flow Yield Insider Transactions Forward Guidance PEG Ratio (forward estimates) Sell-side blended multiples Insider-sentiment analytics (in-depth) Options Chain Data Points: Implied Volatility (IV) Delta, Gamma, Theta, Vega, Rho Open Interest (by strike/expiration) Volume (by strike/expiration) Skew / Term Structure IV Rank/Percentile (after 52-week IV history) Real-time (< 1 min) full chains Weekly/deep Out-of-the-Money (OTM) strikes Dealer gamma/charm exposure maps Professional IV surface & minute-level IV Percentile Price & Volume Historical Data Points: Daily Open, High, Low, Close, Volume (OHLCV) Historical Volatility Moving Averages (50/100/200-day) Average True Range (ATR) Relative Strength Index (RSI) Moving Average Convergence Divergence (MACD) Bollinger Bands Volume-Weighted Average Price (VWAP) Pivot Points Price-momentum metrics Intraday OHLCV (1-minute/5-minute intervals) Tick-level prints Real-time consolidated tape Alternative Data Points: Social Sentiment (Twitter/X, Reddit) News event detection (headlines) Google Trends search interest Credit-card spending trends Geolocation foot traffic ([http://Placer.ai](https://t.co/vJwNz3ugSB)) Satellite imagery (parking-lot counts) App-download trends (Sensor Tower) Job postings feeds Large-scale product-pricing scrapes Paid social-sentiment aggregates Macro Indicator Data Points: Consumer Price Index (CPI) GDP growth rate Unemployment rate 10-year Treasury yields Volatility Index (VIX) ISM Manufacturing Index Consumer Confidence Index Nonfarm Payrolls Retail Sales Reports Live FOMC minute text Real-time Treasury futures & SOFR curve ETF & Fund Flow Data Points: SPY & QQQ daily flows Sector-ETF daily inflows/outflows (XLK, XLF, XLE) Hedge-fund 13F filings ETF short interest Intraday ETF creation/redemption baskets Leveraged-ETF rebalance estimates Large redemption notices Index-reconstruction announcements Analyst Rating & Revision Data Points: Consensus target price (headline) Recent upgrades/downgrades New coverage initiations Earnings & revenue estimate revisions Margin estimate changes Short interest updates Institutional ownership changes Full sell-side model revisions Recommendation dispersion Trade Selection Criteria Number of Trades: Exactly 5 Goal: Maximize edge while maintaining portfolio delta, vega, and sector exposure limits. Hard Filters (discard trades not meeting these): Quote age ≤ 10 minutes Top option Probability of Profit (POP) ≄ 0.65 Top option credit / max loss ratio ≄ 0.33 Top option max loss ≤ 0.5% of $100,000 NAV (≤ $500) Selection Rules Rank trades by model_score. Ensure diversification: maximum of 2 trades per GICS sector. Net basket Delta must remain between [-0.30, +0.30] Ɨ (NAV / 100k). Net basket Vega must remain ≄ -0.05 Ɨ (NAV / 100k). In case of ties, prefer higher momentum_z and flow_z scores. Output Format Provide output strictly as a clean, text-wrapped table including only the following columns: Ticker Strategy Legs Thesis (≤ 30 words, plain language) POP Additional Guidelines Limit each trade thesis to ≤ 30 words. Use straightforward language, free from exaggerated claims. Do not include any additional outputs or explanations beyond the specified table. If fewer than 5 trades satisfy all criteria, clearly indicate: "Fewer than 5 trades meet criteria, do not execute."
    

17.07.2025

  • To be a better programmer, write little proofs in your head One trick that helps developers write code faster and more accurately is to sketch proofs in your head when you’re working on something difficult. Doing this without interrupting flow takes a lot of practice, but once you get good at it, you’ll find a surprising amount of code will work on the first or second try. This article demonstrates a few examples of how this technique can be applied.

18.07.2025

22.07.2025

24.07.2025

31.07.2025

  • [The Anatomy of a Modern LLM. A complete walkthrough of the…by Damian TranJul, 2025Medium](https://medium.com/@damianvtran/the-anatomy-of-a-modern-llm-0347afd72514)

04.08.2025

07.08.2025

13.08.2025

19.08.2025

21.08.2025

27.08.2025

02.09.2025

09.09.2025

11.09.2025

18.09.2025

29.09.2025

14.10.2025

23.10.2025

24.10.2025

31.10.2025