![[6f74d4fa-992f-4e2b-97d6-f3b26762151b.mp3]]

阿里开源QwenLong-L1长文本推理模型，支持120k上下文窗口

简报：

阿里通义千问团队开源QwenLong-L1框架，推出首个通过强化学习训练的长文本推理模型QwenLong-L1-32B
该模型支持最高131072 tokens的上下文窗口，在7个长文本DocQA基准测试中表现超越OpenAI-o3-mini等旗舰模型
采用渐进式上下文扩展技术，结合GRPO和DAPO算法及混合奖励函数，显著提升长文本推理准确性
模型具备"翻书回溯"能力，能有效过滤干扰信息并整合关键数据进行多步推理
开源内容包括32B参数模型、优化训练数据集和创新强化学习方法

相关链接：

Alibaba Open-Sources QwenLong-L1 Long-Text Reasoning Model, Supporting 120k Context Window

Brief:

Alibaba's Tongyi Qianwen team has open-sourced the QwenLong-L1 framework, introducing the first long-text reasoning model trained with reinforcement learning, QwenLong-L1-32B. 📚
The model supports a context window of up to 131,072 tokens and outperforms flagship models like OpenAI-o3-mini in 7 long-text DocQA benchmarks. 🚀
It adopts progressive context expansion technology, combined with GRPO and DAPO algorithms and a hybrid reward function, significantly improving long-text reasoning accuracy.
The model features a "book-flipping recall" capability, effectively filtering noise and integrating key data for multi-step reasoning. 🧠
Open-source content includes the 32B parameter model, optimized training datasets, and innovative reinforcement learning methods.

Related Links:

Reinforcement /ˌriːɪnˈfɔːrs·mənt/

n. 加强；增援；强化

▶ "The company invested in the reinforcement of its cybersecurity system."

[例句] 公司投资加强了其网络安全系统。

◼ 词根分析

re-

再，重新

inforce

加强，强化

◼ 衍生词

reinforce (v.) 加强；增强

reinforced (adj.) 加强的；加固的

大模型长文本生成能力普遍不足，最大输出长度被过度宣传

简报：

最新研究论文《LIFEBENCH: Evaluating Length Instruction Following in Large Language Models》提出全新基准测试集LIFEBENCH，评估26个大语言模型在长度指令遵循方面的表现
测试结果显示，当被明确要求生成特定长度文本时，大多数模型表现糟糕，存在生成内容不足、重复啰嗦或直接拒绝生成等问题
LIFEBENCH基准测试集涵盖问答、摘要、推理和创意生成四类自然语言生成任务，包含短输入(<2000字)和长输入(>2000字)场景
研究指出当前大模型在长文本生成任务中普遍表现不佳，最大输出长度存在过度宣传现象

相关链接：

Large Language Models Generally Lack Long-Text Generation Capabilities, Maximum Output Length Overhyped

Newsletter:

A recent research paper, LIFEBENCH: Evaluating Length Instruction Following in Large Language Models, introduces a new benchmark dataset called LIFEBENCH to evaluate the performance of 26 large language models in adhering to length instructions.
Test results reveal that most models perform poorly when explicitly asked to generate text of a specific length, often producing insufficient content, repetitive or verbose outputs, or outright refusing to generate text.
The LIFEBENCH benchmark dataset covers four types of natural language generation tasks: question-answering, summarization, reasoning, and creative writing, including both short input (<2000 words) and long input (>2000 words) scenarios.
The study highlights that current large language models generally underperform in long-text generation tasks, with their maximum output length being overhyped. 📊💻😕

Related Links:

Verbose /vɜːrˈboʊs/

adj. 冗长的；啰嗦的

▶ "His verbose style makes the report difficult to read."

[例句] 他的报告由于文风冗长而难以阅读。

◼ 词根分析

verb-

词语，言语（word）

-ose

多…的

◼ 衍生词

verbosity (n.) 冗长；赘言

波士顿动力Atlas机器人升级3D感知能力，实现精准工业操作

简报：

波士顿动力Atlas机器人升级3D空间感知和实时物体追踪能力，可自主执行复杂工业任务
演示视频显示Atlas能360度转动头部识别掉落零件，并在人为干扰下精准完成放置操作
新系统融合2D与3D感知技术、物体位姿追踪及基于物理特性的精确校准方案
官方YouTube视频获十余万观看，网友称赞其环境适应能力和纠错功能
技术细节显示机器人需解决金属零件反光、低对比度物体识别等工业场景挑战

相关链接：

Boston Dynamics Atlas Robot Upgrades 3D Perception for Precision Industrial Operations

Brief:

Boston Dynamics' Atlas robot has upgraded its 3D spatial perception and real-time object tracking capabilities, enabling it to autonomously perform complex industrial tasks. 🤖
A demonstration video shows Atlas rotating its head 360 degrees to identify fallen parts and accurately complete placement tasks even under human interference.
The new system integrates 2D and 3D perception technologies, object pose tracking, and precise calibration based on physical properties.
The official YouTube video has garnered over 100,000 views, with netizens praising its environmental adaptability and error-correction features. 👍
Technical details reveal that the robot must overcome challenges in industrial settings, such as reflections from metal parts and recognition of low-contrast objects. 🔧

Related Links:

Autonomously /ɔːˈtɒn.ə.məs.li/

adv. 自动地；自主地

▶ "The robot is capable of working autonomously for several hours."

[例句] 该机器人能够自主工作数小时。

◼ 词根分析

auto-

自己

-nomous/-nomy

法则，管理

◼ 衍生词

autonomous (adj.) 自主的

autonomy (n.) 自主，自治

NUS邵林团队提出机器人装配技能学习框架Manual2Skill，获RSS 2025收录

简报：

新加坡国立大学邵林团队提出Manual2Skill框架，使机器人能通过视觉说明书自主理解并执行家具装配任务
该方法基于视觉语言模型(VLMs)，解决了复杂长时程任务中人类演示数据和训练样本稀缺的问题
研究弥合了抽象指令与物理执行之间的鸿沟，显著提升机器人在真实操作场景中的实用性
论文已被机器人领域顶级会议Robotics: Science and Systems XXI(RSS 2025)接收

相关链接：

NUS Shao Lin Team Proposes Robot Assembly Skill Learning Framework Manual2Skill, Accepted by RSS 2025

Newsletter:

The Shao Lin team from the National University of Singapore introduces the Manual2Skill framework, enabling robots to autonomously understand and perform furniture assembly tasks through visual manuals 📖.
This approach, based on Vision-Language Models (VLMs), addresses the scarcity of human demonstration data and training samples in complex, long-horizon tasks 🤖.
The research bridges the gap between abstract instructions and physical execution, significantly enhancing the practicality of robots in real-world operational scenarios 💪.
The paper has been accepted by the top-tier robotics conference, Robotics: Science and Systems XXI (RSS 2025).

Related Links:

Scarcity /ˈsker·sə·ti/

n. 稀缺，缺乏

▶ "The scarcity of clean drinking water is a major concern in many countries."

[例句] 清洁饮用水的稀缺在许多国家是一个重大问题。

◼ 词根分析

scarce

缺乏的

-ity

名词后缀

◼ 衍生词

scarce (adj.) 稀有的

腾讯ARC Lab推出Video-Holmes测试，所有大模型视频推理能力均未达标

简报：

腾讯ARC Lab和香港城市大学联合推出Video-Holmes基准测试，专门评估多模态大模型的复杂视频推理能力
测试内容包括"推理杀人凶手"、"解析作案意图"等高难度任务，要求模型主动思考视觉信息并关联分散线索
所有参与测试的大模型在各项指标（社会推理、意图与动机链、时间因果推理等）中均未达到及格标准
Video-Holmes已上线GitHub和HuggingFace平台，提供一键测评功能供研究者挑战

相关链接：

视频推理界的"福尔摩斯测试"：所有大模型，统统不及格

Tencent ARC Lab Launches Video-Holmes Test, All Large Models Fail to Meet Video Reasoning Standards

Newsletter:

Tencent ARC Lab, in collaboration with City University of Hong Kong, introduces the Video-Holmes benchmark test, specifically designed to evaluate the complex video reasoning capabilities of multimodal large models 🧠
The test includes high-difficulty tasks such as "identifying a murderer" and "analyzing criminal intent," requiring models to actively process visual information and connect scattered clues 🔍
All large models tested failed to meet the passing standards across various metrics (social reasoning, intent and motive chaining, temporal causal reasoning, etc.) 📉
Video-Holmes is now available on GitHub and HuggingFace platforms, offering a one-click evaluation feature for researchers to take on the challenge

上海人工智能实验室开源Linear-MoE框架，实现线性序列建模与混合专家高效结合

简报：

上海人工智能实验室团队最新成果Linear-MoE首次系统性地实现了线性序列建模与混合专家(MoE)的高效结合
该研究开源了完整技术框架，包括Modeling和Training两大部分，支持层间混合架构
近期广受好评的MiniMax-01模型和腾讯混元TurboS模型均属于Linear-MoE架构
线性序列建模技术具有线性时间复杂度的训练和恒定内存占用的推理优势
项目填补了Linear-MoE架构开源实现的空白，为下一代基础模型研发提供工具

相关链接：

Shanghai AI Lab Open-Sources Linear-MoE Framework, Achieving Efficient Integration of Linear Sequence Modeling and Mixture of Experts

Newsletter:

The team at Shanghai AI Laboratory has unveiled their latest achievement, Linear-MoE, marking the first systematic integration of linear sequence modeling with Mixture of Experts (MoE) for enhanced efficiency. 🚀
This research has open-sourced a comprehensive technical framework, encompassing both Modeling and Training components, and supporting interlayer hybrid architectures.
The recently acclaimed MiniMax-01 model and Tencent Hunyuan TurboS model are both built on the Linear-MoE architecture. 🌟
Linear sequence modeling technology offers the advantages of linear time complexity in training and constant memory usage during inference.
The project fills the gap in open-source implementations of the Linear-MoE architecture, providing a valuable tool for the development of next-generation foundational models. 💡

Related Links:

Systematic /ˌsɪs.təˈmæt.ɪk/

adj. 系统的；有条理的

▶ "The company needs a systematic approach to improve its product quality."

[例句] 该公司需要一种系统的方法来提升产品质量。

◼ 词根分析

system

系统

-atic

……的（形容词后缀）

◼ 衍生词

systematically (adv.) 系统地

system (n.) 系统

阿里巴巴开源WebAgent AI智能体，具备多步推理能力

简报：

阿里巴巴于2025年5月30日在GitHub开源自主搜索AI智能体WebAgent
WebAgent具备端到端自主信息检索与多步推理能力，可模拟人类网络搜索行为
该智能体由WebDancer（训练框架）和WebWalker（基准测试）两部分组成
核心功能包括主动搜索学术数据库、筛选文献、整合观点并生成研究报告
WebDancer框架采用创新的数据合成方法，结合监督微调和强化学习技术

相关链接：

Alibaba Open-Sources WebAgent AI Agent with Multi-Step Reasoning Capabilities

Newsletter:

Alibaba open-sourced its autonomous search AI agent, WebAgent, on GitHub on May 30, 2025
WebAgent features end-to-end autonomous information retrieval and multi-step reasoning abilities, mimicking human web search behavior 🕵️‍♂️
The agent consists of two components: WebDancer (training framework) and WebWalker (benchmarking tool)
Core functionalities include actively searching academic databases, filtering literature, integrating perspectives, and generating research reports 📚
The WebDancer framework employs an innovative data synthesis method, combining supervised fine-tuning and reinforcement learning techniques 💡

Related Links:

Benchmarking /ˈbɛn(t)ʃˌmɑːrkɪŋ/

n. 基准测试；标杆管理

▶ "Benchmarking helps companies improve their performance by comparing processes and metrics to industry bests."

[例句] 基准测试通过将流程和指标与行业最佳进行比较，帮助企业提升业绩。

◼ 词根分析

bench + mark

长凳 + 标记（基准，标准）

-ing

名词后缀，表示动作或过程

◼ 衍生词

benchmark (n./v.) 基准；以…为标准

华为发布准万亿MoE模型Pangu Ultra，全流程基于昇腾NPU训练

简报：

华为盘古团队发布参数规模7180亿的Pangu Ultra MoE模型，全流程在昇腾NPU上训练
采用256个路由专家架构，每个token激活8个专家，总参数量718B，激活量39B
创新性提出DSSN稳定架构和TinyInit小初始化方法，实现10T tokens数据长期稳定训练
使用EP group loss负载优化方法，保证专家负载均衡并提升领域特化能力
预训练阶段昇腾Atlas 800T A2万卡集群MFU提升至41%，推理吞吐达35K Tokens/s

相关链接：

Huawei Releases Near-Trillion-Parameter MoE Model Pangu Ultra, Fully Trained on Ascend NPU

Newsletter:

Huawei's Pangu team unveils the Pangu Ultra MoE model with a parameter scale of 718 billion, fully trained on Ascend NPU. 🚀
Features a 256-router expert architecture, activating 8 experts per token, with a total parameter count of 718B and an activation count of 39B.
Innovatively introduces the DSSN stable architecture and TinyInit small initialization method, achieving long-term stable training on 10T tokens data. 💡
Utilizes EP group loss load optimization method to ensure expert load balancing and enhance domain-specific capabilities.
During the pre-training phase, the Ascend Atlas 800T A2 cluster with 10,000 cards achieves an MFU of 41%, with inference throughput reaching 35K Tokens/s. 📈

Related Links:

Expert /ˈek·spɜːt/

n. 专家

▶ "She is an expert in artificial intelligence and robotics."

[例句] 她是人工智能和机器人学方面的专家。

◼ 词根分析

ex-

外，超出

-pert（peritus）

有经验的

◼ 衍生词

expertise (n.) 专门知识

expertly (adv.) 熟练地

目录

阿里开源QwenLong-L1长文本推理模型，支持120k上下文窗口

Alibaba Open-Sources QwenLong-L1 Long-Text Reasoning Model, Supporting 120k Context Window

大模型长文本生成能力普遍不足，最大输出长度被过度宣传

Large Language Models Generally Lack Long-Text Generation Capabilities, Maximum Output Length Overhyped

波士顿动力Atlas机器人升级3D感知能力，实现精准工业操作

Boston Dynamics Atlas Robot Upgrades 3D Perception for Precision Industrial Operations

NUS邵林团队提出机器人装配技能学习框架Manual2Skill，获RSS 2025收录

NUS Shao Lin Team Proposes Robot Assembly Skill Learning Framework Manual2Skill, Accepted by RSS 2025

腾讯ARC Lab推出Video-Holmes测试，所有大模型视频推理能力均未达标

Tencent ARC Lab Launches Video-Holmes Test, All Large Models Fail to Meet Video Reasoning Standards

上海人工智能实验室开源Linear-MoE框架，实现线性序列建模与混合专家高效结合

Shanghai AI Lab Open-Sources Linear-MoE Framework, Achieving Efficient Integration of Linear Sequence Modeling and Mixture of Experts

阿里巴巴开源WebAgent AI智能体，具备多步推理能力

Alibaba Open-Sources WebAgent AI Agent with Multi-Step Reasoning Capabilities

华为发布准万亿MoE模型Pangu Ultra，全流程基于昇腾NPU训练

Huawei Releases Near-Trillion-Parameter MoE Model Pangu Ultra, Fully Trained on Ascend NPU