编辑
2025-06-02
Brief News
00

目录

AI智能体Zochi独立完成论文被顶会ACL录用,评分位列前8.2%
AI Agent Zochi Independently Completes Paper Accepted by Top Conference ACL, Scoring in Top 8.2%
Alibaba Open-Sources Self-Developed Search AI Agent WebAgent with Multi-Step Reasoning Capabilities
Large Models Underperform in Encrypted Data Evaluation, Qwen3 Accuracy Below 10%
Huawei Releases 718 Billion Parameter Pangu Ultra MoE Model, Fully Trained on Ascend NPU
Baidu PaddlePaddle Launches AIGC Application Development Bootcamp to Boost AI Innovation in Media Industry
Boston Dynamics Atlas Robot Upgrades 3D Perception, Precisely Completes Industrial Tasks
DeepSeek-R1 0528 Version Released as Open Source, Coding Skills Rival Claude 4

![[bcafd7ef-7307-4e83-b748-c5b73c965037.mp3]]

AI智能体Zochi独立完成论文被顶会ACL录用,评分位列前8.2%

简报:

  • Intology AI开发的博士级智能体Zochi独立完成的研究论文被自然语言处理领域顶级会议ACL 2025主会议录用
  • Zochi是首个能独立完成从假设提出到论文发表全流程的AI系统,论文平均得分7.67分
  • 论文提出突破大模型安全的Tempest框架,通过"多轮对话树搜索"实现97%的越狱攻击成功率
  • ACL主会议录用率仅约20%,Zochi论文评分位列投稿前8.2%,达到博士级科研水平
  • Intology已开放Zochi的Beta测试注册

相关链接:

AI Agent Zochi Independently Completes Paper Accepted by Top Conference ACL, Scoring in Top 8.2%

Newsletter:

  • Zochi, a PhD-level AI agent developed by Intology AI, has independently completed a research paper accepted by the main conference of ACL 2025, a top-tier event in natural language processing 🌟
  • Zochi is the first AI system capable of independently handling the entire process from hypothesis formulation to paper publication, achieving an average score of 7.67
  • The paper introduces the groundbreaking Tempest framework for large model security, achieving a 97% success rate in jailbreak attacks through "multi-turn dialogue tree search" 🔍
  • With an acceptance rate of only about 20% at the ACL main conference, Zochi's paper ranks in the top 8.2% of submissions, demonstrating PhD-level research capability 🎓
  • Intology has opened Beta testing registration for Zochi

Related Links:

Formulation /ˌfɔːr.mjəˈleɪ.ʃən/
n. 表达;配方;公式化
"The formulation of the new policy took several months of debate."
[例句] 新政策的制定经过了几个月的讨论。
词根分析
form-
形式,形成
-ation
行为或过程
衍生词
formulate (v.) 构想,规划
formulative (adj.) 形成的;表述的

阿里巴巴开源自主搜索AI智能体WebAgent,具备多步推理能力

简报:

  • 阿里巴巴于2025年5月30日在GitHub开源自主搜索AI智能体WebAgent,具备端到端自主信息检索与多步推理能力
  • WebAgent能模拟人类网络行为,主动搜索多个学术数据库,筛选相关文献并进行深入分析和总结
  • 系统包含WebDancer(智能体训练框架)和WebWalker(LLM基准测试)两大组件
  • WebDancer框架采用创新的数据合成方法,通过短推理和长推理两种方式生成训练数据
  • 该技术可整合不同文献观点,为用户提供全面精准的研究报告

相关链接:

Alibaba Open-Sources Self-Developed Search AI Agent WebAgent with Multi-Step Reasoning Capabilities

Newsletter:

  • On May 30, 2025, Alibaba open-sourced its self-developed search AI agent, WebAgent, on GitHub, featuring end-to-end autonomous information retrieval and multi-step reasoning capabilities. 🌐
  • WebAgent can mimic human online behavior, proactively search multiple academic databases, filter relevant literature, and conduct in-depth analysis and summarization. 📚
  • The system comprises two main components: WebDancer (an agent training framework) and WebWalker (an LLM benchmark).
  • The WebDancer framework employs an innovative data synthesis method, generating training data through both short and long reasoning approaches.
  • This technology can integrate diverse perspectives from literature, providing users with comprehensive and accurate research reports. 🚀

Related Links:

Mimic /ˈmɪm.ɪk/
v. 模仿;n. 模仿者
"She can mimic the accent of almost anyone she meets."
[例句] 她几乎能模仿她遇到的任何人的口音。
词根分析
mim-
模仿
-ic
形容词/名词后缀
衍生词
mimicry (n.) 模仿

大模型在加密数据评测中表现不佳,Qwen3准确率不足10%

简报:

  • 上海AI Lab等机构联合推出CipherBank评测,测试大语言模型在密码学解密任务中的表现
  • 评测结果显示当前大模型在解密任务上整体表现不佳,最优模型准确率未过半
  • Claude-3.5-Sonnet和o1表现最佳,DeepSeek系列略优于通用模型
  • GPT-4o、Gemini等模型表现平庸,Qwen2.5、Llama3.1等开源模型表现较差
  • 最新Qwen3系列模型表现不佳,30B和32B模型准确率均未超过10%

相关链接:

Large Models Underperform in Encrypted Data Evaluation, Qwen3 Accuracy Below 10%

Newsletter:

  • Shanghai AI Lab and other institutions jointly launch CipherBank evaluation, testing the performance of large language models in cryptographic decryption tasks
  • Evaluation results show that current large models generally perform poorly in decryption tasks, with the best model's accuracy rate not exceeding 50%
  • Claude-3.5-Sonnet and o1 perform the best, while DeepSeek series slightly outperforms general models
  • GPT-4o, Gemini, and other models show mediocre results, while open-source models like Qwen2.5 and Llama3.1 perform poorly
  • The latest Qwen3 series models underperform, with both 30B and 32B models achieving accuracy rates below 10% 😕

Related Links:

Decryption /diːˈkrɪp.ʃən/
n. 解密
"The file cannot be opened without decryption."
[例句] 没有解密,无法打开该文件。
词根分析
de-
去掉,解除
crypt
隐藏
衍生词
decrypt (v.) 解密(动词)

华为发布7180亿参数Pangu Ultra MoE模型,全流程在昇腾NPU训练

简报:

  • 华为盘古团队发布参数规模7180亿的Pangu Ultra MoE模型,全流程在昇腾NPU上训练
  • 模型采用256个路由专家,每个token激活8个专家,总参数量718B,激活量39B
  • 团队提出DSSN稳定架构和TinyInit小初始化方法,实现10T tokens数据长期稳定训练
  • 采用EP group loss负载优化方法,保证专家负载均衡并提升领域特化能力
  • 预训练阶段昇腾Atlas 800T A2万卡集群MFU提升至41%,后训练阶段单节点吞吐达35K Tokens/s

相关链接:

Huawei Releases 718 Billion Parameter Pangu Ultra MoE Model, Fully Trained on Ascend NPU

Newsletter:

  • Huawei's Pangu team unveils the Pangu Ultra MoE model with a staggering 718 billion parameters, fully trained on Ascend NPU. 🚀
  • The model features 256 routing experts, activating 8 experts per token, with a total of 718B parameters and 39B active parameters.
  • The team introduces the DSSN stable architecture and TinyInit small initialization method, achieving stable long-term training on 10T tokens data. 📈
  • Utilizes the EP group loss load optimization method to ensure balanced expert load and enhance domain-specific capabilities.
  • During the pre-training phase, the Ascend Atlas 800T A2 cluster with 10,000 cards achieves an MFU of 41%, while the post-training phase reaches a single-node throughput of 35K Tokens/s. 💻

Related Links:

Staggering /ˈstæɡər·ɪŋ/
adj. 惊人的;令人震惊的
"The company suffered a staggering loss of $10 million last year."
[例句] 这家公司去年遭受了高达一千万美元的惊人损失。
词根分析
stagger
蹒跚,摇晃
-ing
形容词后缀
衍生词
stagger (v.) 蹒跚,动摇;使震惊
staggeringly (adv.) 惊人地,令人难以置信地

百度飞桨推出AIGC应用开发实战营,助力传媒行业AI创新

简报:

  • 百度飞桨和文心大模型联合推出「AIGC应用开发实战营」项目,聚焦游戏、营销、影视三大开发方向
  • 活动提供技术指导、创新陪跑、资源支持和孵化推广等权益,帮助开发者实现AI应用落地
  • 3月25日至27日每晚19:00举办直播分享,揭秘AIGC应用开发技巧和行业解决方案
  • 参与者可获得官方技术指导、大模型开发文档和产业课程集锦包等资源
  • 活动面向企业和开发者,获奖应用将获得品牌包装和流量推广支持

相关链接:

Baidu PaddlePaddle Launches AIGC Application Development Bootcamp to Boost AI Innovation in Media Industry

Newsletter:

  • Baidu PaddlePaddle and Wenxin Large Model jointly launch the "AIGC Application Development Bootcamp" project, focusing on three key development areas: gaming, marketing, and film & television 🎮🎥
  • The event offers technical guidance, innovation support, resource assistance, and incubation promotion benefits to help developers bring AI applications to life 🚀
  • Live sharing sessions will be held from March 25 to 27 at 19:00 each evening, unveiling AIGC application development tips and industry solutions 📡
  • Participants can access official technical guidance, large model development documentation, and a comprehensive industry course package
  • Open to enterprises and developers, winning applications will receive brand packaging and traffic promotion support

Related Links:

Bootcamp /ˈbuːtˌkæmp/
n. 训练营
"She learned programming at a coding bootcamp last summer."
[例句] 她去年夏天在一个编程训练营学习了编程。
词根分析
boot
靴子(尤指军靴)
camp
营地
衍生词
coding bootcamp (n.) 编程训练营

波士顿动力Atlas机器人升级3D感知能力,精准完成工业任务

简报:

  • 波士顿动力Atlas机器人完成重磅升级,新增3D空间感知和实时物体追踪能力,可自主执行复杂工业任务
  • 在汽车工厂演示中,Atlas能360度转动头部识别掉落零件,即使人类故意移动装置也能精准完成装配
  • 技术升级包括2D与3D感知融合、物体位姿追踪及基于物理特性的精确校准方案
  • 官方YouTube演示视频获十余万观看量,网友称赞其环境感知和纠错能力
  • 机器人需解决金属零件反光、低对比度物体识别等工业环境特有挑战

相关链接:

Boston Dynamics Atlas Robot Upgrades 3D Perception, Precisely Completes Industrial Tasks

Brief:

  • Boston Dynamics' Atlas robot undergoes a major upgrade, featuring enhanced 3D spatial perception and real-time object tracking, enabling autonomous execution of complex industrial tasks. 🤖
  • In a car factory demonstration, Atlas can rotate its head 360 degrees to identify fallen parts and accurately complete assembly tasks, even when humans deliberately move components. 🔧
  • Technological advancements include the fusion of 2D and 3D perception, object pose tracking, and precise calibration based on physical properties.
  • The official YouTube demonstration video has garnered over 100,000 views, with netizens praising its environmental awareness and error-correction capabilities. 👍
  • The robot still needs to overcome industrial environment challenges, such as reflections from metal parts and low-contrast object recognition.

Related Links:

Precisely /prɪˈsaɪs.li/
adv. 精确地;恰好地
"We don't know precisely how the accident happened."
[例句] 我们并不确切知道事故是如何发生的。
词根分析
precis-
精确
-ly
副词后缀
衍生词
precise (adj.) 精确的;准确的
precision (n.) 精确;精准

DeepSeek-R1 0528版本开源,编程能力直逼Claude 4

简报:

  • DeepSeek于5月28日凌晨在Hugging Face开源R1-0528版本,模型权重已公开但模型卡暂未更新
  • 新版本在LiveCodeBench基准测试中表现接近OpenAI o3-mini(High)和o4-mini(Medium)
  • 编程能力显著提升,实测显示在前端编码等任务上超越Claude 4 Sonnet
  • 支持单任务30-60分钟的长时思考,网友实测思考时间可达25分钟
  • 在AIME 2025数学测试中正确率从70%提升至87.5%,接近OpenAI o3的88.9%
  • 采用MIT开源协议,允许商用和模型蒸馏,API价格仅为o3的3.3%

相关链接:

DeepSeek-R1 0528 Version Released as Open Source, Coding Skills Rival Claude 4

Newsletter:

  • DeepSeek released the R1-0528 version on Hugging Face at midnight on May 28, with model weights made public, though the model card has not yet been updated.
  • The new version performs close to OpenAI's o3-mini (High) and o4-mini (Medium) in the LiveCodeBench benchmark test.
  • Significant improvement in coding capabilities, with real-world tests showing it surpasses Claude 4 Sonnet in tasks like front-end coding. 💻
  • Supports long-duration thinking of 30-60 minutes for single tasks, with user tests confirming thinking time up to 25 minutes. ⏳
  • Accuracy in the AIME 2025 math test improved from 70% to 87.5%, nearing OpenAI o3's 88.9%.
  • Released under the MIT open-source license, allowing commercial use and model distillation, with API pricing at just 3.3% of o3's cost. 💰

Related Links:

Significant /sɪɡˈnɪf.ɪ.kənt/
adj. 重要的;意义重大的
"The discovery of penicillin was one of the most significant events in medical history."
[例句] 青霉素的发现是医学史上最重要的事件之一。
词根分析
sign-
记号,标志
-fic-
做,制造
衍生词
significance (n.) 重要性,意义
significantly (adv.) 显著地

如果对你有用的话,可以打赏哦
打赏
ali pay
wechat pay

本文作者:topwind

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!