NVIDIAのAIエージェント研究トップ「マスコミの誇大宣伝ではなく実態を伝えなければならない」

5
リンク NVIDIA Japan Blog Eureka! ロボット学習に新たな風を吹き込む NVIDIA Research の画期的な進歩 | NVIDIA AI エージェントが、複雑なタスクを達成するためにロボットをトレーニングする報酬アルゴリズムを自動的に生成するために LLM を活用。 1 user 39
Jim Fan @DrJimFan

Can GPT-4 teach a robot hand to do pen spinning tricks better than you do? I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API! Eureka bridges the gap between high-level reasoning (coding) and low-level motor control. It is a “hybrid-gradient architecture”: a black box, inference-only LLM instructs a white box, learnable neural network. The outer loop runs GPT-4 to refine the reward function (gradient-free), while the inner loop runs reinforcement learning to train a robot controller (gradient-based). We are able to scale up Eureka thanks to IsaacGym, a GPU-accelerated physics simulator that speeds up reality by 1000x. On a benchmark suite of 29 tasks across 10 robots, Eureka rewards outperform expert human-written ones on 83% of the tasks by 52% improvement margin on average. We are surprised that Eureka is able to learn pen spinning tricks, which are very difficult even for CGI artists to animate frame by frame! Eureka also enables a new form of in-context RLHF, which is able to incorporate a human operator’s feedback in natural language to steer and align the reward functions. It can serve as a powerful co-pilot for robot engineers to design sophisticated motor behaviors. As usual, we open-source everything! Welcome you all to check out our video gallery and try the codebase today: eureka-research.github.io Paper: arxiv.org/abs/2310.12931 Code: github.com/eureka-researc… Deep dive with me: 🧵

2023-10-21 00:59:43
Jim Fan @DrJimFan

@NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.

jimfan.me

Jim Fan @DrJimFan

Ummm ... why is this a surprise? Transformers are not elixirs. Machine learning 101: gotta cover the test distribution in training! LLMs work so well because they are trained on (almost) all text distribution of tasks that we care about. That's why data quality is number 1 priority: garbage in, garbage out. Most of LLM efforts these days go into data cleaning & annotation.

2023-11-06 09:13:24
anton @abacaj

New paper by Google provides evidence that transformers (GPT, etc) cannot generalize beyond their training data pic.twitter.com/UE4toEdZVa

2023-11-06 02:51:32
Jim Fan @DrJimFan

This paper is equivalent to: Try to train ViTs only on datasets of dogs & cats. Use 100B dog/cat images and 1T parameters! Now see if it can recognize airplanes - surprise, it can't!

2023-11-06 09:27:09

Googleの論文がGPTは訓練データを超えて汎化できないことを示した

これのどこが驚くべきことなのでしょう? Transformerはエリクサーではありません。訓練データの範囲でしか機能しないことは機械学習の基本です。
garbage in, garbage out(ゴミを入れればゴミが出てくる)。LLM(大規模言語モデル)の最優先事項はデータの品質であり、現在のLLMの取り組みのほとんどはデータの前処理とアノテーションです。
この論文は「千億の犬/猫画像と一兆のパラメータで画像認識を訓練したのに飛行機を認識できないなんて!」と言ってるようなものです。

リンク GIGAZINE OpenAIが4度目のブレイクスルーとなる数学ができるAI「Q*(キュースター)」で汎用人工知能開発の飛躍を目指す、アルトマンCEO解任騒動の一因か OpenAIが新たなAI開発プロジェクト「Q*(キュースター)」を進めていることを認めたと報じられています。このQ*は数学的推論能力の改善を目指したもので、汎用人工知能(AGI)の研究に画期的な進歩をもたらすかもしれず、2023年11月に起こったサム・アルトマンCEO解任騒動の一因にもなった可能性が指摘されています。 51 users 243
Jim Fan @DrJimFan

FAQ: - Is LLM + Search a good way to solve tasks with correct answers, like math and coding? Yes. - Is this Q*? Doesn’t matter. Everyone should learn how AlphaGo works. It’s a masterpiece. - Does scaling this up give us AGI? No. - Does it justify the extreme hype and AI fear-mongering in the past week? Hell no. - What’s still missing from AGI? A combination of new sample-efficient architectures, self-improvement mechanism, world modeling, synthetic data, embodiment, multimodal, and scaling up. Yeah, a lot more to be done. A LOT more. I won’t lose my job as AI researcher any time soon 😆

2023-11-28 00:38:15
Jim Fan @DrJimFan

In my decade spent on AI, I've never seen an algorithm that so many people fantasize about. Just from a name, no paper, no stats, no product. So let's reverse engineer the Q* fantasy. VERY LONG READ: To understand the powerful marriage between Search and Learning, we need to go back to 2016 and revisit AlphaGo, a glorious moment in the AI history. It's got 4 key ingredients: 1. Policy NN (Learning): responsible for selecting good moves. It estimates the probability of each move leading to a win. 2. Value NN (Learning): evaluates the board and predicts the winner from any given legal position in Go. 3. MCTS (Search): stands for "Monte Carlo Tree Search". It simulates many possible sequences of moves from the current position using the policy NN, and then aggregates the results of these simulations to decide on the most promising move. This is the "slow thinking" component that contrasts with the fast token sampling of LLMs. 4. A groundtruth signal to drive the whole system. In Go, it's as simple as the binary label "who wins", which is decided by an established set of game rules. You can think of it as a source of energy that *sustains* the learning progress. How do the components above work together? AlphaGo does self-play, i.e. playing against its own older checkpoints. As self-play continues, both Policy NN and Value NN are improved iteratively: as the policy gets better at selecting moves, the value NN obtains better data to learn from, and in turn it provides better feedback to the policy. A stronger policy also helps MCTS explore better strategies. That completes an ingenious "perpetual motion machine". In this way, AlphaGo was able to bootstrap its own capabilities and beat the human world champion, Lee Sedol, 4-1 in 2016. An AI can never become super-human just by imitating human data alone. ----- Now let's talk about Q*. What are the corresponding 4 components? 1. Policy NN: this will be OAI's most powerful internal GPT, responsible for actually implementing the thought traces that solve a math problem. 2. Value NN: another GPT that scores how likely each intermediate reasoning step is correct. OAI published a paper in May 2023 called "Let's Verify Step by Step", coauthored by big names like @ilyasut @johnschulman2 @janleike: arxiv.org/abs/2305.20050 It's much lesser known than DALL-E or Whipser, but gives us quite a lot of hints. This paper proposes "Process-supervised Reward Models", or PRMs, that gives feedback for each step in the chain-of-thought. In contrast, "Outcome-supervised reward models", or ORMs, only judge the entire output at the end. ORMs are the original reward model formulation for RLHF, but it's too coarse-grained to properly judge the sub-parts of a long response. In other words, ORMs are not great for credit assignment. In RL literature, we call ORMs "sparse reward" (only given once at the end), and PRMs "dense reward" that smoothly shapes the LLM to our desired behavior. 3. Search: unlike AlphaGo's discrete states and actions, LLMs operate on a much more sophisticated space of "all reasonable strings". So we need new search procedures. Expanding on Chain of Thought (CoT), the research community has developed a few nonlinear CoTs: - Tree of Thought: literally combining CoT and tree search: arxiv.org/abs/2305.10601 @ShunyuYao12 - Graph of Thought: yeah you guessed it already. Turn the tree into a graph and Voilà! You get an even more sophisticated search operator: arxiv.org/abs/2308.09687 4. Groundtruth signal: a few possibilities: (a) Each math problem comes with a known answer. OAI may have collected a huge corpus from existing math exams or competitions. (b) The ORM itself can be used as a groundtruth signal, but then it could be exploited and "loses energy" to sustain learning. (c) A formal verification system, such as Lean Theorem Prover, can turn math into a coding problem and provide compiler feedbacks: lean-lang.org And just like AlphaGo, the Policy LLM and Value LLM can improve each other iteratively, as well as learn from human expert annotations whenever available. A better Policy LLM will help the Tree of Thought Search explore better strategies, which in turn collect better data for the next round. @demishassabis said a while back that DeepMind Gemini will use "AlphaGo-style algorithms" to boost reasoning. Even if Q* is not what we think, Google will certainly catch up with their own. If I can think of the above, they surely can. Note that what I described is just about reasoning. Nothing says Q* will be more creative in writing poetry, telling jokes @grok, or role playing. Improving creativity is a fundamentally human thing, so I believe natural data will still outperform synthetic ones. I welcome any thoughts or feedback!!

2023-11-25 02:15:50

抄訳

FAQ
Q*を進化させていけばAGI(汎用人工知能)になりますか? - いいえ
AIの恐怖を煽る誇大宣伝は正しかったですか? - いいえ
AGIにまだ欠けているものは? - 自己改良できる仕組み、世界をモデル化できること、その他数え切れないほど沢山あります

Yann LeCun @ylecun

Current LLMs are trained on text data that would take 20,000 years for a human to read. And still, they haven't learned that if A is the same as B, then B is the same as A. Humans get a lot smarter than that with comparatively little training data. Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion "parameters."

2023-11-24 01:33:33
Yann LeCun @ylecun

Animals and humans get very smart very quickly with vastly smaller amounts of training data. My money is on new architectures that would learn as efficiently as animals and humans. Using more data (synthetic or not) is a temporary stopgap made necessary by the limitations of our current approaches.

2023-11-23 15:29:43

抄訳

現在のLLMは人間が読めば二万年かかるテキストで訓練されているが、それでも「A=B」ならば「B=A」という事さえ認識できない
人間は、いやカラスや犬でさえ20億個のニューロンと数兆の「パラメータ」で遥かに賢くなれる

Brett Adcock @adcock_brett

Figure-01 has learned to make coffee ☕️ Our AI learned this after watching humans make coffee This is end-to-end AI: our neural networks are taking video in, trajectories out Join us to train our robot fleet: figure.ai/careers pic.twitter.com/Y0ksEoHZsW

2024-01-07 22:26:59
Yuke Zhu @yukez

2 years ago I was shopping for a coffee machine at Target. I found a perfect Keurig not for me but for my robot: - Round tray to insert a K-cup; - Lid open/close w/ weak forces; - Coffee out w/ one button click. There's no magic. Human ingenuity is behind every robot's success. twitter.com/yifengzhu_ut/s…

2024-01-08 07:23:28
Yifeng Zhu 朱毅枫 @yifengzhu_ut

If you want to learn more about how the task has motivated a line of research in manipulation, see the list: - VIOLA: ut-austin-rpl.github.io/VIOLA/ - HYDRA: sites.google.com/view/hydra-il-… - AWE: lucys0.github.io/awe/ - HITL-TAMP: hitltamp.github.io - MimicGen: mimicgen.github.io twitter.com/yifengzhu_ut/s…

2024-01-08 02:54:19

抄訳

コーヒーメーカーはカップを入れやすいトレイがあり、力が弱くてもふたを開閉でき、ボタンを押すだけで簡単にコーヒーを入れられます。
そのような人間の創意工夫があったからこそロボットにもできたのです。魔法はありません

Jim Fan @DrJimFan

One word: Copilot. I believe this is the best principle AI community should follow. twitter.com/karpathy/statu…

2024-01-09 03:32:51
Andrej Karpathy @karpathy

e/ia - Intelligence Amplification - Does not seek to build superintelligent God entity that replaces humans. - Builds “bicycle for the mind” tools that empower and extend the information processing capabilities of humans. - Of all humans, not a top percentile. - Faithful to computer pioneers Ashby, Licklider, Bush, Engelbart, ...

2024-01-08 11:11:10

抄訳

e/ia - 知能の増幅
人間を置き換える超知能を持つ神を作ることは目指さない
「トップ」ではなくすべての人間のための「心の自転車」を作る

AIコミュニティが従うべき原則を一言で言えば、"副操縦士"──つまり人間の補佐であるべき

リンク ひとり構造改革 Mobile ALOHAとは?AIの進化で実現するロボットの人間化 今日、AIの進化により、ロボットの機能は飛躍的に向上し、私たちの生活にますます深く統合されています。そんな中で注目を集めているのが「Mobile ALOHA」です。この革新的なロボットシステムは、ただの機械を超えて、人間のような柔軟性と対応力を備えています。本記事では、Mobile ALOHAがどのようにして人間の動きや判断を模倣し、日常生活やビジネスシーンでの応用可能性を探ります。
Jim Fan @DrJimFan

A very important clarification: the impressive cooking skills are remotely controlled by HUMANs. ALOHA is NOT independently autonomous. Think of ALOHA as a well-made sports car hardware. The superb racing skill is displayed by a human behind the wheel, not a self-driving AI. There's a bit of imitation learning in the paper, but it is nowhere close to generalization to arbitrary kitchens, objects, cooking recipes, or language commands. We are still far, far away from having a fully autonomous robot chef or maid. I am excited by the new research, but we need to tell the real progress from the media hype.

2024-01-06 01:05:23
Jim Fan @DrJimFan

What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level, generally capable robots. The brain will be. This work is done by 3 researchers with academic budget. What an incredible job! Stanford rocks! Congrats to @zipengfu @tonyzzhao @chelseabfinn Academia is no longer the place for the biggest frontier LLMs, simply because of resource constraints. But robotics levels the playing field a bit between academia and industry, at least in the near term. More affordable hardware is the inevitable trend. Advice for aspiring PhD students: embrace robotics - less crowded, more impactful. Website: mobile-aloha.github.io Hardware assembly tutorial (oh yes we need more of these!): docs.google.com/document/d/1_3… Codebase: github.com/MarkFzp/mobile…

2024-01-05 01:05:11

抄訳

Mobile-ALOHAによって人型ロボット開発のボトルネックはハードウェアでは無くなります。脳がボトルネックになります。
ALOHAは自律していません。優れたスポーツカーのようなハードウェアであり、スキルはAIではなく人間によって発揮されます。本当の自律型ロボットを作れるようになるのはずっと先になるでしょう。
私は新しい研究成果にワクワクしていますが、我々はメディアの誇大宣伝から本当の進捗を伝える必要があります。