James Lee’s Post

View profile for James Lee, graphic

Director

LLM can help junior or beginner to learn how to do the coding even speed up code development for an engineer.

View profile for Andrew Ng, graphic
Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

On Father’s Day last weekend, I sat with my daughter to help her practice solving arithmetic problems. To give her practice problems, I used OpenDevin, an open-source agentic coding framework, to write a Python script that generated questions that she enjoyed answering at her own pace. OpenDevin wrote the code much faster than I could have and genuinely improved my and my daughter’s day. Six months ago, coding agents were a novelty. They still frequently fail to deliver, but I find that they’re now working well enough that they might be genuinely useful to more and more people! Given a coding problem that’s specified in a prompt, the workflow for a coding agent typically goes something like this: Use a large language model (LLM) to analyze the problem and potentially break it into steps to write code for, generate the code, test it, and iteratively use any errors discovered to ask the coding agent to refine its answer. But within this broad framework, a huge design space and numerous innovations are available to experiment with. I’d like to highlight a few papers that I find notable: - “AgentCoder: Multiagent-Code Generation with Iterative Testing and Optimisation,” Huang et al. (2024).  - “LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step,” Zhong et al., (2024).  - “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” Yang et al. (2024). How can we test the code without requiring the user to write test cases? In a multi-agent system, each “agent” is an LLM prompted to play a particular role. An interesting result from AgentCoder shows that having separate agents for writing code and generating tests results in better performance than letting a single agent do both tasks. This is presumably because, if the agent writing the code is also responsible for writing the tests, the tests might be influenced by the code and fail to consider corner cases that the code does not cover.  When people think of testing code, many initially think of output testing, in which we see if the code generates the correct outputs to a specific set of test inputs. If the code fails a test, an LLM can be prompted to reflect on why the code failed and then to try to fix it. In addition to testing the output, the LDB method is helpful. LDB steps through the code and presents to the LLM values of the variables during intermediate steps of execution, to see if the LLM can spot exactly where the error is. This mimics how a human developer might step through the code to see where one of the computational steps went wrong, and so pinpoint and fix the problem. A lot of agentic workflows mimic human workflows. Similar to other work in machine learning, if humans can do a task, then trying to mimic humans makes development much easier compared to inventing a new process. [...] [Reached LinkedIn's length limit: Rest of text here: https://lnkd.in/gNkEC9gS

Open Model Bonanza, Private Benchmarks for Fairer Tests, More Interactive Music Generation, Diffusion + GAN

Open Model Bonanza, Private Benchmarks for Fairer Tests, More Interactive Music Generation, Diffusion + GAN

deeplearning.ai

To view or add a comment, sign in

Explore topics