Video 1: Introduction to AI Agents for Research
This introduction is a high level overview of AI agents as they pertain to research.
Video 1: Introduction to AI Agents for Research
Agentic AI is an asynchronous course offered through PREDOC.org, moderated by Janani Sekar, Pre-Doctoral Fellow at the Center for Applied Artificial Intelligence at the University of Chicago Booth School of Business. The course equips participants with practical tools to design and implement capable agentic workflows for research, drawing on insights from CAAI staff and faculty. Learners may watch the videos sequentially or explore individual modules independently, and are invited to follow along with the accompanying slides to deepen their engagement.
Video 1: Introduction to AI Agents for Research
Video 2: How to use Agentic AI
Video 3: Background and Essential Technical Concepts
Video 4: AI Agents and Their Advanced Capabilities
Video 5: Types of Agents for Research
Video 6: Vibe Coding for Researchers
Video 7 Custom Agents
Video 8: Takeaways and Review
Janani Sekar:
Okay. Hello everyone, and thanks so much for being here. This is going to be the first video in this series of a total of eight video modules, and this first introduction is going to be a high level overview of AI agents, especially as they pertain to research. And I know that PREDOC.org encompasses a variety of different academic research disciplines, whether you're coming to this from an economics or a business or a social science background, hopefully there is some takeaway that you find relatable and useful. I've tried to keep a lot of what we discuss here as general as possible so that it is going to be useful to you regardless of the type of research you do day-to-day. My name is Janani. I am a pre-doc at the University of Chicago Booth School of Business. I work at the Center for Applied Artificial Intelligence here. And I wanted to shout out to folks at both of those places for their support and helping make this video series possible.
And also, of course, to PREDOC.org for being such a great collection of resources for all of us. So to dive right into what we're going to cover over the next eight video modules, we're going to be discussing agentic AI as it pertains to research, but also how to be careful and mindful as we're using these tools. So we'll be discussing how AI agents are different from the traditional chatbots or language models we already use. When people say AI agent, what does that mean? Is that different from GPT or Claude? How do we leverage or use these agents for research tasks like lit reviews and coding? What are some of the things to avoid when we're using agents? And so as researchers, we care about things like replicability and transparency and reporting the right results. And so it's important to make sure when we're using black box technologies or technologies where we don't know all of their inner workings, we are very clear about where and how we've used them.
And finally, whether you come to this series with extensive programming knowledge or a pretty limited programming background, you should be able to access and use these technologies equally. One of the nice things about being able to turn natural language into coded projects is that really it doesn't matter if you don't have a wealth of programming knowledge, you're going to be able to benefit from AI agents nonetheless. So for some brief motivation on how this series came about and why it's so important that we look at this right now, especially over the last year, we're seeing a lot of literature documenting the labor market changes that are occurring because of AI agents and language models. And we see that these things are driving productivity gains, particularly in computational and mathematical and educational disciplines. So pretty much the exact type of work that all of us do as pre-docs.
And that is very evident in this figure here. This is from the Anthropic Economic Index. This is a report compiled by the folks that make the language model Claude. And the source for this, as well as all of the references that I cite and mentioned throughout this presentation will all be available in the slides, which hopefully are being made available to you, but also they'll be visible in the reference slide at the end of the presentation, which you should be able to see at the end of video eight. Okay. So we've established that a lot of our peers are adopting these tools that our field is moving in this direction. So hopefully you feel and understand the importance of familiarizing yourself with these tools. We said that AI agents might be a little bit different from the language models like GPT and Claude that we're used to using on the day-to-day.
What exactly makes them different and what do folks mean when they say AI agents? Well, this is a nice definition and quote from the Kempner Institute for AI at Harvard, and I think it summarizes really well what AI agents do that typical language models can't, which is effectively that you can think of an agent as a layer on top of a language model that is able to execute complicated workflows that might involve multiple different steps without necessarily having a back and forth conversation with you at every step of the way to make that happen. And one of the key tools that allows agents to do this are APIs or we'll discuss these in more detail, but external connections to other applications or services that might allow an agent to do things like access your calendar, write a PDF, run some code and produce a figure, search the internet.
And these are the types of things that really power them up over language models or at least very simple language models like the GPT and the Claude that we were using two or three years ago. Okay.
So, for a little bit more evidence on what agents can do that language models can't, let's focus on what language models can do that we all know that they can do. Chances are that you've used a language model either to explain code or explain maybe a mathematical or statistical model. You can use language models to translate code. So if you have a replication package written in R, but you want to see that in MATLAB, that's something that you can ask a language model to do. You can ask them to debug code, make sense of errors. You can ask them to make tables and figures for papers. And so these are all things that other pre-docs that I know are using language models to do.
But agents with all of their PowerUps can go even beyond this. And so by being able to do things like search the internet, use external tools, act without step-by-step back and forth instructions, the workflows that we can use agents for change entirely. So one really great example that I always like to talk through is imagine that you wanted a personal website, a pretty helpful thing to have, maybe when you're applying to your PhDs. And if you had to ask a language model to do this, you might say, "I need a personal website." It'll give you some code, you'll look at that code, paste it into an editor. It may or may not work. You'll go back, it'll give you some more code, you go back and forth. With an agent that is optimized for building websites, all you have to say is build me a personal website, maybe give it some details about what you want included on this website, and it'll build it from end to end and in the end, deliver you a link, hopefully with a deployed website that you can click on and view as a user without having to figure out how do I publish this website after testing that the code works by myself.
So it just makes this process a little less tedious.
There are multiple different types of agents. There's going to be an entire video module in this series later discussing the different types of agents, but as one perspective on what the different types of agents out there are, this is a nice breakdown from OpenAI. They break agents down into agents that manipulate data, agents that take actions like accessing calendars or sending emails and texts, and agents that manage other agents. So when you have a research agent, you can think of that as an agent that is delegating tasks to other agents that it is in charge of. So an agent to search for literature, an agent to read the literature, an agent to synthesize through writing into paragraphs, and then report all of that back to the final user. And so this is just one way to think about the different types of agents out there.
Like I said, we'll see others in a little bit, but hopefully this orients you a little bit towards the type of things that these tools can do. Here's an example of the diagram of an agent. Again, we won't think about this too much, but as a user, you're going to pass a prompt in. Some stuff is going to happen and the result is going to come back out. For a concrete example of this, suppose that you want an agent to deploy a survey that you write to Qualtrics. As the user, you might say, "Hey, can you build me a survey to see if people are more likely to click on job posting A or B?" Your agent takes in that question, it processes that information, including doing stuff like storing some information, planning the survey structure, talking to external tools like Qualtrics, and then it's going to get back to you and say, "Hey, I've managed to deploy this survey. Here's a link to it."
And you can see that this workflow is a much more complicated involved workflow than what you might be able to accomplish with a standard language model.
Janani Sekar:
Okay. Hello everyone, and welcome back to the series on Agentic AI. In this module, I'm very excited to talk to you all about exactly how to use Agentic AI, but also what are some caveats and things that we should be wary of as we are using these tools. So coming in, one framework that I find that is pretty helpful to think about the different tasks that AI agents can do is this idea of which tasks can we augmented with AI and which tasks can we automate with AI? And as researchers, a lot of the tasks that we do involve things like critical thinking and writing and programming. And as a bit of a spoiler, most of these tasks can be augmented instead of being fully automated. And so what does it look like? What do I mean when I say tasks that are augmented or automated?
Can we put a little bit more color to this? Absolutely. Some examples of augmented tasks include things like programming. When you ask an agent or you ask a language model to run a regression, it is on you to human supervise that a little bit by being specific about the structural form of that model that you're trying to run or that regression. Otherwise, you might get an agent or a program that makes certain assumptions that you may or may not agree with or had planned to make. Writing is pretty similar. If you've ever tried to use deep research or one of those agents to write an outline for a paper or even write a paragraph for your paper directly, you've probably experienced that it's not necessarily the best quality. And if you didn't review that outline before telling it to continue writing, what you might get is pretty bad.
And we don't want that type of thing to make it into a paper or even a draft. We definitely don't want to put that in front of our PIs. And then in critical thinking, a lot of folks, especially with newer models that have stronger reasoning capacities, use these agents to poke holes in their logic, to try and soundboard, to give them some writing and say, "Hey, is there anything that I'm missing here? Any information that we should explore further?" And this is also going to be a heavily human involved task. If you don't have your logic written down, then your agent is obviously not going to be able to identify gaps in it. In contrast, here are some things that we can maybe automate. The one on the left is something that I personally do all the time. So installing software and checking software, including doing things like submitting jobs on your compute cluster, I just gave my Claude code, it's a desktop coding agent, the documentation for the Mercury Booth compute cluster.
And now when I tell Claude, "Hey, can you submit this job to Mercury, which is our compute cluster booth, it's able to do that immediately." Data maintenance is another example. Suppose that you are repeatedly fetching data from the Federal Reserve or some other database in equivalent daily intervals, and you've already done that once or twice. You have a script that works, you know that it works, you don't need to keep checking it. You can outsource that to an agent, just make sure that it runs that program once a day to get you that data. So if you can think of any other tasks that you can automate, I definitely encourage you to try. I would say the main thing that you should be mindful of here is, can I do this task with very, very little or minimal human oversight? There are a lot of papers that discuss this augmented, automated space of tasks that AI can do in research.
I'll highlight a few here, Coronet 2023 and 2025. Both are really, really great resources. They also come with cookbooks, so actual implementation plans. If you are new to this space and you want to see how to use an AI agent to do a literature review or write a draft, definitely check that out. The other source that I wanted to flag here is Dell in 2025. The paper is called Deep Learning for Economists. And it's actually not about AI agents. It's more about how we can use machine learning methods to advance economics research, but there is a brief discussion of things like good prompting strategies in this paper. And what I really like about this paper is it breaks down the insides and the theory behind how language models and agents work. So if you want to know more about these tools and you want to feel less like they're impenetrable black boxes, this is a really great paper to read.
It will also be linked and referenced at the very, very end of this presentation. So we said the tasks that we do as researchers are difficult to augment exactly or difficult to automate. Why exactly is that? And what are some things that go wrong when we try to automate tasks that we shouldn't? Well, think about things like literature reviews and synthesis. What we find is that when we outsource these tasks to agents, sometimes they might return spurious papers. We have other things that can go wrong in the OCR space. Sometimes an entire document is left out, parts of the document aren't done correctly. We have to human verify what happens. We can try to label data sources or annotate data using language models and AI. Things often goes wrong when we do that. We'll talk about what exactly goes wrong there in a little bit more detail on the next slide.
And the same kind of pitfalls arise when we get models to interpret results. And what exactly are these pitfalls? What types of errors do agents make in this space? There's a little bit more literature that I wanted to reference that discusses this in more detail. Again, I'm not going to summarize all of these papers in great detail, but the goal is rather to direct you all to them so that if you're interested in understanding what goes wrong here, both empirically and theoretically, what have other people in this discipline found as best practices for using these models? I'm going to shout out a few papers right now. The first is Bowman et al. In 2025. Specifically, this is a great paper which documents how changing the prompt with which we talk to an agent or a language model for annotating our data produces wildly different results.
So imagine that you're using a language model to try and tell you, is this news positive, negative, or neutral about this company to try and use that as a way to predict stock price movements. Depending on how you phrase that prompt, the language model's annotation of that data point might be entirely different, which means that we are not creating findings or presenting findings that are very robust to the choice of words with which we communicate. That is obviously not a really great position to end up in. So you should be mindful of trying to test out different prompts and seeing how much that changes your results. And in fact, as you do that and you see that your results do change, there are also a body of literature that discusses how to debias your labels in this process. So you can use your language models and your agents to label your data if you do have some human validated ground truth.
And if you want to understand how to better do this in a statistically significant and mathematically defensible way, there are two really great papers that I've mentioned over here. And then the third category of paper that I will highlight is that bit about using LLMs to interpret results. So you might say, "That sounds like a terrible idea to use an LLM to annotate my data to begin with. I would never do that. I wouldn't even let an LLM or an agent run a regression for me. I'm still trusting myself more. I'm going to do that by myself." All I'm going to do is give it the output table and ask it, "What does it mean if this was the sign on my coefficient?" You can't even do that without validating the results because there is a lot of evidence that shows that LLMs and agents will systematically overgeneralize the findings from a analysis or from a model in a way that actually has meaningful impacts on the statistical significance of those results.
So we don't want problems with making causal statements when those things don't exist. We don't want problems with interpreting the sign of our coefficients in the wrong way. So please be very mindful when you think about using your language models and agents for these tasks.
So if it is this challenging and there are this many things that can go wrong when we're using agents for science and scientific discovery and research, why even bother? Well, it's not to say that they can't help us. It's more to say that there are principles for responsible use that we should keep in our heads as we use these tools. And this is a really nice framework from an AI initiative at the University of Cambridge on responsible LLM use for science. And the three main pillars that they highlight are data quality. So making sure that the language models that you use were trained on good data, making sure that you're using language models that were evaluated and were validated against what we call benchmarks. And so benchmarks are large data sets that have been curated by domain experts to assess a model's ability on a particular task.
And so there are benchmarks for medical science and benchmarks for legal work and benchmarks for how good a model is at doing the SAT or doing math or solving proofs. And if you're using your AI agent or language model in one of those domains, you can defend the fact that the information it provides is reasonably of quality if it performs well on an agreed upon benchmark in that domain. And finally, as long as you report the use of AI and are transparent about where you used AI in the work that you produce and that you put out into the world, it is going to be okay to use these tools, obviously, as long as you disclose their use. All right, fabulous. So if there are any takeaways that I have for you all from this module, it is not to say that we should not use AI.
It's not to say that these mistakes are the end of the world and that that should preclude us from exploring this new technology, but rather that with new technology also comes new categories of error and we need to know how to navigate this landscape and be sophisticated users that understand the limitations as well as the power of this incredible, incredible set of tools.
Janani Sekar:
All right. Hello everyone, and welcome to the third module in this Agentic AI for Research series. This video is going to be all about giving you the essential technical concepts required to understand the inner workings of an AI agent. So we've been saying things like AI agents have access to APIs and external tools. We've been saying that AI agents are reasoning engines using language models with other things built on top of them, the ability to execute multi-step workflows. What exactly does all of that mean? Well, let's get started with the external tools bit. So we'll first talk about APIs or application programming interfaces, which are basically protocols that allow software applications to communicate with each other. And specifically by communicate, I mean sending and receiving data. So when you have an agent that is querying a database like the Federal Reserve for inflation, or you have an agent that is writing a PDF or searching the internet, that is going to require that agent to connect to some other software application or tool or website.
And that happens through an API. And so if it's helpful to see an analogy, IBM proposes thinking about this as a restaurant. This is not my analogy. This is written by computer scientists who understand this very well. If you want to see the original reference for this, that will also be linked at the end of the presentation. For me personally, more so than even this analogy, I think doing an example can be a really helpful way to put concreteness to this. And so imagine that we have an agent that we want to communicate with the Spotify API. And by that, I mean we want data about songs that Spotify has in its database. So the way that we would go about doing this is we would get ourselves what we call an API key or authentication token. This is basically proof that we have the authorization to access data from Spotify.
You just need to log into the website and you can get yourself one of these. The next thing that we'll do is we'll tell our agent to write a structured query to get data from Spotify. This would involve basically saying, "I want data on Song X and the fields or the parameters that I want include things like artist, album, release date." The request will be written in a specific format, typically a format called JSON. It'll get sent over the internet. The Spotify API is going to receive that request. It's going to pass it to Spotify, query a database, get the information that you're looking for, format that responds in a similar format and send it back to you, the user, along with a code that indicates whether that interaction was a success or not. And you've probably seen these codes before, even in your regular internet browsing, where a 200 is typically a success code, a 404 is a classic error, page not found, resource not found, data not found error.
And this is how we know whether or not that interaction was successful. So we've now talked about APIs, how agents can access external tools, which is what gives them this leg up over standard language models, but not to ignore the language models themselves. Like I said, these are really the backbone of the agent. These are the reasoning and the brain behind that agent and how it plans tasks and how it executes tasks. And we've all seen and worked with language models at this point. We know what Claude and GPT are. To peel back a little bit of the mystery since people have asked me about this, I thought I would quickly touch on how language models work from a technical sense, and then we can get into the discussion of what they can do for us as agents. When we build language models, they are just neural networks, but they are trained on massive amounts of data, typically the entire internet.
And the way that they're trained is they see millions and billions of documents. And as they see those documents and the words in those documents, they learn which words occur next to each other in the same context, which teaches them how words relate to each other, the meanings of words, the similarities of words, which then allows them to learn a probability distribution over this entire vocabulary, over this entire space of different words. And so what happens is as a user like you or me asks a language model a question, all the language model is doing when it's generating an answer is using its probability distribution to generate the most likely, the most probable sequence of text to respond to that question given the words or the text in that question. It just keeps generating an answer or response word by word until the next most probable action is to stop generating text because it's reached the end of a thought or the idea or it's answered your question.
So pretty complicated in code and in math, but not super complicated in logic. So what kinds of capabilities become exposed when you have a model that is designed to do such a thing? So we know that obviously language understanding is the main one. It's also the main goal behind why these models were developed, but alongside an understanding of language, we also get things like the ability to problem solve, the ability to generate code, the ability to translate, not just between languages, but between formats, between tones, between audiences, and really this is what allows agents to do a wide range of cognitive work. And so there are things that language models on their own don't do very well, and this is where having access to tools and agentic capabilities is going to take them to the next level. If you've ever used a language model to do coding, you know that it doesn't have access to your local data files, the files on your computer, it doesn't know what's in your downloads directory versus in your documents directory, doesn't have access to all of the columns in your table.
So if you tell it to run a regression, it might use the wrong file path, it might use the wrong column names. If you pasted that in directly to your R or your data or your Python interpreter, it probably wouldn't even work. You might have to go back and forth a little bit. The same thing is also true about more complicated tasks like making plots or figures. So how do we get language models to work more effectively for us even when we have agents? So there is an entire field, entire corner of this literature, I should say, that is dedicated to helping us use agents and language models effectively by giving them the right kinds of prompts by asking them the right kinds of questions. So you can think about prompt engineering as getting AI to work for you by asking it or communicating to it in the right way.
I'm going to go over the next few slides, some best principles for prompting and getting good quality outputs from language models. These are strategies that are assembled by two papers that I think are really good reads, Liu et al and Wei et all in 2023 and 2022 respectively. These will also be linked at the very end of this presentation. I'm not going to spend too much time on any one of these slides. If you'd like to read the prompts over or test them out, I will pause the video or feel free to pause the video and look closer, but I'm just going to go through each of these best practices as quickly as I can. So the number one thing is to be specific. We generally want to avoid saying things like analyze this data. We instead want to do more of saying things like, analyze this data set, which has this column, this column and this column.
By analysis, I mean I want you to do one, two, and three, providing context. So also pretty self-explanatory, similar to the last idea. Maybe tell the model a little bit more about the nature of your data and what types of columns it has, whether those are numerical or categorical or text data, how many rows do you have? What do those rows represent? And so on and so forth. This one I will spend a few minutes on. Specifying the output format can be a really good way to guarantee some degree of data quality. This can be something as simple as write me a 200 word paragraph or give me two sentences or give me an outline or use bullets, but it can also be a lot more detailed like you are asking a language model about employees in your organization and you want to make sure that when it responds, each response has an employee name, age, ID, department, and the like.
This would not be something that I think we would typically do as researchers, but let me give you an example where you might use a structured output when you're annotating data. You want the actual annotation or the response from the model. You want some level of magnitude, so how much does this label or annotation apply to this data? And then you might want some confidence. So how much, how sure are you in this label or annotation that you've given me? So you can do a three-way structured output that gives you both your data, but also some metadata about the model's response. And the way that we would typically do this is using a Python library called Pydantic. I will leave this slide up for a little bit so that you can pause the video and look at the documentation for Pydantic if you're interested in understanding this in a little bit more detail.
For some additional strategies, viewshot prompting, using examples is a really good way to get the model to answer in a format that is closer to what you're expecting. You should try to structure your examples from least complicated to most complicated in my experience. There's also some trial and error to be done here in terms of how many examples is sufficient and how many examples is too much such that your model gets confused. That definitely just is situational and comes from some trial and error. Chain of thought is explicitly telling your model to think. A very similar strategy involves what we call role prompting. So saying you are a teacher or you are a research assistant, or you are an econometrician, or you are a financial advisor. And what this does is somehow inside a model that already had this knowledge just unlocks that knowledge. And so telling a model to think like an econometrician somehow automatically makes it better at running a diff and diff.
We don't fully understand why, but we know that this is true in practice, so you should definitely try it. Specifying constraints is another one that I'll spend a little bit of time on. I find that it is helpful to tell a model what not to do, and giving it this laundry list of instructions on how not to behave actually makes sure that you constrain the space of response and volatility and response that you might get. So also worth flagging. Some common challenges that both I've experienced that a lot of my peers have experienced that people have documented in the literature are order effects. So this is what I said earlier when I said the order of examples when your few shot prompting can really make a difference. Framing effects, are you framing your question in a positive or negative way and how does that change your model's response?
How detailed or verbose is your prompting? Detail is generally good. There is also such a thing as too much. And keep in mind that things that might seem very clear to you can actually be interpreted ambiguously by your model. And so obviously we can't anticipate all of these ahead of time. The best way to see what prompting strategy really works is to try a bunch of things and then compare results. And especially if your analysis for whatever research task you're working on is not robust to different types of prompts, you want to make sure that you report all of those results and you see if there are any underlying patterns in prompting that cause the responses to be more similar or more different.
Janani Sekar:
Hey everyone. Welcome to the fourth video in this Agentic AI series. This is the video where we're actually going to get into the weeds of the capabilities of AI agents. So I'm very excited. Let's dive right in. We've talked a lot about what transforms language models into agents. A lot of that is the ability to do multi-step tasks, also the ability to interact with external tools, to access real-time information, to search the internet. So let's prime ourselves with another example of a workflow with and without an agent. Suppose that we wanted to know unemployment last month and we asked an LLM that didn't have access to the internet. So think about this as GPT from say two years ago where it knew some information, but it couldn't cite its sources. We wouldn't know if that information was up to date or not or where it was coming from.
In the best case scenario, you ask a model, what was unemployment last month? And it says, "I don't know." The worst situation is when it tells you information that sounds plausible very confidently, but that information is just false. And that is what we call a hallucination. And that can be a pretty dangerous thing if you aren't aware that your model has hallucinated, you aren't aware that this model say doesn't have information in its knowledge base beyond 2023. With an agent, this problem gets mitigated because today's GPT has access to search and we would consider that to be an agentic capability. What GPT would do now if you asked it for unemployment last month is it would run a web search for current unemployment rate. It would access the Bureau of Labor Statistics website. It would look at what was unemployment as of January 2026, and it will send you that information.
It will return that information to you along with the link and the citation for where it came from so that you can go back and verify that that is true. And this really matters for research because timeliness is important, accuracy is important. Being able to cite our sources obviously all really matters. Here's a quick example of what a workflow for me looked like with and without an agent. I'm mostly writing code. I'm not doing a whole lot of querying for unemployment. But without an agent, if I had to write some code, I would ask a model to say maybe merge two datasets. It would write me some code. I would paste that into my editor. It would error out. I would paste the error into the model. It would try to fix it. I would put it back. New error, we go back and forth until it works.
It's not a bad process. It is a little bit faster than if I had to figure out how to write that code by myself, but honestly, not by a whole lot. With an agent, I can just drag and drop dataset one and dataset two along with a quick request to merge, and pretty quickly I'll get back my merged data. Now, if you wanted to be even better about making sure that this merge went well, you could add some color to this prompt, like we talked about in the previous module that might involve saying things like, "The key that you should merge on is this column," or, "Feel free to drop these rows of data, or if you encounter this error, address it in this way." But even without that, an agent should have the ability to see when it runs into an error and try to figure out the solution from there without being prompted by you, the user.
So we've talked a lot about agents and having access to tools and being able to connect to external applications. What is the whole space of tools that agents can access? The answer is that it is a very, very wide space that I couldn't even really begin to break down in its entirety. But as researchers, the two collections of tools that I wanted to flag for us are data tools, including access to APIs like the Federal Reserve, but also the ability to read CSVs and data files and query databases, and a second group of tools that we will call action tools, which involve things like writing, executing code, accessing files on your computer, running jobs on your compute cluster, generating figures, saving those figures to a directory, maybe even LaTeXing information for you, that type of thing. We've also talked a lot about agents as coding agents or coding tools.
I want to emphasize that agents can do a lot more than just write code. They can write in English, they can build websites and deploy websites. They can help us outline and tasks like presenting a literature review or putting together slides. This is a quick example of a similar workflow to the previous slide, but for developing a personal website, I'm not going to read out this dialogue, but again, feel free to pause and compare and contrast the left and the right side.
We'll finally talk about the architecture of an agent in more detail. This is a figure from module one where maybe it made a little bit less sense, but now that you have the technical jargon to understand what I say when I mean reasoning engine or API, let's tackle the figure on the right. We start with our prompt. This user says to this agent, which has access to the Qualtrics API, I want to survey to see if people are more likely to click on job posting A or job posting B. From there, this agent, which is powered by GPT5 is going to store in its memory the fact that this particular user has information on posting A and posting B. And from there, it is going to plan the structure of the survey to get information on which stimulus people like more. It's going to write those questions and call or make a connection to the Qualtrics API, where that survey is going to get published to your Qualtrics account.
From there, it's going to return a link or maybe some affirmation that that survey question was written to Qualtrics and give you something that is a lot closer to a finished working product. So the core of the agent, remember, is the orchestration engine, that is the LLM, and this is going to handle most of the tasks that the agent does, including planning which order to execute in, what tools it should be calling, if at all, how to interpret the results from that API or from that tool call, and how to handle errors and try again if that doesn't exactly work as planned. All right, the second part of the agent is, of course, the memory, and there are two types of memory that I want to call out here. So the first type of memory is just like humans, short-term memory. This is things that you might've talked to the agent about in that session.
So things like when you query data, make sure you're only giving me data after 2010, and also long-term memory, which are things that you can configure your agent to do. So when I say that my agent, my Claude Code desktop agent, for example, has access to my files on my computer and the documentation for my compute cluster, so this lives in the agent's long-term memory. Anytime I initiate a session with this agent, all of this gets loaded into its memory so that it doesn't have to be repeated by me over and over again. And then finally, remember there are a whole bunch of tools that agents have access to. I'll flag a few. Search we've already talked about, file writing tools we've already talked about, and coding environments themselves are also tools. This is how GPT today sometimes is able to run a simple Python script and tell you the results without you having to copy paste that into an external editor and run it yourself.
Are agents always necessary? What is the value add of an agent? Is every task going to benefit from having an agent? The answer is generally no. Here are a few cases I think where agents can add a lot of value over a standard language model. Anytime timeliness is important, anytime a workflow has multiple steps, anytime we need some sort of external tool, anytime we need to send or receive data, really, we should and can use an agent. But just because we can does not mean that we should. There are also plenty of examples I can think of where fancy agents are not necessary. This is anything that really involves technical questions that we're reasonably confident a model can help us with. You just want to be reminded of the equation for the standard normal CDF. That is something that I'm sure a language model like GPT or Claude can do without having an agentic connection to some math textbook.
For simple one or two lines of code, similar sort of thinking for quick verbal or grammar checks, also really no need for an agent, but I'll leave it to you to discern which of your use cases for which you want to use an agent versus just approach a regular language model.
Janani Sekar:
Hi, everyone. Welcome back. This is video five. This is all about different types of agents that can help us with research. I'm going to present a perspective on different categories of agents and feel free to think while you're listening to this video about which ones will be more or less helpful to you. Obviously, depending on the nature of your work, you're going to find some of this more interesting than the rest. Okay. So this is a perspective that we already saw, I want to say in module one, but this is from OpenAI and it was a breakdown of some of the different types of agents out there. There were data agents and action agents and orchestration agents or agents that managed other agents. I think this is a reasonable categorization. I'll present a different one based off of a lot of what I'm seeing in the social science literature, as well as doing in my own life and my day-to-day research, starting with deep research agents.
So deep research is this idea that there are agents that can do comprehensive literature reviews and then synthesize all of that information and give it to you in a way that ideally you could paste it into a paper. Are we there yet? I think absolutely not. Could we be there in a few months or years? I think definitely. I don't really know why. These are called deep research as opposed to just research, but I am not the person to ask about why this is the case. These are a few of the different deep research agents available on the market. Pretty much every language model provider's got one these days. This table specifically is from CoreNet 2025, and I think it does a nice job of helping you navigate, depending on your use case, which deep research agent you should use. Notably, there is pretty significant heterogeneity in how quickly they do this querying or searching of the literature.
There is also definitely heterogeneity in price and know that some of the paid deep research models are actually pretty expensive to run. So if you're just trying to get a feel for whether or not this type of agent is even something that you would find useful, I would highly recommend that you do start with one of the free ones. It's going to be a slightly more accessible place to start. Okay. So deep research agents can have pretty high value add, but with some limitations. Know that when you run a deep research query about some topic, you may or may not have information that is entirely credible. So oftentimes I've gotten information from Wikipedia or blog posts and not always from peer-reviewed journal articles. And that's always something that you have to look out for because the information can be presented as legitimate and it's on you to look at what those links are and confirm whether or not it is.
The writing might not be good, so this is why I said we're not quite at copy pastable literature review level yet. Oftentimes I find that the outline and the sources, even if they are of quality when they're synthesized, it's not necessarily the way that I would choose to do it. So be prepared to rewrite or reorganize the way that this information is presented. And going back to a pretty recurring problem from that prompt engineering discussion in module three, know that we do expose ourselves to false claims over generalization, even to the extent of falsely causal statements. So keep in mind all of this when you're using deep research agents, and as long as you know to look out for these problems, you can use them pretty effectively, at least for collecting information. So what are some best practices to mitigate these risks? Definitely always be suspicious.
Try to specify filters for quality. Say that you want information from certain journals or from certain authors. Ask for nuance to maintain some standard of paper or data that's reported to you. So things like, I want effect sizes and confidence intervals. I want papers that report these things in their tables. You can even take a paper that you like on a topic and ask the model to find more literature that's similar to that. That's a way to guarantee some quality or some on-topic-ness. And one thing that's worked really well for me in practice is to start very broad and then slowly narrow things down. So when I say start broad, for example, you might do something like give me an overview of the literature on carbon taxation, see what the model gives you. Take the one or two papers that you really liked from that deep literature query or deep research query, and then pass it back in and say, "Now I want more that is pretty similar to this. Please think about this problem in this direction."
Okay. The next category of agent that we will talk about are, of course, the ever present coding agent. If you are a Python person, you've probably worked or tested out one of these. They're also actually pretty good with R in my experience. I have not messed around with them a whole lot for Stato or MATLAB or any of the other statistical programming languages that we might use, but I would imagine that there is a coding agent out there that is reasonably good at any of these languages. And if you're a Python person, we've got Cursor and GitHub Code Pilot and Claude Code. And of course, this process of giving an agent natural language instructions, so just English instructions on what you want your code to do and then getting back out working code is called vibe coding. The appeal is that this definitely reduces the barriers to writing good code.
You no longer have to be a computer scientist or have extensive programming knowledge or no specific syntax to be able to write code. And it allows us to focus on logic rather than get into the weeds of, oh no, I meant to put a colon here and that's why this code didn't work and that's why I got this error. The downside is that if we just let a coding agent go off and write us 5,000 lines of code without checking intermittently, then when it does inevitably break, when things stop working, we will not know where to start looking to try and fix it. And this is a meme that I saw circulating on Twitter a few weeks ago, and I think it is most definitely true if you let your vibe coding agent just get away from you without supervising it or paying attention to what it's writing.
So here are some examples of where vibe coding can go wrong. I will not read them all out loud, but again, feel free to pause the video here to look at each of these scenarios or examples in a little bit more detail to try and see if you can anticipate why it might present a problem.
Here are some other failure modes that I've noticed. These are worth calling out. I've noticed that language models are typically trained on historical code. So even if they have access to the internet and have agentic capabilities like the latest versions of GPT and Claude, they will still sometimes try to load libraries or packages that are old and outdated, which will cause other parts of your code to break because I think they prefer to rely on their internal knowledge base over doing an internet search to the extent possible. So look out for that. Anticipate that as a type of error that might occur. I definitely see all sorts of inefficient and badly written things. This is what we refer to as AI slop. You might get code that works, but just barely because it runs really slowly. You might get code that's just written in a way that's very difficult to follow that doesn't really adhere to good code writing practices or it's not commented and so you don't know what's going on where.
And then there are user specific things. So getting a language model or agent to work with your compute cluster or with the coding guidelines that your lab uses might require some additional work on your end. Best practices for mitigating these failure modes are, of course, planning. So one big thing to do is to write pseudocode or logic, and that way your agent doesn't have to come up with both the logic and the code from scratch. If you give it the logic in English, it's going to be a lot easier to translate that into code. A similar thing goes for debugging. Ask your model to explain at each step when it generates code what it's doing there. This is also going to limit the burden on your part to have to understand the code line by line if you're getting with every 10 lines of code that get generated, a high level overview of what each section does.
And that way when things break, you can roll back to the most recent working version and then restart from there. Ask the agent to catch mistakes as they happen. So say, when you write code to clean this dataset, also write code to check that there are no duplicates, check that the orders of magnitude are right. So if your column has data that is all decimals and then all of a sudden you're seeing 10 to the four, that's maybe a sign that things have gone wrong. And if your agent has internet search capabilities, ask it to give you documentation about the libraries and packages and syntax that it's using, and also feel free to give it documentation. So this is a link to my compute cluster. This is a link to this library that I'm using. Can you make sure that you're using the most recent version of this and that it's not outdated?
Okay. So we've now talked about coding agents and deep research. The next category of agents, I'll very briefly touch on our creative agentic platforms. We are seeing platforms like Lovable and Gamma and Notebook LM. These are agentic platforms where you can talk to a chatbot in natural language and get the tool to generate for you things like slides, graphics, visuals, even videos. They're helpful for creating a rough outline or diagram. I think we're not yet at the point where you might be comfortable using one of these tools to generate slides for a talk, but we are definitely at the point where we can use these tools to generate the preliminary version of slides for a talk that we might have to go in and fill in ourselves. And the final category will, of course, be vibe-coded custom tools. So remember when we can do things like turn English into code, if there isn't an agent for something that we want to do, so for example, you want to work with Qualtrics or you want to work with Spotify.
There is no Spotify API agent that exists on the market that you can buy a subscription to, but with a Spotify API key and an understanding of how APIs work, you have all of the tools that you need to prompt a coding agent to build for you your own custom agent that can access the Spotify API. So I want to leave this here because I think this is going to be something very important and we will actually discuss this in more detail two videos from now.
Janani Sekar:
Hi everyone, and welcome to video six. This module is going to be all about vibe coding for researchers. We spent a small portion of the previous video, I know, discussing how to vibe code. This video is going to be a little bit more guiding you through workflows and examples where you can use coding agents in your work. Before that though, I also know that some of you have told me that you are nervous about presenting AI-generated work in a professional context or to your PI or to an advisor. I totally understand that. If that is the case and you're not yet at the point where you feel comfortable in your ability to validate agentic outputs or vibe code something that you would present professionally, I do not want you to do that. I do, however, want to give you a few examples of places that you can consider vibe coding outside of your professional life to just try and gain some familiarity and comfort with doing this because I think it's such a fun and accessible way to use agentic tools.
So we'll start with personal academic websites. I know I've mentioned this a few times at this point throughout this video series. One way to do this is to have a coding agent build a personal website for you. Another way to do it is actually to use one of the creative tools that I mentioned above. One of those creative agents like Lovable or Gamma. They're actually capable of generating websites as well without you even having to check the code. So all of the code there is masked behind the hood. The interface works with a chatbot where you tell it what your website preferences are and it'll kind of iterate with you until it gives you a website that you are semi-satisfied with. The only thing that I'll say is that they will usually cost somewhere around 10 to $20 a month, typically a subscription-based model versus vibe coding agents do have free tiers.
And so if you're willing to try and inspect the code yourself, doing this purely with a coding agent might be a more economical way to approach this. Experiment interfaces, this is one that I want to spend a few minutes talking about. Personally, this is where most of my vibe coding experience professionally has been. Particularly if you are bored of Qualtrics and you do have to make surveys for work and you are tired of that drag and drop survey flow interface or that randomization logic that is difficult and annoying to implement, I do highly encourage you to try using another platform, a maybe more coding heavy platform that doesn't have to feel so coding heavy anymore because you have an agentic interface like Cursor or Claude Code to guide you through it. So two of these platforms are Streamlit and OTree. These are both Python libraries for building interactive websites.
OTree is actually designed by researchers, I think psychology researchers, if I'm not mistaken. And so it is specifically made for doing interactive decision-making tasks. So playing prisoners dilemmas with two people or having two people talk to each other, teacher-learner type of interactions. OTree is going to be really great for that. Streamlit is another more generalizable library in Python for building interactive websites, and it integrates well with Prolific and MTurk. Both of these tools do in my experience, and I absolutely recommend that you test them out. Again, if you're nervous about doing this professionally, feel free to take an old Qualtrics survey that you might've programmed and try to get an agent to translate it into Streamlit or OTree using Python for you, see how well that works, and if that's something you're willing to try professionally in the future after that. Data dashboards are another really great application.
These render very, very nice to look at figures that are also interactive. So figures that you can click into to see the numbers or to see the breakdown or the distribution and Streamlit. The same platform I mentioned above for experimental interfaces also works for building interactive dashboards. There is an equivalent in R called Shiny. And I've seen a lot of people actually take papers that they really like and try to summarize the results of that paper in a more user-friendly, more general audience way using one of these data dashboards that they vibe coded with a single natural language prompt to a coding agent. And finally, I'm going to be a little bit repetitive because I know I mentioned in the last module that we would talk about building custom coding agents at some point. We're not going to do that just yet, but skip ahead to the next video if you are itching to find out how we might be able to do that.
Okay. What we'll do now is we will talk through an agentic workflow for some tasks that might be relevant to a researcher like me or you. And let's set the scene. We have some survey data that we've collected and we want to analyze the effect of some stimulus. So we had a treatment group that saw the stimulus in a control group that did not. And we want to explore our data and maybe get some summary statistics, validate the data, see what's going on here, make sure that we do some quality control, and we can write a detailed prompt to our agent, our coding agent, to tackle this task. We don't actually have to paste in the data itself because the agent has access to the file system on your computer. This is definitely true if you use a tool like Cursor or ClaudeCode or GitHub Copilot.
The agent is going to run this code. You should, of course, at every step of the way, be checking the outputs, making sure that they make sense, and then you'll move into asking the agent to do your primary analysis, which will actually be running some sort of economic model, estimating some parameter. And once again, at every step of the way, you should be validating the results, asking the agent for explanations for the code that it is writing and making sure that what you are seeing is in line with what you are expecting to see. And then once you've got your preliminary analysis done, maybe you've ran a model or two, you've got a table with those results, you might want to do some additional robustness checks or validation, maybe add some covariates, include fixed effects. And this is also something that your agent should be able to build on with very limited prompting.
Finally, you can also ask your agent to package your results. It should be able to take the results from your regression tables or from the code to run those models and turn those into figures based off of your preferences for color, axis labels and the like. Maybe not quite for a paper, but definitely good enough to show your PI. Okay. So suppose that I want to create an online behavioral experiment where I am studying how people make decisions under uncertainty. I'm going to do this in Streamlit as opposed to in Qualtrics. And this is a true experiment that I program. So this is a real prompt that I used and I had to trial an error with the prompt a little bit, but this was enough to get me a skeleton of this project. And then I went through and I changed a few things around.
So I said, I wanted a consent form, an instructions page, practice rounds, and a main experiment. We wanted participants to see two lotteries with probabilities and payoffs. We wanted to allow the participant to choose one of those payoffs, show them the outcome, and also store variables in my database, including what choice they made, how long it took them to make that choice and things like that. And eventually I had to go back and add some constraints to this because you can see how this part of the prompt is the most general and maybe not fleshed out. So we want to limit the payoffs to a certain range. We want the probabilities to not be entirely random. We want them to come from some distribution. Things like this, you do have to go back and workshop, but a prompt like this will be more than enough to get you set up with the skeleton of an online experiment in Streamlit.
At that point, you might not even need the agent to make those edits if you gain some sense of what's going on in that code, you can make those edits yourself.
One thing that I wanted to flag is that sources of unstructured data like text and images can be very, very powerful inputs to economic models. So even in a standard regression, if you have a way to numerically represent an image, that is something that you can include in the right-hand side of your model. One big barrier though to using these data sources is that processing unstructured data typically requires advanced computational knowledge. Oftentimes we use neural networks to turn text or images into vectors, into numbers that we can use in economic models. One advantage of vibe coding is that this type of data transformation is no longer off limits to someone that doesn't have that computational knowledge. So the ability to vibe code just strips away this computational overhead because you can just vibe code the pipeline that takes your image, your unstructured data, and gets you some tabular structure.
And so one example application might be that you have satellite imagery of agricultural areas, you want to predict crop yields, you can vibe code a processing pipeline to turn those images into structured data that you can now include in an economic model. So the main takeaway that you should have for this module is that coding agents are not going to replace human expertise. When you vibe code an application, when you vibe code a model, a data pipeline, you'll still need to understand what's happening at every step of the way and take the steps to make sure that you understand what's happening every step of the way. The primary thing that this is going to save you is all of that time and energy that you would've needed to learning and understanding and familiarizing yourself with syntax. And so one perspective that I was given, and I don't know how true this is for you, but to treat the agent as an undergrad RA.
So it is going to help you get your tasks done, but no way is it going to be able to replace your expertise and your understanding of the problem that you're working on.
Janani Sekar:
Hi everyone, and welcome to video seven. As promised, this will be the module where we finally talk about how to use AI agents, coding agents in particular to build your own custom agents. And we will do, as promised, a more in- depth demo here. All the materials will be shared. Okay, so the thesis of this module is going to be that if we can vibe code and we can turn natural language, whether it's spoken or written language into working code, then we can build agents for tasks that do not already exist. So remember I mentioned earlier, imagine that you want to interact with the Spotify API and get data from Spotify, or you want data from the Bureau of Labor Statistics, or you want data from some other database on the internet. We can instruct a coding agent to take a language model, connect it to these APIs, and build us a custom tool for some repetitive task that we find ourselves doing often.
Typically, anything that is repetitive that we know that we can review and quality control reasonably well is the type of thing that we want to outsource to an agent. So like I said, we're going to go through an example. This example is coded entirely through natural language. It's vibe coded in Google CoLab. So Google Colab is a interactive, collaborative platform. It works on the browser and it is for Python code. Because it's by Google, it comes by default with Gemini integrated, which means that there is a chatbot on the side that you can interact with the entire time that you are programming in Google Collab these days. It is also free. At least there's a pretty generous free tier, so it's a great place to start if you are interested in beginning your vibe coding journey. So this agent that I built for this demo is an economic agent using FRED, which is the St. Louis Federal Reserves Open Access API.
You do have to go in and make an account and obtain an API key, but once you do that, it is pretty straightforward. So let's open up this demo. I wanted to store the conversation that I used to make this demo work, but it seems as if that's been clear. So well, let's run this notebook step by step so that I can show you how this custom agent works. And then at the very end, I'll show you how you can use the embedded coding agent in the notebook to build on what might already be here. So the prompt that I used to ask Gemini to do this was I said, I already went to the Fred website and I got myself an API key. Using this API key, I want to be able to query the Federal Reserve's data, pass in requests to my agent, and have an agent that can go through, access that API and give me the data that I'm looking for.
And I didn't write any of the code that you're seeing here. This was all done by this model. Over here, the model has told us that the reason for this code here is it's going to install some libraries in Python. It's then doing some setup of our API key. I had to put my API key in this secrets section of my notebook that is basically going to obscure the actual letters and numbers of the key itself because it is a sensitive secure access token, but it's going to give my model and this coding session that I'm running access to that login information. The agent then created a function in Python to get data from this API, get a series of information. So a series might be CPI, it could be inflation, it could be any other data that is found in this database. And so we'll run this cell to load that function.
We then need access to a language model. So remember that an AI agent is powered by a language model reasoning engine. Apologies. We got disconnected there for a second, but as I was saying, the core reasoning engine of any agent, including this one, is going to be a language model. And because this is a Google environment with a Google LLM-based coding agent, it has decided that the LLM reasoning engine, the agent is going to be driven, this FRED agent is also going to be driven by Gemini. So the language model that we are going to use for our Fred agent, just as the language model that is being used here in this notebook is going to be Gemini. And so in order to access Gemini in our code, we're going to need another API key because Gemini is also a software application. And to talk to this language model from this programming notebook, we are going to need an API key.
Fortunately, I've also gotten one of those already and stuck it in my notebook secrets. We will have to run this function. This is going to initialize this model fingers crossed that this works. That did not work. All right, let's see why that doesn't exist. I see it's because it's called Google API Key. Let's fix it. Let's give the model access to it and then fix it.
I can't type that for some reason. So let's just make a new one called Google API Key. And we're going to copy paste what's here and put it in here because it's going to be the same. And sorry, friends, I'm going to obscure that from you so that you are not able to get into my Google account. And I'm going to run that again. And now that worked because this notebook has access to my Google API Key. Let's run this. What the coding agent has decided to do here is set the Gemini model inside this reasoning engine to be Gemini 1.5. That is a pretty old version of this model, and that is A- okay. It is going to be good enough for querying the Federal Reserve. Now we're building the agentic logic. So this is where it is defining the ability of the agent to access the Federal Reserve tool.
And so the description of the tool is one that fetches economic data series, and it's going to always get us an economic series with a start date and an end date. It's also got a prompt for getting that series. And so this prompt was not written by me. In this case, it was written by the agent because this all happened automatically and agentically. Okay. So that happened and now this agent does a bunch of things. Notice how it calls the tool, it gets a response, it looks at that response, and then it returns to the user either that the data was successfully returned and it displays it, or it tells the user, "Hey, I ran into one of many errors and unfortunately I was not able to find the data that you were looking for." Let's test out this agent. I did this earlier today and it looks like it worked.
Let's make sure it works again. I said, "What is US GDP since 2010?" And that is a 404 error because Gemini 1.5 is no longer available. So let's see if we can find out if there's another model that is available instead. Maybe we can try Gemini 2.5 flash that'll work instead.
This is a good lesson in vibe debugging. That didn't work either. Okay, so let's get the model to maybe agentically explain to us what's going on. If it takes a really long time to think, I might cut the video here and then restart when it's done thinking. So just a heads-up that that might be happening in a few seconds. Okay, so coming back, you can see in this window that the model finished thinking, and it said that Gemini 2.5 flash latest is not a valid model name either. It's using Gemini 1.5 flash, which apparently should resolve the issue. Let's see if that works. We can hit this accept and run button. That did not work either. So let's tell the model what to do.
I'm actually just going to give it the direct error message because it tells us what to do and it says call list models to see the list of available models. I don't know how to call list models because I don't exactly know how that function works. I don't know if it takes inputs. I don't know what its outputs are. So I'm going to let the coding agent try and address this for me. Once again, I might cut the video while it's thinking. Just know that that is coming up. As I said, I wasn't sure how to use that function earlier, but it seems like my coding agent has experience doing this, so let's run this to see if it can. Perfect. Okay. So now we've got a list of all of the models that we can use. Sure enough, 1.5 is not one of them, but 2.5 flash is.
Had I tried that earlier, but maybe I had latest at the end, so let's see if this works this way.
Perfect. And that looks like it worked. It fetched data successfully. There is a start date, but there's no end date because we are going since 2010. So up to now means that there is no end date, and that looks like a series ending with 2025 July as the last value. This is the cell where I listed out the models. I'm not going to delete that. It's kind of out of place right now, but we'll leave it in for reference. And we can also test out unemployment. Similarly, see that we are passing in in natural language in just plain English the data that we're looking for, our agent is able to parse this and then figure out what column or what economic series to query from the Federal Reserve. So I think this is a pretty cool thing that we don't have to know that rate is how the Federal Reserve stores unemployment.
We can just say, "Show me unemployment," and the agent is going to be able to translate that into the data that it has to fetch. We can do CPI similarly. This is going to work well for us. And then finally, I also asked Gemini to do this earlier today, so I told this to do it. I told it to do this in ... So I also told Gemini to do this earlier today, which was to take the data that we've queried, so to get GDP, to get unemployment, to get CPI, and then to make me some visualizations. So this is a lot of code. I did not write any of it. It was just a prompt that I asked the model, and it wrote me some code to visualize these series. Let's see what that looks like. And so there's unemployment, there's CPI, there is GDP.
We've got those COVID spikes. Everything looks pretty much like we'd expected to, and it was not all that difficult to go from our API key to our agentic interface, to even getting visuals of these economic variables. So thank you all for watching this demo. Hopefully this agentic interface and this process makes a little bit more sense now that you've seen a concrete example of it. Hopefully some of that live debugging, which I didn't intend for, was actually additive and interesting to see. And I will see you all in the last video, which is the conclusion, and I will do my best to make sure that we make this material and this notebook available to you all also.
Janani Sekar:
Okay. Hi everyone. Welcome to the eighth and final video in this series. You made it to the end. We're just going to do some quick review of what we've covered and some takeaways. Hopefully you understand why we should care about AI, LLMs and Agentic tools. You understand ways in which they can improve our research output and our productivity. You understand how machine learning methods, things like unstructured data can be incorporated into the type of research that we do. It says economics research here, but you know I more broadly mean business, statistics, any type of work that involves running models and crunching numbers. And you also at the same time see the caveats about using LLMs and AI agents in day-to-day work. So when we're using language models and generative AI to label our data, that might be biased. When we use these models to interpret our results, they might overstate our conclusions.
So all things to bear in mind. Hopefully you understand some technical concepts like APIs, how language models actually work. You remember some of those prompt engineering best practices. No worries if you don't remember that all of the references will be included at the end of this module. You understand when to use agents and when you don't actually need an agent. Remember that if you're just asking a language model about some mathematical concept that you're trying to review that a language model you have reason to believe knows very well, you don't need to build out a custom agent for that. There are different types of agents. We talked about deep research. We talked about vibe coding tools. We talked about creative agents like Lovable or Gamma, and we talked about your own custom tools as well. Some parting advice I've got for you, remember that verification is always non-negotiable, especially because we are predominantly in this augmented task space as opposed to this automated task space.
Agents can accelerate our work, but they will never replace our judgment. Actually, I'll never say never, at least in the short run, they will not replace our judgment. We should not treat agents as block boxes. We shouldn't just ask them questions, take the results and go forward with them. It is on us to know what they do, especially when we're building our own vibe coded agents. We can easily see that when we look at our code, like that CoLab notebook we looked at in the last module. When we vibe code in general, we can ask the agent to tell us what it's doing at each step of the way so that we understand the processing steps that it's taking. Know that prompt engineering matters. What you ask your agent, how you ask your agent that question, the order in which you show it, examples, the roles that you tell it to take on, all of these things affect the quality of our results.
Remember to choose the right tool. Remember when you're doing deep research, there was that table from CoreNet25 of all of the different deep research agents you could choose from. Remember that not all tasks will require agents. Remember that there are cases where you want to vibe code your own agent, put in that fixed cost versus cases when you don't want to. And remember to always be transparent and use good research practices, document AI assistance in your work. Always tell your PI if you used AI to prepare some memo so that they are not accountable for your AI use. Be clear about what the agents did when you're presenting anything versus what you used your human judgment for. This is that set of references. I just wanted to thank you all for being here with me. Shout out to all of these folks once again. I hope you learned something from this series.
I hope that you are curious to read and learn more about some of these incredible tools. There is a new language model or agent on the internet and on Twitter and on the market pretty much every day these days. And it's really great to be able to stay on top of this stuff and be able to participate in conversations about where this field is heading. I think it's a very exciting time to be a pre-doc and hopefully you do too. And thanks again for being here. Hope you enjoyed it.
1 of 1
This introduction is a high level overview of AI agents as they pertain to research.
Video 1: Introduction to AI Agents for Research
Learn how to use AI efficiently and responsibly by differentiating between tasks that can be augmented and automated by agents.
Video 2: How to Use Agentic AI
This video is about the essential technical concepts required to understand the inner workings of an AI agent.
Video 3: Background and Essential Technical Concepts
Dive into the capabilities of AI agents by exploring their impact on common workflows used by social science researchers.
Video 4: AI Agents and Their Advanced Capabilities
Consider how understanding the advantages of different types of agents can enhance your research.
Video 5: Types of Agents for Research
Crash course into vibe coding through agentic tools.
Video 6: Vibe Coding for Researchers
A primer on building custom agents and tasks.
Video 7: Custom Agents and More Demos
Eight and final video in this series about AI, LLMs and Agentic tools.
Video 8: Takeaways and Review