
The Pre-Doctoral Path
Stephen (00:02):
So to start the pre-workshop today, what I'd like to do is make sure that everyone here is on the same page as to what pre-doc are, why they might be of interest to you. And even speaking a little bit more generally about what a research career in quantitative social science looks like. There's a lot of hidden curriculum in this process. There's information that everybody should have and be aware of so that they can be sure that they are making their decisions and applications strategically. There are also some other pieces of information that we just want to make sure are widely distributed to people, some myths that we want to disabuse people of as well that are out there about what a research career looks like and what it might look like in particular for you.
(00:46):
So today in particular, we'll be talking about what quantitative social science even is, what questions comprise it, what academic areas. We'll be talking about what a research career looks like, give you some idea of the day-to-day of what that might look like for you, how to prepare for it, what the different pathways to a research career are, and then switch to uppercase pre-doc pre.org, making sure that particularly the resources through our organization, what they are and how you can take advantage of them. And then as I said, you are welcome to put questions in the q and a. As I'm talking, I probably won't be able to check that as I'm presenting, but afterwards I will be sure to take your questions and answer as many of those as I can.
(01:31):
So let's start by defining what quantitative social science is. This is an area where I think we're in the disusing people of some of the myths out there because I think when people think about quantitative social science, they think not unreasonably about economics, which certainly counts as one of the major quantitative social sciences. And when they think of economics, they start to think about the stock market and to think about banking. There are economists who think about the stock market. There are economists who think about banking, but it is hardly what most quantitative social scientists are working on. It's an incredibly broad field that asks very relevant questions to what people care about, ripped from the headlines, sort of issues that can make a real impact in the world on questions that matter for our wellbeing. So I've put five questions here that I took from actual research projects that I'm familiar with, people working in economics departments, sociology departments, business schools, and I'm just going to read through them quickly here to show you some of the diversity of topics and questions that people are addressing.
(02:30):
How can we get people to adopt clean energy technologies? We have technologies and methods out there that we know can address important concerns about climate change, but people aren't doing them. How can we present them in ways that people are actually going to adopt these solutions? How do we combat fake news, especially in the age of generative ai? Why is youth us healthcare so expensive compared to other similar economies? How do we fix it since it is, can artificial intelligence make better decisions about criminal justice than judges? When we think about things like bail decisions or recidivism, how does poverty affect people's ability to think, affect cognition? These are not necessarily questions that I think a lot of people would consider when they're thinking about your standard economist or quantitative social scientist, but these are very much the questions that are being addressed and the currents of issues, passions that are being pursued by people who are in research careers in these fields.
(03:28):
So when we're thinking about the academic fields involved, especially if you're thinking about it through the lens of your own undergraduate major, yes, you're going to find economists, but also psychology, operations, finance, econometrics and statistics, accounting, marketing. These are all areas that are contributing in different ways to quantitative social scientists. And I will assume many quantitative social scientists see fairly low barriers between these different areas. Economists are regularly taking ideas from psychology and applying them to their models. Sociologists want to take quantitative approaches, so they're talking to statisticians and econometricians so that they can have sort of causally relevant results as they're putting together their research and their results. So these are not sort of very narrow and isolated fields. These are people in a very large ranging conversation with each other all trying to address the major social issues of the day.
(04:23):
So say you are interested in pursuing answering questions like these, what does the career actually look like? What are your options? It's a highly entrepreneurial field. This is a great sort of area to get into. If you are a smart person who doesn't like being told what to do, take a faculty member for instance. They are not coming into work every day working on their research because they have a boss who's telling them to because they need to get a report filed by Friday. They're getting up in the morning defining their own questions and pursuing the things that matter to them because they matter to them. And that's a sort of a strong entrepreneurial theme to the sort of work that we're talking about. So you can break it down the way that many entrepreneurial sort of careers are broken down. The strategy here would be coming up with your own research questions.
(05:10):
No one is going to tell you what to pursue. A hugely important part of being a researcher is the ability to come up with important questions and convince other people that those questions are important. Just as answering those questions is there is an operational component as well. If you go back 15, 20 years and look at the quantitative social scientists, it was a much more sort of isolated experience. You might have two or three other people you're working with, often your peers sort of at the same level as you in your professional development all working together on a project the past 10 years in particular, that's changed in some pretty dramatic ways. We're actually going to talk about that more when we talk about the advent of lowercase pre-doc, but much more than it used to be. Quantitative social scientists are working in teams, working in labs.
(05:59):
The questions that they're asking have gotten so ambitious. The data sets that they're working on have gotten so large, the compute they have has gotten so powerful. They need a certain number of hands on keyboards just to wrangle everything that's going on and wrap their arms around the questions they're trying to ask, which is very exciting that we're able finally to answer and tackle those questions that decades ago just had to be theory. But it also means for a different mode of working than we've seen before and an importance of being able to work with other people and work in teams. There is a product that you are producing as part of a research career, often in a purely academic setting. That's a paper that you're going to publish in an academic journal. But many people are also interested in reaching out to policymakers. So they may be writing op-eds, they may be writing white papers trying to get industries or policymakers to adopt the answers that they've come up with.
(06:53):
If they're doing very applied research, they may be producing data sets or tools that will be used by many people in education or again in policy. So there are different products that people are creating depending on the point of their career that they're in and what their interests are. Marketing is an important part of all this because once you have created your product, you need to convince other people that it's usable and relevant. And that's not just that you got the right answer. That's important. You do need to be able to prove that you've thought rigorously about how you're answering your question, but you also need to convince people, as I said before, that the question you asked is important, that you found something that's impactful, whether that's in policy or in social science more generally, the audience you choose is up to you. You may want to convince other academics if you yourself are an academic, that the question that you've chosen is important and other people should be working about it and on it and thinking about it the way that you are and maybe that you're trying to sort of cross the line out of academia into the real world and saying, I've found an important consequence of some policy.
(07:58):
I need people to pay attention to it. I think we need to make an important change that will improve people's lives. There are different pathways that that can take depending on the audience that you choose, regardless of the pathway that you choose. The bottom line of this career is that you are trying to create ideas of enduring impact, and that impact may be with other researchers. It may be in the real world and affecting real people, but it's about this creation of ideas that have enduring impact. I hope I've been able to convince you about some of the personal benefits of pursuing a research career and why it might be rewarding. There are also more material benefits. I think one of those myths that I was talking about earlier is that if people sign up for a research career, particularly in academic research career, they need to sign on to some sort of vow of poverty.
(08:46):
That is not the case in these quantitative social science fields. I mean, are you making fund manager at Goldman money? Maybe not. I don't expect many dollar bonuses for faculty. But as you can see here on this figure from some of the faculty salaries that are common in business schools, people are hardly struggling. It is a comfortable life and you can expect to be making good money as a professor. And that's sort of think of that as a floor. If you're going into some of the many industry opportunities or even some of the policy opportunities for people doing research, I'm fairly confident they'll be able to beat academia.
(09:25):
So let's talk about some of the pathways that people take to these research careers and in particular in this figure, if we go back in time, I'd concentrate on this longer line here, going straight from the bachelor's to the PhD. This used to be the most common pathway. People would finish a bachelor's degree if they felt properly sort of prepared at that time, they were curious about getting into research. They wanted to take some next steps. Maybe they'd done a few part-time ships as an undergraduate, they would just apply straight to a PhD and many people would get in that way and many people still do if they felt they needed some more preparation. There weren't a whole lot of opportunities on how to do that. Often you'd go for a terminal master's degree, which can be a very good option, but tends to be very expensive.
(10:07):
People who were interested in very applied questions often would go into industry and then come back and that pathway also still exists. So all of these are still very valid pathways. There are still people who go straight from bachelor's to PhD, still people who take the master's route, people who work in industry perhaps to discern what they really want to do to make some money, to get access to some real world, world data and figure out how things work out there and then go back to their PhD. What's relatively new past 10 years or so is this tile here at the pre-doc tile, and that's what I want to make sure people are aware of and talk about a little bit more. A pre-doc, lowercase pre-doc is a full-time research assistantship that typically lasts for two years and is post-baccalaureate. So once you have earned your bachelor's degree or your master's degree, you then go on and work as a full-time research assistant, often for a professor at a university, sometimes say for an economist, maybe at the Federal Reserve, it could be in a think tank.
(11:05):
So there are a few different options, but you're working for a PhD researcher and it's very much in this apprenticeship style model where you're able to see the whole research process contribute to it substantively, and that helps you both to prepare some skills and for you to discern whether this is in fact the thing that you want to do. So of all these different pathways, why might the pre-doc pathway be the one that you want to pursue? It's a chance for you to preview the research environment. Thinking about a research career is not a good way to figure out if you want to do it. This is not something where you can sit back and try and think about abstractions or just talk to people, read about it to figure out if you are well suited to a research career and if it's something you want to do, you really have to just dive in and do it.
(11:47):
And that's how you find out. If you have the passion, the aptitude, figure out what exactly it is that you might want to do as a researcher and a pre-doc is a really good way to do that. If you do this work for two years, you're going to know one way or the other if it is something that you want to keep on doing, and you'll be able to see that work being done at a very high level from the inside. In this apprenticeship style model, gaining particular discreet skills that are valuable for graduate school, especially coding skills. Many pre-doc are working on highly empirical projects and doing data cleaning and analysis, but those skills unsurprisingly also translate well to industry. So if you do figure out as part of doing the pre-doc that going on to a PhD isn't what you want to do, you still gain valuable skills and crucially, this is all while getting paid.
(12:31):
Pre-doc are jobs, you are hired into them. That's sort of the contrast to the master's degree is the other pathway here to get some more experience in a master's program, you are paying your school tuition to get that degree. In this case, your school is paying you to work for them. It is an important way to gain some credentials for the PhD application process. I said before, and it's true, it's not necessary to do a pre-doc in order to get into a PhD program. For many top PhD programs, it is becoming more and more common to take the predoctoral route. I'd say more students than less who are matriculating to these programs have done a pre-doc. And in particular, it's a useful way to get some letters of reference from well-known professors for the PhD application process. A letter of reference really has an outsized importance in the PhD application process.
(13:23):
So being able to break into that network and have a letter writer who really knows the other people serving on other admissions committees at these PhD programs can be really important and useful. So as I said before, it's a particularly good idea to do a pre-doc if you don't yet know if you want to do a PhD, this is a great way to find out and discern. It's also good for people who maybe are confident that they want to do a PhD, but they know they need a little bit more preparation. Maybe they need a few more letters. This is a wonderful way to get those sorts of credentials.
(13:54):
So how is it that you prepare as undergraduates to apply to these sorts of positions? There's going to be variance depending on the roles that you're applying to in the faculty member or PI principal investigator that you may be working for, but there are some commonalities. So you'll need some level of coding ability. The big three languages are RS data or Python. And what I want to emphasize there is the or because I often hear people ask me because they'll see those three in job descriptions for instance, and they'll say, do I need to have mastered all three of these languages? And typically the answer is no. When I'm working with hiring managers, they want to find someone who has deep competence in one of these languages because typically if you have that, you can pick up the syntax and everything you need from another language relatively easily.
(14:39):
If the person you're working with say is a data user and you have the most experience and are so I would not advocate becoming a jack of all trades here. I would advocate becoming a master of one. You will need some statistics in math background, probability, econometrics, some skill and causal inference. You'll probably be doing a lot of data cleaning and some analysis, running regressions and these sorts of jobs. So those should be things that you are comfortable with in terms of the amount of math that you need. Again, there's the potential for variance here, but one thing that I want to sort of tease apart is I know that there are a lot of, I don't want to say rumors. It's true especially for people applying to PhD programs in economics, some proof-based math is important. Real analysis in particular is a real strong signal sending course and I think the misconception out there is that people need a similar level of mathematical preparation to be applying to pre-doc, and that's not the case.
(15:37):
You do not need to have taken real analysis to be ready for most pre-doc that are out there. Nobody on your first day is going to hand you a page of partial differential question equations and ask you to solve them. Indeed, I know many pre-doc who have gone the pre-doc route because they haven't had a chance to take real analysis. They didn't have enough time during their undergraduate to do so and they want to take it at a reduced tuition. As a pre-doc, it's a very normal course to take. So don't count yourself out because you do not yet have that experience. Doing the pre-doc can be a way for you to get that experience. Hiring managers often want to see that you have some kind of research experience. If you have a conventional part-time research assistantship with a professor, that's great. That's sort of the gold standard. It's very easy to explain that experience and why it's relevant to applying to a pre-doc. But there are plenty of other pathways where you can demonstrate that you have research experience. It might be an internship in industry that's research intensive, especially if you're working for somebody with a PhD. It could be advanced coursework that you've taken that maybe had a paper that you had to write at the end that could be a writing sample you could use to show that you've done some research.
(16:52):
And then if you're able to do a degree thesis, like a BA thesis, that's a wonderful way to show that you have some research experience and that can be a really good way to demonstrate that. Another area that I want to talk about that's been getting more traction as the years have gone on is some level of professional skills and especially in replication, as I mentioned before, pre-docs more and more are working on teams, working in labs. So to the degree that you can show that you can effectively work in a team, that you can communicate in a team that you can document and comment your work effectively if you have experience using Git or other management software, those are all really important things as more and more PIs are finding that they need to turn over their teams and say, oh, someone just left me a Dropbox folder without any commented code in it and I need to spend two months figuring out what this was.
(17:49):
So those search of skills are also increasingly valued in the application process. So where do you get all these skills? I imagine you can get many of them locally at your own institutions, but there are some other programs that I think you should be aware of. I'm not going to go through everything on this list. And if you see the link at the bottom on the preoc.org website, we keep a list of these sorts of programs that people may want to apply to and we try and keep that up to date so that people can find opportunities outside of our own organization. But take a look at these. The Expanding Diversity and Economics program at the Becker Freeman Institute is a wonderful summer program that I think will start recruiting open their application again in the fall. And then that's sort of if you're at the very early stage, just starting to think about a potential research career if you're on the other end later in your undergraduate experience and you need a kind of accelerator to get you ready. The American Economic Association Summer Training Program is another excellent resource to get you up to speed and get you skilled up.
(18:50):
There is some strategy to the application process to pre-doc. There is some information here that it's important that as you're considering how and when to apply, and I want to make sure that that is all clear to you. So on pre-doc.org we have our opportunities page where you can find full-time opportunities. There are still a few that have a summer 2024, so sort of a now start date. But generally speaking, we are not in one of the application cycles right now. There are two main application cycles. There's one in the fall and there's one in the spring. The one in the fall lasts. I start to see stuff coming up around early to mid-September. Decisions being made by November, maybe early December to start the subsequent summer. So these are people who are applying in the final year of their degree program. And then there's a smaller cycle that runs in the spring that's starting around late February for most people and people are really trying to wrap up by April 15th on that one.
(19:49):
The first one in the fall, I noted that it's larger. It also tends to have more established faculty in it. That's when you're going to find most positions for faculty or faculty and economists, other positions that are already established and looking for work in the spring. You often find that's very common for brand new faculty. So people who've just been hired, they're finishing their PhDs. They haven't actually started at the university yet, but they have been offered somebody to work with them and they need to do the hiring before they get started in the summer. So those tend to be younger researchers earlier in their careers. There's no harm in applying in both cycles if you need to. One of the strategies to consider, I know some people are interested in applying to PhD programs and then applying to pre-doc to hedge their bets on the PhD applications.
(20:39):
If you apply to pre-doc in the fall, that's not really going to help because you're going to have to make decisions on any pre-doc offers before you've heard back from PhD. The PhD programs tend to send out their notices in the spring. If however you apply in the spring cycle, you could ostensibly apply to PhDs in the fall, apply to pre-doc in the spring and have both pieces of information at the same time to help you make that decision. So that is an option as you're considering in which cycle to apply and how as you are applying. One common aspect of the application process is a data task. There's a going to look different from employer to employer. Sometimes you'll have one data task to complete that makes you eligible to work with many different faculty at an institution. Sometimes individual faculty will have a very bespoke task that they want you to complete that's particular to the research that they do.
(21:29):
We have a few practice tasks with suggested solutions up on our website. And of course there is the one that you had the chance to review for today that we'll be doing a walkthrough of in RN state at later today. Depending on how your materials are reviewed and how you do in the data task, you may be invited to an interview. Hopefully the people interviewing you will give you some idea of what to prepare. But if they don't, if there was a data task, you should be prepared to discuss that data task. You should be prepared to discuss why you made certain decisions and assumptions if it's been a few weeks since you submitted it, these processes can take a little time, go back, review what you submitted, be ready to talk about it, be prepared to discuss your own work. If you had a writing sample, if you were talking about a bachelor's thesis, anticipate getting questions on those.
(22:18):
And if you have the opportunity, this is potentially a lot of work depending on how many interviews you have, but if you know the person who's going to interview you and you can take a look at some of their work, that can also be a real bonus to you in the interview process, particularly looking at their working papers. So often you'll see that their websites are divided into their published papers and their working papers. The published papers are done. There's no more work being done on those. The working papers are what you as a pre-doc would be more likely to contribute to. So having even some cursory knowledge of that project could be useful to you in the interview if you have the time to look at those things. If you are in the happy position of having multiple offers, a couple of things that I always suggest, talk to current pre-docs in that program.
(23:02):
They're going to have valuable feedback for you and talk through logistics before you sign on the dotted line. If you need extra support if you're moving or something like that, that's something you want to talk about before accepting the position. While you still have some leverage, I want to talk briefly about the application to the PhD program. This is not something we're going to go into a huge amount of detail today, but I do want to talk about it a little bit what the PhD actually looks like. So one misconception out there is that the PhD is five to six years of coursework and then you graduate. There are actually only about two years of coursework divided differently depending on the field that you're in. But often there's some set of exams at the end of year or year two that you need to complete to sort of advance to candidacy as they say.
(23:49):
And once you've done that, you're working much more independently. You aren't taking classes anymore. You can if you really want to, but you're sort of done with that face of your career. You are now acting as a junior researcher. You are working on your own questions, developing them, answering them, finding colleagues to work with as co-authors, collaborating with faculty, collaborating with other students, and then preparing that product that you're creating these papers that you're writing for the job market, academic or otherwise. So most of your experience as a PhD student is actually sort of working as a junior researcher more than sitting in a formal classroom and taking classes. In the PhD program, you should anticipate full tuition support and a living stipend. So this is substantively different from many undergraduate and master's programs where you may need to take out loans to pay tuition. In this case, you are being paid a stipend and you should not be paying tuition except maybe some small fees once you reach the dissertation stage, the end stage where you're acting more independently to establish your residence. But those are sort of pro forma fees and you have the stipend to offset those.
(24:56):
Switching over to uppercase pre-doc, the predoc.org consortium. There are a few programs I want to make sure that you're aware of. We have a number of educational initiatives that we have. The main one is our summer course. It is a synchronous remote online course that we've been running the past several summers. An opportunity for people to be exposed to the breadth of social science research, develop some skills particularly in state of coding, have some professional development. And then there is a capstone conference in-person conference that we run in Washington DC where people have the chance to present their projects. We just finished that for this year. We have a number of asynchronous courses that we run some on preparing for the GRE an exam that's often needed for PhD applications, some on getting the basics of coding and RN data. We're working on putting up a Python module as well.
(25:45):
And the thing you're in right now, the pre-workshop, this is one of our major initiatives. I mentioned our page on full-time research opportunities. We also have a page on part-time RA ships if you're looking for something outside your own institution. We have regular office hours and advising. So we have PhD students, pre-doc faculty come in and basically do an "ask me anything" for an hour on Zoom. So those are things you can very easily sign up for and address your questions to different audiences and our website in general, we've done our best to put a large amount of information on there that we hope will be useful to you as you consider pursuing a research career. So if you haven't already, I recommend that you sign up for a newsletter. So you hear about all these opportunities, know when to sign up.
(26:30):
This QR code will take you to the signup sheet. And with that, I see that there has been activity in the q and a, so let me open that and awkwardly try and read and talk at the same time over the next couple minutes before we get ready for our next panel. So there's a question here about the postdoc between PhD and assistant professor in the quantitative social sciences. I wouldn't say it's nearly as common as in the physical sciences that you get your PhD and then you work as something called a postdoc for a few years and then you go on to work as a professor. It does happen sometimes. Sometimes I've seen people who are doing so well on the academic job market that they have the option of doing a postdoc at some prestigious place and then they can wait a couple years before they start their faculty position. Some people want to do the postdoc because they need a couple years to finish up their projects before they go onto the job market. So they exist. But I wouldn't say that they are a common feature of the quantitative social sciences, at least not yet. We'll see how this sort of lab structure continues to develop. Continuing to skim here. Oh, this is a great question. So people who do a pre-doc and then decide that a PhD is not what they want to do, what is it that they tend to go on into?
(27:48):
We call these positions pre-doc for a reason. Many people do them with the intent of going on to a PhD, and most do in fact end up matriculate to PhD programs. But there are always some every year who figure out that this is not what they want to do. And that's one of the benefits of doing a pre-doc, figuring out in the second year of your job and not the fourth year of your PhD program, that this actually wasn't what you wanted to do. For people who do go on to say industry positions, often they're going on to consultant, they're working in data intensive fields. What they tell me when I follow up with them afterwards is they say, I have maybe a slight edge in terms of my technical skills, just in terms of the practice I've had the past couple of years. The thing that I really have that my colleagues don't have as much is the ability to be critical about the tasks that I'm given.
(28:37):
The other people are often just sort of regression running machines. They're given a ticket, they fulfilled the ticket. I as someone who have some knowledge of the research process of setting up experiments, things like that, I can actually see the question behind the question and I can do what the ticket tells me to do, but then I can be critical about my results and sort of take several next steps. And that's a real advantage to me in these sorts of careers. So the pre-doc experience is not built to prepare people for industry roles per se, but often the experience that people get ends up being valuable to them. I'm going to answer, I think one more question here, and the rest I will do. I will type out answers to just because I want to give a couple minutes pause before our next panel. And I'm seeing here a question.
(29:23):
Are pre-doc open to international students as well, or do they require citizenship status? This is something that you should check with individual employers. So generally speaking, most employers are able to support visas of some kind, but that will vary from institution to institution. Maybe it's only OPT and the stem. Maybe they can do OPT but not the STEM extension. So then they flip to the J one. Maybe people do the H one B or don't do the H one B. So these are things I'd encourage you to check, particularly if you're interested in working at the Federal Reserve. There are some federal reserve banks that are able to sponsor visas. There are some federal reserve banks that have in the past and don't anymore. I know there's one that's having conversations about whether they might bring that back. So it changes from year to year, and I would say it's important to check. So I know that there are a good number of other questions in here. I will start to type in answers and make those available to folks afterwards, but let me know. Let's take a 92nd pause for anybody who needs to stand up, stretch their legs, get some water or something, and then we can get our next panel started.
How to Apply to a Pre-Doc
Rupsha:
Okay, great. As I mentioned, Pascal will be on in around 15 minutes, but we're going to get started without him. Hi everyone. Welcome to the pre-workshop. My name is Rupsha Debnath and I'm one of the coordinators for the workshop this year I graduated in economics from UC Berkeley in 2022, and I was a pre-doc for Peter Ganong and Pascal Noel. We will be on this call at Chicago Booth for the last two years and starting this fall, I'm going to start my PhD in economics at Harvard Econ, thank you for joining us today and at this workshop and for this first panel, this panel is focused specifically on how to apply to and prepare for a pre-doc position. All of the panelists here today are well versed with how the application goes, either on the administering or the hiring side as John and Pascal or on the applicant side like Theo and Austin, and I'm sure it'll be a very insightful conversation. So I'll ask a couple of questions to get us started and then we'll turn to the audience from the q and a chat box, which you can put your questions into and field. A couple questions from there. To begin, what are all your names? I might've given a little bit of a spoiler, but what are all of your names, your current titles? If you could introduce yourselves, where do you work, what program are you affiliated with and what your interests are? John, can we start with you?
John:
Yeah, I'd be happy to start. My name is John Johnson and I lead research support. I'm the senior director for research support at Northwestern University's Kellogg School of Management. That's our business school. So there are a lot of research enabling activities that go in that, but one of our key activities here is to help our faculty both to recruit a cohort of pre-docs every year and we also develop a lot of programming to help make those pre-doc a lot more, I guess, improve their productivity and things like that while they're here.
Rupsha:
Theo?
Theo:
Hi everyone. My name is Theo and I am a pre-doc at Booth working for professors Pascal Noel and Peter Ganong. So this is how I know RHA at Booth were called RPS or research professionals, but it's similar. The title is the same as kind of what a standard pre-doc would be at a lot institutions. I majored in environmental studies and political science, and so I came into economics and pre-doc in a way that was maybe a little bit different than a standard approach where you take a math and economics undergrad experience. And so I'm happy to answer questions and field questions about what that was like. If anyone on this call is not an econ major or not a mathematics major.
Rupsha:
Austin.
Austin:
Hi everyone. I'm Austin. I did a bachelor's and master's at the University, university of Toronto in economics and then a two year pre-doc at Opportunity Insights. Now I'll going to the PhD program at Harvard. Great.
Rupsha:
Pascal, who's just joined, you just missed one question that was introduce yourselves, so great timing. If you could introduce yourself, what program you're affiliated with and what your ESS are.
Pascal:
Great. So I'm Pascal Noel and I'm a finance professor at the University of Chicago's Booth School of Business. I think that's it. And have had a group of awesome pre-docs who have worked for me, including Rupsha and Theo.
Rupsha:
And as for the panel today, I'm arranging all the questions in a way one would approach the application process. So say I've decided to apply to a pre-doc position, what's next? Where could students start looking for available pre-doc postings and when should they start applying to these jobs? And very specifically for in Austin, if you could explain how that process went for you. Theo, do you want to go ahead?
Theo:
Sure. So again, I mentioned I came at this from a little bit of a different background than maybe others had come from. So at the end of my senior year, I'd already decided on a job that I was doing from my year right after undergrad, which was not a pre-doc, it was actually an education role, but I near the end of that time, decided I wanted to do a pre-doc and wanted to think about going to economics graduate school and had spoken to some people and I began the process in earnest, I would say in the fall following my graduation. But if you are in undergrad the fall of your senior year, starting by just compiling a very large list of programs that I would be interested in now so that there's a lot of places that you can look for. That one is free doc.org, but also there's a huge sense of resources on Twitter and econ_ra. There's also a bunch of faculty resources, so if you're at an institution that has faculty, they often know about them. So tapping into your network of people is a great way to both find opportunities that people who trust you in your life know might be good for you.
So I created an incredibly long list. I think my rule was if I could see myself doing this for two years, I would add that to this long list. You definitely don't want to apply to programs or things that you're not interested in. So there was some internships that in industry that I was not super interested in and that type of thing or specific research areas I wasn't interested in. And then it was just a process like in the fall of, I mean creating this incredibly large or large spreadsheet of existing pre docs and then all the pieces that are required. So there's a lot of overlap between what's required from pre-doc to pre-doc, but making sure that you have all of those pieces set up is I think really 60% of the game. My understanding is that there's a s another cycle in the spring, but I would say the primary timeline for me was getting started in September and thinking and seeing starting postings and then applying in October and November and then getting data tasks and whatever as those go on. And I'm sure we can talk about that a little bit more.
Austin:
Thanks Theo, can I continue?
Rupsha:
Yeah, go ahead.
Austin:
Okay. So my approach was different from Theo's. I think I prefer Theo's approach experts. What I did was that I already knew I was interested in a PhD and so I applied to it. Every single pre-doc position I came across that I think I exposed was not a good idea because first of all, in my experience the people who are evaluating the applications kind of look for fit. So for instance, my undergraduate education, I kind of focus on course and empirical micro. So when I applied to macro positions, I didn't hear from them at Bob. The other thing is that the data task can take quite a bit of your time. And so if you end up applying to a position that in my experience, if the position is already one week older or more, I basically have 0% chance of hearing back from. That's for positions posted by individual professors. If their description is at my institution, for instance, by the Chicago program or the Stanford program, then they tend to have clear deadline and it's not so important that you have to apply as quickly as possible.
Rupsha:
Great. For Pascal and John, how many programs you see your pre-doc or the pre-doc you've seen in your programs apply to? And as Austin said a little bit about research fits, how does that play into what are the kinds of positions people are applying to?
Austin:
You want to go, John?
John:
Yeah,
Pascal:
I'll go. Yeah, that's great. So I would defer to Theo and Austin about questions around how many programs to apply to or people typically apply to because I don't have a great sense of that, but I think it is quite a lot. And in terms of research fit, I would say, so I help run the applied micro search out of Booth. And so even though I'm a professor in the finance group, a lot of my research is across disciplines and when we search, we're searching together with people in the micro group and the macro group at Booth. And so really looking for people that have a broad range of interest or we're hiring for professors that have a broad range of interests. And then in terms of the particular fit of the candidates with the positions, there's basically two levels. The first level is we just want to know people's background and in general their interest and ability to do a pre-doctoral position.
And once you get past that, then there's a little bit of allocation of if somebody is more interested in household finance and they're more likely maybe to end up with me. And if somebody is more interested in industrial organization, they're probably more likely to end up with another professor in our search. And so I think that it is important to at least have some sense that you're going to gain some tools from the particular pre-doc position. But in general, I don't think that it is necessary that the pre-doc wants to exactly do the particular topic that the researcher is working on, number one, because the researcher's topics are going to move around over the year or two of the position. And also I think the biggest thing is to gain some skills and those skills are going to be almost always broadly applicable regardless of just the particular research question that was being tackled in the position.
John:
Yeah, I think I would echo a lot of what Pascal has said there. With our group at Northwestern, we recognize that it is kind of fared out there for graduates and people pursuing pre-docs and PhD programs, and we've intentionally adopted this strategy of trying to make it as easy as possible for people to apply. And in part it is a recognition like Pascal says that with some exceptions, most of the positions that we're fitting for we're looking for pretty generalizable skills. There may be some situations where someone really insists that a certain type of statistical model or be able to program in a particular Python framework or something like that. But in general, most of what we are looking for on behalf of people are things like intellectual curiosity, evidence of resilience, attention to detail, a lot of what goes into, so we will offer a data exercise that comes sort of partway through after we've done some interviewing and we try to do exercises that we'll see, okay, so how careful are you when you're confronted with data that you've never seen before?
Do you build, do you not just do what we ask you to do but build some tests in and things of asked sort? Because a lot of times I think the people that we hire end up being surprised. They thought they wanted to go into Applied Micro and then they discovered that there was a position in accounting that turned out to be rather interesting for them and a good growth opportunity for 'em. I think the other thing I'll just throw out there is I know you applicants are quite diverse from each other and there's a significant number of you I'm sure who are interested in a field like economics but may not have nailed down your area of specialty yet. And you want to use this time to explore and maybe you work in one area, but you go and attend seminars in another. I think those are just wonderful things to do with this time that will ultimately help you when you do have to make a decision about where you're going to specialize.
Rupsha:
Great. So that was sort of step zero of the application process, figuring out what to apply to. Let's move on to preparation prior to the application. And I'm seeing a couple of questions in the q and a about this already. Let's start with Pascal.
Speaker 6:
This is the advisory council for our parents to ensure safe on campus. Please use the QR code on the back.
Rupsha:
Sorry, not sure what that was for Pascal and John, what courses do you look for on pre-doc applications or more rephrased in another way? What's unsuccessful applications? What do you see as for math courses, econ courses? Yeah,
Pascal:
I can start again. The first thing we look at is overall GPA and because people will come into pre-doc positions with different types of experience and somebody who's been able to be successful, even at an unrelated class like say it's German or something like that, shows some amount of grit, some amount of ability to follow directions and work independently. And so that's really important. And then it's always nice to see the most largest amount of math and econ classes, but I would say those are not necessary. We've had fantastic pre-doc who actually didn't have very much math and econ, but had wanted to pursue this more and took some math and or econ classes while they were pre-doc, but were doing this as a way to learn about whether this was something they wanted to do. And those candidates invariably will have had strong records in other areas of academics.
I think the first point is that if you're late in your academic career and you decide you want to do this, then it's not for us. And I think maybe for others it's not. There's some check boxes where you have to have had X, Y, Z classes, but if you're early on and you want to do your most to make yourself as attractive as possible or want to figure out what kinds of classes you would need to take if you wanted to go to grad school as a way to figuring out whether this is even a good path, then going through intermediate, micro and macro and taking if possible, the more math intensive versions of those classes as well as econometrics. And then on math, probably the two most important ones above multi-variable calculus would be linear algebra and real analysis.
John:
For our part, we look at your transcript and we look at it carefully, but there are no binary decisions about we're looking for certain things or not. And I think part of that is, as Pascal says, some people know from their first year of undergraduate that they're passionate about economics. And some people like myself, I discovered that late in my third year of undergraduate, and so my transcript would look a lot different than someone who made that decision early on. I think that the thing that we are, the attributes that I mentioned earlier about genuine intellectual curiosity and resilience and the general good work ethic, we look at the transcript through that lens. And so if on your personal statement, which we ask for and we put a lot of weight on, probably more so than the transcript, if your personal statement says one thing and your transcript says something else that raises questions.
There are certainly plenty of successful candidates that we have where I say, okay, this person looks pretty good, but they're going to need more advanced math before they apply to their doctoral program. And often that's as much about us preparing to advise that person if they end up working with us than it is about selecting. We'd rather have this person who's already had the math and this person who hasn't, we're thinking about like, okay, if this person comes here, then they're definitely going to need to take real analysis or linear algebra or any number of things like that.
Rupsha:
And same question, but from the other side. For the, and Austin, what courses did you find to be the most helpful during your RA experience? Were there courses you took beforehand that were super helpful? Not the courses you were taking while you were an ra, this is pre RA application. What courses were most helpful during your RAship?
Theo:
I can take this one, I can take it first. The two things I'll say, which I think are maybe surprising answers are first that RA ships have a lot of writing in them, even more so than I think they have math in them. And so communicating well and taking classes that allow you to communicate well are important and have been crucial to I think my ability to work as a pre-doc. And so I don't think it's worth discounting them. If you're a person who loves math, that's great, but there should be also some ability to communicate in writing particularly, but also verbally. And so those would be, that's the first surprising answer. The second thing is I am in a pre-doc and I had a year between undergrad and this pre-doc, and I think you can, and if you have teen, fewer math classes, use that as an opportunity to learn more.
The point being your education doesn't have to end when you get your bachelor's degree, you're going to do a PhD, it certainly won't, but you can continue to do work that will prepare you for your pre-doc after you've graduated. So I did a fair bit of work, learning how to be a better programmer and also learning more econometrics in that year between, and I mean, I don't know if that was helpful in terms of my application, but I do know that that was helpful in terms of me doing the job. And so we're all in different situations, but I think beyond just the math that you can take in undergrad, those are the two things I'd say that maybe other people might not tell you.
Austin:
Thanks. For me. Yeah, echoing to, I'm not sure if particular courses were important for my application, but in terms of doing the job, I found courses on causal inference really helpful. So in particular, if your university offers a course that involve the textbook, mostly harmless econometrics or something at that level, that would be really helpful to take. Other courses that I found really useful were courses that required me to write a research paper from start to finish. So this means collecting my own data, write, doing the analysis myself, writing the results myself. It can be a term paper, it can be a thesis, but basically that process helps you really understand the complications of research and that I think it helped me to add value to my job.
Rupsha:
Great. So Austin mentioned this a little bit, but how does, if it does independent research or even previous RA ships make a difference in the application for Theo and Austin's perspective? Does it make you better at your job or make your application a stronger? And from Pascal and John's perspective, are you looking for a lot of independent research or RA ships under faculty in their undergrad institutions, or does this not matter as much? John, do you want to go ahead?
John:
Sure. Yeah, I guess I don't mind saying, and this may be a bit of a provocative statement, I don't think we care so much about your RA experiences and we certainly see that among applicants broadly, there are a lot of, I mean, I think maybe our median applicant had something like three different RA jobs at some point over undergraduate or perhaps a master's degree that they've done. And it's so hard to evaluate across the different things. I think what we focus on instead, we're very focused on what do you bring to the table now? And that could be a skill that you developed and honed as a research assistant, or it might be something that you got through some other means. Maybe you were fortunate enough to have courses with really wonderful instructors who taught you some good programming habits or things like that. But we're often thinking about out what is this person prepared to do on day one?
And I think as Theo mentioned, communication is a, I think an often under-appreciated attribute particularly you obviously need to be able to express yourself very clearly in writing, but I think one of the things that we're looking at throughout the interview process is we're trying to simulate late what does verbal communication look like? How well do you understand what we're really asking? And in particular a thing that I think people maybe don't think about so much when you go into an interview and often people are thinking, I need to display to you all of the things that I know, that's what we're after. And in fact, often what we are after, we're trying to engineer situations where we might challenge something that you think or we may offer feedback on the data exercise that you submitted, and we want to see how you assimilate that feedback. Do you take it well? Are you defensive about that? Are you able to understand what we're saying and translate that into like, oh, well if you do it that way, then here are the implications of that. So I would encourage all of you to think about how and whether that's through a research assistant or any other experience, can you find experiences that will allow you that kind of rich practice with verbal communication?
Pascal:
I totally agree with what John said. We also don't look much if at all, prior research experience particularly, but we care about whatever experience you have and what you've learned from it. And I would say in terms of advice that the reason probably to do an undergrad RAship is to learn about whether this is something that you would enjoy doing and want to spend a year or two of your life doing, and then possibly more if you go onto a PhD rather than as preparation for a pre-doc job.
Theo:
So you mentioned two things, independent research and RA ships, and I was thinking, I think independent research is definitely underrated and sorry. Yeah, and RA ships are overrated in that your job if you are an academic is to do independent research, I mean with collaborators, but to do research. And I've done some independent research this past year and I think it's been hugely helpful in terms of thinking about if this is something I want to do and also hugely helpful in thinking about how research is done. And so if you have the opportunity to do it, I would recommend doing it and I wish I had done it. It's more an undergrad. And I think also if you are thinking about being interested in economics, focusing on an economics topic is a good way to do it. I didn't really do a lot of standard and I think the experiences that I had, which were some working on a political campaign were really helpful to this job just in terms of existing in an office space and then also a lot of the communications skills and yeah, office general office skills.
I think I learned in those jobs almost as well as I would've in an RA ship. And my other sense is that my friends who did ships have really different undergrad RA ships than what a pre-doc RA ship is just because typically, and I went to a smaller liberal arts school, so the nature of research was different. They're working for just a summer and there's really only so much that can be picked up over the course of the summer in terms of onboarding onto complicated projects and really making an impact. And so to go back to what I would say again, I would really recommend independent research if you have time and if you're thinking about doing a pre-doc and I would say broaden your horizons in terms of what you want to do with your summers. You can do an RAship if that is an undergrad RAship, if that's exciting. But also don't think that that's the key to success or learning about or the only way to learn about what it's like to be.
Austin:
Yeah, I agree with everything that's been said here. I think both independent research and I are very helpful, but not essential. Lemme just say what I think both my experiences in these domains brought to the table. I think for I it's useful to try to, if you're at a point in your undergraduate career where you can still do ship, so for instance, if you are a senior, that's maybe a bit late if you're thinking about applying to pre-doc at this point, but suppose that you're earlier on in your undergrad career would try to find ship positions that where you can learn something and add value. For instance, it's a position that you're basically doing data entry is perhaps not a great use of your time, but it's a position where you're actually engaging with instance, the research design, data processing and analysis design. I think that's very helpful and mirrors what you'll be doing in a pre-doc. I'll also encourage you to do independent research. As you has mentioned, this is really useful in trying to get to understand whether you enjoy doing research at all. If at the end of your undergrad thesis or some people you're like, okay, I'm glad that it's done and I didn't enjoy that, or then perhaps this is not the path for you.
Rupsha:
Great. So that was sort of the preparation side of the application process. I'll look for q and a questions if they're more on the preparation side, but let's move on to the next part. That is the materials for the application itself. From most applications I see it's a resume, a cover letter or a personal statement as John mentioned, either letters of rec, for some fed jobs, they use letters of rec, but a lot of them I see mostly two references, two or three references, a coding or a writing sample and a transcript. So those are mainly the components of an application, but jump in if I miss something. So let's handle each of them, each them separately. Cover letter or personal statement. John mentioned that it was a pretty important part of the Northwestern pre-doc application. So if we could start with John, what are some things to highlight in it? How much time should you spend on it? Personalizing it for people? And let's start with John, but if everyone can give their thoughts on how they did it or how they usually see it.
John:
Yeah, absolutely. I think we always stress to potential applicants that if there's one thing that you're going to spend time on applying to our program for sure is take the time to, in some way personalize your cover letter to the position to whatever it is that you know about the position that you're applying to. In general, what we are looking for is hopefully a no more than two page, like a fairly brief, succinct way to articulate. We want to know why do you want to do a pre-doc? What is really driving you? What do you think your interests are? What do you think you are going to get out of this experience of coming here and working with us?
And that's about it. So the customization could be pretty simple. It might be have you spent a few minutes thinking about in our case what it would be like to be at Northwestern University? Are there particular people that you hope to network with or get to know? Here are the people here who do research that you have found inspiring in some way. And I also encourage you, please forgive the language, but please don't attempt to bullshit anybody by name dropping things that you haven't thought about because when the interview question comes up later on and they ask you about that person, that could be a really benchmark against you. But yeah, I would say spend, I don't know, five or 10 minutes thinking about how you would craft the statement for the university that you're applying to.
Pascal:
I can piggy back on that. Yeah, I again agree with basically everything John said. The only things I would add, our number one, probably the thing that we get the most out of the personal statement is just writing ability and communication, which is something that Theo also emphasized before. And so almost more than the substance, the ability to communicate ideas and so personal statements that have typos or have sentences that aren't complete or things like that or are rambling or themselves red flags. But it also is an opportunity which is like if you work hard on your personal statement and you get the writing, well, that's going to be a strong indication that you're able to communicate and that's going to be very helpful, a very helpful signal in the interview process. And then the second thing is just to very much emphasize what John said, which is that it's helpful to have a story for why you're applying to this particular position.
I think this is the opposite of the advice I would give for a PhD application where I would say it's either neutral or negative to try to tailor your application to the particular school you're applying to. And the reason is that when you're applying to a PhD, you have no idea exactly what you're going to be doing working, you're going to be studying, and you might end up in any of the different fields that the institution is offering. For example, if you're doing an economics PhD, you could do an environmental paper or you could do a labor paper, you could do a macro paper, you could do a finance paper. And so specifying that upfront is not helpful, whereas for a pre-doc, almost always you're applying for one or two or three faculty members or a group that's doing a particular kind of thing. And so then getting some sense that you're at least interested in that thing or that you've read the website so that you know this is the thing that they're hiring for is helpful.
Theo:
Well, I think Pascal and John will be more helpful on this question, so I'll just add to the two points. The first is I always view my cover letter for econ jobs and non econ jobs as a chance to explain the rest of your application and put it all in context. And so that's a frame that you can put yourself in. You have a lot of pieces and they don't talk to each other. And your cover letter or your personal statement is your chance to say, this is how my transcript fits with my writing sample. And it doesn't have to be this perfect, you've consolidated your entire life around one goal, but a chance to put everything in conversation is I think, key. And then the second thing, I got advice a while ago on cover letters, which is you can have too many rounds of proofreading where you overs sanitize the thing and it becomes very boring and not your voice and just kind of this, you lose your voice if you overwrite it. And so I would say have some rounds of proofreading. You definitely want to make sure there's no typos, but retaining some of your voice and some of you in the cover letter I think I always think is helpful. And so both of those things together I think are what I can add to this conversation.
Austin:
I don't have much to add to this. I use my cover letter as an opportunity to elaborate on what exactly I did during my IU positions and my independent research because as I mentioned, there's a white diversity of IU positions and some of these positions are perhaps a bit far from what you might be doing in a pre-doc or a PhD program.
Rupsha:
Great. Okay. So the next part is a writing or a coding sample. I think most jobs require it, but correct me if I'm wrong, let's go back in the order that we did the last question in. So let's start with Austin. What did you submit for your coding or writing sample? Was it a thesis or was it from a class or a project and yeah, how do you approach what you choose?
Austin:
Yeah, so for my writing sample, I submitted my undergrad thesis. By the time I applied, I already completed it. So that was helpful for me. I think a lot of people under position by directing that thesis at the point of application. And so, well, first it might be useful if you have a term paper or if you have some sort of sketch of the thesis, that's also fine. Basically what you're looking for, your ability to analyze, to think through about the design and the challenges that you're facing empirically. And it doesn't need to be perfect, but basically that would be useful for coding. Well, basically every position applied to a data task, so there was no one coding sample that I was able to submit everywhere. And so the data task took quite a bit of my time and I would recommend that people set aside some time, basically just make sure that on the academic side you've done your assignments and so on, you don't want to be rushing both at the same time.
Our opportunity insights, basically what the pre-doc do is that we received a lot of applications and the pre-doc help to vet the first few rounds. And what we saw for the coding samples is that quite a lot of people seem to be submitting samples that are consistent with rough work, bunch of areas that you should be catching that should be obvious if you were able to invest the time in that data task. And I would recommend that instead of applying to every job and doing each data task mediocre, I would focus on the drop that you're really interested in. Do those well so that you have a good shot at it. Theo, do you want to go next?
Theo:
Yeah, so on, I think we might talk again about the data task or so like an assignment. Basically you receive or ask to do a little bit of analysis. This is advice I've gotten from faculty as well. They're often on these, we recommend you take three hours to do this, and I basically don't think anyone spends just three hours on them and I don't think it's a good idea to now maybe John and Pascal will say something different, but I think if this is a job you're interested in, you should take the time to do the data task well and ignore the, I mean, there's obviously a limit and some of them have like, we need this back in three days, which is a commitment device there, but you should take the time you need to get it done and not just stop yourself at three minutes or three hours.
So that's on the data task. We might talk about that later. I submitted for my writing sample a paper, just a paper. I wrote a term paper I wrote for an international political economy course that I thought was well written, and I made that choice as compared to papers that a little bit more data, but that I thought were worse written. And so I don't know incorrectly or correctly believed that a better written paper was a stronger foot forward than a paper that was a little bit closer to economics that was worse, not as well written. And then for my coding sample, I hadn't done a lot of coding. I submitted some work I did for an ecology class, so it was about pea plants. And so that was the few jobs that required a coding sample. I submitted data analysis I'd done about growth rates of pea plants. And I don't think that that they're really, my understanding is they want to see your coding ability. They don't care if it's an specifically focused on economics, but again, I'm not the best person to ask about that final part.
Rupsha:
Pascal, do you want to go next?
Pascal:
Sure. So I agree with a lot of what Theo said in the sense that the writing sample, the quality of the writing is more important than the particular topic. And it also ties back to something that Thea said before about the cover letter, which is that if you're writing samples about pea plants, then it's useful in your cover letter to explain why you might have a writing sample about pea plants, but you're interested in a economics pre-doc or whatever pre-doc applying to. And so you can use your cover letter to good effect in explaining that. And then the readers will mostly focus on the writing sample in terms of taking it in that perspective of like, I'm just learning about how this person's writing skills, if you have more research focused tied to economics writing samples, that's helpful in the sense that oftentimes in an interview we might talk about a research product that you've done.
And if you have a writing sample that's about an independent research, then it's an easy way to learn about your interests and your ability to think about a research problem. And so this is an easy way to communicate that, but not the only way to communicate that. So if you have a writing sample that's not related to research, but you're really interested in some economics research topic and you've read a couple of recent academic papers, then we could talk about that and you could try to emphasize that in your cover letter or bring it up in the interview. So again, I don't think the particular topic is that important, but giving some ropes that you could pull on or that the interviewee or the reviewer could pull on that show some interest in economics. Research is very useful and can be accomplished in a variety of ways. And then on the coding sample, I would say there what Austin said on the coding task is important, do the coding task well that is important. And then a non-task related coding sample. I wouldn't care that much about the topic, just about the quality of the code.
John:
Our position on this is we're probably in the minority. We do not ask for coding samples or writing samples, nor will we accept one if someone as sometimes people try to give them to us. And I think part of that's just the arithmetic of we have a lot of applicants and we're trying to spend our time on activities that we think will reveal a lot of information about them. And we just happen to not think that reading one of your papers is a particularly revealing use of our time. We do put a lot of emphasis on how you will do with the data exercise that we give you, and in part I think we try to apply that same principle for you. I mean Theo is absolutely right. If you're going to do a data exercise, take the time to do it well. We try to do you a favor by asking questions that we don't think should take a person more than an hour or so to do well.
So if you wanted to really overdo it, you'd hopefully not spend more than two hours of your life. We recognize that you're applying to not just us, but you're applying to Stanford and Booth and Berkeley and all these other places too, and you can't spend your whole life doing this stuff. But within that, I think what we're looking for and what we would look for if we did want to look at a coding sample from you would be we're looking for really evidence of critical thinking. We also notice if you seem to have good coding habits, again, that's not for any kind of selection mechanism. Like, oh, we want the people who code well, and we don't want the people who don't. But we we're thinking more again about how we would advise you when you get here, what kind of help will you need in order to really thrive in this situation. Some people just don't have the good fortune to have been taught good habits from whoever taught them how to do stuff with Python or status. Some people just sort of got lousy habits that are passed down from one RA to the next over the years. And so we don't really judge you on that. And then we try to figure out what sorts of things you will need to learn in order to do the work well once you're here, if that makes sense.
Rupsha:
Great. And the last thing in terms of preparation is the recommendation letter or the references that you include in your application. What is the most important thing in figuring out who to ask for to be your reference or to write you a letter? And what do you think is the best process to seek these recommendations and what kind of recommendation letters help the most? So let's again go back in the order we did the last question in John, if you'd like to start.
John:
Oh yeah, happy to. I think without question, my advice to you would be if you're seeking a recommendation letter or a reference from somebody, pick somebody who knows you well, that is the most important thing. Someone who has observed hopefully in some revealing situations where they've actually observed, whether it's your work ethic or your ability to think creatively or to solve a problem or to work collaboratively. If you have people like that, that's who I would say you should focus all of your energy on our program. Again, we don't ask you to submit letters of recommendation in part because with the other things, if I get three letters from every one of our applicants, it's very difficult for me to get a good amount of information from those. So we tend to go for references near the end of the process when you're already on a short list, and we might want to distinguish you from a couple of other people, but the things we're looking for are someone who knows you a bit and can speak to the things that are going to make you either likely to be successful as a scholar or successful in a situation where I have to trust you with my research product.
Rupsha:
Pascal, do you want to go next?
Pascal:
I agree with John, somebody who knows you well rather than somebody who is famous or whatever. You choose somebody who knows you well, and that's very, very helpful.
Rupsha:
Austin, Theo, you want to,
Theo:
I'll just answer the part about the process of going about asking. I think it's like if you decide to apply to PhD programs, those require letters of recommendation. And so the first thing I'll say is, and I guess it's easier to write a letter of recommendation if they've already done it, if someone's already done it. And so use this as an opportunity if you're getting people to write letters of recommendation to start that process. I think I'm applying to PhD programs and two out of the three people who wrote me letters of recommendation for this job or going to be letters for the PhD program. So that's just one thing to think about that it's not only going to help you now but can help you later. And the second thing is, I think I always approached it by trying to find a time to meet one-on-one with this, the faculty member, I went to a smaller school and this was possible, but approaching them then in office hours or at some point where you can meet them and just one ask and then ask them face-to-face I think is helpful. And then the second thing is I think if you can provide them with some materials, so different faculty members or professors have different opinions about this, but ranging from a few sentences about how you are conceptual, framing yourself within the application or even just your personal statement can really help them back up your personal statement rather than having them kind of guess what you see yourself as. You don't want your recommendation to say one thing and your personal statement to say something else. And so getting everyone on the same page is important.
Austin:
Thank you. Yeah, I agree with everything that has been said. Yeah, I think one useful way to get faculty members to know you well and also have a recommendation that that is pertinent to this job is perhaps to do research assistantship or independent research where you have an advisor. Those would, I think probably be really informative like this. I haven't read any letters myself, so I have no idea how it looks like from their side, but those were the letters that I solicited for the education process.
Rupsha:
Great. Okay, so that was the application. Let's move on to hopefully you've gotten a callback from a university or a faculty after applying and submitting these letters and applications and so forth. The post application process is usually a data task, which we've heard extensively about, and there'll be another part of today where we'll walk through a data task. So we'll talk more about the data task then. Let's talk about the interview. How should applicants prepare for an interview? What are the kinds of questions that are asked of you during the interview? I know they're very, depending on who's doing the interview, who's the interviewer and the kind of job you applied for, but what should you brush up on tech skills, your resume, and what kind of qualities do you want to shine a light on? Let's start with the person who did mine interview when I was a pre-doc. Pascal, I remember mine interview. So Pascal. Yeah,
Pascal:
Great. So I think usually you'll have some version of why are you interested in this position and tell us about your background. And so you should be prepared to give an explanation for why you want to work in general as a pre-doc and ideally in particular why this particular position is a reasonable one for you. We understand that you're applying lots of positions, so you don't need to say, this is the only position that you care about, but why would this be something that you'd actually be interested in doing? And then sort of to Theo's point about the cover letter, help us explain the story, the trajectory of how you got here and how your preparation fits in. Then oftentimes for positions that ask for a research, a writing paper or a research sample, then there might be questions, discussions about that, and you should be prepared to talk about anything that you submit you should be prepared to talk about.
So you should be prepared to talk about anything in your cover letter, anything on your resume, anything on your transcript, and anything in any writing sample that you provide. And then oftentimes there might be questions, especially if you go further on in the interview process. Sometimes we will do, and I know others do second round or third round interviews, and you might have more specific questions or questions that go deeper to try to test economic intuition or say, okay, if you were presented with a data set that had this as the rows and this as the columns, how would you analyze it? So thinking a little bit on your feet, which is hard to prepare for, but just know that that kind of thing might be coming.
Rupsha:
John, do you want to go ahead?
John:
Yeah, I'll let go. What Pascal's absolutely right. Every single time I think you're going to be asked to tell in a clear way what you're interested in, how you got here, and if you submitted anything that the materials will be different from job to job, but anything that you submit, whether it's a data set exercise or whatever is fair game to be talked about. And I think that's especially true nowadays. Maybe not so much a few years ago. I think now there's a lot more thinking if you submit a writing sample or a coding exercise, how much of this did you author versus a generative AI tool or someone that your roommate may have helped you or things like that. I'll offer a couple of other things that I think you should be prepared. I know we will ask a lot about these. So one is fairly simple.
One is specifically what do you hope to achieve with this pre-doc position? And we're really thinking about someone, have you really imagined coming here and what you would do with your time here? Have you thought through what would be the best use of your extracurricular time? Whether that is, A lot of people will say they want to take classes and so forth, but I'm always intrigued when people have, I think more they've spent some time thinking about how they're going to get better at being a researcher as opposed to getting more experience at being a student. The second thing I'll say, and take this with a grain of salt, because other places may not be looking for this so much, but we are often looking for self-awareness. I think a lot of applicants come into an interview thinking that every question, it's like a quiz and you're supposed to demonstrate how excellent you can answer these questions and how much you already know and how good you are.
And there's certainly, I don't want to discount that entirely. There's certainly an art to selling yourself in any kind of interview, but we're very much looking for, we'll try to ask some questions where the answer is going to be you revealing what you don't know and what you hope to learn. And so because also very much looking for people with a growth mindset, we want people who are capable and who are going to have a really awesome trajectory after they get here. And so I would just advise a lot of you to think about and don't think about trying to put up the most perfect version of you, put the most honest version of you and let us see that you're also still interested in growing and learning and getting better at things that you don't think you're that good at yet. So
Rupsha:
Great. We're actually at time, but for Theo and Austin, any last comments even maybe it's the interview or any last comments about the application itself? Yeah, let's start with Austin.
Austin:
Basically what I wanted to say has been covered. Essentially my interview was very focused on my writing sample and in particular some of the choices that I've made and some of the weaknesses in the writing sample. So you should be prepared to realize the weaknesses in your own research. Of course it's going to be not perfect and try to explain how you've tried to tackle the issues that you faced. Yeah,
Theo:
A few small things. I'd say know your own materials will, but also if you've gotten sent any materials, make sure you have read through the paper they send you and really know what's going on in that paper. There's, there's a video online where someone gives a talk about that paper. You can watch that talk, make sure you have all the first impression things down. So dress well be your interview, make sure your zoom connection works for your interview. All of those things. Do that before it's even worth sometimes doing, especially before you're doing all of these interviews, one or two interviews where you have a friend ask you questions or a parent ask you questions just to mock out what those are. The last thing I'd say is feel like if you don't know the answer to a question, you can pause for a second and take time to think. I don't think that's ever really looked bad in an interview and it can help you a lot. So take your time with answering the questions rather than thinking if I don't know the answer, I need to fill up the 10 seconds of airtime with gibberish before I really get to my answer.
Rupsha:
Great. Thank you all for joining today. I hope this was helpful to the audience. There are a couple questions that q and a, which Steven has been diligently answering throughout the panel, but if you miss parts of this or want to rewatch, it'll be uploaded onto the pre-doc website. So thank you again and we'll be back for the second panel right after this. Thank you.
How to Succeed as a Pre-Doc
Zerah:
All right. Hi everyone. For this panel. We are going to be talking about how to succeed in a pre-doc. We have amazing panelists here with us today to give us insight on that topic and like I said earlier, towards the end we'll allocate around 10 to 15 minutes just to open up to audience q and a. I know you guys have a lot of questions and I want to make sure we have all of those answered as well. So to start off, I'll just go around and we'll have everyone introduce themselves to our audience and I'll just start. So I'm Zerah. I was a pre-doc at Yale in the Tobin Center and I just wrapped up around two weeks ago and this coming fall I'll be starting my PhD in political science at UCLA. So I'll just call people according to how you show up on my Zoom screen. So the first one is Ethan, if you want to go next and introduce yourself.
Ethan:
Yeah. Hi, my name is Ethan Deemer. I'm a pre-doc at the Zel Lary Real Estate Center at the University of Pennsylvania and I'm in my second year of doing this, so next year I'll be applying to grad school.
Zerah:
Awesome. Zach, you're next.
Zach:
Hi folks. I'm Zach Bleemer. I'm an assistant professor of economics at Princeton University. In the industrial relations section, I work with a bunch of pre-docs as an empirical micro economist, mostly focused on questions around higher education and economic mobility in the US.
Zerah:
Awesome. Aryan, do you want to go next?
Aryan:
Hi everyone. I am a pre-doc also at Yale. I just started a month ago and I'm working for Winnie Van Dyke here and then Scott Nelson at the University of Chicago.
Zerah:
Great. Anya, do you want to go last?
Anya:
Sure. I'm Anya Soic. I'm an associate professor at the University of California San Diego. I have been working with pre-doc for the last 15 years or so, and I do work in experimental economics, mostly field experiments in education, health and charitable giving.
Zerah:
Awesome. So we'll get started and I'll start off by saying one thing that people are commonly curious about in a pre-doc position is the technical skills, specifically coding that you need to succeed. So in your case or in your RA's case, what are the programming languages or softwares that you commonly use or have encountered so far and we'll just go to Ethan first. Yeah,
Ethan:
So I think a pretty common one that I use and I've heard a lot of other folks across the field use is Stata and Python is a good core coding language, but one thing that I don't hear a lot of that I'm actually using quite frequently in my job is how to code in Linux and using Bash. If you're submitting any type of code to a server, which a lot of larger universities seem to be doing nowadays for large scale jobs, having that might be a really strong differentiator when you're applying to programs. This is something I didn't know about until I even started this position, so that's something I wish I had known.
Zach:
There's going to be a lot of variety. Oh, sorry.
Zerah:
No, go ahead. Yeah, I want this to be...
Zach:
There's going to be a lot of variety across pre-doc, so different faculty code in different languages. My sense is the most common right now are STATA and R, though it depends on field. There are a lot of people working in Python and MATLAB as well. It's not at all unusual to have people who use other languages. Bash and Unix are one, A lot of people use ArcGIS and related software. My sense is as well, there's two different questions. How you want to pitch yourself to get a job as a pre-doc and then what skills you want to build to be an optimal pre-doc. To me, as I'm hiring pre-doc access or experience in a specific coding language is a lot less important then having shown an interest in and developed some experience in at least one coding language. I primarily work in r, I have a preference for people who work in R, but if people come in and they have experience in Python, that's just fine. It tends to be a lot easier to switch from one language to another than it is to just learn coding from scratch.
Zerah:
Yeah. Aryan, do you want to go next?
Aryan:
Sure. So in my role, I would say just as Zach pointed out, I've sort gotten to see the breadth of the different languages. I have one PI who works exclusively in data, one who works exclusively in R and we have a data partner who works exclusively in Python. So I think having knowledge of one language and being able to write code is important, but a really sort of valuable skill above that I would say is to have some sort of basic familiarity with the whole gamut so you can at least read somebody else's code.
Anya:
My turn?
Zerah:
Yeah,
Anya:
So we do exclusively STATA. I know a lot of people do R as well. One thing to note that I look for actually we do a coding test when we hire pre-docs and the coding test I use is to take one of my papers that's already published in the JPE that comes with the data file and I ask people to replicate the results in the paper and to tell me some interpretation of the results. So I think in addition to knowing the language, what I really look for is people being able to understand the intuition, which is missing for a lot of people. So I see what I don't want is an RA that comes in and presents to me regression output but has no idea how they got there because that's where you might end up with regression output that has a lot of mistakes and you don't realize there's a mistake, but I can clearly see it that coefficient estimate is way too large or way too small or negative and I've done that analysis before and I can see why.
So what I really look for is people that understand the data intuitively, can I take a data set, calculate some means, look at some trends, do I understand the data first and then I do the regression with the data. And so what I actually look for is I just hired a pre-doc a couple of years ago out of Cornell who had not taken a lot of data analysis classes and actually was not kind of up to date on every single language, but what I do is when they replicate my regressions in that particular paper, there's a footnote that explains what are the fixed effects that we're using and what are some of the controls that we're using that a lot of students miss. And so if you miss those, your replication does not produce the same coefficient estimates as the paper and I tell you where the paper is so you can look at it.
And if you don't miss those, then I think, look, this is a person that has a good attention to detail that also understands intuition. And so I actually look for people that are able to understand the intuition and find the details. So the student I hired did not format her table in exactly the right way that it was formatted in the paper, but she understood the intuition behind the table and she understood how to look for the details. So this is what I really look for is you should understand how to do the analysis using R or STA or what have you. And I assume that many of you who are top universities are well equipped to be able to do that, but the intuition is kind of that last piece that really needs to be there that's really helpful for you going forward.
Zach:
I'll just quickly second that. I do exactly the same thing in my hiring process. Have people replicate an old paper and talk me through the findings or in some cases don't even replicate it, just look at the figures and just try to explain to me what went into producing these figures and what would you have to do in order to produce them yourself. I find that intuition is much more indicative of a strong pre-doc than coding experience per se though as a sophomore junior in college, as you're sort of deciding what to beef up on most, I think both focusing on coding languages and also taking econometrics courses or background courses in economics that give you a sense of what empirical economic research look like would be really helpful.
Ethan:
If I could also add one more thing at the cap of this going into my pre-doc, I thought I had a really strong command of the coding languages that I knew, and then once I started, I realized how foolish I was for thinking that a great deal of what I can do now. I learned within the first three months of my job, and I think all of the other people who work in my office have reported very similar experiences. So coming in, and this is maybe less of a technical, but being willing to learn as you go with some of some of the more complicated aspects of data or Python or R for instance. This is as much a learning experience as coming in knowing what you have to know.
Zerah:
Yeah, just building off of that, knowing some of our audience vary in comfortability with coding, how can prospective pre-doc or even pred docs gain proficiency or comfortability in areas where they might be lacking such as coding statistical methods and all of that? Anyone can go ahead.
Aryan:
I think there's some wonderful practice coding exercises online. I also think it might be worthwhile if you've taken a statistics course or data science course to maybe reach out to a faculty member there and see if they have anything on their hands. But what I at least found particularly useful was for these coding exercises, sort of doing it on my own but then spending a lot of time reading the posted code by other people. I think you can try something dozens of times and feel like you're doing okay, but then the moment you see a better way to do it, that's probably the best way to improve your skills.
Zach:
The act of becoming an academic economist is an act of constantly pretending that you're already an academic economist and doing all of the work that an academic economist does. And so as an undergraduate, to me, the best preparation that someone can go through toward becoming a pre-doc is writing a senior thesis or otherwise conducting some kind of independent research project that forces you to sit there with some kind of dataset and answer a question using that dataset, which tends to only be possible using some programming language. Then you become a pre-doc and you are a first an immediate part of some set of large scale economic research projects after which you go and become a graduate student in which you just pretend to be an economist for six years producing the same kind of work that you're going to be producing for the rest of your life once you are an economist. And so the earlier that you start just pretending and producing empirical economic research, I think the faster that you'll develop the skills that are required to conduct that research at the highest level.
Zerah:
Anya, do you want to add on to that?
Anya:
I think Zach gave a really good answer. I don't have a ton more to add.
Zerah:
Alright, so let's switch gears a bit and we discuss coding skills, but that's not the only component of a pre-doc, right? So in your opinion, which other skills like this could be other technical skills or even soft skills like teamwork, communication that you think are most critical to develop during a pre-doc? Ethan, do you want to go?
Ethan:
Yeah, I think something that was really hard for me coming in, this is a bit of a soft skill, was willing to be wrong about things coming into, when I was applying for a pre-doc, I, as Zach said, I had to sort of pretend like I was already acting as an academic economist, which I definitely was not at that stage and I still am not. So it's definitely important to be willing to put yourself forward in that manner, but then being willing to figure out where you have to build up some of your deficiencies and skill or in knowledge. I found my first month I was trying to put my best foot forward a little too hard with my PIs and that made it really hard for me to actually figure out what I needed help with. And so being willing to admit when you don't understand something is a very hard, it was very hard for me and I don't think it's a very easy thing to do in general and that's something that very early on will help you get better at things that you're not good at, currently.
Zach:
A lot of academic research is really boring and some of it doesn't require very much brainpower and just a willingness to put in the time to, for example, scrape a bunch of data or digitize a bunch of data or do what feels like an extremely repetitive process over and over and over again for data that come in a lot of different formats and have to be read in a bunch of different ways. At least some subset of a pre-doc time is going to be spent on these activities. But I think it's really worth emphasizing that some subset of all academics time is spent on these activities. It doesn't stop when you're not a pre-doc anymore. That's part of knowledge production in the same way that in a wet lab part of what you're doing is just titrating all day. We have versions of titration in our science as well and it's not so much as skill as I think it's really important to develop a willingness to do tasks that you might feel like are below you. I constantly have to do tasks like this and I think everyone in academic production has to do that, but it, it's more than a willingness. I think it also requires a trust that these activities are worth it, not just for your time but for science to put in a lot of boring work that in some cases the only reason it hasn't been done before is exactly that it's so boring. But on the other side, it can lead to really important insight.
Anya:
I think communication really, really important. So when I have students come in and start a job, whether it's an RA or a pre-doc, I think they really struggle with the importance of continuing to constantly keep in the loop with the professor. So if I send an email to my pre-doc, I expect her to respond actually the same day to either say, Hey, I'll look into this or I don't understand. I think what often happens with students is they get nervous talking to faculty, and so if there's something they don't understand, they just kind of hide. I've seen that also with PhD students. So this is not just a pre-doc problem, but I think really making sure that you communicate well and respond to every email saying, I understand I got it. I'll work on it by this Friday, or I understand you use by tomorrow, but you also signed me this other task, could I do it on Thursday?
These are really important things that will go a long way toward making sure that you're doing a good job for asking questions is really good. But I think also spending some time online researching it before you ask the question is also very good. So being able to motivate yourself to kind of solve problems. So communication's one, I think problem solving is the other big one, can I solve a problem myself? Can I be motivated to do this task until it's finished even though I don't have a bunch of interim deadlines? These are all skills that are really important to learn, skills that are really important for a pre-doc, but also skills that you continue to develop during the pre-doc that are going to be super important if you end up going into academia, because in academia you need to be very self-motivated, you need to communicate with co-authors, people in the department, students and so on. And this is all things that you want to work on.
Aryan:
Yeah, I agree with what Anya said. I think one of the biggest skills I had to pick up was learning how to write and communicate clearly. I think in most other writing forms you do before your pre-doc, you're sort of trying to put together a succinct argument as quickly as possible, but with a pre-doc, you really need to be careful about it. What are the things you still have doubts about and what are the nuances that you think you and the PI need to think through? In a lot of cases, you're reporting to them about data or about sources that you've seen and they haven't. And so you need to, while being respectful of your time, be able to communicate all sort of the nuances that are important.
Zerah:
Great. Yeah, I went through all of that during my pre-doc, so I agree fully with what all of you said. So another aspect of the pre-doc that I personally think is very, and especially to those who are thinking of going to a PhD program is having that close faculty mentor relationship. So for some who may not have had the chance to form those relationships with faculty in their undergrad or even master's program through a pre-doc, you're able to build that and network with other faculties, which can ultimately lead to a stellar letter of recommendation if you choose the academic route or an amazing reference point if you choose to go to a non-academic route. So for you, what are some ways that you can effectively build a professional relationship either with your advisor, your research team, or even just your pred docs or other pred docs? Aryan, do you want to go?
Aryan:
Yes. I have to give a lot of credit to my PIs here in that I think they really took the initiative in setting a culture which really allowed us to build a strong relationship where I hope a couple years from now they'll be able to write me a recommendation. And a big part of that was when I came in having conversations about me, about what I hope to get from the experience, what sort of structure would be valuable and what sort of things they could do to invest in me. I think there's a lot of variance in terms of what you can get from your PI with that, but just having those conversations up front I think allowed us to create a schedule where now we're able to talk almost weekly about those sort of things and I know how much I can ask them to do to help me on that front.
Zerah:
Zach, do you want to go?
Zach:
I think in building a relationship with a PI, the most important thing for a pre-doc is going to be enthusiasm for the PI's research agenda and the research that you are participating in. There's lots of economic researchers who are conducting all sorts of economic research. I think it's important to sort of place yourself, orient yourself in a position where you can share the enthusiasm of the research team that you're part of in the work that you're doing. That enthusiasm is going to improve the quality of the work in part through this motivation channel that Anya was talking about. I think it will also smooth your relationship with the PI who remember has chosen this research agenda from nowhere, has chosen this among all possible research agendas to devote their livelihood to and is looking to you to help promote that research agenda. And so I think building a relationship out of, at least in part enthusiasm for the work is going to lead to, for example, not just participating in specific projects with the pi, but maybe then starting new projects that you're with the PI or sending you in different directions to identify potentially frontier research topics that you want to pursue going into graduate school.
I think that kind of professional relationship can be very fruitful.
Anya:
Anya, do you want to go next? I think Zach has it exactly right. So you should be also selecting PIs on the basis of your interests. And so if you're very interested in behavioral economics, you want to work with someone that's doing behavioral economics, not someone that's doing macroeconomics, right? So I think for the most part, as a really young person, I feel like many of you are probably interested in a lot of other issues. I have had students who come and work for me who end up a lot of the time actually when we do field experiments and behavioral econ, a lot of my students go on and do development economics. So that's very different, but they're still very engaged in my research and I don't absolutely think it's great that they want to go on and be development economists, but they still need to. I mean, I think the pre-doc is a good time to find out what you are interested in, but you should at least feel like you have some interest in exploring the area that you are applying for. So resist the urge to apply to every single pre-doc if some of them don't interest you.
Ethan:
Yeah. I also think building off of what Anya just said of coming in knowing, I came into my pre-doc pretty dead set on, I know that I want to do urban economics and it's still very much one of my interests, but working with my PIs, one of Whom's focus is very strongly on transport, economics and transport and cities has really helped me evolve my interests and as a result, I think I've become generally more interested in the field than I already thought I had interest coming in. So being willing to let your interests evolve and as Anya said, come in to do behavioral and then end up doing development of the interest in experimental side of things. Also talking with other pred docs at the same time, I'm very fortunate to work on a team of three other pred docs for one pi. It's kind of weird how it's split up, but we work very collaboratively with one another and all of our interests have grown and we have been able to work together to communicate these are the people that will likely be future if you go into economic research. So leveraging the entire team, not just the PIs, but also your fellow pre-doc is really, really nice, really nice benefit of the job.
Aryan:
If I can just add one more thing, I think Anya and Zach had talked about how they structure their interviews to in some ways model the sort of work they expect you to be doing. And so as you're doing the coding interview, you should also be thinking about is this something that I would want to be doing? These sort of questions that interest me, is this the sort of data I would like to be working with? Because I think that gives you a picture of what you're setting up for those two years.
Zerah:
Yeah, these are all amazing advice and I'm sure everyone here is appreciating that. So another thing about a pre-doc program is that it's very unique in that we have a lot of resources at our disposal. So I don't know about everyone else's affiliation here, but for Tobin we have many resources like seminars, classes, study groups, all that. So what are the resources that are available in your respective pre-doc programs and how has that helped you in your position, your developing your own research agenda or even deciding your future plans? So Ethan, do you want to go first?
Ethan:
Yeah, so I can give a broad overview of what we have. I think the most surprising that I did not expect to get out of the pre-doc is are the departmental seminars that my department puts on every week. There are one or two or more generally, but one or two that I can make time to go to seminars where people will come and present on their ongoing research or are currently finishing up research. And one of the most interesting aspects of that is one of those seminars is where the job market candidates for people who are looking to be hired by the university come and present their research. And that's a great opportunity to sort of see the cutting edge of what current outgoing PhD students have been working on. So seeing the type of stuff that I'll be working on within the next five years, that's really useful. Also, I able to, I came into my pre-doc without a strong math background, so I was able to take real analysis last semester. That's something that I know is pretty commonly recommended for applying to an economics PhD. And then I'm going to be taking a few PhD level courses this coming semester in areas of my interest. So seminars and classes are big, and then I'll echo myself earlier with the other people on your floor or working with your professors.
Zach:
These opportunities are going to vary a lot between pre-doc, but I suspect that they'll all have the same flavor. It's pretty common for pre-doc to take a couple of classes while they're at these universities. These are classes that in many cases would've been available during undergrad. They might be things that you just didn't get to or specific topics that are available at one but not another institution. You should think of all of this as the kind of career development that is all just part of this greater process of becoming an economist. These are all classes that you might take during grad school though in some cases you take earlier or things that you wanted to learn before you go into the core econometric sequence or economic sequence of the first year of the PhD. So there's going to be courses available, there's going to be seminars available, and then you're living at a university. And so there's all sorts of talks and activities around the university that pred docs participate in. And I think you can just think of all of those as just additional varieties of economics, professional development.
Zerah:
Anya, do you want to go next?
Anya:
Yeah, so I think these vary a lot. We actually have trouble giving students opportunities to take classes. We have some rules about how much time you have to be a staff member in order to receive tuition remissions, and so it ends up being quite expensive at UCSD to try to take a class. A lot of my students take real analysis from Harvard's extension program. That's a really common class that students will take while they're doing their pre-doc. A lot of my students are doing a pre-doc precisely. They have not taken as much math as they would have liked to prepare themselves for the PhD. So I think actually when you come into the pre-doc, you should really be thinking about what is the thing that's missing from my application being the best it can be for a PhD program. So for some it's research experience.
Maybe as an undergrad, I wasn't sure I wanted to do a PhD, so I didn't really work in a research lab. And obviously the pre-doc is great for that for some people, well, I'm missing a couple classes, can't really afford to do another two years of school, but I'm going to do this and try to take some classes on the side. Maybe I want to have a really good independent research paper to submit as part of my application package in the next year or two. So I'm going to spend time on the side working on that. So it really should be carefully thought out, what is it that's missing? You can talk to your advisors, you can talk to other professors from your undergrad. Try to get a sense of what you could do to give yourself the best shot at a good PhD program.
Zerah:
Aryan, I know you're new, but I know Tobin has a lot of seminars, so do you want to talk about that a little bit?
Aryan:
Yeah, I think you might actually be more qualified to jump in if I'm missing something there, but I think we have sort of the standard things when it comes to seminars with faculty and visitors, graduate students, job market candidates, students pre-doc here are allowed to take courses at Yale, and I know a number of them do, even if that just means auditing a course with an economist who really like and you always wanted to take a class with. And then we have a few pre-doc specific things. We have pre-doc specific seminars when it comes to getting training on good coding practices, working in different languages like vtech or even what are the different opportunities if you don't want to go into graduate school after the pre-doc. And then we have a pre-doc reading group where we read public policy econ papers every week. And so there's a few pre-doc specific programs.
Zerah:
Yeah, that's a very good overview of our Tobin with many resources. So I'm assuming right now that our audience are mostly juniors, seniors or even master's students who are looking to possibly apply to pre-doc in the near future. So what advice do you have for someone who is transitioning from an undergraduate or master's program to a pre-doc environment? And for our two pre-doc on the panel, what was the most shocking part about transitioning to a pre-doc? Do we want to go with our faculty first? How about we go with Zach first?
Zach:
Well, I think the thing to keep in mind is that pred docs are jobs. So I was a pre-doc or what is now called a pre-doc at the New York Fed in 2013 to 15, and it was the first job I'd ever had. There's a boss, you're working on a set of projects, you have hours and those hours end. And so after five o'clock we were totally free. That's also true for pre-doc here. Here. Eventually toward the end of pre-doc, people can get pulled into co-authored projects. Maybe you're working on some economic research in your, so-called off time, but especially at the beginning and in many cases throughout the pre-doc, there's a sense in which you have on hours and which you're expected to be working pretty hard on a given research agenda and you have off hours that belong to you. And I think just learning time management skills around both that dichotomy and as Anya said, this sort of open-endedness of the economic research process is a big change from the very structured time that students tend to live in.
Zerah:
Anya, do you want to jump in and add your thoughts?
Anya:
Well, for structuring time, I think really being motivated to get the work done that you need to do is what's most important in the pre-doc. So officially, our students have set pre-doc have hours like what Zach is saying, but I think if you need to be off certain times but you can get your work done, other times that's totally fine. In my lab, as long as the work is done, we obviously don't expect people to work crazy hours. I have just noted generally the way that the academic, I mean we are talking about pre-doc here, but I think many of you might be interested in the full kind of academic trajectory. You do a lot of work before you have tenure, so there's a lot of work involved. Typically, the pre-doc is quite intensive. The first year of the PhD is extremely intensive with a lot of math and studying and study groups.
And I did not have a social life my first year as a PhD student, second year of a PhD. Typically a lot more kind of field courses, but still a lot of work. The dissertation definitely a lot of work. The first five years of a tenure track professor position, very, very hard. Once you have tenure, that's right. Now that I have tenure, I work significantly less and you have to really want to do it. You really have to be invested for 10 years to be working very hard. It's just kind of how it works. I mean, I still work a lot. I have a lot of projects. I'm publishing a lot, but I'm not as invested in spending my whole life working. So if that's not what you want, don't do it. If you don't see yourself as someone that wants to go in and spend 10 years plugging away, sometimes I think, oh, maybe I live right now in kind of a party area of San Diego.
It's called Pacific Beach, and I see all these 20 somethings playing ping pong in their yard and going surfing every day. And I'm like, huh, I never did that. I spent my entire day studying, so maybe I should have, but, you know, whatever. I didn't want to. I wanted the path that I took, but you got to really want the path that you took. Had students in our, I'm also the director of our PhD program right now in the business school at UCSD. And I've had students that don't do well in it. They tell me, well, I didn't do well in these classes. I really wanted to have a social life. I didn't really think I wanted to study as much. And I said, the PhD is not for you, then this is going to be more of the same. And so you really have to want a life where your life revolves around your work for a while. That's sort of the name of the game in academia. Maybe it's not right, but that's how it's structured.
Zerah:
Okay, well, we'll go to our pre-doc now. So Ethan, do you want to go ahead?
Ethan:
Yeah, so just as the question is also about what was the most surprising shift? So I think the big thing is, this is maybe hinting a little bit about Anya's point of how much, even though there is that strict delineation of, I come into work at eight and I leave at. now I leave at four because I'm studying for the GRE, but it's essentially a nine to five, but how much time I spend outside of work thinking about my own research and my own research interests. So in college, this is getting at a broader point, but in college my schedule was very loose and classes changed from semester to semester, sometimes quarter to quarter. And coming into this job, it was such a weird shift going from this is my strict set of work hours and then outside of this I have to have the self-drive to either determine to do it myself or fill out my time in my own way.
So having that level of control outside of my work hours was a bit strange. And figuring out how to make myself dinner while also doing all the other things that I need to do on a day to day that was a bit jarring, but also freeing in a way. So it is good to have that time to, while I'm making salmon, I can also think about, oh, it's the labor effects of the way that the salmon was fished. I don't know that weird example, but I think time flexibility in time flexibility changes, it doesn't go away, but it changes in a way that is very different from undergrad.
Aryan:
I think the sort of big difference between college and what you get in your pre-doc is here you have to take a lot more ownership and authority over your decisions, I think in how you want to spend your time when you're in college, it sort of is your entire world and a lot of the choices are made for you and you do what you do because it's part of some pre chartered path of problem sets to do next week I need to take this class for a major, whatever that is. And for people who choose to go on to put some other career paths, a lot of those choices continue to be made for you. But with the pre-doc, you do have all this free time, but also you have something you're incredibly passionate about. So I think a lot of us do choose to spend, say our evenings or our weekends working on independent research or thinking about questions or preparing to be an academic. But then that's, I think for the first time in your life, a choice you have to fully own because you can choose to do anything else.
Zerah:
And I want to build off of this, I think it's very important. What are the most important steps for you that you should take or you should have taken in the first few months of your pre-doc to set yourself up for success, not only in this program but beyond in your future? So actually I'm going to go back to you Aryan in your first few months. Do you want to jump in?
Aryan:
I mean, I suppose I'm as excited about the answers as everybody listening, but I think yeah, I think, to a certain degree, I also perhaps fell into the same trip as Ethan just thinking that you sort of hit a home run on every task in your first few weeks or in your first month. And I think what's most important in the long run of the pre-doc is doing thoughtful and careful work. And so that means really resisting the urge to try to turn something around very quickly because you think that would be impressive or to try to take a shortcut because you think that'd be impressive. Sort of building those habits early on of taking your time and having systems to double check your work to make sure that there aren't errors, that things haven't slipped you. Those are the sort of things which I think I still have to build these first few months.
Zerah:
Ethan, do you want to jump in?
Ethan:
Yeah, I build off of that. Everything that Aryan said, I completely agree with having now this be my second year and seeing, we actually the second year I do a two year program. So the second year pre-docs in my department all kind of got together and did a training module for the first year pre-doc coming in. And the biggest thing we emphasized is, take your time, take your time. Your PIs are not, none of the aspect of applying for and getting a pre-doc is a test, but there's a bit, I felt a bit of pressure to feel as if I was constantly being tested by my pi. They're there to push you, but they're not there to test you and they're going to be upfront with you about their expectations. And knowing that and working under that assumption would've is very, very important because taking that extra time to set up a nice code preamble, I used the same code preamble for all my state of code.
And had I set that up very early on, it would've made downstream things way faster, way more organized, and I would've been much better at communicating. But because I felt this sort of internal pressure that I thought was external to be fast and prompt and direct really made it sloppier for myself in the future. And so this goes back to being willing to make mistakes and being willing to say, oops, I messed up. If you make a mistake, shoot your PI an email and say, I messed this up, I need to fix it. And they're always going to appreciate that other than just hoping it flies by the wayside. So make mistakes and go slow are two big parts of what would've made this job easier.
Zach:
I don't know if I have so much to add on this one. So I agree with these guys. They're a little closer to the beginnings of their pre-doc.
Zerah:
Awesome. Anya, do you want to add or do you want to move on?
Anya:
Sure. So I give some advice my PhD advisor gave me during my PhD, so he said it's okay to make each mistake one time. So I tell this to all my pre-docs, I think it's really important. I did lab experiments during my PhD and I did these in this program called Z Tree where it writes, there's a main computer, there's a bunch of little computers, the little computers write the data of what the decisions the kids do, the students do to the big computer. And I had put the data file in the trash before starting the session for some reason, totally stupid. I was so scared to tell my PhD advisor that I did this. It was my first experiment, first session I've ever run. And he just said, you know what? It's okay to make every mistake once, just don't do it again. So I like what Ethan was saying about making mistakes and if you make a mistake, let your PI know right away, but then make sure it doesn't happen again. So learn from the mistakes, that's really, really important. Learn to save your work, make duplicate copies of it, et cetera. That I think is super important and advice my advisor gave me 20 years ago now and I find it really, really useful.
Zerah:
Awesome. I agree with all of that. So let's talk about next steps with a pre-doc. Obviously a pre-doc in the name itself is generally designed to help you to prepare for a PhD specifically in economics. But for some people like me, I figured out that my research interests were more aligned with political science. So the question that I know most of our audience might be wondering is that would you recommend doing a pre-doc to someone if they were on the fence about doing a PhD in economics or just in general a PhD? Let's go with Zach first.
Zach:
Yeah, so I think that doing a pre-doc is a really good way to learn what it's like to be a PhD economist, both in seeing how the PI's life looks, and as I said earlier, sort of pretending to be a PI yourself for those two years, it's worth emphasizing that pre-doc are underpaid, grad students are underpaid. If you're really on the fence, you have to recognize that you're taking an eight year pay cut in return for learning or deciding to be an academic economist. It's a decision that I made and lots of academics made. I know a few academics who regret that decision, but I think there's this trade off where it's true that you can learn a lot in a pre-doc, but also you have to already be, to some degree pretty committed to this track. And while there are pred docs who go on to law school or med school, pred docs who go on to PhD programs outside of economics and pred who just go back to industry, those are clear losses in the sense that the pre-doc is not to help you get into a better law school or in most cases it's not going to get you a great industry job and it's relatively low paying in the meantime.
So I think you just have to be willing to make that jump in return for learning a lot about what it's like to be an academic and opening academic doors for yourself.
Zerah:
Anya, do you want to jump in?
Anya:
Yeah, so I will say actually there's kind of this misconception maybe, and it's partly driven by professors that if you go do a PhD, it means you definitely have to become an academic economist. I think this is what most of your advisors will want for you because that's their job and that's what they feel happy with. I think it's also what you want to write in your PhD application that you want to be an academic economist. Ultimately, this is sort of what our profession wants, but I've had students who decide they don't want to be an academic economist, so they go into industry for a while and then pretty soon realize that most of their bosses are PhD economists and then they go on and do a PhD. So you shouldn't think of it as like I'm doing a PhD, I want to be an academic.
You should think of it as I'm doing a PhD, I want to expand my knowledge and be able to investigate new problems, and maybe that means I'm going to become an academic, but maybe that means I'm going to go to Vanguard and study retirement planning decisions of our clients. And that's what one of my students went on to do. Now she's at Airbnb, but there's a lot of PhD economists out there in a lot of different professions, and I think that's really actually different for economics versus engineering. So my sister did a PhD in engineering at Stanford and went into industry and she said that wasn't really needed for her. She was overqualified for industry for the particular industry that she picked. She's not doing research, but in econ I do see a lot of PhD economists and a lot of higher up roles in government, in think tanks in academia, but also industry. And so you may end up deciding that academia is not for you and that's okay, but I agree with Zach, don't go and do this because if you don't want to do a PhD at all, if you're sure you don't want to do a PhD, you'll be underpaid and you'll have to work really hard. You'll get some training of course, but you might do better as an intern or kind of a junior employee at the firm of your choice.
Zerah:
Aryan, do you want to jump in? Oh, Ethan, go ahead.
Ethan:
Oh, sorry. No, Aryan, go ahead.
Aryan:
I would say broad strokes, I agree with Zach. This is really best suited if you want to go get a PhD, but I would say if you are still sort of on the fence and say you've written a senior thesis and you sort of like this, but you're not sure, I think a pre-doc is a really, really cool opportunity. I think it's an excellent job. I think relative to what people our age get to do in other industries, you get far more responsibility. You get the opportunity to sort of sit the vanguard of human knowledge on some specific topic and you get to work with a really cool community of other pre-ops. And so if you can sort of stomach that pay cut you have to take for those two years, I think it's an excellent opportunity. I think it's a great way to figure out if this is what you're truly called to.
Ethan:
Yeah, I completely agree with Aryan. I think one of the things that is neat about a pre-doc is that especially in economics, because economics is in my opinion, fundamentally a thought methodology and a set of tools. As a result, a lot of other fields have borrowed from those methodologies and tools. So if you come into a pre-doc and you're pretty sure you want to do PhD in economics, but you're not a hundred percent sure if that's your real passion, you could, some people from my cohort are applying to marketing PhDs and finance PhDs, and though those are related, they're not exactly equal PhDs, but I personally, this is with rose colored glasses because I don't have the counterfactual, but I am very, very glad that I decided to do a pre-doc instead of going in doing anything in industry. I knew that I wanted to do a PhD pretty solidly, but as I've done my pre-doc, I've actually realized that I'm much more infatuated with academic research than I thought I was coming into it. So the opposite can happen where you aren't a hundred percent sure you like academia and it transitions you into saying, wow, I wouldn't want to do anything else. So it's a good opportunity to see what you do and don't like if you're on the fence, I think you can be useful.
Zerah:
Alright, so we are nearing at the end. I think we have around eight minutes left up. So for our panel, before I open up to q and a, because I think we have a few questions, do you have any advice to our future and then I'll go to you, Ian,
Aryan:
You said, do we have any advice for prospective pre-docs? Is that right?
Zerah:
Yeah, yeah,
Aryan:
Yeah. The one thing which I think I realized looking back, but I see looking forward when I was in that position was when you talk to other people about pred docs or you sort of read the resources online, there are these often these absolutes which should given about what's needed to get a pre-doc. You need to have taken certain classes, you need to have particular background, but when you think about it from the person who's hiring you perspective, and I'd love to hear from Zach and on if I've got this right, they're really tasking you to take a large responsibility in the research agenda more than you would likely get in other places and they're allowing you to do that very little oversight and so reasonably there's some risk aversion on their part. And so to the degree that you can sort of put together a portfolio which shows them, Hey, I'm not a risky pick, I have some ability to code and understand things and pick up the skills pick and you wouldn't really be taking a gamble and picking me, I think you can get a pre-doc independent of these specific absences.
Zach:
So what I would keep in mind is that empirical social science as facilitated by computers and large data sets is very young and there's just tremendous insight, very low hanging fruit still to be gained by the conduct of social science. And American economics is really at the vanguard of a wide variety of topics that social scientists are studying in the us. I think it's a very exciting team to be part of. Pre-doc are sort of like the first stage on your journey toward being part of that team, a cohort of social scientists who are pushing the frontier of our knowledge about people and they're both beginning that journey personally in participating in that research agenda and also directly contributing to our stock of knowledge about human behavior. And so I think that's part of what our is getting at here and just the enthusiasm and the responsibility of science, it's also relatively weighty. What pre-doc do is extremely important. It would be impossible to do a lot of the science that economists are currently producing without pre-doc. And so I think it's a very exciting kind of job and it's worth not losing sight of that excitement.
Ethan:
Ethan, you want to go? Yeah, yeah. This is mainly geared to people who are currently applying for pre-docs and that's persevere. I know whenever I was applying, I applied to, I think someone asked a question a little bit ago about how many pre-doc to apply for. I admittedly applied to over 40 pre-docs whenever I was applying. I realized halfway through my resume was not computer readable, so maybe half of those were actually just getting thrown into a waste bin somewhere because my resume, in any case, keep pushing forward. The beauty of the pre-doc is that it tends to be a good fit of the type of research you're interested in and it gives you an opportunity to work with people who are doing what you want to do after the program has, you've gone through the program and so it is worth a little bit of perseverance now to find that right fit and find the right people that you can learn from. Zach noted that this is a chance for you to learn what you are going to be spending the rest of your life doing possibly, hopefully. And this is a take the time now and it'll pay dividends in the future.
Zerah:
Anya, do you have some thoughts on this?
Anya:
I've given a lot of feedback already. I was just typing answers to different...
Zerah:
Yeah, so I see the q and a is being answered already. Let me just take a look. If there's any questions,
Ethan:
I will say I saw one, I think Charlie asked about the resume thing. I throw that out there because it was a consideration I didn't even think of. Take your resume that you're using right now and put it in just like Google and Online Resume Reader. It will show you what a computer is seeing on the output and it turns out the way that I formatted mine, they couldn't see any of my work experience. So it was essentially just getting thrown as a string of random numbers. That's worth taking a moment to do. Also look up computer readable resume formats. Those are different than what you would hand to a person if you were reading them. One-on-one.
Zerah:
Okay, I think we've went through all of the q and a, unless I'm not seeing something. Steven, am I missing anything?
Stephen:
I don't think so. Let me vamp here on logistics for just about a minute in case that gives us one or two more questions. So we're going to have an actual break this time. The past five minute breaks were mostly jokes, but now we truly will break for 10 minutes until I'm in central time. So central time, 1:50 Eastern time, 2:50, at 2:50 Eastern time. Be sorted. You can sort yourselves into breakout rooms. We'll have one for R and one for Stata, and you can join those to see the sample data test that we sent you ahead of time being solved for you. So you'll be able to watch that for an hour after that at 3:50 Eastern, we'll have another 10 minute break at which point we'll go into our job fair at Gather. I sent I believe all of you the link to the Gather, but if you registered for the workshop this morning, you may not have gotten it.
So here it is in the chat. This is how you will join the gather at four o'clock Eastern. That job fair does not require you to have a resume ready. Nobody's going to be interviewing you. It. It's a chance for you to go around to different employers and for you to ask questions about those programs and find out more about what they might be like. So 10 minute break, 2:50, the data task walkthroughs, those will last an hour, another 10 minute break. Get yourselves over together for the last hour, four to 5:00 PM that's the career fair. And now we're up to eight questions in the q and a. So anything for the last few seconds?
Zerah:
I guess one question, just want to put this out there. Would you recommend someone who has been in industry for a year or two to go to a pre-doc, I'm assuming is the question.
Zach:
Yeah, so I've had multiple pre-docs who graduated college, went to industry, were disillusioned, and then applied to pre-doc as a path back toward a PhD. I think that was a good idea. I think it would be pretty difficult to stay in industry and then go directly into a PhD unless you have a lot of research experience and connection to faculty from your undergraduate institution. So to the degree that you actually want to sort of hop back onto the academic train, I think pre-docs are a pretty good way of doing that. And I think the programming skills that people develop, at least in some industry jobs, so especially in economic consulting, are very valuable as a pre-doc.
Zerah:
Any other thoughts on that? I think we hit time and I want to make sure everyone gets a break. So thank you everyone. I want to thank our panelists, Ethan, Aryan, Zach and Anya for taking their time to share their knowledge and experience with everyone here. Thank you to our audience who had a lot of questions and was very productive. So I think I'll hand it off to Stephen now.
How to Solve a Data Task in R
Rupsha:
Thank you! You just saved my life. Yes, thank you.
Rupsha:
Oh, okay. It'll be fine at the end. I'll go over the rest. So if you're joining the recording, I have already done one question. So if you don't dunno what's happening, skip to the end where I'll go over the first part again and retell everything that I did and thank you for re-upping that. That's really helpful. So we're doing the second question among women older than 25, which groups of people have the biggest changes? Labor force participation and this is the education, this group by education and female labor force participation. So the reason now I hinted why I use this wrapper around group and that is because you want to reduce the amount of time you're taking up to do these plots and doing copy pasting. So I'm just going to change education to race, which is one of them here and then just do through and then this will automatically give me race.
So if you're noticing all of the labor force participation numbers come up by race and by year, which is lovely, but we see these annoying little white spaces and we want to take these out as well. So what we're going to do is at the end of the chunk we're going to do a filter, oh sorry, at the end of the chunk we're going to do a filter and then take out any white spaces. Let's do like this and then let's double check if that works. Perfect. This is just for plot and then you don't have to want to have weird nondescript race like a fandom race that has some sort of labor force participation. So let's plot this and we see that this is a plot which looks great. Let's add a little bit of transparency in the lines so that we can check what is happening and then we see, great.
So looks like the highest uptick is in two or more races, which you can come back to. Then it's labor force participation for black women. And then what else were we seeing? The red line is Asian women and then Hispanic women has a general upward trend but more fluctuations. And then Native American women have quite a bit of fluctuation and then white women have a downward trend but generally stable through time. So whatever I said is sort of the general scheme of what you're going to include in your answer. Go through each race, see why propose reasons why this is true, why might some races be more, have labor force participation that's more fluctuating, et cetera. So in your answer, write everything. And one nice thing is we've set ourselves up to sort of automate this entire process. So I'm just going to make a function that plots all of this for me. So let's do a female LFP group and then let's do a function and let's do group and then open this up, include this, take out these brackets, and then do a return. Let's do a print for the plot itself.
So this one will save you a lot of time and space and if you want to seem a little cool with your answers, it's a good thing. And now that we have the function we just put in race and it'll do all this math for us and output race, pretty cool. And then say I want to do education. Oh, if I could spell education, education, this'll do the same thing for education. And what else do we have? We have age. Let's see if that works because we haven't done that yet. Age, perfect. I'll leave the summarizing to you, but that should be, yep. And then income percentile. Now I don't remember seeing a column like that, but I do remember a quantile yes. Income quantile. So it's not exactly percentile, but you can use income quantile as a group and that should be fine as well.
Great. And that's giving you the value of female labor force participation is pretty high for to 8,200 income quant. I'm not going to go over all of the nooks and crannies of what the answer's going to be, but this should be how you should approach each problem. And then another one is college, I think it was it's the education should be the same but worth looking at that group as well. Yeah, great. So that's question two. If there are no questions, I'm going to go over to question three. Use the data to examine trends among women older than 25 for each of the following factors. Wage and salary, income, social insurance, income education attainment. Great. So this would be a place where it would be hard for someone without a data dictionary to figure out which one is wage and salary income, which one's social insurance income.
It. There's three income columns and it's a bit unfortunate that we don't have a data dictionary, but part of this is trying to figure out a lot of your pre-doc are going to have very untidy data. So I'm just going to use my best guess to say that income total will your total income. So that'll include your salary, your cap, some sort of gifts, other non wage income from things. So I'm not going to use this but rather use income as the income column. Now that sounds strange to think so much about which column to use when just one just says income, but these can have significant differences among each other given what is being used to calculate each column. And then in SS what I would assume is a social security income. So let's make a plot for these. Basic plot female LFP is we're going to use the 25 plus because it's again focusing on the women older than 25. And then I'm going to take out year 2024.
This might seem like a strange thing, but 2024 isn't over yet, which as unfortunate as it is, it is true. So most of the income column likely is not going to be populated and you can check that for yourself if you do 2024. Actually I'm not going to do that because that's going to take time. But homework, double check that year 20, 24 doesn't have great, doesn't have income at all. And then let's do calculate median and okay, so when we're using the data to examine trends among women for wage and salary income, your mind might just go to calculate the average and then just plot it. But there can be different ways of aggregation. And one very important things that most economists use over average is a median. A median is often more indicative of how much people on average are making in an economy. So to stand out of the crowd, you want to make sure that you're including both means and mediums and then explain the differences you see and explain why you might see those differences. So you do a summarize and then you first do a group by, sorry, group by year and then do a summarize income mean let's do weighted mean. Same thing.
And then you do income medium and then do weighted. No. So when you do a median, you also want to do a weighted median. Everything has to be weighted, you cannot not use weighted stuff. So you want to do weighted quantile is the R function. And then do the same thing except you want to specify quantile means you can calculate very different probabilities, not just median. So you want to specify it's median and then the same thing as any RM and then yeah. Oh actually I should take this a hundred out because it's just, and then this is income bad catch. It should be income. Great. So let's see if this gives us our desired value.
Perfect. Okay, so we're seeing mean income and median income for each, for people in the sample, well in the us, not just in the sample for each of the years. So let's plot this plot income. Let's do plot, and I'm not going to do it here, but I'm going to do a GM line. There's nicer ways to do this, but I'm going to just try to brute force this because of time. I can show you the function I have and it's going to be up on the website so you can screenshot it or use it as exercise right after this. But I'm just going to show you what, let me I another geo mine. This should work. Maybe not, but...
Rupsha:
Let's see what happens. Great. Okay, let me add a color red. Let's do C.
Perfect. Okay, so ignore this Y axis. This is both incomes. I'll clean this up and just copy paste what function I created for this. But we see that the mean is in red and the median is in black and there's a significant difference in the two. Now it follows many of the similar patterns that we see. In 2008 you see a slide dip, but then in 2008 this shows a bigger dip, which is what happened. People who were hardest hit were people who weren't making a lot of money and this small dip is actually understating the fact that there was a huge effect of the great recession. And then there's weird covid time where there was a massive drop off in post 2020 and this shows the bigger drop off than here. So you want to tease out these differences between the mean and the median and sort of explain why you're seeing what you're seeing, right?
So this was wage and salary income. Now I forget who asked, but I think it was Ahmed who asked, can we show the differences in labor force participation by education? And this is a great place to show these differences because income varies by race, income varies by age, income varies by education status. So here extra points if you show, hey, we're seeing this trend, but actually this is going to be very different if we break it down by these groups. So include that as part of your answer and that's a good thing that you're analyzing things on the fly and not just doing what's asked of you just off of that. Social insurance income: so this will be the same thing except I'm just going to change this column into ink SS, which was the one we discussed and then let's see if this worked.
Great. Okay, so you see some very, very interesting things and we see that when we plot the average ink SS it's like 3000, $4,000 and then we see the median is just zero. That hasn't changed at all over the years. And this is the difference between mean and median. This exemplifies what the differences are. It looks like the median person did not receive social insurance but out. This is not even average because it's pulled by the people who got no income. But this is the average of everyone regardless of whether or not they got social insurance income of their social insurance income. So you might think, okay, this is sort of helpful, but maybe I want to do this conditionally. Say I want to just consider people given that they did get social insurance income, then what is the median or mean? So you add another filter, lemme just do it here and say Ink ss is not zero.
And let's see now how these plots change. Great. Okay, now these plots have changed quite a bit and we see that the trends are very much similar is that's what we expect from even the last one. But there's, during the great recession you see this massive uptick in taking up social insurance. And then social insurance income has been steady rising over time and since 1994 to now. So you want to explain why you're seeing what you're seeing as usual and do these weird, not weird, but do these different options to just show that you're actually thinking about it and not just we call regression monkeying. Regression monkeying the answers and just doing it like a robot, right? Education attainment. So what was this question? Trends of education attainment. Got it. So let's consider education. So this will be a grouped situation. So let's do female 25 plus and then let's do group by year education.
And then... so now we want to see the trends in education attainment. So how many people are getting, what percentage of people are getting a bachelor's degree, a master's degree, et cetera. Now very crucially, we still want to consider those weights that do not forget about those weights. They will render your answer useless if you don't use them in survey data. So summarize, we want to use the weights. So let's do a weighted count of how many people are getting...how much of these responses are weighed or how representative of the overall population they are. So let's do a sum of the weights and then let's mutate by calculating the percentage of how many people are getting which degree by doing a weighted count and over the sum of the weighted counts. So over the whole population. So this would be how many people are getting X degree representative of the whole population.
So it should be indicative of how many absolute people are getting a X degree in the United States and then a mutate that calculate the percentage into a hundred. And then very specifically, this should be by year. So let's do a by year and then let's see if this worked. You great. We're seeing 19%, 7%, let's plot this. And I think it'll be more insightful than just seeing the numbers. As I said, tables not really helpful. A plot is the best thing to do. So GG plot aesthetic X to year Y is equal to percentage percentage. And then let's do color grouping variable as education. And then let's do line geo point just to add a couple points and then let's do the theme again, you can clean this up in your own time. I'm just trying to expedite this as much as I can. Okay, great.
So we see that the highest is obviously for high school diploma and I would glean from the data that this is talking about terminal degrees for education and we see that this is steadily going down over time, which is by my priors. That would be right. And then we see the bachelor degrees are going increasing over time and then master's degrees and so on and so forth. So this makes sense and this is exactly what they're looking for. And then you can break this by race, by age, by whatever. So it's important to do some extra bits, not just what's asked for. Great. Ahmed, do you want to ask a question? I'm going to keep coding while you ask you a question so we can get ahead.
Ahmed:
Yes please. I'm sorry, just a very quick thing. It was about income, I think that was in subsection B or A, does it make sense to transform incomes using logs? I'm not sure about the range of the income, but if we use the natural logs, will it make any difference or since you use at some point the weighted mean the trend will won't look any much difference. I'm not sure. Thank you.
Rupsha:
Yeah, that's a good question. So part of economics is also to make choices depending on how readable it is. So if you have something like $10,000, a log of $10,000 approximately is like three, right? 10,000, no four, four. Don't quote me on that off the cuff. So a log of that is four, but what does it mean to plot regardless of what the log is, your wire access is going to be from one to 10. If it's someone who's not an economist, who's not a mathematician, doesn't understand log, they're looking at your plot and they're like what does it mean for income to be four versus if they look at a plot that says dollar amounts, I can read that or my grandma can read that and be like, oh this is $10,000. That makes sense that in 1994 it was $10,000 but versus four log points, that is pretty hard to figure out. So this is a really good question and it is trying to get at, you want to make sure that what you're doing is comprehensible to the person who is reading your data test. So you don't want to do something fancy just for the sake of doing it. Logs are helpful when you're doing regressions, when you need to scale down something and it's all over the place. But for this not helpful at all. So good question, I would say I would not do that.
Great. Question four is which year had the steepest increase in female labor force participation relative to the previous year and what factors are driving this pattern? So you want to do a percentage change. So calculate, I'm going to use that female labor force rate that we calculated in the bit, it's not in the recording yet, but will be in the end when I go back over it. So in this we have a year column and LFPR column and we're going to take this LFPR and do a lag on it. So in 1995 you're going to get this 1994 value just for math's sake. And then we're going to do a percentage change of this. So let's do another mutate new change per minus of PR lag over LPR lag. So know your usual X new minus x old over X old. So LPR change, this is giving you the change percent.
And then let's see, you can do a little bit of trickery here and then just see, you see 1997 is the highest change in positive direction, but again, it's economics, it's always better to plot. So we're going to do a very basic plot and do, I'm actually going to cheat a little bit and just copy from what I have because this should be pretty straightforward. Now I'm doing a same geom line geom plot and then plotting this column, the change percentage column. And we see, yes, 1997 does have the highest change in female labor force participation. And then you have to explain why you think so. And what factors, there's a lot of reasons. There were a lot of welfare programs enacted at the time. A lot of EITC changes. So you want to craft an answer, look at econ papers and stuff and try to figure out why this was the case. Yes?
Ahmed:
Yes. So for the last question, do you need to sort the year first?
Rupsha:
Do you need to sort year first in this one? In this one
Ahmed:
Because it would be like chronologically
Rupsha:
Ah, yeah!
Ahmed:
Yeah.
Rupsha:
Great. That's brilliant question. Yes, so you're exactly right. It was convenient for me that it was according to year, but you're absolutely correct. Always you got to arrange first. So that was a great call arrange year first and then arrange this for you and ascend all order and then you do the calculation. Great, great, great call. Perfect. Okay, so the next question is how has labor force participation changed for college educated and non-college educated women? If you remember, we made our lives super easy by making a function for question two that just took in a group and then outputted a plot. So we're just going to run this and for the recording, that was the first question I worked on and we're just going to run this and this is our answer for question five. So if you're making your life super easy, make these functions, so it's going to really expedite your process and it's going to take you less time to write out these answers and going to look much better. So look at question two.
So that's our question five answer. Obviously you have to explain what you're saying. Yeah, and then you can always split it up by race, et cetera, et cetera. So that's always given so you can do additional. This is just the basic of what they want. Okay, question six, create an alternative measure of labor force participation that excludes individuals from the labor force if they're self-employed in their main job. That is LFP is equal zero. If self-employed in main job, this is very, very nicely given. Using the new measure. Describe how labor force participation for college educated and not college educated women has evolved since 1994. Please provide graphs and or tables to support your answer. Okay, great. So let's take our original data frame.
Female data, I think it was called. Yeah. And then we already, if you see we'd already made this dummy variable and now they're just asking us to change this dummy variable to include that if they're self-employed, they're not in the legal force. So let's do just that mutate and then in LFP is equal to if else if. Oh, now let's figure out, I'm not just going to copy what's in the question. We're going to have to figure out which one even tells us which one self-employed. So if we see we have one column that is self-employed. So let's see what's in the column. Okay, not, and then this phantom operator. So if self-employed is equal to this, then it's zero otherwise. Now this is a dummy already, so you don't want to say if it's zero then one you want to say if it's zero, it's zero. Otherwise it's just the dummy itself. We're just making another addition change to the already existing dummy. So we don't want to erase any data that's already in the dummy. If we did one, this would just be like if they're self-employed, it's zero, otherwise there are one. So we don't want to put a one in there. We want to put in LFP again.
Great. So now we do what we've been doing where master's at now is calculating LFP. So let's do it by college because it says college educated. And then let's do summarize. Let's do LFP. Oh, I'm just going to copy because it's the same.
Rupsha:
There we go.
Right. Wait me in LFP, actually I'm going to make this a new dummy just so that it doesn't cause any issues. And what's nice, let's do LFPR new. And what's nice is we can also calculate the weighted mean of the old one because it's helpful to have the comparison. And let's just do this for the new one. Perfect. Okay, great. So we are seeing this looks great. So we now need to plot this. Let's do plot college then female self-employed, then Gigi thought, then let's do here new and then color college, then that's copy. Oh,what happened? Color is equal college, maybe this color is messing it up. Yep. Perfect. Okay, so we see what do we see? So when you use LFP zero for self-employed, this essentially means you're highlighting only the people who are in wages or salary jobs. So taking out, you're only including traditional working environments if you're working for a corporation, if you're working for anything with a boss and you're getting paid. So this is good and you can include an explanation for this in this question. But the seventh question is, what is important? How does our labor market analysis change when we use the new measure? So for this, you want to include, you want to show both lines, it's not really helpful if you don't. So I will leave that up to you, but I'm going to include what the plot will look like. I'm just going to paste, paste this and essentially I'm taking this data frame that I took that I made to calculate the LFP and stuff and okay, Steven just left a comment in the chat that says, looking forward to seeing you.
Okay, so if you have to leave, you can go ahead, but I'm just going to wrap this up. The female LFP plot with the college and these alternate measures looks like this. So you see this difference, this is the original measure which included self-employed in the labor force and this is the measure without the self-employed people in the labor force. And this is split by college and no degree. So for question seven, this does not tell you that you have to make a plot, right? You don't really, it's not written there. But how are you going to compare these if you're not making a plot? The most... if the thing you think about the most is comparing and contrasting these two different things that you just calculated. And the best way to look at the differences is through a plot. So it's the most straightforward way to answer this question and even if you don't include an answer, it's going to give you a lot of points to just include this plot because you are at least thinking about what the differences are.
So, so that concludes, and then you should just include your answer of what the differences are and what you're seeing. So that concludes part one. We are short on time and it's like 2:50 already and part two is telework. Now if you had done this data task before, you might've noticed that there were a couple issues with the questions in this part, mostly because there was some data issues and it was sort of hard to answer. Now what do you do in that situation? Say you got a data set and you have a couple questions and it looks like you should be able to answer these questions straight off, but you're starting to do the data task and you're like, oh, I actually cannot answer these questions. What you do is you try your best to answer them as the question if you were here before.
The question that I got from William is what if? Can you just say, I don't know for an answer, for a data task? Don't say, I don't know, say this is why I think this was not answerable because of these reasons and this is my best attempt at answering 'em. And I think that'll be a great use of your, it'll look good in the data task even though you can't answer, you at least try it because research is just that process. You don't know the answers and you're just trying, even though there are data issues, it's fine, we'll figure it out. So make some assumptions, see what you can do with the data and just try it. It's okay. If it's not possible, don't leave it blank. That's my pro tip. And okay, it's 351.
So this is the end of the data task. I'm sorry we couldn't get to telework, but you essentially should be able to answer this. And for questions one and three, try your best. One pro tip I will give you for these questions is...I'm just going to copy paste my working for this even though it's not ideal...is this. So I'm just going to walk you through what the idea is. So since you're trying to figure out each individual person whether or not they're teleworking, there's a column there in the data frame that has the IDs for each person in the data, right? C-S-C-P-S-I-D-P. So you want to use this column to figure out whether each person is teleworking or not. And now there are four columns that's like covid, telework, telework, now telework before telework difference. So you want to use these columns to figure out what is happening in these columns.
So I just looked at the unique combination of these columns and just looked at what's in these, what are in these columns. And you see that says, did not telework in the past week had telework in the past week, tellywork prior to Feb 2020, blah blah, blah. And then this column says "did not telework between 21-22 and did telework due to covid." So if you're making a column or a dummy variable that says, did you telework or not? Clearly the answer isn't that easy. It can't just be using one column and seeing whether or not they telework. You need to include the other columns and use a combination of these columns to figure out whether or not they telework. So maybe this is clearly because if they had telework in the past week, this column isn't populated for that. So if you didn't include these people, that's a grave misstep.
So you want to say the telework, if telework from, if this column includes telework from blah, blah blah, as well as if telework now said had telework in the past week, then telework is equal to one, otherwise it's zero, right? So this is essentially what I did here and then I picked those people who did telework according to my understanding of what telework was, and then I picked them by their CPS ID. So you want to make sure you do the ID and then get the unique people who did telework and then I joined it back to the data. So you don't need to worry about this too much, too much, but you just want to try to answer this the best you can, even though there isn't ans answer because it's a little unanswerable because of the data. Great. So I am going to go back and redo the first question and a bit of the second question. I'll take one last question, William.
William:
Oh yeah, I was just wondering. So I was expecting you'd pull out some advanced econometric methods. So for these data tasks you just do basic plotting, no econometrics,
Speaker 1:
Most of them will not have econometrics, there might be some regressions, but a lot of this data task is diagnostic. Essentially a lot of your work as a pre-doc is just going to be diagnostics. What is in the data? What is your understanding of the data? Half of the time, I mean maybe more than half of the time you're going to be doing this kind of work, you're going to try to figure out what the data is telling you. And a weighted mean is significantly econometric heavy. Just because there's a function that says weighted mean doesn't mean it's an easy process to figure out what a mean is. And maybe in your work, this might be the plot, but you might have to write a three page paper on what weighted mean essentially is and whether encapsulating the US population well enough. So it's like it might seem easy, but this is very important work. You might have to do a lot of regressions, you will have to do a lot of regressions, but for this it's mostly going to be diagnostics, maybe some regressions, but prepare for mostly diagnostics and good plots, make good plots and that should be the end of it. Yeah,
William:
Okay. No worries. Thank you so much.
Rupsha:
Yeah. Okay, so for those who've hung back, I'm just going to do the first question, walk through and then call it a day. So question one, we're going to calculate the female label force participation. Before that I've loaded set my working directory and loaded a couple of packages and then I've read in the data that was part of the data task packet that was assigned or given to you. And then because this focuses on female labor force participation, we're going to filter this raw data CPS data to two sex equals a female. And you can check, look at the data first and double check what column is there. It could be gender, it could be sex, figure out what the column names are everything, check your data, read your tables. That's very, very important. And then I'm going to filter that to just female and then make a dummy variable since everything is to do with labor force participation and it's going to be helpful, make a dummy variable that's equals to zero one if LFP is in labor force or not.
So what is L-F-P-L-F-P-A data dictionary wasn't given to us. So LFP is just, I would assume labor force participation, but you can double check what this column includes. And we see that in the console. It's either not in labor force or in labor force or just nothing. So I make this dummy variable and it has one or zero. And then we take that variable and then we group by to calculate how many people what the labor force participation rate is. We do a group by year because it's over time, evolved over time. And then we summarize to get the LFPR. Now this census is a census data. One important column is the weight. A lot of people would likely ignore it, but if you want to be correct on your answers, if you don't include weight, likely all of your answers are going to be wrong.
That's just given because you will be calculating the LFPR for just the people in this sample, which is not what we want. We want for the United States, the census data indicative of the entire of United States. They don't survey everyone in the United States, they just survey some to figure out these rates and they have these weights that make them representative of the whole of the country. So we do a weighted mean and then we include that dummy variable, attach the weights, and then do an NARM to remove the NAS and then multiply by a hundred because it's a percentage and then we just plot it. So let's just run this. And then we see female labor force participation and we see that it rose up to 2000, fell a little bit, ticked up a little bit more, and then the great recession came down till 2013 and then sort of stayed stable and then fell again after covid right after during covid and then rose again. So you want to include in your answer whatever I just said that's trying to figure out why what is happening in the graph is happening. So the second question is, among women older than 25, which groups of people had the biggest changes in labor force participation since 1994? So we take age not equal to 25. How did I figure that out? Just include this, do age, and then do a unique, and then we see one that says less than 25. It's all in groups, but just to choose the one that's less than 25.
And then we just get the 25 plus people. And then I made a function, maybe I included this while I was, I started recording, but take the female LFP group by year and then the group that you're interested in. So let's see a race. And then this wrapper essentially takes this character vector that I include in the function, input in the function, and then converts that to make it into a column. So it just does a group by and then a summarize. So it's the same thing except it just calculates it by groups and then it removes some unnecessary things and then it just plots it. So you them 25 plus and then this function, and then when you just input a group, it'll just plot by race, by education, by age, by income, quantile, et cetera.
So yeah, I think I should be done, but I should have included all of the other questions in the recording. But yeah, that's about it for the data task walkthrough. Thank you for the people who hung back. That's very nice. And I think Stephen's hosting a career fair and that's very... you'll meet a lot of people from different institutions like Yale, et cetera. I think I just know the Yale people are coming and I think that's a very, very good place to sort of get situated and talk to people who are doing, who are hiring at pre-doc jobs. So feel free to head there. I'll hang back for two minutes, but if not, I'm just going to start recording.
How to Solve a Data Task in STATA
Eva:
Hi everyone, my name is Eva. I'm an RA at the Federal Reserve, and I'm going to be doing a demonstration of the 2024 practice data task in STATA. So I have STATA open. Here's my console. We can also see my data editor as well as my due file. So I'll just start off by just briefly walking through what I have here in my setup. So I like to always have a comment block at the top that just talks about the context of the code or the task, what was it for? What was the main output, the main goal of it, what's on the data file. For example here I say that this code includes my analysis and visualizations. I also talk a little bit about the structure of the code and talk about anything that I read, that I read in any DTAs or CSVs as well as anything that's output.
So for here I say that anything that gets output would be just the visualizations, like the grasser tables needed to answer each of the questions. And then I also talk about the technical information. So I like to think of this part as anything that if I came back to this code six months later, any information that would help me remember why I made certain decisions. So this includes any assumptions or any kind of technical information as well as any links. I put this link here. So if I were to come back to this later, or if someone else were to come back to my code, they would probably know to go to this link and that would give some kind of explanation for why I apply inverse probability weights.
And then I will go over these directories. So STATA has two different types of macros. There's globals and locals. I mostly work with globals just because it's easier. Usually I find that with locals it's difficult sometimes to run one section of the code. Usually I would have to run it from the top or run it from a place, a good stopping place. So I just find it easier to work with globals. So I'm going to create a global, this name is pretty arbitrary, it's straightforward, it's just called desktop. So this is kind of just my desktop setup for my Mac. It would look different for a pc, but usually for a Mac it looks something like users and then your name and then whether you're on desktop downloads or documents. Then in the comments I put here just what is this? What exactly is this file path?
So I'm saying this is the desktop directory on my Mac. And then I tell for people if someone else is working with my code on a different computer, or even I think even if you have the same file system, oftentimes you do need to change this. So then I'll say change to your home directory. So this comment, it's mostly for the person who is going to be running my code if I'm working with them. And then next I have the global main path here. So I call it main path just because it's the path that I'm working with most commonly. So you can see how instead of having it like this, I have this desktop global. So without it it would look something like this.
So you can see that this is a much longer file path than this. So the point of me creating this desktop global or part of the main reason is, so that is just, I think it's just cleaner, easier to edit. So that's a little bit about how these globals kind of stack on top of each other. But for the main path that I use usually for specific projects or even a data task, I like having a folder. Usually I'll have a sub folder called data doc and program. So I'm just saying here that this is the directory where my files for this data task is stored. On my desktop I have a folder called Pre-doc and then a sub folder called 2024. And then within that another sub folder called data task. And then I'll have these three sub directories. So again, in my comments, I'm letting the person who's going to be running my code know how to change it.
So I'm saying for example, they might have on their desktop a different spelling, maybe it could be called pre-doc one, or maybe it's something like that. I'm just letting them know. So basically change this file path to wherever their files for this data test are stored. And then again, this is just another brief explanation, but I think the most important things to know with the directories are just, I just think it's a little bit neater to have to avoid writing out the entire directory like this each time. And it's also if someone wants to run my code with my computer, if I'm going to have a bunch of file paths instead of needing to change the file path each time, they could just change the globals and hopefully you'll see what I mean as we go through the questions. But the last one is this is I know that I'm going to be exporting a bunch of graphs, so I want a subdirectory for the graphs. And again, to avoid having to write out the directory each time, I just have a global cult graphs. And when we get to question one, we'll see how this works.
So the next thing is any packages that need to be installed, usually these are the standard packages that I use almost in every due file that I write. So I know off the bat just that I'm going to need them. So I wrote these ahead of time, but also sometimes I don't know that I need a package until I'm halfway through the code or I'm already into the code. So when that happens, usually I'll install it just where it is. But then as I go back and kind of clean up the code, I'll make sure to add it to this top, top, this top part here.
And the last thing is some global for graph settings, I'll actually come back to this, but for now I know I want to set graphics on so that I can demonstrate and then I'll come back to this part when we get to the graphs. Okay, so here I have a section for part one, question one. So I'm going to start off by just reading in the data. So we see here this main path global that came from here. So what I'm basically saying is to full from this main path, this directory again, in my desktop I have a folder called Pre-DOC 2024 data test. So from this directory, go into the sub folder called data, and that's where my dataset is located. So I'm basically reading in the data, and again, as you can see, it's easier and cleaner to have this global then to write out this entire directory and that way. Again, going back to this comment here, if someone else were to run my code and let's say I have this command several times, or I referenced this directory several times, they don't need to change the code each time as long as they change main path here and then they run this and then they run the code, this line will automatically adjust.
Okay, so let's start question one. So the first question asks, how has female labor force participation or female LFP evolved since 1994? Please provide graphs or tables to support your answer. So the first thing I like to do is just kind of make a brief mental plan to myself just thinking about how I'm going to answer this question, what graphs do I need, what kind of analysis, what kind of variables to work with? So I'm thinking for this question, I'm going to want a graph that has something to do with the year, something to do with the year variable and the LFP variable. So that's why I'm just going to order and sort it. So if I want to go into the data editor to take a look at it, it would just be easier. And I'm also going to run this tabulation here to get a sense of the structure of the LFP variable.
So if I run this command, we see that if LFP is not missing, it's either zero or one, which means that it's a binary variable. It's also called the dummy variable. And what that basically means is that the variable only has two values, and usually it's something like zero means no one means yes, or it could be the other way around or it could mean something like zero is no, it doesn't apply to them. One is yes, it does apply to them, meaning the observation. Another way it's often used is for something like sex, it could be like zero is female, one is male, which is actually what we do have here. I was just verifying that LFP is a binary variable. So let's talk a little bit about why that's important. So for this question, I'm thinking I want a graph where the Y axis has the labor force participation rate.
So essentially in order to get that, I need to know what percent of women were in the labor force. So for what percent of women does one apply to, or I don't know if it's grammatically correct or technically correct to say that what percent of women answered one. So I'm just thinking about to get the LFP rate for women, I need to know what percent were in the labor force. So what percent of LFP is one for each year? So the way that it works in STATA when you have a binary variable is to understand the percent of people that answered one or the percent of observations that have one, you take the mean of the dummy variable. So we want to take the mean of the LFP. So one way you could do this is to just use this tabs, that command. It's going to take a while to run, so we can talk about the weights in the meantime. So in my comment block at the top, I had put a link about that takes you to some of the kind of documentation about the CPS weights. So the CPS, oops, actually, let's just go back to it.
Yeah, so here I put this link. So CPS uses sampling weights. So from that I know that I would need to either use analytic weights or inverse probability weights in Stata. If you're not familiar with weights, I would definitely recommend reading this data documentation on it. There's also some documentation I think by Reed College that's really helpful, but basically if the dataset has sample weights, we need to use either inverse probability weights or analytic weights. So with the syntax of the weighting in data, it's always going to be the same. Whenever you have a command that allows you to use weights, it's going to be brackets, and then you would specify what type of weights. So a weights means analytic weights. That's the type of weight I'm using equals the weight variable, which is this variable here.
So for tabulations where I'm just getting the mean kind of descriptive statistics or just like a descriptive summary, a weight and p weight will give you the same thing. So weight means inverse probability weights. So I could also do P weights and I would get the exact same thing. But the main difference is if I were to do regressions, P weights would give you more accurate standard errors. But for something, I'm not doing a regression, I'm just kind of looking at means and meetings right now, I could use either one and I'm just used to using eight weights. So that's the only reason why I'm doing it here instead of p weights. Okay, so let's go back to this tab set. So basically what this is telling me is the labor force participation rate for women in each year. So this point 59 is telling me in 1994, 59% of women were in the labor force. So this is helpful to know from this we could get a sense of the trend, but it is definitely easier to visualize in a graph. And even though we have this, this is not going to allow me to make a graph. So basically what I want to do is make my dataset look like this. So I can do that with the collapse command.
So before I run it, let's make sure we know what the dataset looks like. We can see for specific observation, it will tell us what year they were in the CPS and whether they were in the labor force. What I want the dataset to look like in order for it to look like this, I need to get it down to the year level rather than the individual level. So let's run the collapse command. Again, I'm adding weights and I'm specifying the mean because I want the mean of LFP.
Eva:
Actually to make it easier, let me just drop mail.
Eva:
Okay, so we'll see that basically our dataset now looks exactly like what we saw when we ran the taps that command. So now this is just telling me the mean of the LFP variable with the correct weights for each year. So this is telling me that in 1994, 59% of women were in the labor force. So from here, I could easily produce a graph. One thing to note is when I'm graphing, I like when my Y axis has a percentage, I actually want to manipulate the data even further to change this into a percentage. Let me just run this command here.
What I'm basically doing is I'm going to multiply LFP by a hundred to express it as a percentage, and then I'm going to round it to two decimal places. So 0.01 means to the hundredths place, which is two decimal places. So we'll see that it's now expressed as a percentage. For some reason in the data editor, the rounding is never, it's never actually rounded, but if I were to run a tabulation or produce a graph, the rounding would show. So if I just did something like tab of P three equals 94, we'll see that it is rounded the way that I want. It just for some reason doesn't look like that in the dataset, but okay, so now that we have this, we can go ahead and make the graph. Okay, so this is the graph that I get. One thing to, I think the most important thing to note is that because the question is specifically asking us about changes over time, the question asked how has LFP evolved or changed over time?
That tells me that we should do a line graph and in STATA, if I wanted to have two lines per graph or more than one line per graph, which we'll get to in question two, the STATA graph command is a little bit different. But for something where I just want one line, I'm just going to use the two-way line code. And don't worry about if you're not familiar with the syntax, don't worry too much about it. You don't have to have it memorized. I definitely don't whenever I make graphs either I have to look up the documentation each time for the syntax like this, or I will go into an old due file where I have a similar graph and then copy and paste it to modify the code. So I think the main thing is really just knowing for this type of analysis that you want a line graph and then knowing that you want specifically a graph, a line graph with only one line.
So then going from there, knowing that it would have to be this two-way line and then going into the documentation to try to figure out all the details. So yeah, I think as long as you're able to do that, you should be fine. But let's just go over a little bit about what these globals are. So let's go back up to what I had here. We skipped it over in setup, but I have this global called bestel one, and it's just saying it pink times 0.4. If I were to do this, there's nothing wrong with this. It's not even much shorter than having it like this, but what this is basically saying is that I want my line to be pink and I want the intensity of 0.4, but the reason why I have it in a global, again, is for the same reason why I had globals for directories.
If I wanted several graphs where the line would be pink, but let's say I'm doing it for a conference and then I find out that for the conference slides you can only use black or for a journal submission, it can only be blue or something. Instead of going to every single graph through my code and changing it again, I would just need to change the global to something like this blue and it would automatically and then rerun the code and it would automatically change all of the graphs. So again, it's the same logic as having global spur directories, but I just wanted to point out that for globals, it doesn't have to only be for file pads, it can also be for graph details. Same thing with title size. I have a global here and I'm really just replacing it that way. It is just if I want to change my title sizes, then I can again just change the global. The other thing to note is for something, it's usually with graph commands. When I'm specifying several things, I just find it really hard to read when there's so many lines in this command box. So I'm just going to split it up with these green slash marks.
Eva:
Actually, I think we can do X title by title,
Eva:
So this just makes it a little bit easier to follow. I can see how we have one line for the title, one line for the axes titles, and then one line for the intervals and the labels, the X labels. So I think it's just a little bit neater and it's easier to read and it will still run with the green slashes. The closest thing I can think of is if you've ever used SaaS, it's kind of like the scent, like colons where it's basically it's telling STATA that even though this part is on a different line now, it's still within the same command block, so it's just a little bit easier to read.
I think that's it for the code, for the graph. We can talk a little bit about graphing practices. So one thing to note, I think for the axes, for the min and max, I try to make it reasonable and as close as I can to the lowest and highest value on the graph. For here we see the lowest point is about 50 is a little bit below 57.5%, and the highest point is that's like 61 point 25%. So I'm thinking also I want consistent intervals. I want 0.5, so I don't want to go up here and stop it at 61 point 25. So that's why that's how I came to choose 57 and 61.5. I think sometimes it's just trial and error. Sometimes I'll choose an interval and it ends up being way too low or way too high, so I kind of adjust it after I see the graph.
But I think the main thing is just having consistent intervals and then having them be reasonable, just getting them as close as you can to the minimum and maximum value, but while still being able to have consistent intervals. The other thing to note is because I'm doing intervals of 0.5, I don't want to have something like 57 with no decimals, just 57 and then have 57.5 and then 58. I like having the same decimal places on all of the values, so that's why I went and added it here. And then the last thing is the axis titles. I think for the Y axis or even the x axis kind of depends on the graph you're making, but try to specify the unit. So for the X axis it's a little bit more simple, it's just the unit is a year, whereas the Y axis, I'm specifying that these are percents, not like dollars or number of people or whatever. But no, it's percent. And then for the title of the graph, I like to just try to make it clean and simple, kind of just as short as possible, but still clear enough so that you can easily tell what the graph is. For this one, it's pretty straightforward. It's just female LFP 1994 to 2024. But just keeping that in mind, just making sure that the titles are clear, but also not too long.
Okay, I think that's it for the graph. I think the last thing is let's export it. So we'll go back to how at the top, I had this global for graphs. So I'm basically telling STATA, export this graph here with this name into this folder that I specified. I lied. That was the sec to last thing. The last thing is actually the preserve and restore command. So right now we see that our dataset looks like this. Somehow the mail came back. Sorry. Okay. Yeah. So this is what our dataset looks like for question one. So this dataset would not work for, it's not even a dataset, but the data in this kind of form would not work for question two. So I would need to read in the initial CPS woman whatever I would need to read in the initial DTA file again, but I don't want to do that for all 10 questions. So that's why I have this preserve and collapse command. So essentially what it's doing is actually let's just run it to see what it does. Let's read in the initial data again. So yeah, we see here this is what the initial DTA file looks like. So when I do the preserve and collapse, let's run it.
Eva:
I think I need to re-share.
Eva:
Okay, yeah. So we see with the preserve and collapse, the graph still came, but when we go to our dataset, it's still the initial DTA file. So basically because what the command is saying is it's saying preserve what we have here, preserve this initial dataset, do the collapse and the rounding in the background, do the graph and then restore it back to this initial dataset. So that way our data is back in shape for question two, and I can just add preserve and restore each time. I'm not going to need to read in this dataset like 20 times. So let's go over some ways that we can interpret this graph. So there's not really a right or wrong answer. I think the only wrong answer would be if you said something like labor force participation stayed at one point for all 30 years, which clearly it's not true.
We see all of these ups and downs, but other than that, I think you could point out patterns that you find interesting. So one thing you could do is you could probably start off by looking at the whole thing and saying in 1994 and 2024, it wasn't that different. In other words, our endpoint, our end year was pretty similar to the initial year, but in between quite a few things happened. It definitely wasn't consistent. So we can point out kind of a lowest point in 2021 and we can try to contextualize that. We see in 2021, female LFP was at 57.5 and just if someone were to tell me that, I would be like, so is that high? Is that low? I don't know. But one thing that we could do to kind of add more context to it is compare it to another time period. So in 2021 it makes sense because of Covid that it was the lowest across these 30 years, but we can also compare it to the great recession in 2007 and we can say something like 57.5 is pretty low because it's even lower than 1994 and it's much lower than what it was during the great recession.
Another thing that we could do is talk about inflection points. So you can say something like LFP had increased from 1994 up until year 2000, and then started a decrease again. And then talk about why that might've happened, what happened around the year 2000 that led to this decrease. Again, we could also talk about rates of changes. You could say in 1996, LFP began to rise much more sharply until this point 1998 or 1997, and then talk about what went on in those years that might've led to this drastic increase. Same thing here from 2020 to 2021, there's a pretty sharp decrease and then another sharp increase afterwards. So those are just some things that you can point out, but I think the most important thing is to show that you can think about different patterns and contextualize it with economic context or somehow tie it back to econ theory.
I know that was a lot of information, so let's over some of the main points for question one. So we talked about how in STATA, if you have a binary variable and you want to know the percentage of observations with a certain value, then you would need to take the mean of the binary variable. So you can use the state of command that gives you the mean. So for example, taps that or collapse mean for this question in particular, we needed to know the observations, the percentage of women that were in the labor force. So it's the percentage of observations who are women who have one for the LFP variable. So that gives us the labor force participation rate, and we did that by collapsing mean. We also talked about how if you're doing something for just for descriptive statistics, a weights and P weights would be you would get the same thing, but if you were to do a regression or something more complex, then you would need P weights for accurate standard errors or more accurate standard errors.
But in this case, because it's sample weights and we're just doing means and medians, we can just use eight weights. We also saw how we collapse the data from an individual level down to the year level. So by collapsing it, we saw that for each year what was the LFP for women, which again is the mean of the LFP binary variable for each year for women. We also saw how the preserve and restore command allows us to avoid needing to read in the data each time. So that's quick for the graph details. We also talked a little bit about the graph practices. So stuff like having consistent intervals, being mindful about the min and max that you set for the access labels, and also having clear titles, specifying the units. And then we also talked about practices for the code for the grass. So we talked about how splitting when it spans across multiple lines, splitting that chunk of code just makes it a little bit easier to read.
We also talked about how you could use globals for the graph graph options. What I mean by that is just anything that comes after the comma in the graph. So yeah, just pointing out that the globals don't have to be only for directories. You can also do it for graph colors or title sizes. And then we also talked about how, because this question is asking us about changes over time and time trends, we want to use a line graph, and in this case we want just one line on the graph. So that's an important distinction.
And then we talked about just some ways that you can interpret the graph. Again, there's no right answer, but you should definitely explain you think why these patterns happen. So kind of thinking about the economic or the historic context during those years and some things that you could discuss are any peaks or lowest points or inflection points. Points. Also any rates of changes, any patterns that you think are interesting. So yeah, thanks for bearing with me through question one. I know that was a lot of information. A lot of these main points also apply to the other questions, so I'm not going to be as detailed and it will definitely go slower. Or sorry, the other questions will definitely go quicker.
Okay, so question two asks, among women older than 25, which groups of people had the biggest changes in LFP since 1994? So when I see groups, I'm thinking kind of demographic groups. So let's just take a look at the variables we have here. We see race, education, college age, income, whether they're employed. So all of these are potential demographic groups. But I think for this question, the key groups would be race, education, and probably age, maybe income, but definitely I think at least race, education, age. So that's how I started off this question. So let's just do a few of these graphs. I think for race, I'm not going to run these lines because it takes a while, but I essentially was kind of just getting a sense of the sample size in each year for different racial groups, and I decided to actually remove the group two or more. And also Native American due to the small sample sizes. I want one line for each of those four races on the same line graph. So as I mentioned in question one, the graph command is a little bit different when you have more than one line per graph. So here I'm just using the graph two way, I had already collapsed it and I also restricted it to age greater than or equals to two because that means age over 25. So let's graph this and see what it gives us.
Okay, so this is what the graph looks like. So just from looking at this graph alone, I'm not too sure how I want to go about answering the question yet. So I'm going to move on. So let's do this for H group.
Okay. So I think here, this is kind of on its own pretty interesting. We see that the trend is, it's more similar. I think for ages 25 through 54, we can see they're all close together and generally we see this increase then decrease, and then it's kind of unsteady. Whereas for the age group 55 through 64, it's not super steady, but it's almost like overall there is an increase at least until 2010, and then it kind of tapers off and increases more slowly. But there's also in 1994, it starts at 48% and then it goes up to 60%. Whereas for all the other years, the difference between 2024 and 1994 is not as large. So I think this is something I think I would say probably just based on this alone, the age group 55 through 64 has the greatest change, but I would actually want to verify this by looking at percentage changes between 2024 and 1994.
But for now, let's just move on to another demographic group, which would be education. And again, it's the same kind of thing with the line graph. Just having these graphs where I have multiple lines on the same axis, that just makes it easier to compare. I think if I were to produce, for example, for something like age group, if I had four different line graphs, you might still be able to see the trend, but it would not be as easy to compare where it's clear distinction, we have this gap here on the same access. So keeping that in mind, let's go on to doing the education class.
Eva:
So I think this is also pretty interesting. For any group that has above a high school diploma, we kind of see this downward trend. But for groups with the high school diploma, it's almost stagnant. There's a slight increase. I think I'd have to look at the percentage changes. So from this graph alone, I think the key takeaway would be women over 25 who have at least a high school diploma, had greater changes in labor force participation. These are all three of the graphs that I've created so far for this question, and I'm going to look at them and try to get a sense of how I might interpret them. The question asks which groups of people had the biggest changes? And that's actually pretty ambiguous because it doesn't specify what it means by biggest change or change, and it also doesn't specify any time periods.
So biggest change could mean one group had the most fluctuations between 1994 and 2024, or it could also mean that this point in 94 and this point in 2024 have the most drastic change. So I'm going to start from there just using, I'm just going to define biggest change as the change between 2024 and 1994 was the biggest because the starting points for 1994 are different for each of the racial groups as well as the different age groups and education groups, I need percentage changes. So I'm going to start by just looking at some percentage changes. So let's actually just run this code. I think if let's just collapse it for education, and then I'll do the rounding.
Okay. So basically for each year, for each of the six education groups, I have the female labor force participation for that year. So to get the percentage change for, let's just say for the group less and high school diploma between 2024 and 1994, I would do this value. So the LFP in 2024, the new minus the initial, which is the LFP in 1994 divided by the initial. And then I don't need to multiply it by, actually, I think I do need to multiply it by a hundred still. But yeah, so the percentage change formula is new, minus initial, divided by initial. So let's just go here and start from there. Oops. So it is in a loop, but let's just ignore for now. So let's run just this part.
Eva:
I do need to sort it first. Okay, and then let's just run this part.
Eva:
Okay, so here I'm saying for each year in 1994 and 2024, create this new variable called LFP year. So we have this here, LFP 1994, LFP 2024, and then set it equal to this variable if year equals this year. So it is in a loop, but let's just look at 1994 for now. So let's ignore 2024. So for each education group, create this variable called LFP 1994. We have this here and then setting it equal to the LFP setting equal to this variable if year equals 1994. So if year equals 1994, this variable is going to equal this value, and then it does the same for 2024. So that's kind of how it works. So the reason why it's missing here is because I had this condition if year equals 1994, and then here for, so we have this for all five of all six of the education groups versus for here for 2024, you'll see that only when the year equals 2024, we have the value. So the next line is going to fix that.
Eva:
So let's run this next line. Okay,
Eva:
So what I did is I wanted this, wanted this variable to be the same for each year for one education group. So for the group that has less than a high school diploma for each year from 1994 to 2024, it's the same value and then the same for 2024. So this e replace function, I basically, I took the main of what it was before because before I ran that line, we just had this 28.5, and then it was all missing for the rest of the of the years for this education group. So the mean of this education group less than a high school diploma is 28.75. So basically I just replaced the variable with its mean. And then same here for the group that has a high school diploma. Initially the mean was just initially we just had 56.91 for 1994 and then all missing, but same thing, I replaced it with the mean, and then I did the same for 2024. So the reasoning for that, the reason why I did that is because I now want a variable here called percentage change that's going to be this variable minus this divided by initial. So we do that, hopefully you can see what I mean. So now I'm going to generate this variable called percentage change.
So now I have for this high school diploma group, the percentage change in LFP from between 2024 and 1994, we have this here, and then you'll see that it differs once. It's a new education group. So it's kind of like when I was going through the steps, I was thinking, I want a variable that looks like this. I want this percentage change variable. So I started, first I had to create LFP 1994 and then LFP 2024. But then once I did that, I had a bunch of missing values. So that's why I did the e replace and eventually ended up getting to the percentage change. So let's see what's next,
Eva:
Replacing it. Okay, so
Eva:
From here, there's not that many observations, so I could easily eyeball it and see what the max is, but just make things easier in case, which this is just a practice that I do because sometimes I will go and I want the code to apply to a larger data set. So instead of eyeballing it, I try to write code that can apply, can be automated to fit all different types of data. So let's run this. So now I'm creating this variable that just tells me the max percentage change, and I see that it's actually 17 point 17. So now I want to figure out which group has this percentage change, which we can see here it's some college node degree, but we can also do it through tabulations.
That's kind of what this does. So I'm saying if the percentage change equals the max, tell me which education group it is, which that's what we just saw, and that's what comes out here. And then again, I'm just doing kind of a quick summary tabulation here. So now I can kind of see that between when we look at just education groups, the group that had the greatest change in LFP was the group that has some college, but no degree. So that's one way that I can start to answer the question. And I'm going to do this actually for the other demographic groups. So for example, I did it also for race and age, and then here we'll see that I have Hispanic and age 55 through 64. So that's kind of one way that I can go about answering the question. But again, another way that you could answer it is to kind of talk about which groups had the most fluctuations, or you could do both.
Okay, so question three says, to use the data to examine trends among women older than 25 for three different variables. So the first one is wage and salary income. The second is social insurance income, and the third is education attainment. So let's actually start from the second one, social insurance income. So the first thing that I think about anytime I'm working with income is to consider if I should use mean or median. And in this case, I'm going to go with median. I usually do go with median because the mean could be very sensitive to the tail ends of the distribution because income, we don't have any negatives in this data set, but usually there are. So the lows can be very low, whereas the highs can be very high.
The mean could either be skewed, left or so I usually use the median to get a better sense of what the typical person, what's the income for the typical person. So let's just start there. So this is how I would do the median. I would do collapse with median because collapse is not limited only to Maine as we've been seeing so far. So I would do collapse median also with the weights collapse, this variable social insurance income for women over the age of 25 for each year. So let's go ahead and do that and see what the graph looks like.
Eva:
It seems like I need to re-share the screen. Okay, here it is.
Eva:
So the XS is not fixed, but basically this value is zero. So we see that the median is just zero for all of the years. So there's two different ways to resolve this. The first way is a little bit more straightforward. It's just to use mean instead and just say the median is zero. But I think a better answer would be to explain why the median would be zero. I think that would show that you can think about data and when something looks wrong, something doesn't look right, be able to interpret why. So in this case, the reason why the median is zero is because most people do not have social security income or a very small sample. Only a very small sample would, but most people who are not yet at the retirement age do not have social security income yet. So they would have zero. So basically median, if most people in our dataset have zero, the median, it makes sense that the median is zero. So in that case, we actually want to do the conditional median. So what that means is we want to look at the medians only for people who do have social security income. So it would be the exact code, the exact same code. I would just have this added condition. So let's graph this and see,
Okay, so here we have a more sensible trend that we can work with, and we can just see that over the years it's increased, and this is already adjusted for 2020 $3. So we know that in real terms, we still see an increase. So that's how we can start just for looking at trends for social insurance income. But let's also look at education attainment. The third one, the third variable.
We also see here that it's a pretty steady increase in the percent of women over 25 that have a college degree. So one thing to note here is that I use the college variable rather than the education variable. I think if I were to use education variable, because even though it is not a binary variable, it's more like there's six different values. Those values are not very meaningful. Six doesn't mean there's six quantities or something, it just means six. That's just the number we gave to the group with the highest education level. So we would not be able to collapse it because collapsing by mean or median, that would only be useful if the number was actually meaningful to a quantity. So in this case, I used college because a binary variable. So collapsing, it at least can let me look at this trend of what percent of women have a college degree. So we see a study increase, and then now let's go back and look at the median for wage income. So again, I did include the mean on here, but I'm only going to be working with the median, essentially.
Eva:
Okay, so we see here, I think the median wage income in 1994 is pretty drastically different than 2024, which, or sorry, 2023, which makes sense. And we see that after recessionary periods, it all makes sense. There's a dip also during covid, there's also a drastic dip. But these are the three graphs that we have so far for question three. So what the question is asking is basically based on these graphs and these trends that we see, what do we think is explaining the patterns that we saw in questions one and two? So the breakdown by demographic groups in question two, and the overall trends in question one, we want to try to get a sense of if we see a dip in certain years or a steady increase, any fluctuations kind of looking into where we might also see it in these factors. So I think one good approach for looking into it is just kind of comparing, seeing if there's any similarities in trends.
So this is labor force participation for women over 25 and can see it's not super similar, but there are some similarities with the trend for the median wage income versus for social security income. We don't really see much similarities. I think we'd have to look much closer. But here we do see that there's a similar trend between education and social security income. So I think one way to start is to kind of think about the similarity between these two trends as well as these two trends. So that's just kind of a hint to get you started.