An AI Professor at Harvard: ChatLTV

This last semester, I ran a few generative AI experiments in my Harvard Business School (HBS) class. The largest experiment was the creation of an AI faculty co-pilot to help me teach my MBA entrepreneurship course, Launching Tech Ventures (LTV). This blog post will share some of the findings from that experiment.

Course Background

LTV is a course I created thirteen years ago in partnership with my colleague Professor Tom Eisenmann. This year, there were three sections of the course with over 80 Harvard MBA students each so a total of roughly 250 students. The course is taught using the case method and has a fair amount of analytical work associated with it. All the cases focus on pre-product market fit tech startups and put the students in the shoes of the founders who have to make difficult decisions with imperfect data that will make or break their startup.

As part of the course material, I have written over 50 HBS cases and teaching notes, two books, and numerous book chapters. There are also dozens of PowerPoint slide decks and Excel spreadsheets that we use in the course. Thus, there is a large corpus of material tied to the course that has built up over our thirteen years. I also created an online version of the course last year which required me to write out a precise transcript, including a detailed glossary of various frameworks and acronyms that the course covers, as well as a large number of video interviews with case protagonists. Finally, I’ve been writing blog posts about entrepreneurship for nearly 19 years. The importance of this large corpus will become clear shortly.

A final important background point about the course is that we have a course Slack and require our students to post reflections in Slack as part of their grade as well as use Slack to share various course materials. This use of Slack had several benefits: first, we have three years’ worth of Q&A content from the course Slack, and, second, Slack is emphasized as a part of the student workflow.

Building a Generative AI Chatbot: Chat LTV

With a small team, we developed a Slack-based chatbot called ChatLTV, which served as a faculty co-pilot throughout the semester. ChatLTV was trained on the entire corpus of my course — including all the case studies, teaching notes, books, blog posts, and historical Slack Q&A mentioned above — as well as selected and curated publicly available material. In total, the corpus contained roughly 200 documents and 15 million words.

We embedded ChatLTV into the course Slack in the form of a Slack app, allowing each of our 250 students to engage with the chatbot either privately or publicly. If the student chose for the engagement to be private, only the student and the faculty could see the interactions. If posted publicly, everyone in either the section or the full course could see the interaction.

Our technical approach was to respond to a student’s query by providing an LLM (in our case, we chose OpenAI’s ChatGPT v4) with two pieces of information: (a) the question being asked, (b) relevant context that the LLM can use to answer the question. The relevant context was retrieved from the corpus, which was stored in a vector database (in our case, we chose Pinecone). The most relevant content chunks were then served to the LLM using OpenAI’s API. This technique is known as Retrieval Augmented Generation (RAG) and is a useful way to improve the reliability and accuracy of the LLM’s responses. We used Langchain as a middleware tool to simplify the ChatLTV code base and take advantage of some useful services that sped up development.

An architecture for ChatLTV can be found here:

After a great deal of trial and error (see testing below), we settled on the following LLM prompt to provide an answer with the relevant chunked content as context: “You are a world-class algorithm to answer questions in a specific format. You use the context provided to answer the question and list your sources in the format specified. Do not make up answers.”

Since HBS has a copyright on much of the content being used, it was paramount that we ensure that the content would not flow into the public domain. Rather than using the OpenAI APIs directly, we used Microsoft’s Azure OpenAI Service for both development and production use. Leveraging Azure allows us to take advantage of the security, privacy, and compliance benefits, as well as guarantee that the data fed into the service is not used to retrain models that are then made available to others. The content is itself stored within the Pinecone Vector Database, which is SOC2 Type II compliant, and only relevant segments of content (e.g., a few paragraphs of a particular case or teaching note) are sent to the Azure OpenAI Service depending on the query that is made. During the course of our development, Harvard made a private LLM available to faculty and we anticipate porting ChatLTV over to it.

The total ChatLTV code base was 8000 lines for the backend (including 800 lines for RAG and 900 lines for content indexing, and then backend APIs, tests, and deployment code). We also created a content management system (CMS) that allowed faculty to add or delete additional content and observe student queries. That CMS was 9000 lines of code and a simple web-based application. The importance of the CMS will become clear later. The code was written over the course of the late spring and summer and took roughly 2-3 person months. If written today, with the rapid improvement in the underlying development tools, the code base would be substantially smaller (perhaps half the size) and the person months similarly smaller.

We also made ChatGPT4 publicly available to the students in the course Slack alongside ChatLTV. That way, students could use the public chatbot or the course chatbot, depending on their needs. In addition to providing the answers through Slack, the LLM shared the document sources for the answers in the Slack reply to the answer so that students could see the source material references.

Training and Testing the GPT, Adding Admin Content

Given the inherent probabilistic and nondeterministic nature of LLMs and the large body of text involved in the inputs and outputs, the development of an LLM app is an iterative process. We created numerous test data sets (roughly 500 Q&A queries) to manually test ChatLTV and provided an evaluation function to provide the development team with feedback on the quality of the responses.

We also ran an automated evaluation by using OpenAI to compare the outputs to the ground truth data and generate a quality score as well as manual testing noted above. The mix of manual and automated testing allowed us to play around with our prompts (i.e., prompt engineering) and content indexing. It also revealed an important feature that we later added: a repository of admin content. We realized in our informal user testing (i.e., discussions with prospective users, drawing from numerous students from past classes) that students would want to ask administrative questions about the course, such as “When is assignment #2 due?” and “How do I schedule office hours with Jeff?” and “what are the parameters for the final project?” Although this information is always provided to students over the course of a semester, students (shockingly!) sometimes forget the details and don’t know where to look for them. Thus, we created a set of content that we labeled “Course Admin” (e.g., “LTV Grading Rubric”, “LTV Writing Assignment 1”) and programmed the RAG algorithm to review that corpus first when providing answers.

Results: Student Experience

We launched the chatbot at the start of the semester in early September and used it throughout the semester, ending just last week. From my standpoint, the experiment was a smashing success. Throughout the semester, students found ChatLTV to be an invaluable resource for course preparation. They used the chatbot to ask clarifying and evaluative questions about case studies, analysis, acronyms, and a full range of administrative matters. Students expressed a high level of interest and excitement for the chatbot and described it as a valuable tool for enhancing their learning experience. A few quotes from a post-course evaluation:

I loved it — I found that I could use it to check my answers but more importantly understand if my methodlogy was directionally correct, which helped me get farther in my case prep. I loved that I could use it almost like a professor by my side as I worked through the questions, and I feel like it definitely helped me learn the content better.

It was nice to have a walled garden of content that we could know and trust to be used in tandem with other resources, including ChatGPT.

Over half our students — roughly 170 — made over 3000 queries over the course of the semester of ChatLTV. The course has 28 sessions, including 24 cases (versus exercises). Thus, there were roughly 130 queries made per case. When surveyed, nearly 40% of the students who used the chatbot gave it a quality score of a “4” or “5”. The usage and quality were frankly higher than I had anticipated. I was thrilled with both.

Interestingly, of the over 3000 queries, only a dozen or so were made using the public channel versus the private channel on Slack. Thus, 99% of our students elected for private queries with the chatbot rather than allow their peers to see what they were asking in advance of their case preparation.

Results: Faculty Experience

Perhaps most surprising to me over the course of this semester was the faculty experience. I had two fears: (1) ChatLTV would see no usage after all this work, or (2) students would use ChatLTV in a way that diminished the quality of the in-class conversation (e.g., getting “the answers” from the chatbot and spitting them back in a rote fashion). The latter was not at all the case. In fact, the quality of the in-class case conversation was excellent. Students appeared to have used the chatbot to prepare effectively for the case conversation and advance their understanding of the material, as noted in the student quotes above. When students provided answers to analytical questions that they were provided by ChatLTV, the faculty was able to push them to deconstruct their assumptions, methodology, and strategic implications rather than waste class time with “doing the math”.

Most interestingly, as a faculty member, I had a unique window into what my students were asking about before walking into the classroom. Each morning before class, I would inspect the admin CMS to see what queries had been made by what students (typically the night before — ChatLTV usage seemed to be most active between 10pm-2am!). From that resource, I had a unique opportunity to peer inside their minds and appreciate where they were at in terms of their comfort and knowledge of the material for that day and beyond.

A few examples will illustrate this point:

  • One student, let’s call him Jay, is an introvert. I didn’t see his hand up as much as others over the course of the semester. I wondered about his level of engagement with the material, but I was concerned that if I cold-called him, I would embarrass him because he was unprepared or lacking in command of the content. I noticed one morning that he had made numerous very thoughtful queries of ChatLTV about the day’s case. Based on the nature of my observation of the Q&A with the chatbot and his progression, I was confident I could cold call him that morning. I did. He crushed the opening, getting the class conversation off to a fantastic start.
  • I noticed that another student, let’s call her Nikki, was asking ChatLTV many clarifying questions before class. “What does OTE mean?”, “What does CPI mean?”, “What does WAU mean?” Nikki is a non-native English speaker and had previously worked at a Fortune 500 company, not someone who had been previously immersed in Startupland before HBS. Some of the acronyms may be hard for her to grasp and getting in the way of her learning journey. I asked ChatLTV in the public Slack channel to provide a list of the top 15 acronyms from the course and detailed definitions of each, which it accurately produced (thanks to the admin content glossary). As a follow-on, one wise guy student asked in the public chat for ChatLTV to come up with a catchy and funny way to remember the acronyms. Unfortunately, ChatLTV’s humor was not much better than “Dad jokes” level. Either way, I saw a subsequent reduction in Nikki’s acronym queries and instead more advanced, sophisticated queries.
  • Another student, let’s call her Mary, was frequently asking ChatLTV the morning before class to summarize the day’s case. I worried that perhaps Mary was not reading the cases and using the chatbot as a crutch. But I found her hand was up frequently and her comments were excellent, demonstrating command of the material. I asked her about it (in as nonconfrontational a manner as I could muster). She shared that because she was a young Mom, her sleep was highly variable and she was not in control of her time. To compensate, she prepared for cases many days in advance. The morning of the class, she liked to ask ChatLTV to help refresh her memory regarding the key case facts and issues so that she was ready to go each day. I no longer worried that Mary was taking shortcuts.

These and countless other examples demonstrated that ChatLTV was a useful tool not just for our students, but for me as a faculty member trying to meet my students where they were at any given moment to assist them in their individualized learning journeys.

Bonus: HBS LTV Project Feedback, a custom GPT

OpenAI launched a powerful new feature a few weeks ago, called custom GPTs. Using no code, Custom GPTs can create customized versions of ChatGPT, trained and tuned for a particular skill.

At the end of the semester (i.e., last weekend), I decided to create a custom GPT called “HBS LTV Feedback”, a critical academic evaluator to provide feedback on LTV final course papers and startup ideas. The final project requires students to apply a course tool to a startup of their choice, often their own, and write their reflections and takeaways from the experience.

Typically, students work in teams of two. Thus, with over 250 students, we will receive around 125 papers. We grade them all, but historically we simply don’t have the time to provide them with tangible written feedback on the quality of their paper or the quality of the idea. Thus, a custom GPT project evaluator.

It took me less than two hours to set up and train the custom GPT and zero lines of code. The functionality is ridiculously easy to use. As one of my genAI portfolio company founders likes to put it, “English is the cool new programming language for software.”

The results were excellent. I had to prompt the GPT to be tougher and more critical than its instincts might normally be (LLMs are way softer than HBS professors — in the face of widespread grade inflation, we still grade on the same forced curve from decades and decades ago). Students seemed happy with the results. Two feedback comments that were indicative:

Thank you so much for sending feedback. Honestly, will incorporate this feedback into the document ASAP since I am actually going to use this for real tests to validate the idea and business model over the next couple of months.

Thanks for the feedback and explanation – both human and in silico. A couple of the GPT points are pretty helpful! Especially the 2 areas for improvement for the project and startup idea.

One of my HBS faculty colleagues joked that this represented a historic moment as he believes no faculty in the 100+ year history of HBS has ever provided tangible, written, constructive feedback on final papers to each student. I don’t know if that’s entirely true, but using AI is going to make that much more routine and effective in the future.

Conclusion

This semester was a fun experiment. There is an enormous amount of usage of AI across the HBS faculty and curriculum and the school is racing ahead to embrace the tools even more on behalf of our students. Hopefully, this write-up will inspire other faculty around the world to run their own experiments.

Thank Yous

The ChatLTV project team consisted of Saswat Panda, Chiyoung Kim, Laura Whitmer, and Robin Lobo. Saswat was a total hero in writing every line of code. Special thank you to HBS’ administrative and IT leaders for allowing me to run this experiment and taking the risks associated with it, particularly Prof Mitch Weiss. Also thanks to my LTV faculty colleagues Lindsay Hyde and Christina Wallace. Finally, thank you to our 250 LTV students from 2023 as well as the 2000+ students who have taken the course over the last 13 years. None of us would be here if not for you.

Leave a comment