Elsa Li — Portfolio

Welcome

Welcome to my portfolio! I'm a software engineer passionate about natural language processing, machine learning, and building impactful solutions.

Outside of school, I love tinkering with mechanical keyboards, and I'm especially big fan of perfumes! I'm always looking for new keyboards and fragrances to add my ever-growing collection >:)

Professional Experience

Computer Vision Team Lead

American Red Cross | September 2025 - Present

Lead a 5-person team building a computer vision detection system using neural networks to identify vulnerable buildings in Indonesia from 360° Mapillary imagery
Benchmarked the model at 75% accuracy

Software Engineer Intern

Yujun Venture Capital Management | July 2025 - August 2025

Built a Selenium web scraper to consolidate public company information, accelerating pre-investment research time by 15%
Applied the scraper to analyze market trends across perfume companies, presenting key findings to the internship team and investors

Software Engineer Intern

The Nature Conservancy | May 2025 - July 2025

Developed an original RAG-based LLM system using ChromaDB and Qwen to process 2,000+ agroforestry research papers, reducing a year-long literature-review process for domain-specific queries
Created a new benchmarking dataset of ~7,000 unique questions for agroforestry RAG systems
Designed a regex-based text segmentation system to optimize article preprocessing, reducing token input by 30%

Software Engineer Intern

Shenzhen Lanyang Technology | May 2024 - July 2024

Patented and led development of a novel image-comparison algorithm with 70% accuracy for fraudulent LPG tank detection using Python and OpenCV, preventing the illegal sale of hazardous units
Reduced implementation costs by 80%, making safe tank detection accessible to low-income families in China

Research Experience

Computer Vision Research Assistant

Lab for Cognition and Attention in Time and Space (Lab for CATS) @ Harvey Mudd College | September 2025 - Present

Analyzing behavior of visual depth models when applied to forced perception videos
Interested in working on monocular depth detection and leveraging adversarial attacks on vision models

NLP Research Assistant

Workflows for Humanistic Inference of Statistical Knowledge (WHISK) Lab @ Harvey Mudd College | January 2025 - May 2025

Implemented supervised ML models (MultinomialNB, ComplementNB, Logistic Regression) to analyze perfume reviews
Visualized data using confusion matrices and analyzed precision, recall, and F1 scores

Affiliations

Sponsors for Educational Opportunity (SEO) - First Year Academy

Selected out of 1200 applicants to participate in software engineering pre-professional development training | February 2025 - August 2025

About Me

I'm a software engineer with experience in natural language processing, machine learning, and full-stack development. I'm passionate about leveraging AI and data-driven solutions to solve real-world problems.

My work spans from building RAG-based LLM systems for environmental research to developing patented computer vision algorithms for safety applications. I've been fortunate to work with organizations like The Nature Conservancy, Yujun Venture Capital, and Shenzhen Lanyang Technology.

Projects

Indonesia Building Vulnerability CV

Leading a 5-person team building a neural-network detection system that identifies vulnerable buildings in Indonesia from 360° Mapillary imagery — 75% benchmarked accuracy.

Python Neural Networks Computer Vision Mapillary

VC Research Web Scraper

Selenium scraper that consolidates public-company data for a venture-capital team, accelerating pre-investment research by 15% and surfacing perfume-market trends for an investor presentation.

Python Selenium Web Scraping Market Analysis

Agroforestry RAG System

RAG-based LLM system processing 2,000+ research papers, automating literature review for agroforestry queries.

Python ChromaDB Qwen NLP

LPG Tank Fraud Detection

Patented computer vision algorithm with 70% accuracy for detecting fraudulent liquid petroleum gas tanks using OpenCV.

Python OpenCV Computer Vision Patent

EcoBuddies

Won Best Use of Streamlit out of 50 teams at Caltech Hacktech — full-stack interactive environmentalism game.

Python Streamlit Gemini API

Rainforest Revival

Won Most Unique Game out of 24 teams at SEO First Year Academy Summer Program.

Python Pygame Gemini API

Digidraw

Web app for digitally coordinating college room-draw plans. Used by 900+ students at Harvey Mudd to plan housing each year.

Web App Full-Stack 900+ users

← Back to Home

Indonesia Building Vulnerability CV

A computer vision detection system built with the American Red Cross to identify structurally vulnerable buildings in Indonesia using 360° street-level imagery from Mapillary. The model helps humanitarian teams prioritize disaster-preparedness efforts by surfacing at-risk structures at scale.

Organization

American Red Cross | September 2025 - Present

Role

Computer Vision Team Lead — leading a 5-person team through data pipeline design, model training, and evaluation.

Key Achievements

Architected and lead a neural-network pipeline for building-vulnerability classification on 360° imagery
Benchmarked the system at 75% accuracy against held-out ground truth
Coordinating a 5-person team across data collection, preprocessing, modeling, and evaluation

Technologies Used

Python Neural Networks Computer Vision Mapillary 360° PyTorch

Impact

The detection system gives the Red Cross a scalable way to survey building stock in at-risk regions of Indonesia — work that would otherwise require extensive ground-truth teams. Identifying vulnerable structures early enables better targeting of retrofits, evacuation planning, and disaster-response resources.

← Back to Home

VC Research Web Scraper

A Selenium-based web scraper built for Yujun Venture Capital Management to consolidate public-company information for the firm's pre-investment research process. The tool was also applied to the perfume industry to surface market trends for a pitch to the internship team and investors.

Organization

Yujun Venture Capital Management | July 2025 - August 2025

Key Achievements

Built a Selenium scraper consolidating public-company information across multiple sources
Accelerated pre-investment research time by 15%
Analyzed market trends across perfume companies and delivered key findings in a final presentation to the team and investors

Technologies Used

Python Selenium Web Scraping Data Analysis

Impact

Replaced a manual information-gathering workflow with an automated pipeline, freeing analysts to focus on interpretation instead of collection. The perfume-market analysis demonstrated the scraper's applicability to thesis-driven research and informed the firm's discussion of that sector.

← Back to Home

Agroforestry RAG System

A sophisticated Retrieval-Augmented Generation (RAG) system developed at The Nature Conservancy to automate literature review processes for agroforestry research. The system processes over 2,000 research papers to provide domain-specific answers to complex queries.

Organization

The Nature Conservancy | May 2025 - July 2025

Key Achievements

Processed 2,000+ agroforestry research papers using RAG architecture
Created benchmarking dataset of ~7,000 unique questions for system evaluation
Developed regex-based text segmentation for optimized article preprocessing
Automated literature review process, significantly reducing research time

Technologies Used

Python ChromaDB Qwen LLM RAG Architecture NLP Regex

Technical Architecture

The system uses ChromaDB as a vector database to store embeddings of research papers. When a query is received, the RAG system retrieves the most relevant document chunks and uses the Qwen language model to generate comprehensive, contextually accurate answers. The regex-based preprocessing pipeline ensures clean, well-structured text for optimal retrieval performance.

Benchmarking Dataset

To evaluate system performance, I created a comprehensive dataset of approximately 7,000 unique questions covering various aspects of agroforestry. This dataset serves as a robust benchmark for testing retrieval accuracy and answer quality, and can be used to compare different RAG implementations.

Impact

This system dramatically accelerates the research process for agroforestry experts at The Nature Conservancy, enabling them to quickly find relevant information across thousands of papers. The automated literature review capability supports more efficient decision-making for conservation and sustainable agriculture initiatives.

← Back to Home

LPG Tank Fraud Detection

A patented computer vision algorithm developed to detect fraudulent liquid petroleum gas (LPG) tanks, preventing the illegal sale of hazardous units. This project achieved 70% accuracy in identifying counterfeit tanks while reducing implementation costs by 80%, making safety technology accessible to low-income families in China.

Organization

Shenzhen Lanyang Technology | May 2024 - July 2024

Key Achievements

Patented novel image comparison algorithm for fraud detection
Achieved 70% accuracy in identifying fraudulent LPG tanks
Reduced implementation costs by 80% compared to existing solutions
Made safe tank detection accessible to low-income families in China
Prevented sale of hazardous counterfeit units

Technologies Used

Python OpenCV Computer Vision Image Processing Patent

Technical Approach

The algorithm uses advanced image comparison techniques with OpenCV to analyze LPG tank features and identify discrepancies that indicate counterfeit products. The system compares tank characteristics including markings, serial numbers, manufacturing details, and physical features against a database of authentic tank specifications.

Problem Statement

Counterfeit LPG tanks pose serious safety risks, as they may lack proper safety mechanisms and can lead to explosions or gas leaks. However, traditional detection methods were prohibitively expensive for many households. This project aimed to develop an affordable, accessible solution that could be widely deployed.

Cost Reduction Innovation

By leveraging computer vision and standard imaging equipment, the solution reduced costs by 80% compared to existing hardware-based authentication systems. This dramatic cost reduction was crucial for making the technology accessible to low-income families who are most vulnerable to purchasing counterfeit products.

Social Impact

This project directly contributes to public safety by preventing the distribution of dangerous counterfeit LPG tanks. By making detection technology affordable and accessible, it protects vulnerable communities from the risks associated with substandard gas containers. The patent ensures the technology can be commercialized and deployed at scale.

← Back to Home

EcoBuddies

EcoBuddies is an interactive full-stack chatbot application designed to promote environmental awareness and sustainable living. The project won Best Use of Streamlit at Caltech's premier hackathon, Hacktech 2025.

Award

🏆 Best Use of Streamlit - Hacktech 2025 (Caltech)

Project Overview

EcoBuddies integrates Google's Gemini LLM with Streamlit's web framework to create an engaging conversational interface that educates users about environmental topics, sustainability practices, and climate action. The application makes learning about environmentalism accessible and interactive through AI-powered conversations.

Technologies Used

Python Streamlit Google Gemini API LLM Integration

Key Features

Full-stack chatbot application with intuitive user interface
Integration with Google Gemini LLM for intelligent conversations
Real-time responses to environmental queries and sustainability questions
Streamlit web framework for rapid development and deployment
Educational content delivery through conversational AI

Technical Implementation

The application leverages Streamlit's component-based architecture to create a responsive and interactive web interface. The Google Gemini API powers the conversational capabilities, providing accurate and contextually relevant responses to user queries about environmental topics.

Hackathon Success

Winning Best Use of Streamlit at Caltech's Hacktech 2025 demonstrated the application's effective use of modern web frameworks and AI technology. The judges recognized the project's clean implementation, user-friendly design, and impactful mission of promoting environmental awareness.

← Back to Home

Rainforest Revival

Rainforest Revival is an innovative game that combines entertainment with environmental education. The project features three distinct mini-games and an integrated AI character chatbot, creating a unique gaming experience that raises awareness about rainforest conservation.

Award

🏆 Most Unique Game - SEO First Year Academy Summer Program

Project Overview

Developed during the SEO First Year Academy Summer Program, Rainforest Revival stands out for its creative combination of traditional game mechanics with cutting-edge AI technology. The game provides an engaging platform for learning about environmental conservation.

Technologies Used

Python Pygame Google Gemini API Game Development

Game Components

Three Mini-Games: Each focuses on different aspects of rainforest ecosystems and conservation challenges
AI Character Chatbot: Integrated Google Gemini LLM provides an interactive guide that responds to player questions
Educational Content: Environmental information seamlessly woven into gameplay
Interactive Storytelling: Dynamic narrative that responds to player choices

Technical Implementation

Built using the Pygame framework for game mechanics and graphics, with Google Gemini API integration for the chatbot feature. The three mini-games were designed to be both entertaining and educational, each highlighting different conservation themes.

Why "Most Unique"

The project earned the Most Unique Game award for its innovative combination of traditional 2D gaming with advanced AI conversation capabilities. The seamless integration of an LLM-powered chatbot into a Pygame environment was particularly noteworthy.

← Back to Home

Digidraw

Digidraw is a web application that digitally coordinates the annual room-draw (housing selection) process at Harvey Mudd College. It replaces an error-prone paper-and-whiteboard workflow with a shared interface where students can plan, preview, and finalize their room picks in real time.

Scale

Used by 900+ students

Repository

github.com/tomqlam/roomdraw ↗

What It Does

Lets students visualize every dorm and open room during the draw
Coordinates pick order so multiple students can plan against the same live state
Reduces miscommunication and last-minute conflicts compared to the legacy manual process

Tech Stack

Frontend: React (JavaScript)
Backend: Go with the Gin web framework, containerized with Docker/Podman
Database: PostgreSQL storing suites, users, rooms, and a transaction log for auditability
Authentication: Google OAuth
External services: BunnyNet CDN for suite-design image hosting, SMTP for email notifications
Tooling: Python + Jupyter notebooks for DB seeding and test data, rate-limiting and request-queuing middleware to handle draw-day load

Impact

Adopted campus-wide, Digidraw has become the standard tool students use during room draw each year — streamlining a high-stakes event that previously depended on in-person coordination and shared spreadsheets.

← Back to Home

Polygraph Lie Detection

A multimodal lie detection system that analyzes videos using both natural language processing and computer vision techniques. The system combines sentiment analysis from speech and visual cues to identify potential deception indicators.

Project Overview

This project explores the intersection of NLP and computer vision by implementing a comprehensive system that processes video content to detect deception through multimodal analysis.

Technologies Used

Python NLP Computer Vision Google Cloud Speech API Sentiment Analysis

System Components

NLP Sentiment Analysis: Analyzes speech for linguistic patterns and emotional indicators
Computer Vision Analysis: Examines facial expressions and behavioral patterns
Speech-to-Text: Uses Google Cloud Speech API for transcription
Multimodal Integration: Combines NLP and CV insights for comprehensive assessment

Technical Implementation

The system processes videos by extracting audio and visual streams. Google Cloud Speech API converts speech to text for NLP sentiment analysis, while computer vision algorithms analyze facial expressions and behavioral cues simultaneously.

Multimodal Approach

By combining NLP sentiment analysis and computer vision, the system leverages multiple information sources. This approach provides more comprehensive detection than single-modality systems by analyzing both verbal and non-verbal communication.

Research Purpose

This project serves as an exploration of multimodal AI techniques. It's designed for educational and research purposes to understand the capabilities and challenges of combining NLP and computer vision for complex analysis tasks.

Get In Touch

I'm currently looking for software engineering internships for summer 2027, and I would love to get in contact to chat more!

🌸 Scheduler

This page is password protected.