ShashankShekhar Singh
ML Engineer & Open Source Developer
GSoC '24 '25 • LFX '24 • SoB '24 '25 • C4GT '25 • AmEx '25 • Blue Yonder '24
About Me
Turning research into real-world impact
4th year B.Tech student at IIT (BHU) Varanasi, building at the intersection of Machine Learning, NLP, and Open Source. From Google Summer of Code to American Express, I turn research into real-world impact.
I've worked across American Express, Google Summer of Code, Linux Foundation, Summer of Bitcoin, Code for GovTech, and Blue Yonder — spanning NLP, Computer Vision, RAG systems, and data science. I also built CodeGraphContext, an open-source tool with 3K+ stars that indexes code into knowledge graphs for AI assistants.
Education
Tech Stack
Tools of the trade
Languages
ML & AI
Tools & Platforms
Domains
Experience
Where I've worked
American Express
ML Intern
- ▸Built an end-to-end NLP tool to cluster 40,000+ monthly survey comments into meaningful topics in under 1 minute.
- ▸Created a keyword-aware topic mapping system using unsupervised techniques to surface latent issues and sentiments.
Google Summer of Code — HumanAI
ML Intern (2025)
- ▸Improved CRNN-based OCR with beam search decoding, attention mechanisms, and language models — 97% accuracy.
- ▸Deployed the pipeline as a web app for real-time inference and digitization of archival documents.
Summer of Bitcoin — Braidpool
NLP Intern (2025)
- ▸Developed a RAG system tailored for Bitcoin development, reducing hallucinations in API usage.
- ▸Integrated domain-specific retrieval over Bitcoin APIs, dev discussions, and protocol standards.
Code for GovTech — Pratham Books
ML Intern (C4GT 2025)
- ▸Developed multilingual voice/text search combining speech-to-text with GPT tag generation.
- ▸Implemented hybrid semantic-keyword search enabling discovery in 330+ languages across 53K+ storybooks.
Google Summer of Code — HumanAI
ML Intern (2024)
- ▸Implemented CRNN architecture for text recognition in historical Spanish documents.
- ▸Applied data augmentation and fine-tuned model parameters for enhanced accuracy.
Linux Foundation Mentorship (LFX)
ML Intern
- ▸Developed NLP-driven anonymization pipelines to redact sensitive entities in telecom datasets.
- ▸Automated pre-processing, entity masking, and validation using scalable open-source frameworks.
Summer of Bitcoin — Libbitcoin
Software Development Intern (2024)
- ▸Designed and implemented a SQLite database in C++ for libbitcoin, optimizing storage and retrieval.
- ▸Developed scripts and APIs to efficiently query and manage blockchain data.
Blue Yonder
ML/DS Intern
- ▸Developed a Fashion Trend prediction model using Transfer Learning on historical sales and social media.
- ▸Engineered features like seasonal trends, color, category, and style to improve model performance.
Projects
Things I've built
From knowledge graph engines to OCR pipelines — building tools that solve real problems.
CodeGraphContext
A powerful CLI toolkit & MCP server that indexes local code into a knowledge graph for AI assistants. Scaled to 50K+ downloads and 3K+ GitHub stars within 8 months.
Graph-RAG Travel Recommender
Travel assistant & itinerary planner using Google Maps and Wikipedia APIs, with Gemini generating Cypher commands to query a Neo4j knowledge graph via a RAG pipeline.
Tabular Data Analysis Tool
Multilingual Flask chatbot using Gemini API and SQLite3 for analyzing handwritten/printed table images, supporting multiple schemas and CRUD operations. Hosted on GCP.
Dark Pattern Detection
Transformer-based detection system (BERT, XLNet, RoBERTa) to identify and categorize dark patterns in text, extended with a browser plugin for real-time visual highlighting.
More
Beyond the code
Open Source
Leadership
Honours & Achievements
Contact
Let's connect
Interested in collaborating, have an ML idea, or just want to say hi? I'm always open to interesting conversations and opportunities.