Loading Portfolio
0%
AboutSkillsExperienceProjectsAchievementsContact
IIT (BHU) Varanasi • B.Tech 2026

ShashankShekhar Singh

ML Engineer & Open Source Developer

GSoC '24 '25 • LFX '24 • SoB '24 '25 • C4GT '25 • AmEx '25 • Blue Yonder '24

About Me

Turning research into real-world impact

4th year B.Tech student at IIT (BHU) Varanasi, building at the intersection of Machine Learning, NLP, and Open Source. From Google Summer of Code to American Express, I turn research into real-world impact.

I've worked across American Express, Google Summer of Code, Linux Foundation, Summer of Bitcoin, Code for GovTech, and Blue Yonder — spanning NLP, Computer Vision, RAG systems, and data science. I also built CodeGraphContext, an open-source tool with 3K+ stars that indexes code into knowledge graphs for AI assistants.

Education

B.Tech in Mechanical EngineeringIIT (BHU), Varanasi
8.372022 - 2026
CBSE (XII)Pragati Public School, Kota
93.60%2022
CBSE (X)Little Flower House, Varanasi
97.20%2020
8+
Internships
3K+
GitHub Stars
300+
Contributions
7+
Hackathon Wins

Tech Stack

Tools of the trade

Languages

PythonCC++JavaScriptSQLBashHTMLLaTeX

ML & AI

PyTorchTensorFlowKerasScikit-learnHugging FaceOpenCVLangchainPySpark

Tools & Platforms

Git/GitHubDockerLinuxFlaskFastAPISQLite3Neo4jJupyterKaggleGCP

Domains

Machine LearningNLP / LLMsRAG & AgentsComputer VisionGenerative AIData ScienceZero/Few-shot Learning

Experience

Where I've worked

American Express

ML Intern

Gurgaon|May 2025 — Jul 2025
  • Built an end-to-end NLP tool to cluster 40,000+ monthly survey comments into meaningful topics in under 1 minute.
  • Created a keyword-aware topic mapping system using unsupervised techniques to surface latent issues and sentiments.

Google Summer of Code — HumanAI

ML Intern (2025)

Remote|May 2025 — Jul 2025
  • Improved CRNN-based OCR with beam search decoding, attention mechanisms, and language models — 97% accuracy.
  • Deployed the pipeline as a web app for real-time inference and digitization of archival documents.

Summer of Bitcoin — Braidpool

NLP Intern (2025)

Remote|May 2025 — Jul 2025
  • Developed a RAG system tailored for Bitcoin development, reducing hallucinations in API usage.
  • Integrated domain-specific retrieval over Bitcoin APIs, dev discussions, and protocol standards.

Code for GovTech — Pratham Books

ML Intern (C4GT 2025)

Remote|May 2025 — Jul 2025
  • Developed multilingual voice/text search combining speech-to-text with GPT tag generation.
  • Implemented hybrid semantic-keyword search enabling discovery in 330+ languages across 53K+ storybooks.

Google Summer of Code — HumanAI

ML Intern (2024)

Remote|May 2024 — Jul 2024
  • Implemented CRNN architecture for text recognition in historical Spanish documents.
  • Applied data augmentation and fine-tuned model parameters for enhanced accuracy.

Linux Foundation Mentorship (LFX)

ML Intern

Remote|May 2024 — Jul 2024
  • Developed NLP-driven anonymization pipelines to redact sensitive entities in telecom datasets.
  • Automated pre-processing, entity masking, and validation using scalable open-source frameworks.

Summer of Bitcoin — Libbitcoin

Software Development Intern (2024)

Remote|May 2024 — Jul 2024
  • Designed and implemented a SQLite database in C++ for libbitcoin, optimizing storage and retrieval.
  • Developed scripts and APIs to efficiently query and manage blockchain data.

Blue Yonder

ML/DS Intern

Hyderabad|May 2024 — Jul 2024
  • Developed a Fashion Trend prediction model using Transfer Learning on historical sales and social media.
  • Engineered features like seasonal trends, color, category, and style to improve model performance.

Projects

Things I've built

From knowledge graph engines to OCR pipelines — building tools that solve real problems.

Featured

CodeGraphContext

A powerful CLI toolkit & MCP server that indexes local code into a knowledge graph for AI assistants. Scaled to 50K+ downloads and 3K+ GitHub stars within 8 months.

PythonNeo4jMCPKnowledge Graph
View Source
Featured

Graph-RAG Travel Recommender

Travel assistant & itinerary planner using Google Maps and Wikipedia APIs, with Gemini generating Cypher commands to query a Neo4j knowledge graph via a RAG pipeline.

RAGNeo4jGeminiGoogle Maps API
View Source

Tabular Data Analysis Tool

Multilingual Flask chatbot using Gemini API and SQLite3 for analyzing handwritten/printed table images, supporting multiple schemas and CRUD operations. Hosted on GCP.

FlaskGemini APISQLite3GCP
View Source

Dark Pattern Detection

Transformer-based detection system (BERT, XLNet, RoBERTa) to identify and categorize dark patterns in text, extended with a browser plugin for real-time visual highlighting.

BERTRoBERTaXLNetBrowser Extension
View Source

More

Beyond the code

Leadership

Head of COPS IG, IIT BHU
Jul 2024 — May 2025
Led the Machine Learning group with 20 AI/ML research students. Speaker at workshops, mentored students, organized ML tracks.
Core Team — Robotics Club, IIT BHU
Jul 2023 — Present
Coordinated Mazex, ROS Installation fest. Conducted workshops on AI/ML and Computer Vision.

Honours & Achievements

First prize — NLP.py competition at Optima, IIT Kharagpur
National Winner — Crystal Ball event by Blue Yonder
Best Use of Knowledge Graph RAG — Hypermode AI Challenge
First runner-up — Ikigai's Low Code ML Competition at IIM Ahmedabad
First runner-up — TechTrove at IIM Trichy
First runner-up — Asteroid Venture, NSSC, IIT Kharagpur
Second runner-up — Byte Synergy 2.0 by Zense, IIIT Bangalore
Finalist — Dark Patterns Buster Hackathon (DPBH-2023) at IIT BHU
Kaggle Dataset Contributor — 4000+ views, Official Trending Datasets
Problem Setter — DeepLearn 2024, 11th Intl. School on Deep Learning

Contact

Let's connect

Interested in collaborating, have an ML idea, or just want to say hi? I'm always open to interesting conversations and opportunities.