I only used a single Nvidia RTX 3090 GPU to pretrain a 150M Korean Language Model – fully from scratch. Built a tokenizer with SentencePiece, using LatentMoE architecture, gathering datasets and finetuned. Open-source on Huggingface, Github.
Just tried to mount Llama3-2B model on iOS phone. Got bunch of errors about dynamic libraries, Metal and Memory allocation problems. Millions of failures, learned valuable things. + Text Classification model & Extraction Summarization
Normal and Defect sealant classification problem. Accuracy skyrocketed to 98% using Vision Transformers, but TinyVGG also. There was a problems with training procedure, which as a result achieved 60% accuracy.
Help! I accidentally built GPT from scratch! First paper read, second AI model I built. (The first AI model was ViT) Learning most of the things building from scratch.