FastText Working and Implementation
Last Updated :
28 Jul, 2025
Word embeddings have become an important part of modern natural language processing, but traditional approaches like Word2Vec struggle with out-of-vocabulary words and morphologically rich languages. FastText addresses these limitations through a subword-based approach that captures semantic meaning at the character level while maintaining computational efficiency.
Understanding FastText Architecture
FastText extends the Skip-gram and CBOW models by representing words as bags of character n-grams rather than atomic units. This fundamental shift allows the model to generate embeddings for previously unseen words and capture morphological relationships between related terms.
The Subword Approach
Traditional word embedding models treat each word as an indivisible token. FastText breaks words into character n-grams, enabling it to understand word structure and meaning at a granular level.
Consider the word "running":
- 3-grams: <ru, run, unn, nni, nin, ing, ng>
- 4-grams: <run, runn, unni, nnin, ning, ing>
- 5-grams: <runn, runni, unnin, nning, ning>
The angle brackets indicate word boundaries, helping the model distinguish between subwords that appear at different positions.
Hierarchical Softmax Optimization
FastText employs hierarchical softmax instead of standard softmax for computational efficiency. Rather than computing probabilities across all vocabulary words, it constructs a binary tree where each leaf represents a word and internal nodes represent probability distributions.
Key advantages of hierarchical softmax:
- Reduces time complexity from O(V) to O(log V) where V is vocabulary size
- Uses Huffman coding to optimize frequent word access
- Maintains prediction accuracy while significantly improving training speed
Step-by-Step Implementation
Step 1: Installing and Importing FastText
Install FastText using pip and import the required libraries:
Python
import fasttext
import os
Note: Use numpy==1.24.4 for compatibility with FastText
Step 2: Creating Training Data
- Prepares example sentences related to royalty, exercise and reading.
- Writes each sentence in lowercase into a text file for FastText training.
Python
def create_sample_data():
# Sample sentences for training
sentences = [
"The king rules the kingdom",
"The queen helps the king",
"Running is good exercise",
"The runner runs fast",
"Walking is healthy activity",
"The walker walks slowly",
"Reading books is fun",
"The reader reads daily"
]
# Save to text file (one sentence per line)
with open('training_data.txt', 'w') as f:
for sentence in sentences:
f.write(sentence.lower() + '\n') # Convert to lowercase
print("Training data created in 'training_data.txt'")
create_sample_data()
Output:
Training data created in 'training_data.txt'
Step 3: Training a Basic FastText Model
- Trains a skipgram model using FastText on the created text file.
- Saves the trained word vector model to a .bin file.
Python
def train_simple_model():
# Train skipgram model (predicts context from target word)
model = fasttext.train_unsupervised(
'training_data.txt', # Input file
model='skipgram',
dim=50, # Embedding dimension
epoch=10, # Number of training iterations
minCount=1, # Minimum word frequency
minn=3, # Minimum character n-gram length
maxn=6 # Maximum character n-gram length
)
model.save_model('word_vectors.bin')
print("Model trained and saved as 'word_vectors.bin'")
return model
model = train_simple_model()
Output:
Model trained and saved as 'word_vectors.bin'
Step 4: Getting Word Vectors
- Retrieves vector representations of words using the trained model.
- Shows vector values for known and out-of-vocabulary (OOV) words.
Python
def get_word_embeddings(model):
king_vector = model.get_word_vector('king')
print(f"Vector for 'king': {king_vector[:5]}...")
print(f"Vector shape: {king_vector.shape}")
kingdom_vector = model.get_word_vector('kingdom')
print(f"Vector for 'kingdom' (OOV): {kingdom_vector[:5]}...")
return king_vector, kingdom_vector
king_vec, kingdom_vec = get_word_embeddings(model)
Output:
Vector for 'king': [-0.0001826 -0.00033079 0.0004302 0.00088911 -0.00164602]...
Vector shape: (50,)
Vector for 'kingdom' (OOV): [ 0.00122273 0.00092931 -0.00018005 -0.00013839 -0.00051276]...
Step 5: Finding Similar Words
- Uses the model to find top-k words most similar to a given query word.
- Displays similar words along with their similarity scores.
Python
def find_similar_words(model, word, k=3):
print(f"\nWords similar to '{word}':")
try:
neighbors = model.get_nearest_neighbors(word, k)
for i, (similarity, similar_word) in enumerate(neighbors, 1):
print(f"{i}. {similar_word}: {similarity:.4f}")
except Exception as e:
print(f"Error: {e}")
find_similar_words(model, 'king')
find_similar_words(model, 'running')
Output:
Words similar to 'king':
1. walks: 0.2693
2. running: 0.1971
3. queen: 0.1912
Words similar to 'running':
1. runner: 0.4778
2. the: 0.3344
3. runs: 0.2653
Step 6: Text Classification Implementation
- Creates labeled movie review data with __label__ prefixes for classification.
- Stores the data in movie_reviews.txt.
Python
def create_classification_data():
reviews = [
("This movie is amazing and fun", "positive"),
("Great acting and story", "positive"),
("Excellent film with good plot", "positive"),
("Wonderful cinematography", "positive"),
("Terrible movie very boring", "negative"),
("Bad acting and poor story", "negative"),
("Worst film ever made", "negative"),
("Boring and predictable plot", "negative")
]
with open('movie_reviews.txt', 'w') as f:
for text, label in reviews:
f.write(f"__label__{label} {text.lower()}\n")
print("Classification data created in 'movie_reviews.txt'")
create_classification_data()
Output:
Classification data created in 'movie_reviews.txt'
Step 7: Training Text Classifier
- Trains a FastText supervised model for sentiment classification.
- Saves the trained model to a file named text_classifier.bin.
Python
def train_text_classifier():
classifier = fasttext.train_supervised(
'movie_reviews.txt',
epoch=25,
lr=0.1,
wordNgrams=2,
verbose=2
)
classifier.save_model('text_classifier.bin')
print("Classifier trained and saved")
return classifier
classifier = train_text_classifier()
Output:
Classifier trained and saved
Step 8: Making Predictions
Python
def test_classifier(classifier):
test_sentences = [
"This is a fantastic movie",
"Boring and terrible film",
"Great story and acting",
"Worst movie I have seen"
]
print("\nClassification Results:")
print("-" * 40)
for sentence in test_sentences:
labels, probabilities = classifier.predict(sentence, k=1)
predicted_label = labels[0].replace('__label__', '')
confidence = probabilities[0]
print(f"Text: '{sentence}'")
print(f"Prediction: {predicted_label} (confidence: {confidence:.4f})\n")
test_classifier(classifier)
Output:
Final classification resultsEdge Cases
- Character encoding issues: FastText requires consistent UTF-8 encoding across training and inference data. Mixed encodings can lead to inconsistent subword generation.
- Optimal n-gram range: The choice of minimum and maximum n-gram lengths depends on the target language. For English, 3-6 character n-grams typically work well, while morphologically rich languages may benefit from longer ranges.
- Training data quality: FastText is sensitive to preprocessing decisions. Inconsistent tokenization or normalization can degrade model quality, particularly for subword-based features.
Practical Applications
FastText excels in scenarios requiring robust of morphological variations and out-of-vocabulary words. It's particularly effective for:
- Multilingual applications where training data may be limited for some languages
- Domain-specific text with specialized vocabulary not found in general corpora
- Real-time systems requiring fast inference and low memory overhead
- Text classification tasks where subword information provides discriminative features
The library's combination of efficiency and linguistic sophistication makes it a valuable tool for production NLP systems, especially when dealing with diverse or evolving vocabularies where traditional word-level approaches fall short.
Advantages and Limitations
Key Advantages
- OOV handling: Generates embeddings for unseen words through subword information
- Morphological awareness: Captures relationships between word variants (run, running, runner)
- Computational efficiency: Fast training and inference through hierarchical softmax
- Language flexibility: Works well with morphologically rich languages
Limitations
- Memory overhead: Requires more storage than traditional embeddings due to subword information
- Hyperparameter sensitivity: N-gram range (minn, maxn) significantly affects performance
- Limited semantic depth: May not capture complex semantic relationships as well as transformer-based models
Explore
Machine Learning Basics
Python for Machine Learning
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advanced Techniques
Machine Learning Practice