FastText Working and Implementation

Last Updated : 28 Jul, 2025

Word embeddings have become an important part of modern natural language processing, but traditional approaches like Word2Vec struggle with out-of-vocabulary words and morphologically rich languages. FastText addresses these limitations through a subword-based approach that captures semantic meaning at the character level while maintaining computational efficiency.

Understanding FastText Architecture

FastText extends the Skip-gram and CBOW models by representing words as bags of character n-grams rather than atomic units. This fundamental shift allows the model to generate embeddings for previously unseen words and capture morphological relationships between related terms.

The Subword Approach

Traditional word embedding models treat each word as an indivisible token. FastText breaks words into character n-grams, enabling it to understand word structure and meaning at a granular level.

Consider the word "running":

3-grams: <ru, run, unn, nni, nin, ing, ng>
4-grams: <run, runn, unni, nnin, ning, ing>
5-grams: <runn, runni, unnin, nning, ning>

The angle brackets indicate word boundaries, helping the model distinguish between subwords that appear at different positions.

Hierarchical Softmax Optimization

FastText employs hierarchical softmax instead of standard softmax for computational efficiency. Rather than computing probabilities across all vocabulary words, it constructs a binary tree where each leaf represents a word and internal nodes represent probability distributions.

Key advantages of hierarchical softmax:

Reduces time complexity from O(V) to O(log V) where V is vocabulary size
Uses Huffman coding to optimize frequent word access
Maintains prediction accuracy while significantly improving training speed

Step-by-Step Implementation

Step 1: Installing and Importing FastText

Install FastText using pip and import the required libraries:

Python

import fasttext
import os

Note: Use numpy==1.24.4 for compatibility with FastText

Step 2: Creating Training Data

Prepares example sentences related to royalty, exercise and reading.
Writes each sentence in lowercase into a text file for FastText training.

Python

def create_sample_data():
    # Sample sentences for training
    sentences = [
        "The king rules the kingdom",
        "The queen helps the king",
        "Running is good exercise", 
        "The runner runs fast",
        "Walking is healthy activity",
        "The walker walks slowly",
        "Reading books is fun",
        "The reader reads daily"
    ]
    
    # Save to text file (one sentence per line)
    with open('training_data.txt', 'w') as f:
        for sentence in sentences:
            f.write(sentence.lower() + '\n')  # Convert to lowercase
    
    print("Training data created in 'training_data.txt'")

create_sample_data()

Output:

Training data created in 'training_data.txt'

Step 3: Training a Basic FastText Model

Trains a skipgram model using FastText on the created text file.
Saves the trained word vector model to a .bin file.

Python

def train_simple_model():
    # Train skipgram model (predicts context from target word)
    model = fasttext.train_unsupervised(
        'training_data.txt',    # Input file
        model='skipgram',       
        dim=50,                 # Embedding dimension
        epoch=10,               # Number of training iterations
        minCount=1,             # Minimum word frequency
        minn=3,                 # Minimum character n-gram length
        maxn=6                  # Maximum character n-gram length
    )
    
    model.save_model('word_vectors.bin')
    print("Model trained and saved as 'word_vectors.bin'")
    return model


model = train_simple_model()

Output:

Model trained and saved as 'word_vectors.bin'

Step 4: Getting Word Vectors

Retrieves vector representations of words using the trained model.
Shows vector values for known and out-of-vocabulary (OOV) words.

Python

def get_word_embeddings(model):
    king_vector = model.get_word_vector('king')
    print(f"Vector for 'king': {king_vector[:5]}...")
    print(f"Vector shape: {king_vector.shape}")
    
    kingdom_vector = model.get_word_vector('kingdom')
    print(f"Vector for 'kingdom' (OOV): {kingdom_vector[:5]}...")
    
    return king_vector, kingdom_vector

king_vec, kingdom_vec = get_word_embeddings(model)

Output:

Vector for 'king': [-0.0001826 -0.00033079 0.0004302 0.00088911 -0.00164602]...
Vector shape: (50,)
Vector for 'kingdom' (OOV): [ 0.00122273 0.00092931 -0.00018005 -0.00013839 -0.00051276]...

Step 5: Finding Similar Words

Uses the model to find top-k words most similar to a given query word.
Displays similar words along with their similarity scores.

Python

def find_similar_words(model, word, k=3):
    print(f"\nWords similar to '{word}':")
    try:
        neighbors = model.get_nearest_neighbors(word, k)
        for i, (similarity, similar_word) in enumerate(neighbors, 1):
            print(f"{i}. {similar_word}: {similarity:.4f}")
    except Exception as e:
        print(f"Error: {e}")

find_similar_words(model, 'king')
find_similar_words(model, 'running')

Output:

Words similar to 'king':
1. walks: 0.2693
2. running: 0.1971
3. queen: 0.1912
Words similar to 'running':
1. runner: 0.4778
2. the: 0.3344
3. runs: 0.2653

Step 6: Text Classification Implementation

Creates labeled movie review data with __label__ prefixes for classification.
Stores the data in movie_reviews.txt.

Python

def create_classification_data():
    reviews = [
        ("This movie is amazing and fun", "positive"),
        ("Great acting and story", "positive"), 
        ("Excellent film with good plot", "positive"),
        ("Wonderful cinematography", "positive"),
        ("Terrible movie very boring", "negative"),
        ("Bad acting and poor story", "negative"),
        ("Worst film ever made", "negative"),
        ("Boring and predictable plot", "negative")
    ]
    
    with open('movie_reviews.txt', 'w') as f:
        for text, label in reviews:
            f.write(f"__label__{label} {text.lower()}\n")
    
    print("Classification data created in 'movie_reviews.txt'")

create_classification_data()

Output:

Classification data created in 'movie_reviews.txt'

Step 7: Training Text Classifier

Trains a FastText supervised model for sentiment classification.
Saves the trained model to a file named text_classifier.bin.

Python

def train_text_classifier():
    classifier = fasttext.train_supervised(
        'movie_reviews.txt',
        epoch=25,
        lr=0.1,
        wordNgrams=2,
        verbose=2
    )
    
    classifier.save_model('text_classifier.bin')
    print("Classifier trained and saved")
    return classifier

classifier = train_text_classifier()

Output:

Classifier trained and saved

Step 8: Making Predictions

Python

def test_classifier(classifier):
    test_sentences = [
        "This is a fantastic movie",
        "Boring and terrible film", 
        "Great story and acting",
        "Worst movie I have seen"
    ]
    
    print("\nClassification Results:")
    print("-" * 40)
    
    for sentence in test_sentences:
        labels, probabilities = classifier.predict(sentence, k=1)
        predicted_label = labels[0].replace('__label__', '')
        confidence = probabilities[0]
        print(f"Text: '{sentence}'")
        print(f"Prediction: {predicted_label} (confidence: {confidence:.4f})\n")

test_classifier(classifier)

Output:

FastText-Classification — Final classification results

Edge Cases

Character encoding issues: FastText requires consistent UTF-8 encoding across training and inference data. Mixed encodings can lead to inconsistent subword generation.
Optimal n-gram range: The choice of minimum and maximum n-gram lengths depends on the target language. For English, 3-6 character n-grams typically work well, while morphologically rich languages may benefit from longer ranges.
Training data quality: FastText is sensitive to preprocessing decisions. Inconsistent tokenization or normalization can degrade model quality, particularly for subword-based features.

Practical Applications

FastText excels in scenarios requiring robust of morphological variations and out-of-vocabulary words. It's particularly effective for:

Multilingual applications where training data may be limited for some languages
Domain-specific text with specialized vocabulary not found in general corpora
Real-time systems requiring fast inference and low memory overhead
Text classification tasks where subword information provides discriminative features

The library's combination of efficiency and linguistic sophistication makes it a valuable tool for production NLP systems, especially when dealing with diverse or evolving vocabularies where traditional word-level approaches fall short.

Advantages and Limitations

Key Advantages

OOV handling: Generates embeddings for unseen words through subword information
Morphological awareness: Captures relationships between word variants (run, running, runner)
Computational efficiency: Fast training and inference through hierarchical softmax
Language flexibility: Works well with morphologically rich languages

Limitations

Memory overhead: Requires more storage than traditional embeddings due to subword information
Hyperparameter sensitivity: N-gram range (minn, maxn) significantly affects performance
Limited semantic depth: May not capture complex semantic relationships as well as transformer-based models

shristikotaiah

Improve

Article Tags :

FastText Working and Implementation

Understanding FastText Architecture

The Subword Approach

Hierarchical Softmax Optimization

Step-by-Step Implementation

Step 1: Installing and Importing FastText

Step 2: Creating Training Data

Step 3: Training a Basic FastText Model

Step 4: Getting Word Vectors

Step 5: Finding Similar Words

Step 6: Text Classification Implementation

Step 7: Training Text Classifier

Step 8: Making Predictions

Edge Cases

Practical Applications

Advantages and Limitations

Key Advantages

Limitations

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?