Questions tagged [clustering]
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
                4,044 questions
            
            
            0
            votes
        
        
            0
            answers
        
        
            15
            views
        
    Pattern analysis for time between events data
                I am trying to subset data based on a pattern of "strings" or clusters of food deliveries to young that I see in my data (see plots labeled 2, 4, 5, 6, and 8 in the figure below for the most ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            24
            views
        
    How to identify and quantify main tendencies across participants from cluster membership heatmaps?
                I'd appreciate your thoughts on the following problem.
I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η).
Now, I'...
            
        
       
    
            1
            vote
        
        
            1
            answer
        
        
            112
            views
        
    Examining country-level effects based on individual-level data combined with country-level data
                I am new to working with country-level effects in comparative OLS regression with individual-level data. Are there any good resources for this?
Suppose my dependent variable is social integration (an ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            42
            views
        
    Are there clustering algorithms or preprocessing strategies tailored for zero-inflated and continuous data types?
                I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias)
Dataset ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            52
            views
        
    How to peform clustering on heavily right skewed data and zero inflated data
                I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
            
        
       
    
            3
            votes
        
        
            1
            answer
        
        
            118
            views
        
    Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data
                I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
            
        
       
    
            2
            votes
        
        
            0
            answers
        
        
            65
            views
        
    Estimating number of clusters using Scikit Bayesian GMM
                I am generating clustering data using the Bayesian mixture of Gaussian models described in Bishop's Pattern Recognition and Machine Learning textbook, with model parameters drawn from the following ...
            
        
       
    
            1
            vote
        
        
            1
            answer
        
        
            59
            views
        
    Mixture-Based Clustering for Ordered Stereotype Model - Distance Scores
                I have a 5-variable/3 category-level ordinal survey data set. E.g. 5 health variables ranked 1-3 (good-moderate-poor).
I want to row-cluster different responses. But also, I want determine whether ...
            
        
       
    
            1
            vote
        
        
            0
            answers
        
        
            52
            views
        
    Are equal and diagonal variance matrices implicitly assumed in k-means clustering?
                When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...
            
        
       
    
            1
            vote
        
        
            0
            answers
        
        
            71
            views
        
    "How to validate if a dataset has natural clusters?"
                I've recently learnt unsupervised learning methods such as KMeans and DBSCAN.
While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            58
            views
        
    Data cross validation to predict label from cluster analysis [closed]
                My project has the following steps:
Use elbow method to determine the features and number of clusters for kmeans.
Run kmeans on the data (with determined features and n clusters), and gives the ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            28
            views
        
    What is the interval of values of the CDbw index for clustering internal evaluation?
                I'm currently studying the CDbw (Compose Density between and within clusters) index, which is metric designed for internal clustering evaluation.
The original article of this index was published in ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            72
            views
        
    How can UMAP improve HDBSCAN clustering results when it also uses nearest neighbors i.e., clustering, internally
                I went through UMAPs official documentation which says HDBSCAN, being a density based algorithm suffers from curse of dimensionality and reducing dimensions with UMAP can improve the results. But! ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            48
            views
        
    Cluster Trajectories in LCGA and GMM: Stable Levels vs. Directional Trends
                I am currently performing latent class growth analysis (LCGA) and growth mixture modeling (GMM) to identify distinct subgroups within my study population based on the longitudinal trajectories of a ...
            
        
       
    
            4
            votes
        
        
            2
            answers
        
        
            112
            views
        
    Clustering based on the longitudinal trajectory of a single continuous variable
                I am currently working on a longitudinal dataset in which I aim to cluster individuals based on the trajectory of a single continuous variable measured repeatedly across time (e.g., daily values). The ...
            
        
       
     
        