Questions tagged [clustering]
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
4,044 questions
0
votes
0
answers
15
views
Pattern analysis for time between events data
I am trying to subset data based on a pattern of "strings" or clusters of food deliveries to young that I see in my data (see plots labeled 2, 4, 5, 6, and 8 in the figure below for the most ...
0
votes
0
answers
24
views
How to identify and quantify main tendencies across participants from cluster membership heatmaps?
I'd appreciate your thoughts on the following problem.
I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η).
Now, I'...
1
vote
1
answer
112
views
Examining country-level effects based on individual-level data combined with country-level data
I am new to working with country-level effects in comparative OLS regression with individual-level data. Are there any good resources for this?
Suppose my dependent variable is social integration (an ...
0
votes
0
answers
42
views
Are there clustering algorithms or preprocessing strategies tailored for zero-inflated and continuous data types?
I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias)
Dataset ...
0
votes
0
answers
52
views
How to peform clustering on heavily right skewed data and zero inflated data
I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
3
votes
1
answer
118
views
Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data
I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
2
votes
0
answers
65
views
Estimating number of clusters using Scikit Bayesian GMM
I am generating clustering data using the Bayesian mixture of Gaussian models described in Bishop's Pattern Recognition and Machine Learning textbook, with model parameters drawn from the following ...
1
vote
1
answer
59
views
Mixture-Based Clustering for Ordered Stereotype Model - Distance Scores
I have a 5-variable/3 category-level ordinal survey data set. E.g. 5 health variables ranked 1-3 (good-moderate-poor).
I want to row-cluster different responses. But also, I want determine whether ...
1
vote
0
answers
52
views
Are equal and diagonal variance matrices implicitly assumed in k-means clustering?
When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...
1
vote
0
answers
71
views
"How to validate if a dataset has natural clusters?"
I've recently learnt unsupervised learning methods such as KMeans and DBSCAN.
While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
0
votes
1
answer
58
views
Data cross validation to predict label from cluster analysis [closed]
My project has the following steps:
Use elbow method to determine the features and number of clusters for kmeans.
Run kmeans on the data (with determined features and n clusters), and gives the ...
0
votes
0
answers
28
views
What is the interval of values of the CDbw index for clustering internal evaluation?
I'm currently studying the CDbw (Compose Density between and within clusters) index, which is metric designed for internal clustering evaluation.
The original article of this index was published in ...
0
votes
0
answers
72
views
How can UMAP improve HDBSCAN clustering results when it also uses nearest neighbors i.e., clustering, internally
I went through UMAPs official documentation which says HDBSCAN, being a density based algorithm suffers from curse of dimensionality and reducing dimensions with UMAP can improve the results. But! ...
0
votes
0
answers
48
views
Cluster Trajectories in LCGA and GMM: Stable Levels vs. Directional Trends
I am currently performing latent class growth analysis (LCGA) and growth mixture modeling (GMM) to identify distinct subgroups within my study population based on the longitudinal trajectories of a ...
4
votes
2
answers
112
views
Clustering based on the longitudinal trajectory of a single continuous variable
I am currently working on a longitudinal dataset in which I aim to cluster individuals based on the trajectory of a single continuous variable measured repeatedly across time (e.g., daily values). The ...