Skip to main content

Questions tagged [clustering]

Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]

Filter by
Sorted by
Tagged with
0 votes
0 answers
15 views

Pattern analysis for time between events data

I am trying to subset data based on a pattern of "strings" or clusters of food deliveries to young that I see in my data (see plots labeled 2, 4, 5, 6, and 8 in the figure below for the most ...
thegrayson's user avatar
0 votes
0 answers
24 views

How to identify and quantify main tendencies across participants from cluster membership heatmaps?

I'd appreciate your thoughts on the following problem. I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η). Now, I'...
maria mystakidou's user avatar
1 vote
1 answer
112 views

Examining country-level effects based on individual-level data combined with country-level data

I am new to working with country-level effects in comparative OLS regression with individual-level data. Are there any good resources for this? Suppose my dependent variable is social integration (an ...
Olestan's user avatar
  • 51
0 votes
0 answers
42 views

Are there clustering algorithms or preprocessing strategies tailored for zero-inflated and continuous data types?

I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias) Dataset ...
Rishab's user avatar
  • 1
0 votes
0 answers
52 views

How to peform clustering on heavily right skewed data and zero inflated data

I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
Rishab's user avatar
  • 1
3 votes
1 answer
118 views

Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data

I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
Tom's user avatar
  • 1,112
2 votes
0 answers
65 views

Estimating number of clusters using Scikit Bayesian GMM

I am generating clustering data using the Bayesian mixture of Gaussian models described in Bishop's Pattern Recognition and Machine Learning textbook, with model parameters drawn from the following ...
PJB's user avatar
  • 21
1 vote
1 answer
59 views

Mixture-Based Clustering for Ordered Stereotype Model - Distance Scores

I have a 5-variable/3 category-level ordinal survey data set. E.g. 5 health variables ranked 1-3 (good-moderate-poor). I want to row-cluster different responses. But also, I want determine whether ...
EB3112's user avatar
  • 264
1 vote
0 answers
52 views

Are equal and diagonal variance matrices implicitly assumed in k-means clustering?

When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...
EngineerMathlover's user avatar
1 vote
0 answers
71 views

"How to validate if a dataset has natural clusters?"

I've recently learnt unsupervised learning methods such as KMeans and DBSCAN. While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
ssmalik's user avatar
  • 41
0 votes
1 answer
58 views

Data cross validation to predict label from cluster analysis [closed]

My project has the following steps: Use elbow method to determine the features and number of clusters for kmeans. Run kmeans on the data (with determined features and n clusters), and gives the ...
Xin Niu's user avatar
  • 103
0 votes
0 answers
28 views

What is the interval of values of the CDbw index for clustering internal evaluation?

I'm currently studying the CDbw (Compose Density between and within clusters) index, which is metric designed for internal clustering evaluation. The original article of this index was published in ...
DavideChicco.it's user avatar
0 votes
0 answers
72 views

How can UMAP improve HDBSCAN clustering results when it also uses nearest neighbors i.e., clustering, internally

I went through UMAPs official documentation which says HDBSCAN, being a density based algorithm suffers from curse of dimensionality and reducing dimensions with UMAP can improve the results. But! ...
Shradha's user avatar
0 votes
0 answers
48 views

Cluster Trajectories in LCGA and GMM: Stable Levels vs. Directional Trends

I am currently performing latent class growth analysis (LCGA) and growth mixture modeling (GMM) to identify distinct subgroups within my study population based on the longitudinal trajectories of a ...
Konstantinos Gkirgkiris's user avatar
4 votes
2 answers
112 views

Clustering based on the longitudinal trajectory of a single continuous variable

I am currently working on a longitudinal dataset in which I aim to cluster individuals based on the trajectory of a single continuous variable measured repeatedly across time (e.g., daily values). The ...
Konstantinos Gkirgkiris's user avatar

15 30 50 per page
1
2 3 4 5
270