You may have heard about Segment Anything or Depth Anything. Now there’s also Match Anything, available in the Transformers library. Read more below:
Exciting model addition to Hugging Face Transformers: MatchAnything is now available! 🔥 MatchAnything is a strong universal image matching model, pre-trained on a large scale involving different imaging modalities. This allows it to exhibit remarkable generalizability on unseen multi-modality matching and registration tasks. Image matching has many applications, like image stitching (think about the "panorama" feature in your phone), merging satellite images to be displayed on Google Earth or Street View, stichting together medical images from a scan, etc. The key contribution of the MatchAnything paper is the pre-training framework. The authors collect a massive, diverse dataset synthesized with cross-modal stimulus signals. This includes multi-view images with 3D reconstructions, large-scale unlabelled video sequences, and vast single-image datasets. Furthermore, synthetic data is used via image generation techniques (like style transfer and depth estimation). Collecting such diverse data teaches the model to recognize fundamental, appearance-insensitive structures. The authors applied their framework to train 2 popular image matching models: EfficientLofTR and ROMA. EfficientLofTR was recently integrated in the Transformers library, hence the weights which the MatchAnything authors released are now available. The model is Apache 2.0 licensed, which means you can adopt it for commercial purposes too! Big kudos to Steven Bucaille for making image matching models easier accessible to the community. Resources: - model: https://lnkd.in/eba9Fukx - docs: https://lnkd.in/eba9Fukx - demo: https://lnkd.in/ekruNM7W