Duan et al., 2022 - Google Patents

Position-aware image captioning with spatial relation

Duan et al., 2022

Document ID: 12743885603719737125
Author: Duan Y; Wang Z; Wang J; Wang Y; Lin C
Publication year: 2022
Publication venue: Neurocomputing

External Links

Cited by

Snippet

Image caption aims to generate a language description of a given image. The problem can be solved by learning semantic information of visual objects and generating descriptions based on extracted embedding. However, the spatial relationship between visual objects …

Continue reading at www.sciencedirect.com (other versions)

230000000007 visual effect 0 abstract description 170

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6201—Matching; Proximity measures
- G06K9/6202—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/34—Segmentation of touching or overlapping patterns in the image field
- G06K9/342—Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis

Similar Documents

Publication	Publication Date	Title
CN111858954B (en)	2022-12-13	Task-oriented text-generated image network model
Gu et al.	2019	Unpaired image captioning via scene graph alignments
Chen et al.	2022	D 3 net: A unified speaker-listener architecture for 3d dense captioning and visual grounding
Kim et al.	2019	Semantic sentence matching with densely-connected recurrent and co-attentive information
Woo et al.	2018	Linknet: Relational embedding for scene graph
Chowdhury et al.	2025	Envqa: Improving visual question answering model by enriching the visual feature
Li et al.	2021	Transformer-based language-person search with multiple region slicing
Duan et al.	2022	Position-aware image captioning with spatial relation
Hu et al.	2021	A novel visual representation on text using diverse conditional gan for visual recognition
Lu et al.	2023	Prediction calibration for generalized few-shot semantic segmentation
Al Badarneh et al.	2025	An ensemble model with attention based mechanism for image captioning
Liu et al.	2024	Content-guided spatial–spectral integration network for change detection in HR remote sensing images
Zhao et al.	2022	Aligned visual semantic scene graph for image captioning
CN117131923B (en)	2025-08-12	A backdoor attack method and related device for cross-modal learning
Wang et al.	2022	Spatial-semantic collaborative graph network for textbook question answering
Yang et al.	2025	Pseudo-label enhancement for weakly supervised object detection using self-supervised vision transformer
Zhao et al.	2025	Grad-eclip: Gradient-based visual and textual explanations for clip
Lin et al.	2024	Decoupling foreground and background with Siamese ViT networks for weakly-supervised semantic segmentation
Zheng et al.	2020	BLAN: Bi-directional ladder attentive network for facial attribute prediction
Shao et al.	2022	Multi-stream feature refinement network for human object interaction detection
Xu et al.	2023	Panel-page-aware comic genre understanding
Zhu et al.	2024	Multi-modal large language model enhanced pseudo 3d perception framework for visual commonsense reasoning
Cao et al.	2025	Co-dance with Ambiguity: An Ambiguity-Aware Facial Expression Recognition Framework for More Robustness
Yellinek et al.	2025	3vl: Using trees to improve vision-language models’ interpretability
Li et al.	2021	Diversified text-to-image generation via deep mutual information estimation