Web Science and Digital Libraries Research Group

Posts

Showing posts with the label Table Detection

2023-08-10: A Study on Reproducibility and Replicability of Table Structure Recognition Methods

By Kenny Ajayi - August 10, 2023

Introduction The realm of concerns surrounding reproducibility, replicability, and generalizability (RR&G) of findings has gained substantial attention within the social and behavioral sciences as well as artificial intelligence (AI). While these concerns have evolved over the past decade and have seen recognition in top-tier journals, they have recently extended their reach into the field of AI. Inconsistencies in terminologies have led to the adoption of precise definitions from Goodman et al. (2016) [7]. Reproducibility refers to consistent computational results under the same conditions, replicability involves achieving consistent results on similar datasets, and generalizability pertains to consistent results across different experimental contexts. AI's reproducibility studies have mostly targeted empirical and computational AI, focusing on open datasets, code availability, and metadata documentation. However, efforts towards the replicability of AI research have remained...

2023-01-10: A Summary of "Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations"

By Kenny Ajayi - January 10, 2023

Figure 1: Detecting tables and extracting table cell structures in document image (Fig 2 in Fischer et al. ) In the past decades, several works have been published on detecting and extracting tables both from in-text-tables and tables appearing in born-digital or scanned PDF documents [ Pyreddi et al. ]. Early work focus on using heuristics such as character alignment in table images to extract tables [ Pyreddi et al. ]. Recent works involve detecting the corners of the table cells and inferring their connectivity [ Seo et al. ] in document images (see Figure 2). Figure 2: Detecting table cell corners (Fig 2 in...

2022-12-29: A Summary of "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

By Kenny Ajayi - December 29, 2022

Figure 1: Traditional and Deep Learning Approaches to Table Recognition ( Hashmi et al. ) Table recognition refers to the process of using optical character recognition (OCR) and machine learning (ML) models to identify the rows, columns, and individual text cells in tables in digital documents either born-digital or scan PDFs. The task of table recognition has been under investigation for more than two decades for automatically extracting textual information from a variety of tables [ Kieninger et al. , Wei et al. ]. Automatic table recognition can be very challenging due to tables having different structures, data types, and misaligned data e...