Posts

Showing posts with the label DDU

2020-04-16: Visual Data Analysis with Streaming-hub

Image
Streaming-hub [ Link ] In my  previous post , I elaborated on how dataset metadata could be standardized in a manner that enables researchers to efficiently discover and reuse data already collected for past studies. Adopting such a standard brings a host of benefits to research communities – such as simplified data sharing, massively collaborative research, and automated data pre-processing. However, formulating and adapting such a standard would take years, if not decades, unless 1) the public realizes its practical benefits over the initial hassle of transition, and 2) tools and libraries are built that would ease workflows after transition. My previous post tries to addresses the first concern by introducing DFS and DDU. In this post, I describe our work towards addressing the second concern.

2019-06-03: Metadata on Datasets Saves You Time

Image
When I joined ODU this Spring 2019, I explored datasets in digital libraries with the hope of discovering ways to enable users to discover data, and for data to find its ways to users as my first task. This led to some interesting findings that I will elaborate in this post. First things first, let's take a look at what tools and platforms are available that attempt to make things easier for users to find and visualize data. A quick Google Search provided a link to  this awesome GitHub repository which contains a list of topic-centric public dataset repositories. This collection proved useful to gather the types of dataset descriptions available at present. The first dataset collection I explored was Kaggle. Here, the most upvoted dataset (as of May 31, 2019) was a CSV file with the topic "Credit Card Fraud Detection". Taking a quick look at the data, the first two columns provides a textual description of the content, but not the rest. Since I'm not the mai...