[re-search] why the hell use google to search open datasets?

several days later and merely a lurker on this list, but my thoughts on

I think this was a missed opportunity (unsurprisingly) to standardize on
something like Datasheets for Datasets

On the one hand, cool, making research and datasets ostensibly easier to
find. On the other hand, attempts to make Google a one-stop-shop for
research is risky. Google probably wants to corner the market on being the
first AND last stop for information when people are looking for
information, but as a business goal I'm skeptical that it would actually
improve academic research and knowledge. Already there's bountiful evidence
that many people (especially college students who don't work with their
library) start and end their research on Google rather than using scholarly
databases (insert gripe about hard-to-access publicly-funded research
here), which limits the potential quality of their research results.

Extending those "lazy" research processes to datasets, and data analysis,
especially without exhaustive datasheets for the datasets, seems risky to
me. It's encouraging that datasets have dates associated with them, but I'm
curious where the description is coming from. If the datasheets for these
datasets are essentially algorithmically scraped and generated, is it any
better quality than the "knowledge block" blurbs that show up in some
Google search results? How can we validate the accuracy of it?
Easier-to-find data can easily correlate to better data for data analysts,
but it can just as easily correlate to really-off-base analyses where
someone did a basic keyword search for a complex dataset and neglected to
do the research into the limitations of the dataset. If the interface was
something that made information like: what does the data cover? who
collected the data? for what purpose? what features exist in the data?
which fields were collected and which were derived? what assumptions were
made when collecting the data? and other valuable context more visible, I'd
be much more encouraged. But it's not.

Thanks for sharing this with the list, Geert!

- Sarah Moir
(tech writer at an enterprise big data software company)

