[re-search] why the hell use google to search open datasets?

Sarah Moir smoir at iwu.edu
Wed Jan 29 08:24:05 CET 2020


several days later and merely a lurker on this list, but my thoughts on
this:

I think this was a missed opportunity (unsurprisingly) to standardize on
something like Datasheets for Datasets
<https://arxiv.org/pdf/1803.09010.pdf>.

On the one hand, cool, making research and datasets ostensibly easier to
find. On the other hand, attempts to make Google a one-stop-shop for
research is risky. Google probably wants to corner the market on being the
first AND last stop for information when people are looking for
information, but as a business goal I'm skeptical that it would actually
improve academic research and knowledge. Already there's bountiful evidence
that many people (especially college students who don't work with their
library) start and end their research on Google rather than using scholarly
databases (insert gripe about hard-to-access publicly-funded research
here), which limits the potential quality of their research results.

Extending those "lazy" research processes to datasets, and data analysis,
especially without exhaustive datasheets for the datasets, seems risky to
me. It's encouraging that datasets have dates associated with them, but I'm
curious where the description is coming from. If the datasheets for these
datasets are essentially algorithmically scraped and generated, is it any
better quality than the "knowledge block" blurbs that show up in some
Google search results? How can we validate the accuracy of it?
Easier-to-find data can easily correlate to better data for data analysts,
but it can just as easily correlate to really-off-base analyses where
someone did a basic keyword search for a complex dataset and neglected to
do the research into the limitations of the dataset. If the interface was
something that made information like: what does the data cover? who
collected the data? for what purpose? what features exist in the data?
which fields were collected and which were derived? what assumptions were
made when collecting the data? and other valuable context more visible, I'd
be much more encouraged. But it's not.

Thanks for sharing this with the list, Geert!

- Sarah Moir
(tech writer at an enterprise big data software company)

On Fri, Jan 24, 2020 at 12:48 AM Geert Lovink <geert at xs4all.nl> wrote:

> my gut feeling says that we should ban this product… what do you think?
> https://blog.google/products/search/discovering-millions-datasets-web/
>
>
> _______________________________________________
> re-search mailing list
> re-search at listcultures.org
> http://listcultures.org/mailman/listinfo/re-search_listcultures.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listcultures.org/pipermail/re-search_listcultures.org/attachments/20200128/3e80abf1/attachment.html>


More information about the re-search mailing list