Google Releases Data Search Engine In Pilot Program
Google Releases Data Search Engine In Pilot Program
Google announced a search engine on Wednesday that it calls Dataset Search, so scientists, data journalists, data geeks, or anyone who is curious can find data about a variety of topics.
The feature is in pilot. It finds data where it’s hosted and uses schema.org and other metadata standards that can be added to pages describing datasets. It aims to improve the discovery of datasets from fields such as life sciences, social sciences, machine learning, and civic and government data.
For instance, those wanting to analyze daily weather records can query it in the search engine. Querying generic phrases works well, but more specific searches do not, such as “daily weather in Huntington Beach, California,” which returns nothing.
Perhaps these types of more specific data searches will improve once more publishers add their datasets to the index.
Google developed guidelines for dataset providers to describe their data in a way that Google and other search engines can understand the content of their pages.
The guidelines include information about datasets such as who created them, when the data was published, how the data was collected, what the terms are for using the data, and more.
Any publisher can add datasets to the index by marking up their published data web pages. Some examples that quality include a file in a proprietary format that contains data, a structured object with data in some other format that you might want to load into a special tool for processing, and images that capture data.
Google says the search engine is one project in a series to bring datasets more prominently into its products. The company also recently made it easier to discover tabular data in search. It uses the same metadata, along with the linked tabular data, to provide answers to queries directly in search results. That search initiative focuses more on news organizations and data journalists. Dataset search can be useful to a much broader audience.
(39)