Google just released the (beta) version of a dataset search a couple days ago (Dataset Search Link), allowing users to effectively and efficiently search publicly available databases. With the use of Google-fu (a quick graphic guide) or even regular Google skills, discovering data sets that might be buried in a web page are more readily discoverable.
Conversely, better standardization of your datasets allows for an increased likelihood that it will reach broader audiences in the near future. Google has set forth guidelines that you should consider following to ensure exposure.
Example Dataset Search
Say we’re interested in finding potential databases that might contain information on crop allocation in California without being aware of any well-known sources; if you already have a database to access the data you need, just head there first. With a simple search, we can identify any potentially available dataset and explore its contents. In this specific example, we were able to discover that the National Agricultural Statistics Service has a raster dataset called the Cropland Data Layer that shows annual crop-specific land cover for the entire continental United States that was created using moderate-resolution satellite imagery and validated/corrected with agriculture ground truth. (Notably, CropScape has its own web-based GIS viewer, shown below)
Database Guidelines Overview
As previously mentioned, Google has established guidelines that will increase the likelihood that your dataset may be found via this new tool. Notably, they recommend that your site should be marked up using JSON-LD (Microdata and RDFa are also supported). For help, use the standardized Schema.org to ensure you have the proper setup for your website’s markup (i.e. the HTML code describing webpage that can be “read” by Google and web browsers). Dataset-specific vocabulary is shown here and is also shown on as examples on the Google Datasets Guide.
Google has a created Structured Data Markup Helper to help generate the necessary HTML code. They note that you can test your data set using the following tool either using the code snippet for your webpage or by fetching the URL itself (link to tool). If you need additional help, Google’s forums are a valuable resource.