Big datasets, small code chunks, and why I use Google Earth Engine

December 17, 2021

Have you ever found yourself in the midst of an analysis when suddenly, out of nowhere, it happens. That tiny, dreaded pinwheel appears indicating an error has occurred. Yes, that's right, they call it the spinning wheel of death. Your application freezes. Everything fades. Did it save?! You clutch your stress ball, watching helplessly as your computer approaches molten temperatures and begins to sputter uncanny, otherworldly sounds. WHIRRRRRRR. Your fate seems to rest on that miniature pinwheel -- whose outcome is no more predictable than a Magic Eight Ball. Your chances of recovery are somewhere between “Outlook not so good” and “Don’t count on it.”

If this sounds like your recurring nightmare, then you’ve likely worked with large geospatial datasets.

As a PhD Candidate in Environmental Science, Policy & Management at Berkeley, I routinely use satellite imagery to understand how land cover change is affecting animal movement and species interactions. This requires obtaining, processing, and storing a large amount of remote sensing data. For years, I thought crashing applications were simply part of the job, that is, until I learned about Google Earth Engine.

Google’s Earth Engine (GEE) is an online, cloud-based infrastructure to process geographical data. Officially unveiled in 2010, GEE has rapidly revolutionized the remote sensing world. The platform was designed to handle and process petabytes of geospatial data. GEE’s collection of ready-to-use data products eliminates the need for users to download and store satellite images on a laptop, or to change file formats or geographic projections. Instead, enormous geospatial datasets can be accessed and integrated in a couple lines of code. The best part? It’s open source. Without license fees, even beginner coders can conduct analyses at planetary-scales in a user-friendly and reproducible manner.

For today’s blog post, I’d like to share the reasons why I am learning to use Google Earth Engine and how you can get started too.

Big data, fast

GEE is both fast and big. First, its online code editor is designed for you to pull in datasets, process, and save to the cloud. Thus, any computations are performed not by one computer, but by various distributed Google servers processing in parallel. This is great for anyone trying to work on an outdated laptop (me).

Second, GEE’s public data catalog is vast. You can access over 200 datasets with more than 5-million images. It includes both raw satellite data (e.g. Landsat, MODIS, Sentinel, Pulsar, etc.) and derived satellite products -- in other words, datasets that are pre-classified to identify land cover type, global surface water, or vegetation indices, such as Normalized Difference Vegetation Index (NDVI). These public data are updated and error checked regularly. A major advantage is that you can quickly process many files over vast timeframes, stretching back to 1972 (i.e. Landsat). This capability has increased the ease of conducting time-series analysis, enabling a better understanding of how our climate, forests, and oceans have changed overtime. It has even changed how we measure crop yields!

Accessibility

While leading geospatial software, such as ESRI’S ArcGIS and ENVI, are unmatched in computational features and functions, such programs come with a hefty price tag. GEE, on the other hand, is free and available to any google user with an internet connection. This accessibility has enabled non-profits and researchers to conduct spatial analyses while avoiding license fees. So, if you might lose your access to geospatial software in the coming year, then GEE might be a helpful tool to learn and add to your arsenal.

GEE has two interfaces. One is the Graphical User Interface, which will allow you to quickly visualize and explore datasets. It is a point and click platform that allows you to do basic geometric and image calculations, save, export, and share. The second interface is the GEE code editor, which is the primary platform for any custom analysis. A major consideration in using GEE’s full functionality on this interface is that it requires some coding experience. GEE operates using JavaScript and related functions. Thus, if you know some JavaScript, you are bound to hit the ground running. If not, the platform does provide many tutorials and videos on how to get started in the code editor. It even includes an official guide to help users learn and implement functions.

Is JavaScript not for you? Don’t worry, you can also access GEE using Python. With this API, you can use Python syntax to pull GEE functionality. If that still doesn’t work for you, check out this new R package that accesses GEE functions by transforming the Python API into R syntax.

Ease of use

Learning to navigate GEE’s online code editor is fairly similar to other coding interfaces -- with a code editor for inputs and a console for outputs (Figure 1). The interface allows you to write, save, and manage code in the “script manager.” In your “asset manager” you can upload and store your own shapefiles, csv files, geoTIFFs, and more. Uploaded assets can then be called and incorporated into any analysis alongside data from the data catalog. A suite of geometry tools allows you to also drop markers and draw polygons with ease. Finally, any outputs can be visualized on the map alongside your layer manager. Best of all, you can share the browser link to send a duplicate of your code to any collaborators. Outputs can be exported to Google Drive and downloaded to your computer (So, just note, if you have limited cloud storage, you may run into saving issues).

Figure 1. Google Earth Engine’s code editor interface.

Another aspect to note is that GEE is probably better suited to image analyses than vector-based analyses. If you are mainly working with points and lines -- let’s say you need to intersect several polygons -- it’s best to stay within GIS. However, if you are extracting image data to several polygons overtime, then GEE is the way to go.

Reproducible workflows

GEE is highly reproducible, as it operates via a script that can be shared, rerun, and automatically saved. No more crashing applications or tedious user interfaces with hundreds of buttons.

Once you familiarize yourself with the GEE layout and begin running basic commands, you might want to consider integrating GEE into a hybrid workflow. I find GEE to be most useful as a rapid prototyping tool for geospatial analyses. For example, I might upload a geospatial file of my research site locations, use GEE to create a 20-year timeseries of NDVI across my sampling sites, then export this dataset to my computer for further manipulation and visualization in R or Python. GEE can reduce what was once a multi-hour task to a few minutes.

But GEE cannot do everything. In my explorations with GEE, I have also found that several data manipulation and visualization techniques are better suited to other platforms. For instance, the graphical output of GEE is pretty basic (Figure 2). Thus, I think, developing a hybrid workflow to use GEE for basic exploration of very large datasets and leaving the rest to your favorite coding platform, is your best bet.

Figure 2. A basic example of GEE’s graphical output (annual NDVI profiles of classified land cover types in Sonoma County, CA).

Growing user community

Another point in favor of learning GEE is that its user group is growing. Today, the GEE developer forum is an active place to find quick answers to your questions, making learning this platform significantly easier today than it was even 3 years ago. Additionally, if the change detection or classification algorithms you want are not included in the function dictionary, custom function workarounds are often published on forums by users. As the content keeps growing, scientists are pushing the boundaries of remote sensing in new exciting directions, such as developing methods to identify classification errors in imagery or using multiple satellite sources to reduce error from cloud cover.

Final thoughts

Whether you have been thinking of learning to use GEE for geospatial image analysis or only just heard of it now, I encourage you to give it a try this semester! Overall, I have found GEE to be a rapid and reproducible tool for analysis of large-scale spatial datasets. While GEE can’t do everything, it does the big stuff really fast. To get started, sign up for a free developer account here: https://earthengine.google.com/signup/(you’ll usually hear back in 2-3 days).And stay tuned for future workshops on Google Earth Engine from the D-Lab!

Big datasets, small code chunks, and why I use Google Earth Engine

Topics

Amy Van Scoyoc