Cloud Optimized GeoTIFF (COG) Overview | Introduction

Overview#

Cloud Optimized GeoTIFF (COG) relies on two auxiliary technologies.

The first is the storage capability of GeoTiff: pixels are stored in a special way, rather than just storing the raw pixels.
The second is the support for range requests in HTTP Get, which allows the client to request only the part of the file that is needed.

The storage method of GeoTIFF enables the easy retrieval of the data that needs to be processed by the client.

Organization of GeoTIFF#

COG uses two main data organization techniques: tiling and overviews, and data compression makes data transmission more efficient.

Tiling creates built-in tiles in the image, instead of simply using stripes of data. When using stripes of data, reading the entire data is required to obtain the specified data. With tiles, the required data can be quickly accessed in a specific area.

Overviews create multiple downsampled versions of the same image. Downsampling means that when an original image is "shrunk," many details are lost (the current 1 pixel may represent 100 or even 1000 pixels in the original image), and the data size is smaller. Usually, a GeoTIFF will have multiple overviews to match different zoom levels. This makes the server's response faster because it only needs to return the specific pixel values, without the need to determine which pixel value represents the 1000 pixels. However, this also increases the overall file size.

Data compression allows software to quickly access images, usually resulting in a better user experience. However, it is still important to make the work of HTTP GET range requests more efficient.

HTTP Get Range Requests#

HTTP 1.1 introduced a powerful feature: range requests, which can be used by the client when requesting data from the server using a GET request. If the server has Accept-Ranges: bytes in the response header, it means that the bytes in the data can be requested by the client in any desired way. This is often referred to as "Byte Serving," and its working principle is explained in detail in the Wikipedia article. Clients can request the required bytes from the server. In the web domain, this is widely used, such as in video services, so that clients can operate on the file without downloading the entire file.

Range requests are an optional field, so the server is not required to implement it. However, most cloud service providers (Amazon, Google, Microsoft, OpenStack, etc.) provide this option in their object storage tools. Therefore, most data stored in the cloud can provide range request services.

Integration#

After introducing these two technologies, it becomes clear how they work together. The tiles and overviews in GeoTIFF are stored in the cloud file with a specific structure, so that range requests can request the relevant parts of the file.

Overviews work when the client wants to render a quick view of the entire image. The entire process does not require downloading every pixel. Instead, it requests a smaller, pre-created overview. The specific structure of the GeoTIFF file allows the server that supports HTTP range requests to easily provide the required parts of the entire file to the client.

Tiling is useful when only a part of the entire image needs to be processed or visualized. This can be part of an overview or the full resolution. It is worth noting that the tiles organize all the relevant data in the same location in the file, so range requests can retrieve them when needed.

If a GeoTIFF is not "cloud optimized" with overviews and tiling, it can still be remotely operated, but it requires downloading the entire data or more data than actually needed.

Advantages#

More and more geospatial data is being migrated to the cloud ☁️, and most of it is stored in cloud-based object storage, such as S3 or Google Cloud Storage. Traditional GIS file formats can be easily stored in the cloud, but they are no longer efficient for providing web map tile services or performing fast data processing. Usually, the data needs to be downloaded to another location and then converted into a more optimized format or read into memory.

Cloud Optimized GeoTIFF makes geospatial data workflows on cloud-based services possible by using some small techniques to make data streaming more efficient. Online image platforms such as Planet Platform and GBDX use this approach to provide fast image services. Software using COG technology can optimize execution time by retrieving only the necessary data.

Many new geospatial software, such as GeoTrellis, Google Earth Engine, and IDAHO, also incorporate the concept of COG into their software architecture. Each processing node performs image processing at high speed by retrieving parts of the COG file.

In terms of the impact on the existing GeoTIFF standard, it does not introduce a new file format. Current software can read COG without any modifications. They do not need to have the ability to process streaming files, they simply need to download and read the entire file.

Providing Cloud Optimized GeoTIFF format files in the cloud can help reduce a large number of file copies. Online software can use streaming files without the need for their own copies, making it more efficient, which is a common pattern today. In addition, data providers do not need to provide multiple formats of data, as both old and new software can read these data. Data providers only need to update one version of the data, and multiple online software can use it at the same time without the need for additional copies and downloads.

QUICK START#

Introduction#

This tutorial explains how developers can use and produce Cloud Optimized GeoTIFF.

Reading#

The simplest way to use it is to use the VSI Curl function of GDAL. You can read the GDAL Wiki section on How to read it with GDAL. Most geospatial software today use GDAL as a dependency library, so introducing GDAL is the fastest way to read COG functionality.

On Planet, all data is already in COG format, and there is a tutorial on downloading: download part of an image. Most tutorials only explain how to use the Planet API, but they also explain how GDAL Warp can extract a single workspace from a large COG file.

Creating#

Also on the GDAL wiki page about COG, How to generate it with GDAL.

$ gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

Or use the rio-cogeo plugin:

$ rio cogeo create in.tif out.tif --cog-profile deflate

Other geospatial software should also be able to add appropriate overviews and tiling.

Validation#

Use the rio-cogeo plugin:

$ rio cogeo validate test.tif

References#

https://www.cogeo.org/