Google Cloud Storage (GCS) is an object storage service that can be used to store any type of data.
In this tutorial, we will learn how to work with Google Cloud Storage in your Java application, and perform common operations like uploading and downloading files.
If you just want to see the code, you can go to the java-gcp-examples Github repo.
Creating a Bucket
You can think of a bucket as the container that all your files and data get stored in.
To create a new container, navigate to the GCP console. You may need to enable billing first:
👆 You can click on any of the images here to enlarge them
Next, you can create a new bucket:
While creating a bucket, you’ll need to choose configuration options for your bucket:
Some of the important options are:
- Name - this is the identifier you use when accessing your bucket. The name of your bucket must be unique across all of Google Cloud.
- Storage location - When you create a bucket, you can choose its geographic location. GCS is a distributed system, and objects are stored in different locations around the world. The location of a bucket determines where its objects are stored.
- Storage class - this determines how your data is stored, and allows you to trade availability and cost. For example, the standard storage class is the most expensive, but you can access data instantly. Whereas coldline storage is the least expensive storage class, but it takes the longest to access data.
Once you’ve created a bucket, you can view its files and folders on the GCP console:
You can even manually upload files using the UI.
Installing Dependencies
For running the Java code in this example, we’ll be using a standard Maven project structure
First, we need to install the Google Cloud CLI.
Once installed, we can run gcloud auth application-default login
to update our authentication details.
After you login successfully, the gcloud SDK stores your authentication information in a default location on your computer, which is then used by our Java application.
Note: using the
application-default
login command is only recommended for development purposes. For production applications, you should use a service account.
Next, we need to install the Java library. For this, we can add the following as dependencies to our pom.xml
file:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>2.22.2</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.1-jre</version>
</dependency>
Uploading Files
Now that we have our bucket created, we can start uploading files to it.
First, let’s create a sample file to upload. We’ll create a file called sample.txt
which contains the text hello world
:
echo "hello world" > /tmp/sample.txt
Next, we can write the Java code to upload this file in our App
class:
public class App {
private static String projectId = "sohamkamani-demo";
private static String bucketName = "sohamkamani-demo-bucket";
private static String objectName = "sample.txt";
public static void main(String[] args) throws Exception {
uploadFile();
}
// upload file to GCS
public static void uploadFile() throws IOException {
// Create a new GCS client
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
// The blob ID identifies the newly created blob, which consists of a bucket name and an object
// name
BlobId blobId = BlobId.of(bucketName, objectName);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
// The filepath on our local machine that we want to upload
String filePath = "/tmp/sample.txt";
// upload the file and print the status
storage.createFrom(blobInfo, Paths.get(filePath));
System.out
.println("File " + filePath + " uploaded to bucket " + bucketName + " as " + objectName);
}
}
We can run this code by execution the following command:
mvn -DMAIN_CLASS=com.sohamkamani.storage.App assembly:single && java -jar target/java-gcp-examples-1.0-SNAPSHOT-jar-with-dependencies.jar
Once the code has executed, we should see the following output:
File /tmp/sample.txt uploaded to bucket sohamkamani-demo-bucket as sample.txt
Now, we can check the GCP console to see if the file was uploaded successfully:
Updating Files
We can also update files that are already present in our bucket.
In this case, we can run the same code as before, but change the contents of the file that we are uploading:
echo "hello world 2" > /tmp/sample.txt
If we run our code now, we should get the same output as before, but the file contents would be replaced by the new version.
Too keep track of different versions of the same file, every object in a GCS bucket has a generation number.
We can see the generation number of our file on the GCP console:
The generation number will change each time we upload a file to the same object location. The MD5 hash, on the other hand, will only change if the contents of the file itself have changed.
The Blob Instance
The Blob class represents an object stored in our bucket. In almost all cases, we’ll be using this class to interact with our files on GCS.
Although we didn’t use it in our previous example, the storage.createFrom
method returns a Blob
instance.
For all operations dealing with existing files, like downloading, deleting, or reading contents, we first need to get the Blob
instance for the file that we want to interact with:
Downloading Files
Let’s try to download the same file that we just uploaded.
Here, we’ll construct the Blob instance using the bucket name and location of our object, and then use the GCS client to download our file:
public class App {
private static String projectId = "sohamkamani-demo";
private static String bucketName = "sohamkamani-demo-bucket";
private static String objectName = "sample.txt";
public static void main(String[] args) throws Exception {
downloadFile();
}
// download file from GCS
public static void downloadFile() throws IOException {
// we'll download the same file to another file path
String filePath = "/tmp/sample_downloaded.txt";
// Create a new GCS client and get the blob object from the blob ID
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
BlobId blobId = BlobId.of(bucketName, objectName);
Blob blob = storage.get(blobId);
// download the file and print the status
blob.downloadTo(Paths.get(filePath));
System.out.println("File " + objectName + " downloaded to " + filePath);
}
}
Once we run this code, we should see the following output:
File sample.txt downloaded to /tmp/sample_downloaded.txt
And we should see the file appear in /tmp/sample_downloaded.txt
.
Reading Files
We can also read the contents of a file without downloading it:
// read contents of the file
public static void readFile() throws IOException {
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
BlobId blobId = BlobId.of(bucketName, objectName);
Blob blob = storage.get(blobId);
// read the contents of the file and print it
String contents = new String(blob.getContent());
System.out.println("Contents of file " + objectName + ": " + contents);
}
If we run this code, it’ll print the contents of the file:
Contents of file sample.txt: hello world
Deleting Files
Finally, we can also delete files from our bucket by calling the blob.delete()
method:
public static void deleteFile() throws IOException {
// Create a new GCS client and get the blob object from the blob ID
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
BlobId blobId = BlobId.of(bucketName, objectName);
Blob blob = storage.get(blobId);
// delete the file and print the status
blob.delete();
System.out.println("File " + objectName + " deleted from bucket " + bucketName);
}
Running this code should give us the following output:
File sample.txt deleted from bucket sohamkamani-demo-bucket
Note that deleting a file is not idempotent. If you try to delete a file that does not exist, the code will throw an exception:
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "com.google.cloud.storage.Blob.delete(com.google.cloud.storage.Blob$BlobSourceOption[])" because "blob" is null
at com.sohamkamani.storage.App.deleteFile(App.java:67)
at com.sohamkamani.storage.App.main(App.java:23)
So, make sure to check if the file exists before trying to delete it:
public static void deleteFile() throws IOException {
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
BlobId blobId = BlobId.of(bucketName, objectName);
Blob blob = storage.get(blobId);
if (blob == null) {
System.out.println("File " + objectName + " does not exist in bucket " + bucketName);
return;
}
blob.delete();
System.out.println("File " + objectName + " deleted from bucket " + bucketName);
}
How Folders Work
Cloud storage uses a flat namespace. This means that folders don’t actually exist, and all files are stored directly within the bucket.
You can still upload a file to, say, my_folder/sample.txt
, but the path is just a part of the file name, and doesn’t actually create any folder structure.
This is the same code as the upload file example, but with a folder prefix added to the file name:
public static void uploadFile() throws IOException {
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
// adding a folder prefix to the file name
BlobId blobId = BlobId.of(bucketName, "my_folder/" + objectName);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
String filePath = "/tmp/sample.txt";
storage.createFrom(blobInfo, Paths.get(filePath));
System.out
.println("File " + filePath + " uploaded to bucket " + bucketName + " as " + objectName);
}
Despite this, the Cloud Storage API gives us an illusion of a traditional file system for convenience. For example, we can still list files within a folder.
Listing Files in a Folder
We can list files just like we would in our local file system, using the storage.list
method:
// list all files in a folder or bucket
public static void listFiles() throws IOException {
// Create a new GCS client and get the blob object from the blob ID
Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
System.out.println("Files in bucket " + bucketName + ":");
// list all the blobs in the bucket
for (Blob blob : storage
.list(bucketName, BlobListOption.currentDirectory(), BlobListOption.prefix(""))
.iterateAll()) {
System.out.println(blob.getName());
}
}
This will give us the output:
Files in bucket sohamkamani-demo-bucket:
sample.txt
my_folder/
The options we pass to the storage.list
method are required to get the result in this format:
BlobListOption.prefix("")
- this is the prefix that we want to list files for. In this case, we want to list all files in the root directory, so we pass an empty string.BlobListOption.currentDirectory()
- this means that we only want to list files in the current directory. If we didn’t pass this option, we would get a list of all files in the bucket, including files in subdirectories. Since we’ve added this option, we’ll only get the files in the root directory, along with themy_folder
directory name
If we want to list the files within the my_folder
directory, we can pass BlobListOption.prefix("my_folder/")
instead, after which we’ll get the following output:
Files in bucket sohamkamani-demo-bucket:
my_folder/sample.txt
Conclusion
In this post, we learned how to work with Google Cloud Storage in Java, including how to set up your bucket, upload and download files, and list files in a folder.
For the most part, we can treat a GCS bucket as a regular file system, and use the same methods to interact with it. However, we must remember that under the hood, GCS is an object storage service, and not a file system. This means that there are some differences in how it works, and we need to be aware of them.
For example, since there is no inherent folder structure, there are major performance differences between moving or deleting folder in a local file system, and doing the same in GCS.
If you want to know more about how Cloud Storage works, you can read the official documentation.
If you want to try out all the examples listed here, you can find the code in the java-gcp-examples repo on Github.