What are the best storage options for your K8s cluster?

Where this fits in K8s strategy

Work out the right data storage tools and methods for your cluster

Why it’s important

Is everything stateless in your apps? You need to store your application data somewhere.

Kubernetes doesn’t store data itself

Containers running in Kubernetes are stateless i.e. they don’t store data.

Well, they can. But only for as long as the host pod lasts. That data can get stored in the scratch space – that space on the VM dedicated to temporary data.

Fine when the app’s processing transactional data.

But what about when you need re-accessible data?

With the default setup, that option’s not available. At that point, you need to consider dedicated storage options.

3 ways to package and store cluster data

These are the storage types that will serve your containers as Persistent Volumes (PV). The PV is Kubernetes’ way of software-defined access to storage.

Block Storage

In a nutshell: data stored in very distributed form

Mechanics: data broken down into blocks with ID and distributed by SAN

Best use cases: rapid, access to many streams of data like transactions

Benefits: fast processing of data like transactions and e-commerce activity

Risks: complex distribution pattern requires prior planning

Works best with data that is distributed and needs speed – containers

Associated terms: SAN

Object Storage

In a nutshell: use it to build your data lake

Mechanics: data broken down into objects stored in single repository

Best use cases: streaming video, big data (exabyte level), IoT data

Benefits: handles large loads of unstructured data in scalable way

Risks: high latency slows down database work, not great with data change

Works best with data that doesn’t update all the time – like video

Associated terms: CDN integration

File Storage

In a nutshell: traditional method of data storage

Mechanics: whole files stored in directories like on your PC desktop

Best use cases: file collaboration, long-term backups and archiving

Benefits: simple to work out and implement – it’s like having office folders

Risks: higher latency (slow) for data retrieval, one OS system only

Works best with data that is structured i.e. files and folders

Associated terms: NAS, NTFS / NFS systems

Jargon buster NAS = Network Attached Storage – single storage device serving multiple clients CDN = Content Delivery Network – distributes high latency data as close to requester as possible SAN = Storage Area Network – multiple storage devices creating a data storage network

Which is the preferred Kubernetes storage form?

Block storage gets a lot of mentions when it comes to Kubernetes. Makes sense.

Kubernetes is about orchestrating containers i.e. distributed services. And block storage is ideal for very distributed data storage. What a winning combo!

But wait, it’s not over. There is a bit more to storage than that.

Using more than one storage form is not unusual

Chances are your application/s will call for more than 1 storage form.

For example, you’d use block storage to services that need fast access to data. Data that users rarely request can go into file storage for archiving and slower retrieval.

Now, tactical advice regarding storage setup

Leave a Comment