Where this fits in K8s strategy
Work out the right data storage tools and methods for your cluster
Why it’s important
Is everything stateless in your apps? You need to store your application data somewhere.
Kubernetes doesn’t store data itself
Containers running in Kubernetes are stateless i.e. they don’t store data.
Well, they can. But only for as long as the host pod lasts. That data can get stored in the scratch space – that space on the VM dedicated to temporary data.
Fine when the app’s processing transactional data.
But what about when you need re-accessible data?
With the default setup, that option’s not available. At that point, you need to consider dedicated storage options.
3 ways to package and store cluster data
These are the storage types that will serve your containers as Persistent Volumes (PV). The PV is Kubernetes’ way of software-defined access to storage.
Block Storage
In a nutshell: data stored in very distributed form
Mechanics: data broken down into blocks with ID and distributed by SAN
Best use cases: rapid, access to many streams of data like transactions
Benefits: fast processing of data like transactions and e-commerce activity
Risks: complex distribution pattern requires prior planning
Works best with data that is distributed and needs speed – containers
Associated terms: SAN
Object Storage
In a nutshell: use it to build your data lake
Mechanics: data broken down into objects stored in single repository
Best use cases: streaming video, big data (exabyte level), IoT data
Benefits: handles large loads of unstructured data in scalable way
Risks: high latency slows down database work, not great with data change
Works best with data that doesn’t update all the time – like video
Associated terms: CDN integration
File Storage
In a nutshell: traditional method of data storage
Mechanics: whole files stored in directories like on your PC desktop
Best use cases: file collaboration, long-term backups and archiving
Benefits: simple to work out and implement – it’s like having office folders
Risks: higher latency (slow) for data retrieval, one OS system only
Works best with data that is structured i.e. files and folders
Associated terms: NAS, NTFS / NFS systems
Jargon buster NAS = Network Attached Storage – single storage device serving multiple clients CDN = Content Delivery Network – distributes high latency data as close to requester as possible SAN = Storage Area Network – multiple storage devices creating a data storage network
Which is the preferred Kubernetes storage form?
Block storage gets a lot of mentions when it comes to Kubernetes. Makes sense.
Kubernetes is about orchestrating containers i.e. distributed services. And block storage is ideal for very distributed data storage. What a winning combo!
But wait, it’s not over. There is a bit more to storage than that.
Using more than one storage form is not unusual
Chances are your application/s will call for more than 1 storage form.
For example, you’d use block storage to services that need fast access to data. Data that users rarely request can go into file storage for archiving and slower retrieval.
Now, tactical advice regarding storage setup
- Pick the right storage type for the data you’re serving (like mentioned above)
- Make sure the persistent (permanent) storage is decoupled from the containers
- Learn how to setup the persistent volume (PV) from PureStorage
- Pick the right storage plugins – follow Blocks and Files for more on that