Peer Fusion Big Data Solutions
Peer Fusion Supra-Clusters™ are designed for the massive scaling of capacity, resiliency, and performance with simple administration.
Supra-Clusters™ virtualize a group of clusters into a single storage pool. They export a mount point that provides access to every storage element within the pool: the entirety of the pool, a specific cluster, or a specific peer within a cluster. Every component of a Supra-Clusters™ is accessible through a standard path name.
There is virtually no limit to the number of clusters that can be aggregated into a Supra-Cluster™. Thousands of clusters, each with their own peers, provide tens of thousands of storage units with the enormous computational power required to utilize their data.
The peers in a Supra-Cluster™ provide a local mount point for client applications to read the local data. The entire Supra-Cluster™ is available through mount points for client applications on peers to store data.
Data can be laid out across the clusters in different ways to optimize access by the client applications. All data layouts preserve the resiliency levels configured for the clusters. Data can be dispersed across clusters as large contiguous data sets to ensure its localization. Data can localized to a single peer to ensure optimal access to data sets of arbitrary size. Data can be striped across clusters to ensure a balanced capacity and network load.
The Peer Fusion high-performance codec is able to ingest several gigabytes of data per second per peer. Encoding and decoding requests are distributed to each CPU core for parallel processing. Whether just one byte or a gigabyte, data is directly encoded as it is ingested without the need for buffering or temporary replication.
Several levels of resiliency can be configured depending upon the count of peer failures per cluster that can be tolerated. For example it is possible to configure a cluster to tolerate the loss of as little as one peer and as much as forty percent of the cluster peer count.
Client applications can write to and read from failed peers without disruptions to the client applications. On-the-fly data regeneration allow client applications to complete I/O transactions as long as the peer failure count is below the maximum configured.
With erasure coding large cost savings can be achieved over the standard three copies of the data required for resiliency. Making three copies of data represents 200% overhead, whereas erasure coding overhead is typically not more than 50%.
A Peer Fusion cluster configuration file is small and simple. The configuration file of gateways specifies the resources to use and default settings. The configuration file of peers is even simpler as it contains the minimum information for cluster discovery and receives the remaining configuration parameters from the gateways.
The clusters and peers appear as mount points and are POSIX filesystem compliant. Most operating system utilities are compatible (i.e. mkdir, rmdir, ls, cp, stat, touch, rm, etc.) which facilitates the learning process.
The clusters and peers appear as filesystem and are VFS compliant. Most OS system and library calls are compatible (i.e. mkdir, rmdir, opendir, readdir, open, read, write, close, stat, truncate, unlink, sync, etc.) which facilitates the client application development and integration process.