eIGeR: data storage infrastructure

In the area of data storage, the key benefits of eIGeR project are represented by easily accessible and redundant storage capacities for scientific computations, as well as by an opportunity for long-term reliable archive of important data. Within the project, data storage will be procured (as a combination of disk arrays and tape libraries or other equipment equivalent to tape libraries such as VTL) with capacity of around 10 PB (of which one tenth of storage will be on discs and the rest on tape libraries or alternative technologies) serving Czech scientific and academic community needs. Individual data centers will be distributed around country and will be connected by high-capacity network (CESNET Association plans to build three data centers during the project – in Pilsen, Pardubice, and Brno). Such a data storage system will become very robust and reliable mainly thanks to mutual data backup and replications across geographically separated sites. Procurement of technology is planned gradually starting from the first year of a project. The incremental capacity increase is then planned up to the total aimed capacity while in the years after the project finishes the sustainability phase will ensure replacement of outdated data storage with modern one.

We plan to offer storage capacities to users in a manner conceptually technically realized as Hierarchical Storage Management (HSM) solution provided by all leading storage systems vendors. This solution combines large storage capacities on magnetic tapes with pre-placed disc arrays. The users work with a system over these disc arrays and gain an illusion of a practically unlimited disc capacity while the less used files are automatically moved onto tapes. In case of a need the data is then automatically moved from tapes to discs and while during the first access to such moved data the user notices some delay later on the user can work with the data at full speed

To cover the higher demands for securing against data loss we plan to operate three geographically separated HSM systems. It will enable to store sensitive data for example in two copies on each of them. The storage sites will be installed close to the points-of-presence of CESNET network at Association members’ locations. In this way, the whole system will become resistant also to physical destruction of one of these sites e.g. by natural catastrophe.

Target user groups are primarily large research teams whose existing or planned instruments will produce extreme data volumes in the near future. Not all such groups plan for a large capacity storage. However, the distributed storage infrastructure constructed by CESNET Association will allow for a relatively easy integration of the groups that will operate their own data storage. Another important user group will cover the teams working with extensive simulations and mathematical modeling. CESNET in collaboration with computing centers will provide storage capacities also to starting or small research groups without own IT equipment. The distributed storage environment will be adapted also to the needs of distributed scientific groups with varying level of collaboration. The role of storage and archive service provision will be important.

Planned data storage use

Storage element
Data storage oriented mainly to the large data volume capacity, throughput and transmission. It implements protocols such as SRM, gridFTP, rfio etc. and is used mainly in the grid systems.
File system
Data storage provided in the form of a file system. We plan to implement protocols such as FTP, SCP (both easily usable by Windows system users) and also rsync protocol. Service subset could be the file system used for user backups. It could either be a data storage serving as a back-end for backup software (in this case the data storage will provide a server part of backup system) or a provision of plain backup space while leaving the backup operations fully in hands of users.
Block access
Alongside the above listed services we plan to offer to a limited user group a block access to storage capacities based on the iSCSI system. We do not exclude even the usage of the communication protocols of block access such as native FC (e.g. in form of FCoDWDM) or FCoE, FCIP.



Last change: 21.6.2013