Driven by the demand of modern ambitious astronomical applications that are both data-intensive and computation-expensive, continuously improving the volume and quality of high-performance computing facilities has long been a high priority of the Shanghai Astronomical Observatory. Significant amounts of resources have been devoted to that goal. Being a key node in the HPC network of the Chinese Academy of Sciences, the computing support department of the observatory is in charge of, supporting, hosting and running various pieces of the HPC equipment with heterogeneous performance that satisfies different requirements emerging out of different specific computational projects. Two categories of computing service are generally provided. At the top level is large-scale, high-performance and massive-storage facilities open to all qualified users in the observatory. Such typical machines include two distributed blade clusters (Intel® Xeon® Processors, Infiniband Interconnections, Network I/O) accounting for a total of 1,112 cores peaked at 20.96 TFLOPS. There is also a 256-core 1TB-shared-memory SGI UV2000 machine providing a good platform for multi-threaded applications. All the public HPC resources are running with abundant on-board free or commercial software, job management systems and tools enabling high efficiency of usage for a vast number of concurrent accesses. In contrast to the centralized public services, SHAO also helps hosting HPC machines that are purchased and exclusively used by observatory-based research groups or individuals. Funded by a large number of research grants obtained by our scientists, high-quality personalized supports are offered by our professional, well trained and warmly helpful technicians. Key projects under our highly diverse and targeted services include numerical cosmology, galaxy formation and evolution, black hole physics, planetary interior processes, spacecraft orbits determination, VLBI technologies and deep-space exploration. Owing to the great diversity of machines equipped, from desk-side servers (such as Dell, Dawning, HP, IBM etc.) to medium-scale clusters, it is a challenging task to manage a stable hardware and software environment to ensure constantly stable operations. Overall, we have managed to maintain an 85% availability of our computing services as a result of the outstanding team work.
|