Which Compression Format Should I Use for HDFS?

     Which compression format you should use depends on your application. Do you want to maximize the speed of your application or are you more concerned about keeping storage costs down? In general, you should try different strategies for your application, and benchmark them with representative data-sets to find the best approach. For large, unbounded files, like […]

Why Is a Block in Hadoop Distributed Filesystem (HDFS) So Large?

HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be made to be significantly larger than the time to seek to the start of the block.Thus the time to transfer […]