A namenode can eat up memory, since a reference to every block of every file is maintained in memory. It’s difficult to give a precise formula, since memory usage depends on the number of blocks per file, the filename length, and the number of directories in the filesystem; plus it can change from one Hadoop release to another. The default of 1,000 MB of namenode memory is normally enough for a few million files, but as a rule of thumb for sizing purposes you can conservatively allow 1,000 MB per million blocks of storage.
You can increase the namenode’s memory without changing the memory allocated to other Hadoop daemons by setting HADOOP_NAMENODE_OPTS in hadoop-env.sh to include a JVM option for setting the memory size. HADOOP_NAMENODE_OPTS allows you to pass extra options to the namenode’s JVM. So, for example, if using a Sun JVM, -Xmx2000m would specify that 2,000 MB of memory should be allocated to the namenode.
If you change the namenode’s memory allocation, don’t forget to do the same for the secondary namenode (using the HADOOP_SECONDARYNAMENODE_OPTS variable), since its memory requirements are comparable to the primary namenode’s. You will probably also want to run the secondary namenode on a different machine, in this case. There are corresponding environment variables for the other Hadoop daemons, so you can customize their memory allocations, if desired.