Caching dir entries twice #3155

snajpa · 2015-03-04T21:13:27Z

Is there anything we can do about dentry cache, so that we wouldn't have everything cached twice?

I know that vm.vfs_cache_pressure can be set and it then puts some pressure on VFS to evict dentries early, but still dentry cache on my systems is about 3.5GB in average, dnode_t slab is about 12G (for one example machine).

I'd like to have either dnode cache or dentry cache, not both at once.

Is this possible with linux VFS? If not, then discard this issue :) But that would be a shame.

On related note, on the same server dmu_buf_impl_t is about 7 GB. The reason why I'm even investigating this is, that on one of our servers there was a shift in load about a few hours ago and it caused ARC to shrink to a third of its original size, most of the RAM has went somewhere into SLAB and doesn't seem reclaimable.
I have yet to see if it is going to shrink any time soon, but this is really asking for a system reset, since it's not really all that usable with low ARC.

snajpa · 2015-03-04T21:25:10Z

[root@node10.prg.vpsfree.cz]
 ~ # slabtop -s c -o
 Active / Total Objects (% used)    : 112688074 / 150675499 (74.8%)
 Active / Total Slabs (% used)      : 10639229 / 10640344 (100.0%)
 Active / Total Caches (% used)     : 219 / 390 (56.2%)
 Active / Total Size (% used)       : 49650004.94K / 59513018.12K (83.4%)
 Minimum / Average / Maximum Object : 0.02K / 0.39K / 4096.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
18442188 14957495  81%    0.84K 2049132        9  16393056K dnode_t                
695536 695536 100%   16.00K 695536        1  11128576K zio_buf_16384          
17614152 13129349  74%    0.50K 2201769        8   8807076K zio_buf_512            
22816534 13926940  61%    0.30K 1755118       13   7020472K dmu_buf_impl_t         
597110 597107  99%    8.00K 597110        1   4776880K size-8192              
14132474 13896299  98%    0.22K 831322       17   3325288K dentry                 
8231724 7157507  86%    0.30K 685977       12   2743908K arc_buf_hdr_t          
21596537 8771653  40%    0.06K 366043       59   1464172K size-64                
11670244 10999332  94%    0.10K 315412       37   1261648K sa_cache               
8118270 6887670  84%    0.12K 270609       30   1082436K size-128               
480348 480345  99%    1.15K 160116        3    640464K nfs_inode_cache        
4756840 1008133  21%    0.09K 118921       40    475684K arc_buf_t              
9741872 9741872 100%    0.03K  86981      112    347924K size-32                
570052 563797  98%    0.55K  81436        7    325744K radix_tree_node        
1069720 336034  31%    0.19K  53486       20    213944K size-192               
4030428 4027288  99%    0.04K  43809       92    175236K l2arc_buf_hdr_t        
830186 776994  93%    0.20K  43694       19    174776K vm_area_struct         
187360 182882  97%    0.72K  37472        5    149888K proc_inode_cache       
2068186 1799206  86%    0.06K  35054       59    140216K range_seg_cache        
 73308  41028  55%    1.00K  18327        4     73308K zio_buf_1024           
522960 522951  99%    0.12K  17432       30     69728K nfs_page               
 45156  45129  99%    1.09K  15052        3     60208K ext4_inode_cache       
 13264  12999  98%    2.75K   6632        2     53056K task_struct            
 29380  14714  50%    1.50K   5876        5     47008K zio_buf_1536           
893431 824105  92%    0.05K  11603       77     46412K anon_vma_chain         
168240 138647  82%    0.25K  11216       15     44864K filp                   
 11201  10975  97%    4.00K  11201        1     44804K zio_buf_4096           
 42360  37119  87%    1.00K  10590        4     42360K size-1024              
 37048  36228  97%    0.94K   9262        4     37048K nfs_write_data         
  7426   7196  96%    4.00K   7426        1     29704K size-4096              
427966 392731  91%    0.05K   5558       77     22232K anon_vma               
 28926  27330  94%    0.65K   4821        6     19284K inode_cache            
  7212   6968  96%    2.06K   2404        3     19232K sighand_cache          
 23575  22743  96%    0.75K   4715        5     18860K sock_inode_cache       
  8758   6571  75%    2.00K   4379        2     17516K zio_buf_2048           
  6504   4028  61%    2.50K   2168        3     17344K zio_buf_2560           
   964    962  99%   10.00K    964        1     15424K zio_buf_10240          
 15256  14897  97%    1.00K   3814        4     15256K size-1024(UBC)         
  1837   1829  99%    5.00K   1837        1     14696K zio_buf_5120           
  7138   6174  86%    2.00K   3569        2     14276K size-2048              
  3450   2586  74%    3.00K   1725        2     13800K zio_buf_3072           
 33580  25703  76%    0.38K   3358       10     13432K ip_dst_cache           
   829    829 100%   12.00K    829        1     13264K zio_buf_12288          
 14229  14138  99%    0.84K   1581        9     12648K shmem_inode_cache      
 13329  12391  92%    0.81K   1481        9     11848K task_xstate            
 11720  11416  97%    0.88K   2930        4     11720K UNIX                   
 76005  75899  99%    0.14K   2815       27     11260K sysfs_dir_cache        
131758 129737  98%    0.07K   2486       53      9944K Acpi-Operand           
  1221   1220  99%    6.00K   1221        1      9768K zio_buf_6144           
  4476   4288  95%    2.00K   2238        2      8952K size-2048(UBC)         
 17744  15713  88%    0.50K   2218        8      8872K size-512               
   132    132 100%   64.00K    132        1      8448K size-65536             
  7343   6964  94%    1.06K   1049        7      8392K signal_cache           
  5065   4762  94%    1.38K   1013        5      8104K mm_struct              
  3810   3669  96%    1.75K   1905        2      7620K TCP                    
   941    941 100%    7.00K    941        1      7528K zio_buf_7168           
  1831   1824  99%    3.50K   1831        1      7324K zio_buf_3584           
   397    397 100%   14.00K    397        1      6352K zio_buf_14336          
   370    357  96%   16.00K    370        1      5920K size-16384             
 49913  49661  99%    0.10K   1349       37      5396K buffer_head            
   661    607  91%    8.00K    661        1      5288K zio_buf_8192           
 26000  18588  71%    0.19K   1300       20      5200K cred_jar               
  1232   1129  91%    4.00K   1232        1      4928K size-4096(UBC)         
  9208   5614  60%    0.50K   1151        8      4604K zio_data_buf_512       
  7735   7403  95%    0.53K   1105        7      4420K idr_layer_cache        
  5907   4930  83%    0.69K    537       11      4296K files_cache            
  1840   1840 100%    2.00K    920        2      3680K zio_data_buf_2048      
 17020  15401  90%    0.19K    851       20      3404K eventpoll_epi          
  2779   2730  98%    1.06K    397        7      3176K RAWv6                  
  1710   1253  73%    1.50K    342        5      2736K zio_data_buf_1536      
  2556   2544  99%    0.88K    639        4      2556K RAW                    
  9210   6349  68%    0.25K    614       15      2456K size-256               
[root@node10.prg.vpsfree.cz]
 ~ # cat /proc/meminfo 
MemTotal:       263970936 kB
MemFree:        25372844 kB
Buffers:          150968 kB
Cached:         98602652 kB
SwapCached:      2195272 kB
MemCommitted:   424673280 kB
VirtualSwap:           0 kB
Active:         35112456 kB
Inactive:       120746064 kB
Active(anon):   30447984 kB
Inactive(anon): 28609428 kB
Active(file):    4664472 kB
Inactive(file): 92136636 kB
Unevictable:     1411680 kB
Mlocked:         1411696 kB
SwapTotal:      33445756 kB
SwapFree:       26979132 kB
Dirty:             22400 kB
Writeback:         78096 kB
AnonPages:      56403760 kB
Mapped:          4423928 kB
Shmem:           1925928 kB
Slab:           61559612 kB
SReclaimable:    4545896 kB
SUnreclaim:     57013716 kB
KernelStack:      101072 kB
PageTables:      1039312 kB
NFS_Unstable:     163584 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    165431224 kB
Committed_AS:   161971844 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     3597304 kB
VmallocChunk:   34123480124 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5564 kB
DirectMap2M:     2058240 kB
DirectMap1G:    266338304 kB

snajpa · 2015-03-04T21:33:46Z

ryao · 2015-03-10T21:34:33Z

The dnode cache is called a cache because it uses the SPL SLAB allocator via the kmem_cache_* functions. The term dnode in ZFS means "DMU object node" and is essentially an abstract inode. The only dnodes in the "dnode cache" should roughly correspond to what the Linux kernel would consider to be in-memory inodes. That said, there is a risk of long lived SLAB objects keeping SLABs from being reclaimed, which can cause it to use more RAM than necessary. The solution for this on Illumos was to implement the ability to defragment the dnode slab cache. We don't presently have this functionality implemented in ZoL.

dweeezil · 2015-03-10T21:48:32Z

I also tend to think this and related problems are likely caused by fragmentation. If I fill the dnode cache by traversing lots of inodes and then immediately apply memory pressure, the cache frees up nicely. On one of my own customer's rsync servers recently, the dnode_t cache active/size ratio was only 35% and It was never being freed (spending lots of time in the arc adapt thread).

behlendorf · 2016-10-12T01:43:50Z

If needed the dnode cache can now be limited in size by setting the zfs_arc_dnode_limit module option.

kernelOfTruth mentioned this issue Mar 9, 2015

After rsync of ~2TiB of data large amount of SUnreclaim (ARC), keeps on growing (slabtop) without limit - slowing down system to a halt #3157

Closed

behlendorf closed this as completed Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching dir entries twice #3155

Caching dir entries twice #3155

snajpa commented Mar 4, 2015

snajpa commented Mar 4, 2015

snajpa commented Mar 4, 2015

ryao commented Mar 10, 2015

dweeezil commented Mar 10, 2015

behlendorf commented Oct 12, 2016

Caching dir entries twice #3155

Caching dir entries twice #3155

Comments

snajpa commented Mar 4, 2015

snajpa commented Mar 4, 2015

snajpa commented Mar 4, 2015

ryao commented Mar 10, 2015

dweeezil commented Mar 10, 2015

behlendorf commented Oct 12, 2016