1. 27 May, 2019 1 commit
    • J. R. Okajima's avatar
      aufs: bugfix, protect creating XINO from concurrent mounts · abf61326
      J. R. Okajima authored
      
      
      At the mount-time, XINO files are created and removed very soon.  When
      multiple mounts with being given the same XINO file path executed in
      parallel, some of them may fail in creating XINO due to EEXIST.
      Introducing a local mutex, make it serialized.
      By default, XINO is created at the top dir of the first writable
      branch.  In this case, the new mutex won't be used.
      
      Obviously this is an unnecessary overhead when the XINO file path is not
      same, and such lock should be done by inode_lock() for the parent dir.
      Actually au_xino_create2() behaves in this manner.  Then why didn't I
      apply the same way to this au_xino_create()?  It is just because of my
      laziness.  Calling VFS filp_open() here is easy for me.
      Reported-by: default avatarKirill Kolyshkin <kolyshkin@gmail.com>
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      (cherry picked from commit 30d9273f2a1ce331b1a79b770bdb4c493919a673)
      abf61326
  2. 09 Mar, 2019 21 commits
    • J. R. Okajima's avatar
      aufs: fhsm (file-based hierarchical storage management) · 3be35a97
      J. R. Okajima authored
      
      
      This feature automatically handles MVDOWN in other commits.
      In user-space, a daemon monitors the free space of the branch and issues
      MVDOWN ioctl automatically when necessary. The main role is in
      user-space and several options are implemented.
      For a branch to join the FHSM circle, a new attribute 'fhsm' should be
      specified.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      3be35a97
    • J. R. Okajima's avatar
      aufs: ioctl, mvdown 1/2, body · bc962de7
      J. R. Okajima authored
      
      
      Another ioctl feature, move-down.
      The behaviour is, as you can guess, the opposite of copy-up.
      The feature called FHSM (file-based hierarchical storage management, in
      later commit) uses this ioctl aggressively.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      bc962de7
    • J. R. Okajima's avatar
      aufs: ioctl, ibusy · 8ea77cf9
      J. R. Okajima authored
      
      
      Because of some inode is in use, the deletion of a branch can fail.
      For those who wants to test the inode is busy or not, aufs provides an
      ioctl, and a utility 'aubusy' in aufs-util.git.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      8ea77cf9
    • J. R. Okajima's avatar
      aufs: debugfs interface · 0f0e211c
      J. R. Okajima authored
      
      
      Aufs provides some info via debugfs such as
      - the branch path
      - the current number of pseudo-links
      - the size and the number of consumed blocks by XINO, XIB and XIGEN.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      0f0e211c
    • J. R. Okajima's avatar
      aufs: dirren 1/6, inum-list of the renamed dir in a branch · 53c63b41
      J. R. Okajima authored
      
      
      This commits brings a list of the inode numbers which indicates the
      logically renamed dir into a branch. The list will be referred in
      lookup, and its lifetime is equivalent to the branch's, ie. the list is
      loaded/created in adding a branch, and stored/deleted in deleting a
      branch. The simple storing happens in remounting and unmounting aufs
      too.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      53c63b41
    • J. R. Okajima's avatar
      aufs: branch management, modify the permission and attribute · 6462d8eb
      J. R. Okajima authored
      
      
      The permissions and attributes of a branch can be modified dynamically.
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6462d8eb
    • J. R. Okajima's avatar
      aufs: branch management, delete 2/3, body · b79632ef
      J. R. Okajima authored
      
      
      Delete a branch which is not busy.
      Aufs judges the branch is deletable by testing the opened files, the
      cached dentries and inodes. Even if a directory is in use, as long as
      the same named entry exist on another branch, then the branch is
      deletable.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b79632ef
    • J. R. Okajima's avatar
      aufs: remount 4/5, refresh the internal branch array · fb70a624
      J. R. Okajima authored
      
      
      Maintain the internal array including corresponding XINO file and sysfs
      entries.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      fb70a624
    • J. R. Okajima's avatar
      aufs: new inode · f3457eb3
      J. R. Okajima authored
      
      
      As a part of looking-up, construct a virtual inode.
      After branch-management (add/del branches), the inode has to be
      refreshed to represent a revealed file.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f3457eb3
    • J. R. Okajima's avatar
      aufs: hnotify 1/3, headers · 9dc58c2b
      J. R. Okajima authored
      
      
      This is the hardest test to support UDBA (users' direct branch access).
      It uses 'fsnotify' internally.  Detecting UDBA, decrements the
      generation of the cached aufs objects.  In the next access to the file,
      aufs detects the generation is obsoleted and tries refreshing it.
      Eventually aufs cache will be updated to latest status.
      
      The fsnotify is set on the cached dirs on the non-RR branches.
      The RR (real readonly) branches will never be modified and it is
      unnecessary to set fsnotify for them.
      
      This commit is for the declarations mainly, and the body parts will be
      in succeeding commits.
      
      This feature is compiled only when CONFIG_AUFS_HNOTIFY is enabled.
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      9dc58c2b
    • J. R. Okajima's avatar
      aufs: export via NFS 1/2, body · c470ddd6
      J. R. Okajima authored
      
      
      Implement exporting via NFS.
      The file handle is rather large (40 bytes at most + the file handle on a
      branch).
      The non-virtual filesystems can use an anonymous (disconnected) dentry
      as long as the inode is identified, but aufs needs a dentry with dinfo
      which is usually constructed.  So aufs has to find or generate the
      normal dentry from the file handle in decoding.  Eg. in aufs, there
      should never be the anonymous dentry.
      
      In decoding the file handle, if both of the dentry and the inode which
      are corresponding the file handle are still in cache, then they are
      returned immediately.  Otherwise aufs has to find the cached parent dir
      from the file handle.  If the parent dir is not cached either, the aufs
      tries these steps.
      - decode the branch fs's file handle and get the parent dir
      - generate the path of the parent dir on the branch
      - convert the branch path to aufs's path
      - lookup the inode number under the aufs' path
      The last one is the slowest case.
      
      exportfs_decode_fh() (actually reconnect_path()) acquires mutex, and
      this behaviour violates the locking order between aufs si_rwsem.  This
      is not a problem since internal exportfs_decode_fh() is called for the
      branch fs.
      Simply use lockdep_off/on to silence the lockdep message.
      
      See also the document in later commit.
      This is compiled only when CONFIG_AUFS_EXPORT is enabled.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c470ddd6
    • J. R. Okajima's avatar
      aufs: DIO and dynamically customize address_space_operations · 5b336293
      J. R. Okajima authored
      
      
      As a result of branch management, the virtual inode may point a
      different real inode from it used to. And aufs has to maintain its
      address_space_operations, since its definition may affect the
      behaviour.
      I know some people (including grsec-patch) doesn't like a non-const
      address_space_operations, but in order to keep the consistency of the
      behaviour, the correct address_space_operations is important.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      5b336293
    • J. R. Okajima's avatar
      aufs: writable branch select policy 2/2, variations · 2ef06b5a
      J. R. Okajima authored
      
      
      Several policies to select one among multiple writable branches.
      See also the document in previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2ef06b5a
    • J. R. Okajima's avatar
      aufs: copy-up 4/7, body · f63b4f2f
      J. R. Okajima authored
      
      
      The functions for
      - create the copy-up target file
      - copy filedata
      - copy metadata
      
      In copying filedata, I had tried splice_direct() instead of repeating
      read/write. Surprisingly, I could not see a big difference. So let's
      keep this approach for a while. Someday SEEK_DATA/SEEK_HOLE become more
      popular, it may help optimizing this read/write.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f63b4f2f
    • J. R. Okajima's avatar
      aufs: copy-up 3/7, internal file I/O · e989fe7b
      J. R. Okajima authored
      
      
      The internal file read/write for copy-up in kernelspace.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      e989fe7b
    • J. R. Okajima's avatar
      aufs: writable branch 1/3, white-out · 72a970fc
      J. R. Okajima authored
      
      
      The writable branch prepares a few files and dirs for whiteouts.
      For branch filesystems which doesn't support link(2), there is "nolwh"
      attribute. On the branch which is specified this attribute, aufs never
      try link(2) for whitout and always creat(2) it.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      72a970fc
    • J. R. Okajima's avatar
      aufs: xino truncation · 2a150b32
      J. R. Okajima authored
      
      
      As mentioned earlier, sometimes the size of XINO file is a problem.
      Aufs has a feature to truncate it asynchronously using workqueue. But it
      may not be so effective in some cases, and you may want to stop
      discontiguous distribution of the inode numbers on branch fs.
      See also the log in another commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2a150b32
    • J. R. Okajima's avatar
      aufs: sysfs interface · a3adef77
      J. R. Okajima authored
      
      
      The branch path can be much longer and it is not suitable to print via
      /proc/mounts as a part of mount options. Aufs can show it either
      separately via sysfs or /proc/mounts (as a part of mount options).
      This approach affects the lifetime of aufs objects and sbinfo contains
      kobject (in another commit). Theoretically user can disable
      CONFIG_SYSFS, but the lifetime management is always necessary. So
      supporting sysfs is split into two files, sysaufs.c and sysfs.c.
      sysaufs.c is always compiled, but sysfs.c is compiled only when
      CONFIG_SYSFS is enabled.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a3adef77
    • J. R. Okajima's avatar
      aufs: xino 2/2, callers · 8fe49c5d
      J. R. Okajima authored
      
      
      XINO and XIB files are read and written frequently after unlinked, and
      it means that the remote filesystems are not suitable for them.
      Additionally aufs shows their metadata via debugfs (in later commit).
      To make it easier to do this, aufs expects branch filesystems to
      maintain their i_size and i_blocks. And it means some filesystem are not
      suitable for XINO.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      8fe49c5d
    • J. R. Okajima's avatar
      aufs: xino 1/2, core · 6fe05098
      J. R. Okajima authored
      
      
      XINO and XIB files are to maintain the inode numbers in aufs
      (cf. struct.txt and aufs manual in aufs-util.git).
      
      XINO file contains just a sequence of the inode numbers, and their
      offset in the file is real_inum x sizeof(inum).  So the size is limited
      by s_maxbytes of the filesystem where XINO file is located.  In order to
      support the larger inum, aufs stores XINO files as an internal array.
      
      Sometimes the size of XINO file can be a problem, ie. too big,
      particularly when XINO files are located on tmpfs. In this case, another
      separate patch tmpfs-ino.patch in aufs4-standalone.git is recommended
      (as well as vfs-ino.patch). The patch makes tmpfs to maintain inode
      number within itself and suppress its discontiguous distribution.
      
      See also the document in next commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6fe05098
    • J. R. Okajima's avatar
      aufs: readonly branch 1/2, definition · b7051459
      J. R. Okajima authored
      
      
      The branch object is managed by the sbinfo object as an element of its
      internal array. The iinfo and dinfo objects contain the branch id, and
      it will be used to implement the correct order in branch management
      (add/del).
      
      See also the documents in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b7051459