1. 09 Mar, 2019 40 commits
    • J. R. Okajima's avatar
      aufs: directory op · 83ab62e6
      J. R. Okajima authored
      
      
      This is very similar to file operations, including re-open after branch
      management. The major part of readdir(3) is split into another object
      called VDIR (in previous commit).
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      83ab62e6
    • J. R. Okajima's avatar
      aufs: atomic_open 5/5, the body aufs_atomic_open() · dc25d54f
      J. R. Okajima authored
      
      
      ->atomic_open() is another monster (the other is ->rename() of course).
      It operates look-up, create, and open in a single method.
      Strictly speaking the behaviour is not atomic from the branch fs's point
      of view, while the atomicity is kept in aufs's. This is a second-best
      approach. For details, refer to the design document in previous commit.
      
      A simple list 'si_aopen' can be put into aufs inode object, but I don't
      think it a good idea because the number of inodes is much larger than
      the number of super_block. If I put it into inode, it can be an
      un-interesting memory pressure I am afraid.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      dc25d54f
    • J. R. Okajima's avatar
      aufs: atomic_open 4/5, introduce au_aopen_or_create() · 47170bb7
      J. R. Okajima authored
      
      
      This new function au_aopen_or_create() tries calling branch fs's
      ->atomic_open() first. If it is not set, call vfs_create() instead.
      By putting this behaviour into aufs's add_simple(), many codes can be
      shared.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      47170bb7
    • J. R. Okajima's avatar
      aufs: atomic_open 3/5, pass h_file to au_do_open() · 51a7f431
      J. R. Okajima authored
      
      
      Extend do_open_dir(), au_do_open_nondir() and au_do_open() to receive an
      additional parameter h_file, which is an opened file object by branch
      fs's ->atomic_open(). By this design/commit, aufs doesn't have to
      duplicate many codes into a new aufs_atomic_open() (in later commit),
      and can simply share them.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      51a7f431
    • J. R. Okajima's avatar
      aufs: atomic_open 2/5, introduce vfsub_atomic_open() · be3f6e60
      J. R. Okajima authored
      
      
      Following the design in another commit, aufs calls branch fs's
      ->atomic_open() if exits. Ideally it would be better to call
      VFS:do_last, lookup_open() or atomic_open, but it is very hard for
      aufs. This implementation is far from the best.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      be3f6e60
    • J. R. Okajima's avatar
      aufs: file op · a3387b95
      J. R. Okajima authored
      
      
      Implement several f_op functions for non-dir.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a3387b95
    • J. R. Okajima's avatar
      aufs: file op, mmap · 8d319094
      J. R. Okajima authored
      
      
      For details, read the document in this commit.
      I don't like this approach, but there is no other way currently. But it
      seems that UnionMount is trying add siblings of f_dentry and d_inode for
      linux-4.0 or later. It may become another light for aufs too.
      
      The finfo object which has ever mmapped is excluded from
      refreshing (based upon fi_mmapped). Otherwise we may corrupt the process
      memory space.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      8d319094
    • J. R. Okajima's avatar
      aufs: file op, internal re-open when write · beeeb01e
      J. R. Okajima authored
      
      
      By default, aufs doesn't copy-up the file in open(2).
      The file write operation is one of the trigger of the copy-up.
      Although I understand that O_RDWR or O_WRONLY should trigger the
      copy-up, it is not a good idea for the case of open(O_RDWR) +
      mmap(MAP_PRIVATE). In this case, all changes are not written-back to the
      file on disk, and the copy-up is meaningless entirely.
      In other words, aufs postpone the copy-up as possible.
      
      This design also applies to the file operation after branch management.
      Some of the opened file need to be refreshed after add/del/mod
      branches. Eg. detect the revealed same named one, open it, close the old
      one internally while the virtual file is kept opened.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      beeeb01e
    • J. R. Okajima's avatar
      aufs: file op, open non-dir · b5bd8ebf
      J. R. Okajima authored
      
      
      Implement f_op->open() for non-directory.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b5bd8ebf
    • J. R. Okajima's avatar
      aufs: xattr and acl · a76ab411
      J. R. Okajima authored
      
      
      Support for XATTR and ACL including several branch attributes to ignore
      the copy error around XATTR and ACL.
      
      NFS always sets MS_POSIXACL regardless its mount option 'noacl.'
      When MS_POSIXACL is set, generic_permission() calls check_acl() (via
      acl_permission_check()) and gets -EOPNOTSUPP because the NFS branch is
      mounted as 'noacl.'
      In aufs, h_permission() should not call generic_permission() in this
      case.
      The similar thing happens in coping-up XATTR. vfs_getxattr_alloc()
      returns -EOPNOTSUPP.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a76ab411
    • J. R. Okajima's avatar
      aufs: udba=none has no ->getattr() 2/2, refresh inode_operations · e4dbdb8c
      J. R. Okajima authored
      
      
      An enhancement for udba=none if possible.
      The condition is same to the 'no ->d_revalidate()' patch series.
      Refresh i_op in all cached inodes at the remount-time.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      e4dbdb8c
    • J. R. Okajima's avatar
      aufs: udba=none has no ->getattr() 1/2, inode_operations array · 46b85e53
      J. R. Okajima authored
      
      
      An enhancement for udba=none if possible.
      The condition is same to the 'no ->d_revalidate()' patch series.
      In this mode, i_op->getattr() is less important, and the default VFS
      handler is enough.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      46b85e53
    • J. R. Okajima's avatar
      aufs: inode op, get/set attributes · 25f10296
      J. R. Okajima authored
      
      
      Implement i_op->get/setattr().
      setattr() is another trigger of the copy-up. The file may or may not be
      opened. And some of sub-functions are commonly used to XATTR operations
      in later commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      25f10296
    • J. R. Okajima's avatar
      aufs: inode op, add, link · 39391364
      J. R. Okajima authored
      
      
      Implement i_op->link().
      As aufs supports 'pseudo-link', aufs_link() can make it without
      copying-up. In the case of aufs_link() has to copy-up, the name of the
      target file is used as-is, and it is pseudo-linked. In other words,
      calling link(2) after the copy-up is unnecessary.
      
      See also struct.txt in previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      39391364
    • J. R. Okajima's avatar
      aufs: inode op, rename 2/2, body · 2902df92
      J. R. Okajima authored
      
      
      Implement i_op->rename().
      This is a big monster and I don't like it.
      
      In this version, the copy-up always happen.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2902df92
    • J. R. Okajima's avatar
      aufs: inode op, rename 1/2, intro · 2a422a58
      J. R. Okajima authored
      
      
      Implement i_op->rename().
      This is a big monster and I don't like it.
      
      In order to call d_move() in aufs lock section, FS_RENAME_DOES_D_MOVE is
      set to fstype.f_flags.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2a422a58
    • J. R. Okajima's avatar
      aufs: inode op, del, rmdir · 5755f006
      J. R. Okajima authored
      
      
      Implement i_op->rmdir() with supporting logical deletion by whiteout
      including all children.
      
      As struct.txt in previous commit described, the target dir is renamed to
      a whiteout-ed temporary unique name in rmdir(2), and then removed
      asynchronously by the system global workqueue.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      5755f006
    • J. R. Okajima's avatar
      aufs: inode op, del, unlink · 4dc51987
      J. R. Okajima authored
      
      
      Implement i_op->unlink() with supporting logical deletion by whiteout.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      4dc51987
    • J. R. Okajima's avatar
      aufs: inode op, add an entry · 78db268b
      J. R. Okajima authored
      
      
      Here are entry adding inode operations, i_op->create(), symlink(),
      mkdir(), mknod(), and tmpfile().
      Obviously they return EOPNOTSUPP when the target branch fs doesn't
      support the operation.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      78db268b
    • J. R. Okajima's avatar
      aufs: inode op, for symlink · 7213fdc8
      J. R. Okajima authored
      
      
      Implement dir.i_op->get_link().
      I am afraid there may exist the case skipping updating the inode atime.
      It means the aufs inode atime may be reverted by the real fs inode atime
      silently.  But I don't think it a big problem.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      7213fdc8
    • J. R. Okajima's avatar
      aufs: inode op, lookup · ed4978f6
      J. R. Okajima authored
      
      
      Implement dir.i_op->lookup() and ->permission().
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      ed4978f6
    • J. R. Okajima's avatar
      aufs: dentry op, opt-out ->d_revalidate() · ab70d2ad
      J. R. Okajima authored
      
      
      Optimize out ->d_revalidate() if possible.
      If the refreshing failed, then ->d_revalidate() remains. In this case,
      aufs has two types of dentries. One has ->d_revalidate, the other
      doesn't. I am afraid it will confuse me someday.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      ab70d2ad
    • J. R. Okajima's avatar
      aufs: dentry op · 5531888f
      J. R. Okajima authored
      
      
      Implement d_op->d_revalidate().
      This is another core part of UDBA (cf. lookup.txt in another commit).
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      5531888f
    • J. R. Okajima's avatar
      aufs: superblock op · 3cb74551
      J. R. Okajima authored
      
      
      Implement basic sb_op->show_options(), statfs() and sync_fs() simply.
      - show_options() doesn't print the default values (AuOpt_Def).
      - statfs() will have an option to summarize the numbers from branches
        (in later commit).
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      3cb74551
    • J. R. Okajima's avatar
      aufs: remount 5/5, refresh all · 2b77ce86
      J. R. Okajima authored
      
      
      Call the functions in previous commits to refresh all.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2b77ce86
    • J. R. Okajima's avatar
      aufs: remount 4/5, refresh the internal branch array · fb70a624
      J. R. Okajima authored
      
      
      Maintain the internal array including corresponding XINO file and sysfs
      entries.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      fb70a624
    • J. R. Okajima's avatar
      aufs: remount 3/5, refresh the opened files · 97be350d
      J. R. Okajima authored
      
      
      As a part of branch-management, aufs maintains all cached inodes,
      dentries, and opened files in remounting.
      This commits handles the opened files by counting the number of them,
      generating an array of their pointers. I don't like such array
      approach, but I don't have another idea.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      97be350d
    • J. R. Okajima's avatar
      aufs: remount 2/5, refresh the cached dentries (using d_walk()) · 3d06348e
      J. R. Okajima authored
      
      
      As a part of branch-management, aufs maintains all cached inodes,
      dentries, and opened files in remounting.
      This commits handles the cached dentries by calling the VFS internal
      function d_walk().
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      3d06348e
    • J. R. Okajima's avatar
      aufs: remount 1/5, an array of the cached inodes · 9dfa0bd3
      J. R. Okajima authored
      
      
      As a part of branch-management, aufs maintains all cached inodes,
      dentries, and opened files in remounting.
      This commits handles the cached inodes by counting the number of cached
      inodes, generating an array of their pointers. I don't like such array
      approach, but I don't have another idea.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      9dfa0bd3
    • J. R. Okajima's avatar
      aufs: virtual or vertical directory 2/2, body · d11c974d
      J. R. Okajima authored
      
      
      It is hard to implement readdir(3) for aufs virtual directory.
      It considers the every whiteout in a single direcotry, as well as the
      (first) opaque marker (diropq).
      This implementation consumes memory a lot, and I'd suggest you to try
      RDU (readdir in userspace) in later commit.
      
      See also the document in previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      d11c974d
    • J. R. Okajima's avatar
      aufs: virtual or vertical directory 1/2, intro · 47118316
      J. R. Okajima authored
      
      
      This commit is just to prepare for the succeeding commit, and split to
      suppress the size of a single commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      47118316
    • J. R. Okajima's avatar
      aufs: finfo for directory · 4dcf89d5
      J. R. Okajima authored
      
      
      Expand finfo to support for a directory.
      For readdir(3), see VDIR and RDU in later commits.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      4dcf89d5
    • J. R. Okajima's avatar
      aufs: finfo core · c3e1eecf
      J. R. Okajima authored
      
      
      The structure is very similar to iinfo and dinfo (in previous commits).
      This commit is for non-dir files. For a directory, see later commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c3e1eecf
    • J. R. Okajima's avatar
      aufs: new inode · f3457eb3
      J. R. Okajima authored
      
      
      As a part of looking-up, construct a virtual inode.
      After branch-management (add/del branches), the inode has to be
      refreshed to represent a revealed file.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f3457eb3
    • J. R. Okajima's avatar
      aufs: mount/unmount · 8d4d76d3
      J. R. Okajima authored
      
      
      Now aufs becomes mountable with very few features.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      8d4d76d3
    • J. R. Okajima's avatar
      aufs: hnotify 3/3, callers · f8ec4890
      J. R. Okajima authored
      
      
      In order to prevent firing the notify event from aufs itself, hnotify
      feature is suspend/resume-able. They are combined with mutex lock/unlock
      for the parent dir.
      
      See also previous commits.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f8ec4890
    • J. R. Okajima's avatar
      aufs: hnotify 2/3, body · 6b81a214
      J. R. Okajima authored
      
      
      The feature is constructed by two layers. One is generic interface, and
      the other is exact implementation. This is rather historical. Originally
      aufs implemented this feature based upon 'inotify.' Later 'fsnotify'
      made 'inotify' obsolete. During the transition period, these two layers
      were introduced to support both of 'inotify' and 'fsnotify.' Currently
      only 'fsnotify' is supported, but the layers are kept for the future
      use.
      
      This feature is compiled only when CONFIG_AUFS_HNOTIFY is enabled.
      See also the document in previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6b81a214
    • J. R. Okajima's avatar
      aufs: hnotify 1/3, headers · 9dc58c2b
      J. R. Okajima authored
      
      
      This is the hardest test to support UDBA (users' direct branch access).
      It uses 'fsnotify' internally.  Detecting UDBA, decrements the
      generation of the cached aufs objects.  In the next access to the file,
      aufs detects the generation is obsoleted and tries refreshing it.
      Eventually aufs cache will be updated to latest status.
      
      The fsnotify is set on the cached dirs on the non-RR branches.
      The RR (real readonly) branches will never be modified and it is
      unnecessary to set fsnotify for them.
      
      This commit is for the declarations mainly, and the body parts will be
      in succeeding commits.
      
      This feature is compiled only when CONFIG_AUFS_HNOTIFY is enabled.
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      9dc58c2b
    • J. R. Okajima's avatar
      aufs: export via NFS 2/2 · a030bc2a
      J. R. Okajima authored
      
      
      The main part is in previous commit.
      This commit handles the generation of aufs objects, to make sure the
      inode in the file handle is still valid.
      In order not to confuse NFSD, the various operation returns ESTALE for
      NFSD where it used to return EBUSY.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a030bc2a
    • J. R. Okajima's avatar
      aufs: export via NFS 1/2, body · c470ddd6
      J. R. Okajima authored
      
      
      Implement exporting via NFS.
      The file handle is rather large (40 bytes at most + the file handle on a
      branch).
      The non-virtual filesystems can use an anonymous (disconnected) dentry
      as long as the inode is identified, but aufs needs a dentry with dinfo
      which is usually constructed.  So aufs has to find or generate the
      normal dentry from the file handle in decoding.  Eg. in aufs, there
      should never be the anonymous dentry.
      
      In decoding the file handle, if both of the dentry and the inode which
      are corresponding the file handle are still in cache, then they are
      returned immediately.  Otherwise aufs has to find the cached parent dir
      from the file handle.  If the parent dir is not cached either, the aufs
      tries these steps.
      - decode the branch fs's file handle and get the parent dir
      - generate the path of the parent dir on the branch
      - convert the branch path to aufs's path
      - lookup the inode number under the aufs' path
      The last one is the slowest case.
      
      exportfs_decode_fh() (actually reconnect_path()) acquires mutex, and
      this behaviour violates the locking order between aufs si_rwsem.  This
      is not a problem since internal exportfs_decode_fh() is called for the
      branch fs.
      Simply use lockdep_off/on to silence the lockdep message.
      
      See also the document in later commit.
      This is compiled only when CONFIG_AUFS_EXPORT is enabled.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c470ddd6