1. 09 Mar, 2019 39 commits
    • J. R. Okajima's avatar
      aufs: fuse branch (including poll(2)) · 08ecca6f
      J. R. Okajima authored
      
      
      Fuse doesn't want the callers to access the inode attributes without
      issuing stat, and it is not assured that they are valid after lookup or
      iget().
      The inode attribute is critical for aufs, and aufs decided to call stat
      every time for fuse.
      Of course, it makes aufs slow. But when the branch fs is not fuse, stat
      is not called.
      
      Currently, only FUSE implements ->poll(), and aufs supports it.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      08ecca6f
    • J. R. Okajima's avatar
      aufs: hfsplus branch · 5bb71622
      J. R. Okajima authored
      
      
      Special support for filesystems which acquires an inode mutex at final
      closing a file, eg, hfsplus.
      This trick is very simple and stupid, just to open the file before really
      necessary open to tell hfsplus that this is not the final closing.
      The caller should call au_h_open_pre() after acquiring the inode mutex,
      and au_h_open_post() after releasing it.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      5bb71622
    • J. R. Okajima's avatar
      aufs: loopback-mounted branch · 9512a2c3
      J. R. Okajima authored
      
      
      It is ok that the branch is loopback-mounted.
      But it had a problem if the backend fs-image is placed on another
      branch, and aufs had prohibited such nested branch for a long time due
      to the recursive lookup by 'loopN' daemon.
      I don't remember when it was, but the daemon stopped such recursive
      behaviour, but aufs is still prohibitting the nested branch via
      loopback-mount.
      Upon the request from users, aufs will allow the loopback-mounted branch
      on another branch by another patch (aufs4-loopback.patch in
      aufs4-standalone.git).
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      9512a2c3
    • J. R. Okajima's avatar
      aufs: fhsm (file-based hierarchical storage management) · 3be35a97
      J. R. Okajima authored
      
      
      This feature automatically handles MVDOWN in other commits.
      In user-space, a daemon monitors the free space of the branch and issues
      MVDOWN ioctl automatically when necessary. The main role is in
      user-space and several options are implemented.
      For a branch to join the FHSM circle, a new attribute 'fhsm' should be
      specified.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      3be35a97
    • J. R. Okajima's avatar
      aufs: ioctl, mvdown 1/2, body · bc962de7
      J. R. Okajima authored
      
      
      Another ioctl feature, move-down.
      The behaviour is, as you can guess, the opposite of copy-up.
      The feature called FHSM (file-based hierarchical storage management, in
      later commit) uses this ioctl aggressively.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      bc962de7
    • J. R. Okajima's avatar
      aufs: ioctl, rdu (readdir in userspace) · 681a21c3
      J. R. Okajima authored
      
      
      For a directory which has millions of files, aufs VDIR consumes
      much memory. In this case, RDU (readdir(3) in user-space) is definitely
      better.
      If you enable CONFIG_AUFS_RDU at compiling aufs, install libau.so from
      aufs-util.git, and set some environment variables, then you can use this
      feature. When readdir(3) in libau.so receives an aufs dir, it issues
      ioctl(2) instead of regular readdir(3).
      All merging and whiteout handling are done in userspace.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      681a21c3
    • J. R. Okajima's avatar
      aufs: ioctl, wbr_fd · de0ee5c9
      J. R. Okajima authored
      
      
      Provide a file descriptor corresponding the specified writable branch.
      The file descriptor will be used from user-space such as FHSM and
      libau.so. For details, see aufs-util.git.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      de0ee5c9
    • J. R. Okajima's avatar
      aufs: debug print by MagicSysRq · b41567b1
      J. R. Okajima authored
      
      
      Print the current data status for debugging.
      The trigger key is a module parameter and you can freely change. The
      default is 'a' of course.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b41567b1
    • J. R. Okajima's avatar
      aufs: debugfs interface · 0f0e211c
      J. R. Okajima authored
      
      
      Aufs provides some info via debugfs such as
      - the branch path
      - the current number of pseudo-links
      - the size and the number of consumed blocks by XINO, XIB and XIGEN.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      0f0e211c
    • J. R. Okajima's avatar
      aufs: dirren 1/6, inum-list of the renamed dir in a branch · 53c63b41
      J. R. Okajima authored
      
      
      This commits brings a list of the inode numbers which indicates the
      logically renamed dir into a branch. The list will be referred in
      lookup, and its lifetime is equivalent to the branch's, ie. the list is
      loaded/created in adding a branch, and stored/deleted in deleting a
      branch. The simple storing happens in remounting and unmounting aufs
      too.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      53c63b41
    • J. R. Okajima's avatar
      aufs: file op, open non-dir · b5bd8ebf
      J. R. Okajima authored
      
      
      Implement f_op->open() for non-directory.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b5bd8ebf
    • J. R. Okajima's avatar
      aufs: xattr and acl · a76ab411
      J. R. Okajima authored
      
      
      Support for XATTR and ACL including several branch attributes to ignore
      the copy error around XATTR and ACL.
      
      NFS always sets MS_POSIXACL regardless its mount option 'noacl.'
      When MS_POSIXACL is set, generic_permission() calls check_acl() (via
      acl_permission_check()) and gets -EOPNOTSUPP because the NFS branch is
      mounted as 'noacl.'
      In aufs, h_permission() should not call generic_permission() in this
      case.
      The similar thing happens in coping-up XATTR. vfs_getxattr_alloc()
      returns -EOPNOTSUPP.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a76ab411
    • J. R. Okajima's avatar
      aufs: inode op, rename 2/2, body · 2902df92
      J. R. Okajima authored
      
      
      Implement i_op->rename().
      This is a big monster and I don't like it.
      
      In this version, the copy-up always happen.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2902df92
    • J. R. Okajima's avatar
      aufs: inode op, del, unlink · 4dc51987
      J. R. Okajima authored
      
      
      Implement i_op->unlink() with supporting logical deletion by whiteout.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      4dc51987
    • J. R. Okajima's avatar
      aufs: inode op, add an entry · 78db268b
      J. R. Okajima authored
      
      
      Here are entry adding inode operations, i_op->create(), symlink(),
      mkdir(), mknod(), and tmpfile().
      Obviously they return EOPNOTSUPP when the target branch fs doesn't
      support the operation.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      78db268b
    • J. R. Okajima's avatar
      aufs: virtual or vertical directory 1/2, intro · 47118316
      J. R. Okajima authored
      
      
      This commit is just to prepare for the succeeding commit, and split to
      suppress the size of a single commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      47118316
    • J. R. Okajima's avatar
      aufs: finfo core · c3e1eecf
      J. R. Okajima authored
      
      
      The structure is very similar to iinfo and dinfo (in previous commits).
      This commit is for non-dir files. For a directory, see later commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c3e1eecf
    • J. R. Okajima's avatar
      aufs: hnotify 2/3, body · 6b81a214
      J. R. Okajima authored
      
      
      The feature is constructed by two layers. One is generic interface, and
      the other is exact implementation. This is rather historical. Originally
      aufs implemented this feature based upon 'inotify.' Later 'fsnotify'
      made 'inotify' obsolete. During the transition period, these two layers
      were introduced to support both of 'inotify' and 'fsnotify.' Currently
      only 'fsnotify' is supported, but the layers are kept for the future
      use.
      
      This feature is compiled only when CONFIG_AUFS_HNOTIFY is enabled.
      See also the document in previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6b81a214
    • J. R. Okajima's avatar
      aufs: export via NFS 1/2, body · c470ddd6
      J. R. Okajima authored
      
      
      Implement exporting via NFS.
      The file handle is rather large (40 bytes at most + the file handle on a
      branch).
      The non-virtual filesystems can use an anonymous (disconnected) dentry
      as long as the inode is identified, but aufs needs a dentry with dinfo
      which is usually constructed.  So aufs has to find or generate the
      normal dentry from the file handle in decoding.  Eg. in aufs, there
      should never be the anonymous dentry.
      
      In decoding the file handle, if both of the dentry and the inode which
      are corresponding the file handle are still in cache, then they are
      returned immediately.  Otherwise aufs has to find the cached parent dir
      from the file handle.  If the parent dir is not cached either, the aufs
      tries these steps.
      - decode the branch fs's file handle and get the parent dir
      - generate the path of the parent dir on the branch
      - convert the branch path to aufs's path
      - lookup the inode number under the aufs' path
      The last one is the slowest case.
      
      exportfs_decode_fh() (actually reconnect_path()) acquires mutex, and
      this behaviour violates the locking order between aufs si_rwsem.  This
      is not a problem since internal exportfs_decode_fh() is called for the
      branch fs.
      Simply use lockdep_off/on to silence the lockdep message.
      
      See also the document in later commit.
      This is compiled only when CONFIG_AUFS_EXPORT is enabled.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c470ddd6
    • J. R. Okajima's avatar
      aufs: DIO and dynamically customize address_space_operations · 5b336293
      J. R. Okajima authored
      
      
      As a result of branch management, the virtual inode may point a
      different real inode from it used to. And aufs has to maintain its
      address_space_operations, since its definition may affect the
      behaviour.
      I know some people (including grsec-patch) doesn't like a non-const
      address_space_operations, but in order to keep the consistency of the
      behaviour, the correct address_space_operations is important.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      5b336293
    • J. R. Okajima's avatar
      aufs: writable branch select policy 1/2, core · 60b24eed
      J. R. Okajima authored
      
      
      Aufs can have multiple writable branches, and there are several
      policies to select one among them.
      This commit implements default "top-down-parent" for both of
      creating-policy and copyup-policy.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      60b24eed
    • J. R. Okajima's avatar
      aufs: copy-up 3/7, internal file I/O · e989fe7b
      J. R. Okajima authored
      
      
      The internal file read/write for copy-up in kernelspace.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      e989fe7b
    • J. R. Okajima's avatar
      aufs: copy-up 1/7, attributes · f7f1bacc
      J. R. Okajima authored
      
      
      Copy the inode attributes between branches.
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f7f1bacc
    • J. R. Okajima's avatar
      aufs: pin or lock the parent dir and the child on a branch · 9f923257
      J. R. Okajima authored
      
      
      To create/delete/rename files including copy-up, aufs acquires several
      locks on the branch fs internally. These lock/unlock operations are
      consolidated into struct au_pin in this commit.
      au_pin handles
      - LOCKDEP class
      - re-validate/verify
      - suspend/resume HNOTIFY
      
      See also lookup.txt in later commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      9f923257
    • J. R. Okajima's avatar
      aufs: pseudo-link and procfs support · c7ae8357
      J. R. Okajima authored
      
      
      Aufs pseudo-link (plink) represents a virtual hardlink across the
      branches. To implement the plink maintenance mode, aufs uses procfs.
      See also the document in this commit.
      
      There is an external user-space utility called 'auplink' in
      aufs-util.git, which has these features.
      - 'list' shows the pseudo-linked inode numbers and filenames.
      - 'cpup' copies-up all pseudo-link to the writable branch.
      - 'flush' calls 'cpup', and then 'mount -o remount,clean_plink=inum'
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c7ae8357
    • J. R. Okajima's avatar
      aufs: white-out 1/2 · 2a7e7277
      J. R. Okajima authored
      
      
      The whiteout represents a logical deletion.
      Although the document in this commit mentioned about rmdir(2) and
      rename(2) for dir, this commit doesn't contain such functions. They will
      be added in later commits.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      2a7e7277
    • J. R. Okajima's avatar
      aufs: sysfs interface · a3adef77
      J. R. Okajima authored
      
      
      The branch path can be much longer and it is not suitable to print via
      /proc/mounts as a part of mount options. Aufs can show it either
      separately via sysfs or /proc/mounts (as a part of mount options).
      This approach affects the lifetime of aufs objects and sbinfo contains
      kobject (in another commit). Theoretically user can disable
      CONFIG_SYSFS, but the lifetime management is always necessary. So
      supporting sysfs is split into two files, sysaufs.c and sysfs.c.
      sysaufs.c is always compiled, but sysfs.c is compiled only when
      CONFIG_SYSFS is enabled.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a3adef77
    • J. R. Okajima's avatar
      aufs: xino 1/2, core · 6fe05098
      J. R. Okajima authored
      
      
      XINO and XIB files are to maintain the inode numbers in aufs
      (cf. struct.txt and aufs manual in aufs-util.git).
      
      XINO file contains just a sequence of the inode numbers, and their
      offset in the file is real_inum x sizeof(inum).  So the size is limited
      by s_maxbytes of the filesystem where XINO file is located.  In order to
      support the larger inum, aufs stores XINO files as an internal array.
      
      Sometimes the size of XINO file can be a problem, ie. too big,
      particularly when XINO files are located on tmpfs. In this case, another
      separate patch tmpfs-ino.patch in aufs4-standalone.git is recommended
      (as well as vfs-ino.patch). The patch makes tmpfs to maintain inode
      number within itself and suppress its discontiguous distribution.
      
      See also the document in next commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6fe05098
    • J. R. Okajima's avatar
      aufs: workqueue · f04356cb
      J. R. Okajima authored
      
      
      Aufs uses the workqueue both synchronously and asynchronously.
      For sync-use-case, aufs uses its own specific wkq since doesn't want to
      be disturbed by other tasks on the system. For async-use-case, aufs uses
      the system global workqueue.
      Aufs has to prevent itself to being unmounted during the async-task is
      queued.
      
      See also the document in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f04356cb
    • J. R. Okajima's avatar
      aufs: readonly branch 2/2, callers · 66bc346d
      J. R. Okajima authored
      
      
      For details, see previous commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      66bc346d
    • J. R. Okajima's avatar
      aufs: readonly branch 1/2, definition · b7051459
      J. R. Okajima authored
      
      
      The branch object is managed by the sbinfo object as an element of its
      internal array. The iinfo and dinfo objects contain the branch id, and
      it will be used to implement the correct order in branch management
      (add/del).
      
      See also the documents in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      b7051459
    • J. R. Okajima's avatar
      aufs: infos generation 2/2 · f2ade007
      J. R. Okajima authored
      
      
      The generation of iinfo and dinfo inherit sbinfo's.
      Also iinfo generation tracks the branch inode's generation to test the
      matching after the branch management.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      f2ade007
    • J. R. Okajima's avatar
      aufs: infos generation 1/2, dcsub · fa25dbe6
      J. R. Okajima authored
      
      
      The generation of aufs objects will be updated by the branch
      management (add/del branches, by later commit), and aufs will refresh
      the objects based upon the generation.
      In refreshing dinfos, aufs will find all ancestors of the given dentry
      and store the pointers in dynamically allocated memory. I don't think it
      beautiful, but I don't have any other idea.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      fa25dbe6
    • J. R. Okajima's avatar
      aufs: sbinfo core · 3742849b
      J. R. Okajima authored
      
      
      The structure is very similar to inode and dentry infos (in previous
      commits), but the internal array is for 'struct au_branch' instead of
      'superblock.' Additionally the lifetime of 'struct au_sbinfo' is managed
      by kobject since it will be connected to sysfs by later commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      3742849b
    • J. R. Okajima's avatar
      aufs: dinfo core · ee166183
      J. R. Okajima authored
      
      
      The structure is very similar to aufs inode info (in previous commit).
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      ee166183
    • J. R. Okajima's avatar
      aufs: debug print · 786c90cb
      J. R. Okajima authored
      
      
      Print various info about aufs inode. This feature is enabled when
      CONFIG_AUFS_DEBUG and the module parameter 'debug' are set.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      786c90cb
    • J. R. Okajima's avatar
      aufs: module parameter, debug · 6011ea2b
      J. R. Okajima authored
      
      
      This parameter is available only when CONFIG_AUFS_DEBUG is enabled, and
      its default value is 0. Setting 1 will enable the various verifications
      and debug outputs.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      6011ea2b
    • J. R. Okajima's avatar
      aufs: iinfo core · a3d2caf2
      J. R. Okajima authored
      
      
      See the documents in this commit.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      a3d2caf2
    • J. R. Okajima's avatar
      aufs: kmalloc/kfree wrappers · c7926831
      J. R. Okajima authored
      
      
      Very basic, simple and stupid wrappers for kmalloc family.
      It tries kfree_ruc() as possible, with hoping a better performance.
      Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
      c7926831