Commit 3be35a97 authored by J. R. Okajima's avatar J. R. Okajima
Browse files

aufs: fhsm (file-based hierarchical storage management)



This feature automatically handles MVDOWN in other commits.
In user-space, a daemon monitors the free space of the branch and issues
MVDOWN ioctl automatically when necessary. The main role is in
user-space and several options are implemented.
For a branch to join the FHSM circle, a new attribute 'fhsm' should be
specified.

See also the document in this commit.
Signed-off-by: default avatarJ. R. Okajima <hooanon05g@gmail.com>
parent 08363511
# Copyright (C) 2011-2019 Junjiro R. Okajima
File-based Hierarchical Storage Management (FHSM)
----------------------------------------------------------------------
Hierarchical Storage Management (or HSM) is a well-known feature in the
storage world. Aufs provides this feature as file-based with multiple
writable branches, based upon the principle of "Colder, the Lower".
Here the word "colder" means that the less used files, and "lower" means
that the position in the order of the stacked branches vertically.
These multiple writable branches are prioritized, ie. the topmost one
should be the fastest drive and be used heavily.
o Characters in aufs FHSM story
- aufs itself and a new branch attribute.
- a new ioctl interface to move-down and to establish a connection with
the daemon ("move-down" is a converse of "copy-up").
- userspace tool and daemon.
The userspace daemon establishes a connection with aufs and waits for
the notification. The notified information is very similar to struct
statfs containing the number of consumed blocks and inodes.
When the consumed blocks/inodes of a branch exceeds the user-specified
upper watermark, the daemon activates its move-down process until the
consumed blocks/inodes reaches the user-specified lower watermark.
The actual move-down is done by aufs based upon the request from
user-space since we need to maintain the inode number and the internal
pointer arrays in aufs.
Currently aufs FHSM handles the regular files only. Additionally they
must not be hard-linked nor pseudo-linked.
o Cowork of aufs and the user-space daemon
During the userspace daemon established the connection, aufs sends a
small notification to it whenever aufs writes something into the
writable branch. But it may cost high since aufs issues statfs(2)
internally. So user can specify a new option to cache the
info. Actually the notification is controlled by these factors.
+ the specified cache time.
+ classified as "force" by aufs internally.
Until the specified time expires, aufs doesn't send the info
except the forced cases. When aufs decide forcing, the info is always
notified to userspace.
For example, the number of free inodes is generally large enough and
the shortage of it happens rarely. So aufs doesn't force the
notification when creating a new file, directory and others. This is
the typical case which aufs doesn't force.
When aufs writes the actual filedata and the files consumes any of new
blocks, the aufs forces notifying.
o Interfaces in aufs
- New branch attribute.
+ fhsm
Specifies that the branch is managed by FHSM feature. In other word,
participant in the FHSM.
When nofhsm is set to the branch, it will not be the source/target
branch of the move-down operation. This attribute is set
independently from coo and moo attributes, and if you want full
FHSM, you should specify them as well.
- New mount option.
+ fhsm_sec
Specifies a second to suppress many less important info to be
notified.
- New ioctl.
+ AUFS_CTL_FHSM_FD
create a new file descriptor which userspace can read the notification
(a subset of struct statfs) from aufs.
- Module parameter 'brs'
It has to be set to 1. Otherwise the new mount option 'fhsm' will not
be set.
- mount helpers /sbin/mount.aufs and /sbin/umount.aufs
When there are two or more branches with fhsm attributes,
/sbin/mount.aufs invokes the user-space daemon and /sbin/umount.aufs
terminates it. As a result of remounting and branch-manipulation, the
number of branches with fhsm attribute can be one. In this case,
/sbin/mount.aufs will terminate the user-space daemon.
Finally the operation is done as these steps in kernel-space.
- make sure that,
+ no one else is using the file.
+ the file is not hard-linked.
+ the file is not pseudo-linked.
+ the file is a regular file.
+ the parent dir is not opaqued.
- find the target writable branch.
- make sure the file is not whiteout-ed by the upper (than the target)
branch.
- make the parent dir on the target branch.
- mutex lock the inode on the branch.
- unlink the whiteout on the target branch (if exists).
- lookup and create the whiteout-ed temporary name on the target branch.
- copy the file as the whiteout-ed temporary name on the target branch.
- rename the whiteout-ed temporary name to the original name.
- unlink the file on the source branch.
- maintain the internal pointer array and the external inode number
table (XINO).
- maintain the timestamps and other attributes of the parent dir and the
file.
And of course, in every step, an error may happen. So the operation
should restore the original file state after an error happens.
......@@ -96,6 +96,15 @@ config AUFS_XATTR
branch attributes for EA.
See detail in aufs.5.
config AUFS_FHSM
bool "File-based Hierarchical Storage Management"
help
Hierarchical Storage Management (or HSM) is a well-known feature
in the storage world. Aufs provides this feature as file-based.
with multiple branches.
These multiple branches are prioritized, ie. the topmost one
should be the fastest drive and be used heavily.
config AUFS_RDU
bool "Readdir in userspace"
help
......
......@@ -30,6 +30,7 @@ aufs-$(CONFIG_AUFS_EXPORT) += export.o
aufs-$(CONFIG_AUFS_XATTR) += xattr.o
aufs-$(CONFIG_FS_POSIX_ACL) += posix_acl.o
aufs-$(CONFIG_AUFS_DIRREN) += dirren.o
aufs-$(CONFIG_AUFS_FHSM) += fhsm.o
aufs-$(CONFIG_AUFS_RDU) += rdu.o
aufs-$(CONFIG_AUFS_DEBUG) += debug.o
aufs-$(CONFIG_AUFS_MAGIC_SYSRQ) += sysrq.o
......@@ -38,6 +38,11 @@ static void au_br_do_free(struct au_branch *br)
AuRwDestroy(&wbr->wbr_wh_rwsem);
}
if (br->br_fhsm) {
au_br_fhsm_fin(br->br_fhsm);
au_kfree_try_rcu(br->br_fhsm);
}
key = br->br_dykey;
for (i = 0; i < AuBrDynOp; i++, key++)
if (*key)
......@@ -136,6 +141,12 @@ static struct au_branch *au_br_alloc(struct super_block *sb, int new_nbranch,
goto out_hnotify;
}
if (au_br_fhsm(perm)) {
err = au_fhsm_br_alloc(add_branch);
if (unlikely(err))
goto out_wbr;
}
root = sb->s_root;
err = au_sbr_realloc(au_sbi(sb), new_nbranch, /*may_shrink*/0);
if (!err)
......@@ -148,8 +159,8 @@ static struct au_branch *au_br_alloc(struct super_block *sb, int new_nbranch,
if (!err)
return add_branch; /* success */
out_wbr:
au_kfree_rcu(add_branch->br_wbr);
out_hnotify:
au_hnotify_fin_br(add_branch);
out_xino:
......@@ -1282,6 +1293,7 @@ int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount,
aufs_bindex_t bindex;
struct dentry *root;
struct au_branch *br;
struct au_br_fhsm *bf;
root = sb->s_root;
bindex = au_find_dbindex(root, mod->h_root);
......@@ -1303,11 +1315,21 @@ int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount,
if (br->br_perm == mod->perm)
return 0; /* success */
/* pre-allocate for non-fhsm --> fhsm */
bf = NULL;
if (!au_br_fhsm(br->br_perm) && au_br_fhsm(mod->perm)) {
err = au_fhsm_br_alloc(br);
if (unlikely(err))
goto out;
bf = br->br_fhsm;
br->br_fhsm = NULL;
}
if (au_br_writable(br->br_perm)) {
/* remove whiteout base */
err = au_br_init_wh(sb, br, mod->perm);
if (unlikely(err))
goto out;
goto out_bf;
if (!au_br_writable(mod->perm)) {
/* rw --> ro, file might be mmapped */
......@@ -1344,12 +1366,25 @@ int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount,
}
}
if (unlikely(err))
goto out;
goto out_bf;
if (au_br_fhsm(br->br_perm)) {
if (!au_br_fhsm(mod->perm)) {
/* fhsm --> non-fhsm */
au_br_fhsm_fin(br->br_fhsm);
au_kfree_rcu(br->br_fhsm);
br->br_fhsm = NULL;
}
} else if (au_br_fhsm(mod->perm))
/* non-fhsm --> fhsm */
br->br_fhsm = bf;
*do_refresh |= need_sigen_inc(br->br_perm, mod->perm);
br->br_perm = mod->perm;
goto out; /* success */
out_bf:
au_kfree_try_rcu(bf);
out:
AuTraceErr(err);
return err;
......
......@@ -43,6 +43,16 @@ struct au_xino {
struct kref xi_kref;
};
/* File-based Hierarchical Storage Management */
struct au_br_fhsm {
#ifdef CONFIG_AUFS_FHSM
struct mutex bf_lock;
unsigned long bf_jiffy;
struct aufs_stfs bf_stfs;
int bf_readable;
#endif
};
/* members for writable branch only */
enum {AuBrWh_BASE, AuBrWh_PLINK, AuBrWh_ORPH, AuBrWh_Last};
struct au_wbr {
......@@ -93,6 +103,7 @@ struct au_branch {
au_lcnt_t br_count; /* in-use for other */
struct au_wbr *br_wbr;
struct au_br_fhsm *br_fhsm;
#ifdef CONFIG_AUFS_HFSNOTIFY
struct au_br_hfsnotify *br_hfsn;
......@@ -318,5 +329,24 @@ static inline int au_sbr_whable(struct super_block *sb, aufs_bindex_t bindex)
#define WbrWhMustAnyLock(wbr) AuRwMustAnyLock(&(wbr)->wbr_wh_rwsem)
#define WbrWhMustWriteLock(wbr) AuRwMustWriteLock(&(wbr)->wbr_wh_rwsem)
/* ---------------------------------------------------------------------- */
#ifdef CONFIG_AUFS_FHSM
static inline void au_br_fhsm_init(struct au_br_fhsm *brfhsm)
{
mutex_init(&brfhsm->bf_lock);
brfhsm->bf_jiffy = 0;
brfhsm->bf_readable = 0;
}
static inline void au_br_fhsm_fin(struct au_br_fhsm *brfhsm)
{
mutex_destroy(&brfhsm->bf_lock);
}
#else
AuStubVoid(au_br_fhsm_init, struct au_br_fhsm *brfhsm)
AuStubVoid(au_br_fhsm_fin, struct au_br_fhsm *brfhsm)
#endif
#endif /* __KERNEL__ */
#endif /* __AUFS_BRANCH_H__ */
......@@ -728,6 +728,7 @@ int cpup_entry(struct au_cp_generic *cpg, struct dentry *dst_parent,
if (cpg->len == -1)
force = !!i_size_read(h_inode);
}
au_fhsm_wrote(sb, cpg->bdst, force);
}
if (do_dt)
......
......@@ -202,7 +202,7 @@ out:
}
static void au_write_post(struct inode *inode, struct file *h_file,
struct au_write_pre *wpre)
struct au_write_pre *wpre, ssize_t written)
{
struct inode *h_inode;
......@@ -211,6 +211,10 @@ static void au_write_post(struct inode *inode, struct file *h_file,
h_inode = file_inode(h_file);
inode->i_mode = h_inode->i_mode;
ii_write_unlock(inode);
/* AuDbg("blks %llu, %llu\n", (u64)blks, (u64)h_inode->i_blocks); */
if (written > 0)
au_fhsm_wrote(inode->i_sb, wpre->btop,
/*force*/h_inode->i_blocks > wpre->blks);
fput(h_file);
}
......@@ -283,7 +287,7 @@ static ssize_t aufs_write(struct file *file, const char __user *ubuf,
goto out;
err = vfsub_write_u(h_file, buf, count, ppos);
au_write_post(inode, h_file, &wpre);
au_write_post(inode, h_file, &wpre, err);
out:
si_read_unlock(inode->i_sb);
......@@ -371,7 +375,7 @@ static ssize_t aufs_write_iter(struct kiocb *kio, struct iov_iter *iov_iter)
goto out;
err = au_do_iter(h_file, MAY_WRITE, kio, iov_iter);
au_write_post(inode, h_file, &wpre);
au_write_post(inode, h_file, &wpre, err);
out:
si_read_unlock(inode->i_sb);
......@@ -426,7 +430,7 @@ aufs_splice_write(struct pipe_inode_info *pipe, struct file *file, loff_t *ppos,
goto out;
err = vfsub_splice_from(pipe, h_file, ppos, len, flags);
au_write_post(inode, h_file, &wpre);
au_write_post(inode, h_file, &wpre, err);
out:
si_read_unlock(inode->i_sb);
......@@ -454,7 +458,7 @@ static long aufs_fallocate(struct file *file, int mode, loff_t offset,
lockdep_off();
err = vfs_fallocate(h_file, mode, offset, len);
lockdep_on();
au_write_post(inode, h_file, &wpre);
au_write_post(inode, h_file, &wpre, /*written*/1);
out:
si_read_unlock(inode->i_sb);
......@@ -513,7 +517,8 @@ static ssize_t aufs_copy_file_range(struct file *src, loff_t src_pos,
a_src.h_file = au_read_pre(src, /*keep_fi*/1, AuLsc_FI_2);
err = PTR_ERR(a_src.h_file);
if (IS_ERR(a_src.h_file)) {
au_write_post(a_dst.inode, a_dst.h_file, &wpre);
au_write_post(a_dst.inode, a_dst.h_file, &wpre,
/*written*/0);
goto out_si;
}
}
......@@ -531,7 +536,7 @@ static ssize_t aufs_copy_file_range(struct file *src, loff_t src_pos,
dst_pos, len, flags);
out_file:
au_write_post(a_dst.inode, a_dst.h_file, &wpre);
au_write_post(a_dst.inode, a_dst.h_file, &wpre, err);
fi_read_unlock(src);
au_read_post(a_src.inode, a_src.h_file);
out_si:
......@@ -679,7 +684,7 @@ static int aufs_fsync_nondir(struct file *file, loff_t start, loff_t end,
goto out_unlock;
err = vfsub_fsync(h_file, &h_file->f_path, datasync);
au_write_post(inode, h_file, &wpre);
au_write_post(inode, h_file, &wpre, /*written*/0);
out_unlock:
si_read_unlock(inode->i_sb);
......
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2011-2019 Junjiro R. Okajima
*/
/*
* File-based Hierarchy Storage Management
*/
#include <linux/anon_inodes.h>
#include <linux/poll.h>
#include <linux/seq_file.h>
#include <linux/statfs.h>
#include "aufs.h"
static aufs_bindex_t au_fhsm_bottom(struct super_block *sb)
{
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
SiMustAnyLock(sb);
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
AuDebugOn(!fhsm);
return fhsm->fhsm_bottom;
}
void au_fhsm_set_bottom(struct super_block *sb, aufs_bindex_t bindex)
{
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
SiMustWriteLock(sb);
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
AuDebugOn(!fhsm);
fhsm->fhsm_bottom = bindex;
}
/* ---------------------------------------------------------------------- */
static int au_fhsm_test_jiffy(struct au_sbinfo *sbinfo, struct au_branch *br)
{
struct au_br_fhsm *bf;
bf = br->br_fhsm;
MtxMustLock(&bf->bf_lock);
return !bf->bf_readable
|| time_after(jiffies,
bf->bf_jiffy + sbinfo->si_fhsm.fhsm_expire);
}
/* ---------------------------------------------------------------------- */
static void au_fhsm_notify(struct super_block *sb, int val)
{
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
SiMustAnyLock(sb);
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
if (au_fhsm_pid(fhsm)
&& atomic_read(&fhsm->fhsm_readable) != -1) {
atomic_set(&fhsm->fhsm_readable, val);
if (val)
wake_up(&fhsm->fhsm_wqh);
}
}
static int au_fhsm_stfs(struct super_block *sb, aufs_bindex_t bindex,
struct aufs_stfs *rstfs, int do_lock, int do_notify)
{
int err;
struct au_branch *br;
struct au_br_fhsm *bf;
br = au_sbr(sb, bindex);
AuDebugOn(au_br_rdonly(br));
bf = br->br_fhsm;
AuDebugOn(!bf);
if (do_lock)
mutex_lock(&bf->bf_lock);
else
MtxMustLock(&bf->bf_lock);
/* sb->s_root for NFS is unreliable */
err = au_br_stfs(br, &bf->bf_stfs);
if (unlikely(err)) {
AuErr1("FHSM failed (%d), b%d, ignored.\n", bindex, err);
goto out;
}
bf->bf_jiffy = jiffies;
bf->bf_readable = 1;
if (do_notify)
au_fhsm_notify(sb, /*val*/1);
if (rstfs)
*rstfs = bf->bf_stfs;
out:
if (do_lock)
mutex_unlock(&bf->bf_lock);
au_fhsm_notify(sb, /*val*/1);
return err;
}
void au_fhsm_wrote(struct super_block *sb, aufs_bindex_t bindex, int force)
{
int err;
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
struct au_branch *br;
struct au_br_fhsm *bf;
AuDbg("b%d, force %d\n", bindex, force);
SiMustAnyLock(sb);
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
if (!au_ftest_si(sbinfo, FHSM)
|| fhsm->fhsm_bottom == bindex)
return;
br = au_sbr(sb, bindex);
bf = br->br_fhsm;
AuDebugOn(!bf);
mutex_lock(&bf->bf_lock);
if (force
|| au_fhsm_pid(fhsm)
|| au_fhsm_test_jiffy(sbinfo, br))
err = au_fhsm_stfs(sb, bindex, /*rstfs*/NULL, /*do_lock*/0,
/*do_notify*/1);
mutex_unlock(&bf->bf_lock);
}
void au_fhsm_wrote_all(struct super_block *sb, int force)
{
aufs_bindex_t bindex, bbot;
struct au_branch *br;
/* exclude the bottom */
bbot = au_fhsm_bottom(sb);
for (bindex = 0; bindex < bbot; bindex++) {
br = au_sbr(sb, bindex);
if (au_br_fhsm(br->br_perm))
au_fhsm_wrote(sb, bindex, force);
}
}
/* ---------------------------------------------------------------------- */
static __poll_t au_fhsm_poll(struct file *file, struct poll_table_struct *wait)
{
__poll_t mask;
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
mask = 0;
sbinfo = file->private_data;
fhsm = &sbinfo->si_fhsm;
poll_wait(file, &fhsm->fhsm_wqh, wait);
if (atomic_read(&fhsm->fhsm_readable))
mask = EPOLLIN /* | EPOLLRDNORM */;
if (!mask)
AuDbg("mask 0x%x\n", mask);
return mask;
}
static int au_fhsm_do_read_one(struct aufs_stbr __user *stbr,
struct aufs_stfs *stfs, __s16 brid)
{
int err;
err = copy_to_user(&stbr->stfs, stfs, sizeof(*stfs));
if (!err)
err = __put_user(brid, &stbr->brid);
if (unlikely(err))
err = -EFAULT;
return err;
}
static ssize_t au_fhsm_do_read(struct super_block *sb,
struct aufs_stbr __user *stbr, size_t count)
{
ssize_t err;
int nstbr;
aufs_bindex_t bindex, bbot;
struct au_branch *br;
struct au_br_fhsm *bf;
/* except the bottom branch */
err = 0;
nstbr = 0;
bbot = au_fhsm_bottom(sb);
for (bindex = 0; !err && bindex < bbot; bindex++) {
br = au_sbr(sb, bindex);
if (!au_br_fhsm(br->br_perm))
continue;
bf = br->br_fhsm;
mutex_lock(&bf->bf_lock);
if (bf->bf_readable) {
err = -EFAULT;
if (count >= sizeof(*stbr))
err = au_fhsm_do_read_one(stbr++, &bf->bf_stfs,
br->br_id);
if (!err) {
bf->bf_readable = 0;
count -= sizeof(*stbr);
nstbr++;
}
}
mutex_unlock(&bf->bf_lock);
}
if (!err)
err = sizeof(*stbr) * nstbr;
return err;
}
static ssize_t au_fhsm_read(struct file *file, char __user *buf, size_t count,
loff_t *pos)
{
ssize_t err;
int readable;
aufs_bindex_t nfhsm, bindex, bbot;
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
struct au_branch *br;
struct super_block *sb;
err = 0;
sbinfo = file->private_data;
fhsm = &sbinfo->si_fhsm;
need_data:
spin_lock_irq(&fhsm->fhsm_wqh.lock);
if (!atomic_read(&fhsm->fhsm_readable)) {
if (vfsub_file_flags(file) & O_NONBLOCK)
err = -EAGAIN;
else
err = wait_event_interruptible_locked_irq
(fhsm->fhsm_wqh,
atomic_read(&fhsm->fhsm_readable));
}
spin_unlock_irq(&fhsm->fhsm_wqh.lock);
if (unlikely(err))
goto out;
/* sb may already be dead */
au_rw_read_lock(&sbinfo->si_rwsem);
readable = atomic_read(&fhsm->fhsm_readable);
if (readable > 0) {
sb = sbinfo->si_sb;
AuDebugOn(!sb);
/* exclude the bottom branch */
nfhsm = 0;
bbot = au_fhsm_bottom(sb);
for (bindex = 0; bindex < bbot; bindex++) {
br = au_sbr(sb, bindex);
if (au_br_fhsm(br->br_perm))
nfhsm++;
}
err = -EMSGSIZE;
if (nfhsm * sizeof(struct aufs_stbr) <= count) {
atomic_set(&fhsm->fhsm_readable, 0);
err = au_fhsm_do_read(sbinfo->si_sb, (void __user *)buf,
count);
}
}
au_rw_read_unlock(&sbinfo->si_rwsem);
if (!readable)
goto need_data;
out:
return err;
}
static int au_fhsm_release(struct inode *inode, struct file *file)
{
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
/* sb may already be dead */
sbinfo = file->private_data;
fhsm = &sbinfo->si_fhsm;
spin_lock(&fhsm->fhsm_spin);
fhsm->fhsm_pid = 0;
spin_unlock(&fhsm->fhsm_spin);
kobject_put(&sbinfo->si_kobj);
return 0;
}
static const struct file_operations au_fhsm_fops = {
.owner = THIS_MODULE,
.llseek = noop_llseek,
.read = au_fhsm_read,
.poll = au_fhsm_poll,
.release = au_fhsm_release
};
int au_fhsm_fd(struct super_block *sb, int oflags)
{
int err, fd;
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
err = -EPERM;
if (unlikely(!capable(CAP_SYS_ADMIN)))
goto out;
err = -EINVAL;
if (unlikely(oflags & ~(O_CLOEXEC | O_NONBLOCK)))
goto out;
err = 0;
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
spin_lock(&fhsm->fhsm_spin);
if (!fhsm->fhsm_pid)
fhsm->fhsm_pid = current->pid;
else
err = -EBUSY;
spin_unlock(&fhsm->fhsm_spin);
if (unlikely(err))
goto out;
oflags |= O_RDONLY;
/* oflags |= FMODE_NONOTIFY; */
fd = anon_inode_getfd("[aufs_fhsm]", &au_fhsm_fops, sbinfo, oflags);
err = fd;
if (unlikely(fd < 0))
goto out_pid;
/* succeed regardless 'fhsm' status */
kobject_get(&sbinfo->si_kobj);
si_noflush_read_lock(sb);
if (au_ftest_si(sbinfo, FHSM))
au_fhsm_wrote_all(sb, /*force*/0);
si_read_unlock(sb);
goto out; /* success */
out_pid:
spin_lock(&fhsm->fhsm_spin);
fhsm->fhsm_pid = 0;
spin_unlock(&fhsm->fhsm_spin);
out:
AuTraceErr(err);
return err;
}
/* ---------------------------------------------------------------------- */
int au_fhsm_br_alloc(struct au_branch *br)
{
int err;
err = 0;
br->br_fhsm = kmalloc(sizeof(*br->br_fhsm), GFP_NOFS);
if (br->br_fhsm)
au_br_fhsm_init(br->br_fhsm);
else
err = -ENOMEM;
return err;
}
/* ---------------------------------------------------------------------- */
void au_fhsm_fin(struct super_block *sb)
{
au_fhsm_notify(sb, /*val*/-1);
}
void au_fhsm_init(struct au_sbinfo *sbinfo)
{
struct au_fhsm *fhsm;
fhsm = &sbinfo->si_fhsm;
spin_lock_init(&fhsm->fhsm_spin);
init_waitqueue_head(&fhsm->fhsm_wqh);
atomic_set(&fhsm->fhsm_readable, 0);
fhsm->fhsm_expire
= msecs_to_jiffies(AUFS_FHSM_CACHE_DEF_SEC * MSEC_PER_SEC);
fhsm->fhsm_bottom = -1;
}
void au_fhsm_set(struct au_sbinfo *sbinfo, unsigned int sec)
{
sbinfo->si_fhsm.fhsm_expire
= msecs_to_jiffies(sec * MSEC_PER_SEC);
}
void au_fhsm_show(struct seq_file *seq, struct au_sbinfo *sbinfo)
{
unsigned int u;
if (!au_ftest_si(sbinfo, FHSM))
return;
u = jiffies_to_msecs(sbinfo->si_fhsm.fhsm_expire) / MSEC_PER_SEC;
if (u != AUFS_FHSM_CACHE_DEF_SEC)
seq_printf(seq, ",fhsm_sec=%u", u);
}
......@@ -96,7 +96,7 @@ out:
static int au_cmoo(struct dentry *dentry)
{
int err, cmoo;
int err, cmoo, matched;
unsigned int udba;
struct path h_path;
struct au_pin pin;
......@@ -111,6 +111,8 @@ static int au_cmoo(struct dentry *dentry)
struct inode *delegated;
struct super_block *sb;
struct au_sbinfo *sbinfo;
struct au_fhsm *fhsm;
pid_t pid;
struct au_branch *br;
struct dentry *parent;
struct au_hinode *hdir;
......@@ -127,6 +129,16 @@ static int au_cmoo(struct dentry *dentry)
sb = dentry->d_sb;
sbinfo = au_sbi(sb);
fhsm = &sbinfo->si_fhsm;
pid = au_fhsm_pid(fhsm);
rcu_read_lock();
matched = (pid
&& (current->pid == pid
|| rcu_dereference(current->real_parent)->pid == pid));
rcu_read_unlock();
if (matched)
goto out;
br = au_sbr(sb, cpg.bsrc);
cmoo = au_br_cmoo(br->br_perm);
if (!cmoo)
......
......@@ -48,6 +48,7 @@ static int epilog(struct inode *dir, aufs_bindex_t bindex,
IMustLock(dir);
au_dir_ts(dir, bindex);
inode_inc_iversion(dir);
au_fhsm_wrote(sb, bindex, /*force*/0);
return 0; /* success */
}
......@@ -776,6 +777,7 @@ int aufs_link(struct dentry *src_dentry, struct inode *dir,
/* some filesystem calls d_drop() */
d_drop(dentry);
/* some filesystems consume an inode even hardlink */
au_fhsm_wrote(sb, a->bdst, /*force*/0);
goto out_unpin; /* success */
out_revert:
......
......@@ -437,6 +437,8 @@ static int do_rename(struct au_ren_args *a)
/* remove whtmp */
if (a->thargs)
au_ren_del_whtmp(a); /* ignore this error */
au_fhsm_wrote(a->src_dentry->d_sb, a->btgt, /*force*/0);
}
err = 0;
goto out_success;
......
......@@ -110,6 +110,7 @@ out:
long aufs_ioctl_dir(struct file *file, unsigned int cmd, unsigned long arg)
{
long err;
struct dentry *dentry;
switch (cmd) {
case AUFS_CTL_RDU:
......@@ -129,6 +130,14 @@ long aufs_ioctl_dir(struct file *file, unsigned int cmd, unsigned long arg)
err = au_brinfo_ioctl(file, arg);
break;
case AUFS_CTL_FHSM_FD:
dentry = file->f_path.dentry;
if (IS_ROOT(dentry))
err = au_fhsm_fd(dentry->d_sb, arg);
else
err = -ENOTTY;
break;
default:
/* do not call the lower */
AuDbg("0x%x\n", cmd);
......
......@@ -61,7 +61,12 @@ static int find_lower_writable(struct au_mvd_args *a)
bindex = a->mvd_bsrc;
bbot = au_sbbot(sb);
if (a->mvdown.flags & AUFS_MVDOWN_FHSM_LOWER)
; /* re-commit later */
for (bindex++; bindex <= bbot; bindex++) {
br = au_sbr(sb, bindex);
if (au_br_fhsm(br->br_perm)
&& !sb_rdonly(au_br_sb(br)))
return bindex;
}
else if (!(a->mvdown.flags & AUFS_MVDOWN_ROLOWER))
for (bindex++; bindex <= bbot; bindex++) {
br = au_sbr(sb, bindex);
......
......@@ -31,6 +31,7 @@ enum {
Opt_diropq_a, Opt_diropq_w,
Opt_warn_perm, Opt_nowarn_perm,
Opt_wbr_copyup, Opt_wbr_create,
Opt_fhsm_sec,
Opt_verbose, Opt_noverbose,
Opt_sum, Opt_nosum, Opt_wsum,
Opt_dirperm1, Opt_nodirperm1,
......@@ -97,6 +98,12 @@ static match_table_t options = {
{Opt_ignore_silent, "nodirren"},
#endif
#ifdef CONFIG_AUFS_FHSM
{Opt_fhsm_sec, "fhsm_sec=%d"},
#else
{Opt_ignore, "fhsm_sec=%d"},
#endif
{Opt_diropq_a, "diropq=always"},
{Opt_diropq_a, "diropq=a"},
{Opt_diropq_w, "diropq=whiteouted"},
......@@ -205,6 +212,9 @@ static match_table_t brattr = {
/* general */
{AuBrAttr_COO_REG, AUFS_BRATTR_COO_REG},
{AuBrAttr_COO_ALL, AUFS_BRATTR_COO_ALL},
#ifdef CONFIG_AUFS_FHSM
{AuBrAttr_FHSM, AUFS_BRATTR_FHSM},
#endif
#ifdef CONFIG_AUFS_XATTR
{AuBrAttr_ICEX, AUFS_BRATTR_ICEX},
{AuBrAttr_ICEX_SEC, AUFS_BRATTR_ICEX_SEC},
......@@ -682,6 +692,9 @@ static void dump_opts(struct au_opts *opts)
AuDbg("copyup %d, %s\n", opt->wbr_copyup,
au_optstr_wbr_copyup(opt->wbr_copyup));
break;
case Opt_fhsm_sec:
AuDbg("fhsm_sec %u\n", opt->fhsm_second);
break;
case Opt_dirren:
AuLabel(dirren);
break;
......@@ -1174,6 +1187,20 @@ int au_opts_parse(struct super_block *sb, char *str, struct au_opts *opts)
pr_err("wrong value, %s\n", opt_str);
break;
case Opt_fhsm_sec:
if (unlikely(match_int(&a->args[0], &n)
|| n < 0)) {
pr_err("bad integer in %s\n", opt_str);
break;
}
if (sysaufs_brs) {
opt->fhsm_second = n;
opt->type = token;
} else
pr_warn("ignored %s\n", opt_str);
err = 0;
break;
case Opt_ignore:
pr_warn("ignored %s\n", opt_str);
/*FALLTHROUGH*/
......@@ -1292,6 +1319,10 @@ static int au_opt_simple(struct super_block *sb, struct au_opt *opt,
au_fset_opts(opts->flags, REFRESH_DYAOP);
break;
case Opt_fhsm_sec:
au_fhsm_set(sbinfo, opt->fhsm_second);
break;
case Opt_diropq_a:
au_opt_set(sbinfo->si_mntflags, ALWAYS_DIROPQ);
break;
......@@ -1511,7 +1542,7 @@ static int au_opt_xino(struct super_block *sb, struct au_opt *opt,
int au_opts_verify(struct super_block *sb, unsigned long sb_flags,
unsigned int pending)
{
int err;
int err, fhsm;
aufs_bindex_t bindex, bbot;
unsigned char do_plink, skip, do_free, can_no_dreval;
struct au_branch *br;
......@@ -1542,6 +1573,7 @@ int au_opts_verify(struct super_block *sb, unsigned long sb_flags,
" by the permission bits on the lower branch\n");
err = 0;
fhsm = 0;
root = sb->s_root;
dir = d_inode(root);
do_plink = !!au_opt_test(sbinfo->si_mntflags, PLINK);
......@@ -1604,6 +1636,11 @@ int au_opts_verify(struct super_block *sb, unsigned long sb_flags,
spin_unlock(&dentry->d_lock);
}
if (au_br_fhsm(br->br_perm)) {
fhsm++;
AuDebugOn(!br->br_fhsm);
}
if (skip)
continue;
......@@ -1627,6 +1664,20 @@ int au_opts_verify(struct super_block *sb, unsigned long sb_flags,
else
au_fclr_si(sbinfo, NO_DREVAL);
if (fhsm >= 2) {
au_fset_si(sbinfo, FHSM);
for (bindex = bbot; bindex >= 0; bindex--) {
br = au_sbr(sb, bindex);
if (au_br_fhsm(br->br_perm)) {
au_fhsm_set_bottom(sb, bindex);
break;
}
}
} else {
au_fclr_si(sbinfo, FHSM);
au_fhsm_set_bottom(sb, -1);
}
return err;
}
......
......@@ -159,7 +159,7 @@ struct au_opt {
int udba;
struct au_opt_wbr_create wbr_create;
int wbr_copyup;
/* add more later */
unsigned int fhsm_second;
};
};
......
......@@ -82,6 +82,8 @@ int au_si_alloc(struct super_block *sb)
sbinfo->si_wbr_copyup_ops = au_wbr_copyup_ops + sbinfo->si_wbr_copyup;
sbinfo->si_wbr_create_ops = au_wbr_create_ops + sbinfo->si_wbr_create;
au_fhsm_init(sbinfo);
sbinfo->si_mntflags = au_opts_plink(AuOpt_Def);
sbinfo->si_xino_jiffy = jiffies;
......
......@@ -281,6 +281,8 @@ static int aufs_show_options(struct seq_file *m, struct dentry *dentry)
AuUInt(RDBLK, rdblk, sbinfo->si_rdblk);
AuUInt(RDHASH, rdhash, sbinfo->si_rdhash);
au_fhsm_show(m, sbinfo);
AuBool(DIRREN, dirren);
AuBool(SUM, sum);
/* AuBool(SUM_W, wsum); */
......@@ -822,6 +824,7 @@ static int aufs_remount_fs(struct super_block *sb, int *flags, char *data)
au_dy_arefresh(do_dx);
}
au_fhsm_wrote_all(sb, /*force*/1); /* ?? */
aufs_write_unlock(root);
out_mtx:
......@@ -1004,6 +1007,7 @@ static void aufs_kill_sb(struct super_block *sb)
if (sbinfo) {
au_sbilist_del(sb);
aufs_write_lock(sb->s_root);
au_fhsm_fin(sb);
if (sbinfo->si_wbr_create_ops->fin)
sbinfo->si_wbr_create_ops->fin(sb);
if (au_opt_test(sbinfo->si_mntflags, UDBA_HNOTIFY)) {
......
......@@ -53,6 +53,21 @@ static inline int au_plink_hash(ino_t ino)
return ino % AuPlink_NHASH;
}
/* File-based Hierarchical Storage Management */
struct au_fhsm {
#ifdef CONFIG_AUFS_FHSM
/* allow only one process who can receive the notification */
spinlock_t fhsm_spin;
pid_t fhsm_pid;
wait_queue_head_t fhsm_wqh;
atomic_t fhsm_readable;
/* these are protected by si_rwsem */
unsigned long fhsm_expire;
aufs_bindex_t fhsm_bottom;
#endif
};
struct au_branch;
struct au_sbinfo {
/* nowait tasks in the system-wide workqueue */
......@@ -95,6 +110,9 @@ struct au_sbinfo {
/* most free space */
struct au_wbr_mfs si_wbr_mfs;
/* File-based Hierarchical Storage Management */
struct au_fhsm si_fhsm;
/* mount flags */
/* include/asm-ia64/siginfo.h defines a macro named si_flags */
unsigned int si_mntflags;
......@@ -181,9 +199,14 @@ struct au_sbinfo {
* if it is false, refreshing dirs at access time is unnecessary
*/
#define AuSi_FAILED_REFRESH_DIR 1
/* add later */
#define AuSi_FHSM (1 << 1) /* fhsm is active now */
#define AuSi_NO_DREVAL (1 << 2) /* disable all d_revalidate */
#ifndef CONFIG_AUFS_FHSM
#undef AuSi_FHSM
#define AuSi_FHSM 0
#endif
static inline unsigned char au_do_ftest_si(struct au_sbinfo *sbi,
unsigned int flag)
{
......@@ -263,6 +286,43 @@ int au_wbr_do_copyup_bu(struct dentry *dentry, aufs_bindex_t btop);
/* mvdown.c */
int au_mvdown(struct dentry *dentry, struct aufs_mvdown __user *arg);
#ifdef CONFIG_AUFS_FHSM
/* fhsm.c */
static inline pid_t au_fhsm_pid(struct au_fhsm *fhsm)
{
pid_t pid;
spin_lock(&fhsm->fhsm_spin);
pid = fhsm->fhsm_pid;
spin_unlock(&fhsm->fhsm_spin);
return pid;
}
void au_fhsm_wrote(struct super_block *sb, aufs_bindex_t bindex, int force);
void au_fhsm_wrote_all(struct super_block *sb, int force);
int au_fhsm_fd(struct super_block *sb, int oflags);
int au_fhsm_br_alloc(struct au_branch *br);
void au_fhsm_set_bottom(struct super_block *sb, aufs_bindex_t bindex);
void au_fhsm_fin(struct super_block *sb);
void au_fhsm_init(struct au_sbinfo *sbinfo);
void au_fhsm_set(struct au_sbinfo *sbinfo, unsigned int sec);
void au_fhsm_show(struct seq_file *seq, struct au_sbinfo *sbinfo);
#else
AuStubVoid(au_fhsm_wrote, struct super_block *sb, aufs_bindex_t bindex,
int force)
AuStubVoid(au_fhsm_wrote_all, struct super_block *sb, int force)
AuStub(int, au_fhsm_fd, return -EOPNOTSUPP, struct super_block *sb, int oflags)
AuStub(pid_t, au_fhsm_pid, return 0, struct au_fhsm *fhsm)
AuStubInt0(au_fhsm_br_alloc, struct au_branch *br)
AuStubVoid(au_fhsm_set_bottom, struct super_block *sb, aufs_bindex_t bindex)
AuStubVoid(au_fhsm_fin, struct super_block *sb)
AuStubVoid(au_fhsm_init, struct au_sbinfo *sbinfo)
AuStubVoid(au_fhsm_set, struct au_sbinfo *sbinfo, unsigned int sec)
AuStubVoid(au_fhsm_show, struct seq_file *seq, struct au_sbinfo *sbinfo)
#endif
/* ---------------------------------------------------------------------- */
static inline struct au_sbinfo *au_sbi(struct super_block *sb)
......
......@@ -152,6 +152,7 @@ static int au_cpdown_dir(struct dentry *dentry, aufs_bindex_t bdst,
au_set_ibbot(inode, bdst);
au_set_h_iptr(inode, bdst, au_igrab(h_inode),
au_hi_flags(inode, /*isdir*/1));
au_fhsm_wrote(dentry->d_sb, bdst, /*force*/0);
goto out; /* success */
/* revert */
......
......@@ -581,6 +581,8 @@ static void reinit_br_wh(void *arg)
wbr_wh_write_unlock(wbr);
au_hn_inode_unlock(hdir);
di_read_unlock(a->sb->s_root, AuLock_IR);
if (!err)
au_fhsm_wrote(a->sb, bindex, /*force*/0);
out:
if (wbr)
......@@ -668,6 +670,8 @@ static int link_or_create_wh(struct super_block *sb, aufs_bindex_t bindex,
/* return this error in this context */
err = vfsub_create(h_dir, &h_path, WH_MASK, /*want_excl*/true);
if (!err)
au_fhsm_wrote(sb, bindex, /*force*/0);
out:
wbr_wh_read_unlock(wbr);
......@@ -792,9 +796,10 @@ struct dentry *au_wh_create(struct dentry *dentry, aufs_bindex_t bindex,
wh_dentry = au_wh_lkup(h_parent, &dentry->d_name, au_sbr(sb, bindex));
if (!IS_ERR(wh_dentry) && d_is_negative(wh_dentry)) {
err = link_or_create_wh(sb, bindex, wh_dentry);
if (!err)
if (!err) {
au_set_dbwh(dentry, bindex);
else {
au_fhsm_wrote(sb, bindex, /*force*/0);
} else {
dput(wh_dentry);
wh_dentry = ERR_PTR(err);
}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment