bf bf film bf rajwap.biz tamil pussy fingering desi pron video coxhamster.mobi sexy girl video open xvideoimage pornfactory.info bihari sexy video download سكس كلاسيك مترجم pornclassic.info سكس عراقي hindixxvidio porno-zona.com dise murga indian old women nude rajwap.biz srxyvidio www sexy videoes com indiansexmms.me sxsi bf مصريات شراميط pornvuku.net نيك احلي كس malligi hotel donfreeporn.mobi bengalixxxvideo tsmil sex youjizz.sex perfect girl xxx hd any porn porntubemovs.info xxx thumbzilla نيك طيز مترجم 3gpjizz.info سكس فى السجن xossip bengali kamporn.mobi indian sex play forced indian sex video indiansexgate.mobi sleeping girl xnxx classic video com 2beeg.me indiangirlssexvideos

引子

在分布式系统中,保证客户端的缓存一致性非常重要。

假设有两个客户端Client A和Client B。集群的文件系统中有这么一个目录:/a/b/c/d/。

Client A创建文件/a/b/c/d/a.txt成功,同时缓存了这个目录。

Client B移动目录’d’到/a/下面。目录变成/a/b/c/和/a/d/。

Client A尝试创建文件/a/b/c/d/b.txt。

为了性能考虑,Client A会之间从缓存中读取目录信息而不是每次都通过网络询问。

Client A从缓存中得知,目录/a/b/c/d/存在,于是获取’d’的Inode并发送创建文件请求(在目录’d’下创建b.txt)。服务器响应请求并在/a/d/下创建b.txt。

很明显,这种行为是错误的,因为目录/a/b/c/d/已经不再存在。

经过测试,Ceph很好地解决了这一问题。

代码分析

首先看open函数:

int Client::open(const char *relpath, int flags, const UserPerm &perms,
                 mode_t mode, int stripe_unit, int stripe_count,
                 int object_size, const char *data_pool)
{
  ldout(cct, 3) << "open enter(" << relpath << ", " << ceph_flags_sys2wire(flags) << "," << mode << ")" << dendl;
  std::lock_guard lock(client_lock);
  tout(cct) << "open" << std::endl;
  tout(cct) << relpath << std::endl;
  tout(cct) << ceph_flags_sys2wire(flags) << std::endl;

  if (unmounting)
    return -ENOTCONN;

  Fh *fh = NULL;//file handle for any open file state

#if defined(__linux__) && defined(O_PATH)
  /* When the O_PATH is being specified, others flags than O_DIRECTORY
   * and O_NOFOLLOW are ignored. Please refer do_entry_open() function
   * in kernel (fs/open.c). */
  if (flags & O_PATH)
    flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
#endif

  filepath path(relpath);
  InodeRef in;
  bool created = false;
  /* O_CREATE with O_EXCL enforces O_NOFOLLOW. */
  bool followsym = !((flags & O_NOFOLLOW) || ((flags & O_CREAT) && (flags & O_EXCL)));///能不能是符号链接
  int r = path_walk(path, &in, perms, followsym, ceph_caps_for_mode(mode));

  if (r == 0 && (flags & O_CREAT) && (flags & O_EXCL))
    return -EEXIST;

#if defined(__linux__) && defined(O_PATH)
  if (r == 0 && in->is_symlink() && (flags & O_NOFOLLOW) && !(flags & O_PATH))
#else
  if (r == 0 && in->is_symlink() && (flags & O_NOFOLLOW))
#endif
    return -ELOOP;

  if (r == -ENOENT && (flags & O_CREAT))
  {
    filepath dirpath = path;
    string dname = dirpath.last_dentry();
    dirpath.pop_dentry();
    InodeRef dir;
    r = path_walk(dirpath, &dir, perms, true,
                  cct->_conf->client_permissions ? CEPH_CAP_AUTH_SHARED : 0);
    if (r < 0)
      goto out;
    if (cct->_conf->client_permissions)
    {
      r = may_create(dir.get(), perms);
      if (r < 0)
        goto out;
    }
    r = _create(dir.get(), dname.c_str(), flags, mode, &in, &fh, stripe_unit,
                stripe_count, object_size, data_pool, &created, perms);
  }
  if (r < 0)
    goto out;

  if (!created)
  {
    // posix says we can only check permissions of existing files
    if (cct->_conf->client_permissions)
    {
      r = may_open(in.get(), flags, perms);
      if (r < 0)
        goto out;
    }
  }

  if (!fh)
    r = _open(in.get(), flags, mode, &fh, perms);
  if (r >= 0)
  {
    // allocate a integer file descriptor
    ceph_assert(fh);
    r = get_fd();
    ceph_assert(fd_map.count(r) == 0);
    fd_map[r] = fh;
  }

out:
  tout(cct) << r << std::endl;
  ldout(cct, 3) << "open exit(" << path << ", " << ceph_flags_sys2wire(flags) << ") = " << r << dendl;
  return r;
}

其中path_walk函数用于寻找路径,返回最终的InodeRef。

int Client::path_walk(const filepath &origpath, InodeRef *end,
                      const UserPerm &perms, bool followsym, int mask)
{
  filepath path = origpath;
  InodeRef cur;
  if (origpath.absolute())
    cur = root;
  else
    cur = cwd;
  ceph_assert(cur);

  ldout(cct, 10) << __func__ << " " << path << dendl;

  int symlinks = 0;

  unsigned i = 0;
  while (i < path.depth() && cur)//执行parse_bits()返回bits.size()
  {
    int caps = 0;
    const string &dname = path[i];
    ldout(cct, 10) << " " << i << " " << *cur << " " << dname << dendl;
    ldout(cct, 20) << "  (path is " << path << ")" << dendl;
    InodeRef next;
    if (cct->_conf->client_permissions)
    {
      int r = may_lookup(cur.get(), perms);
      if (r < 0)
        return r;
      caps = CEPH_CAP_AUTH_SHARED;
    }

    /* Get extra requested caps on the last component */
    if (i == (path.depth() - 1))
      caps |= mask;
    int r = _lookup(cur.get(), dname, caps, &next, perms);
    if (r < 0)
      return r;
    // only follow trailing symlink if followsym.  always follow
    // 'directory' symlinks.
    if (next && next->is_symlink())
    {
      symlinks++;
      ldout(cct, 20) << " symlink count " << symlinks << ", value is '" << next->symlink << "'" << dendl;
      if (symlinks > MAXSYMLINKS)
      {
        return -ELOOP;
      }

      if (i < path.depth() - 1)
      {
        // dir symlink
        // replace consumed components of path with symlink dir target
        filepath resolved(next->symlink.c_str());
        resolved.append(path.postfixpath(i + 1));
        path = resolved;
        i = 0;
        if (next->symlink[0] == '/')
        {
          cur = root;
        }
        continue;
      }
      else if (followsym)
      {
        if (next->symlink[0] == '/')
        {
          path = next->symlink.c_str();
          i = 0;
          // reset position
          cur = root;
        }
        else
        {
          filepath more(next->symlink.c_str());
          // we need to remove the symlink component from off of the path
          // before adding the target that the symlink points to.  remain
          // at the same position in the path.
          path.pop_dentry();
          path.append(more);
        }
        continue;
      }
    }
    cur.swap(next);
    i++;
  }
  if (!cur)
    return -ENOENT;
  if (end)
    end->swap(cur);
  return 0;
}

其中_lookup函数用于寻找下一级目录。

int Client::_lookup(Inode *dir, const string &dname, int mask, InodeRef *target,
                    const UserPerm &perms)
{
  int r = 0;
  Dentry *dn = NULL;

  if (dname == "..")
  {
    if (dir->dentries.empty())//xlist<Dentry *> dentries; // if i'm linked to a dentry.缓存不存在
    {
      MetaRequest *req = new MetaRequest(CEPH_MDS_OP_LOOKUPPARENT);
      filepath path(dir->ino);
      req->set_filepath(path);

      InodeRef tmptarget;
      int r = make_request(req, perms, &tmptarget, NULL, rand() % mdsmap->get_num_in_mds());

      if (r == 0)
      {
        Inode *tempino = tmptarget.get();
        _ll_get(tempino);
        *target = tempino;
        ldout(cct, 8) << __func__ << " found target " << (*target)->ino << dendl;
      }
      else
      {
        *target = dir;
      }
    }
    else//缓存存在
      *target = dir->get_first_parent()->dir->parent_inode; //dirs can't be hard-linked
    goto done;
  }

  if (dname == ".")
  {
    *target = dir;
    goto done;
  }

  if (!dir->is_dir())
  {
    r = -ENOTDIR;
    goto done;
  }

  if (dname.length() > NAME_MAX)
  {
    r = -ENAMETOOLONG;
    goto done;
  }

  if (dname == cct->_conf->client_snapdir &&
      dir->snapid == CEPH_NOSNAP)
  {
    *target = open_snapdir(dir);
    goto done;
  }

  if (dir->dir &&//Inode -> Dir //if i am a dir
      dir->dir->dentries.count(dname))//存在名为dname的子目录dentry unordered_map<string, Dentry*>
  {
    dn = dir->dir->dentries[dname];

    ldout(cct, 20) << __func__ << " have dn " << dname << " mds." << dn->lease_mds << " ttl " << dn->lease_ttl
                   << " seq " << dn->lease_seq
                   << dendl;

    if (!dn->inode || dn->inode->caps_issued_mask(mask, true))//dentry未link(子目录不存在),或子目录存在并有满足mask的cap
    {
      // is dn lease valid?
      utime_t now = ceph_clock_now();
      if (dn->lease_mds >= 0 &&
          dn->lease_ttl > now &&
          mds_sessions.count(dn->lease_mds))//dentry未撤销,未过期
      {
        MetaSession &s = mds_sessions.at(dn->lease_mds);
        if (s.cap_ttl > now &&
            s.cap_gen == dn->lease_gen)//MetaSession的cap未过期,且此dentry的lease来自这个MetaSession
        {
          // touch this mds's dir cap too, even though we don't _explicitly_ use it here, to
          // make trim_caps() behave.
          dir->try_touch_cap(dn->lease_mds);
          goto hit_dn;
        }
        ldout(cct, 20) << " bad lease, cap_ttl " << s.cap_ttl << ", cap_gen " << s.cap_gen
                       << " vs lease_gen " << dn->lease_gen << dendl;.
      }
      // dir shared caps?
      if (dir->caps_issued_mask(CEPH_CAP_FILE_SHARED, true))
      {
        if (dn->cap_shared_gen == dir->shared_gen &&
            (!dn->inode || dn->inode->caps_issued_mask(mask, true)))
          goto hit_dn;
        if (!dn->inode && (dir->flags & I_COMPLETE))
        {
          ldout(cct, 10) << __func__ << " concluded ENOENT locally for "
                         << *dir << " dn '" << dname << "'" << dendl;
          return -ENOENT;
        }
      }
    }
    else
    {
      ldout(cct, 20) << " no cap on " << dn->inode->vino() << dendl;
    }
  }
  else
  {
    // can we conclude ENOENT locally?
    if (dir->caps_issued_mask(CEPH_CAP_FILE_SHARED, true) &&
        (dir->flags & I_COMPLETE))
    {
      ldout(cct, 10) << __func__ << " concluded ENOENT locally for " << *dir << " dn '" << dname << "'" << dendl;
      return -ENOENT;
    }
  }

  r = _do_lookup(dir, dname, mask, target, perms);//向MDS查询
  goto done;

hit_dn:
  if (dn->inode)
  {
    *target = dn->inode;
  }
  else
  {
    r = -ENOENT;
  }
  touch_dn(dn);

done:
  if (r < 0)
    ldout(cct, 10) << __func__ << " " << *dir << " " << dname << " = " << r << dendl;
  else
    ldout(cct, 10) << __func__ << " " << *dir << " " << dname << " = " << **target << dendl;
  return r;
}

_lookup函数解析

首先,特判下一级目录为”..”和”.”的情况:

如果是”..”,查上级目录dir的dentry缓存。如果不存在,向MDS查询;如果存在,直接返回父目录。

如果是”.”,直接返回此目录。

1.排除掉一些错误情况后,看dir是否是目录且存在名为dname的dentry?

如果存在,下一步;

如果不存在,查询上级目录是否有CAP_FILE_SHARED。根据上一篇文章所述,如果上级目录拥有此CAP,说明它的目录信息是正确的,此时没查到说明本身就是没有,返回“不存在”。如果没有此CAP,向MDS查询。

2.dentry未link到Inode 或 已经link并且对应Inode有满足mask的cap?

如果是,下一步;

如果不是,向MDS查询。

3.dentry的lease未撤销,未过期且mds_sessions中有dentry的lease_mds

如果是,下一步;

如果不是,跳转到5。

4.MetaSession的cap未过期且cap_gen==lease_gen?(gen大概就是类似指纹一样的东西)

如果是,缓存命中,touch cap和mds;

如果不是,下一步。

5.上级目录是否有CAP_FILE_SHARED?

如果是,下一步;

如果不是,向MDS查询。

6.dn->cap_shared_gen == dir->shared_gen

如果是,缓存命中。和之前同理,有cap且指纹相同能证明本地缓存的正确性。

如果不是,dentry未link则返回不存在,已link则向MDS查询。

总结

回到最开始的问题,当Client B移动目录’d’到/a/的时候,MDS发送消息撤销了Client A上’d’的dentry的lease,使得lease_mds = -1而验证无法通过,同时撤销相关caps。从而保证客户端缓存一致性。

Latest posts by 青鸟千秋 (see all)

CephFS:什么是CAPS?

在Ceph中,他们建立了一套独特的系统来管理客户端对Inode的操作权限,称为capabilities,简称CAPS。与其他网络文件系统(例如NFS或SMB)的主要区别之一是授予...

阅读全文

Ceph集群快速搭建教程

本文的目标是在三台虚拟机上搭建Ceph集群,其中一台作为整个集群的服务器,另外两台只安装ceph-client作为客户端,可以挂载CephFS。 虚拟机系统:CentOS 7.6 ...

阅读全文

Win10 17074麦克风没声音bug解决方案

今天笔记本麦克风突然没声音重装声卡驱动也不好使 百度“win10麦克风没声音”无果,突发奇想:是不是微软这家伙的锅? 改成“win10 17074麦克风没声音”——竟然...

阅读全文

欢迎留言