博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
rd_kafka_seek_SEEK_HOLE和SEEK_DATA:有效地存档/复制大型稀疏文件
阅读量:2517 次
发布时间:2019-05-11

本文共 6628 字,大约阅读时间需要 22 分钟。

rd_kafka_seek

How to efficiently archive a very large sparse file, say 1TB? The sparse file may contains a small amount of data, say 32MB.

如何有效地归档非常大的稀疏文件(例如1TB)? 稀疏文件可能包含少量数据,例如32MB。

SEEK_HOLE和SEEK_DATA (SEEK_HOLE and SEEK_DATA)

The SEEK_HOLE/SEEK_DATA functionalities play the trick and makes `tar` and `cp` handle the large sparse file very efficiently.

SEEK_HOLE / SEEK_DATA功能发挥了作用,使tar和cp非常有效地处理大型稀疏文件。

`lseek` with `SEEK_HOLE` returns the offset of the of the next hole where the offset is greater than or equal to the supplied offset. `lseek` with `SEEK_DATA` sets the file pointer to the start of the next non-hole file region where the new offset is greater than or equal to the supplied offset.

带有SEEK_HOLE的lseek返回偏移量大于或等于提供的偏移量的下一个Kong的的偏移量。 lseek和SEEK_DATA将文件指针设置为新偏移量大于或等于提供的偏移量的下一个非空文件区域的开头。

More on SEEK_HOLE:

有关SEEK_HOLE的更多信息:

The SEEK_HOLE is supported from 3.1. From :

3.1支持SEEK_HOLE。 从 :

Seeking file data and holes Since version 3.1, Linux supports the followingadditional values for whence:    SEEK_DATA Adjust the file offset to the next location in the file greaterthan or equal to offset containing data.  If offset points to data, then thefile offset is set to offset.    SEEK_HOLE Adjust the file offset to the next hole in the file greater thanor equal to offset.  If offset points into the middle of a hole, then the fileoffset is  set  to  offset.  If there is no hole past offset, then the fileoffset is adjusted to the end of the file (i.e., there is an implicit hole atthe end of any file).

If you are curious the changes insides of Linux kernel, check .

如果您对Linux内核内部的更改感到好奇,请检查 。

工具/图书馆支持 (Tools/library support)

I dig a little bit in the libarchive source tree and find it is added possibly from this commit:

我在libarchive源代码树中进行了一些挖掘,发现可能是通过此提交添加的:

commit d216d028a78e56a37bab9e42a2f17f28714a6535Author: Michihiro NAKAJIMA 
Date: Tue Feb 2 06:09:17 2010 -0500 Determine sparse through API such as lseek(HOLE). SVN-Revision: 1856

After that, there are bug fixes. For example:

在那之后,有一些错误修复。 例如:

$ git show b76da87985101f7acdcc0d84490bb4f6a736d210commit b76da87985101f7acdcc0d84490bb4f6a736d210Author: Michihiro NAKAJIMA 
Date: Sat Feb 25 18:38:13 2012 +0900 Fix a wrong check on a result of lseek.diff --git a/libarchive/archive_read_disk_entry_from_file.c b/libarchive/archive_read_disk_entry_from_file.cindex 8fcd0ab..0fef3c7 100644--- a/libarchive/archive_read_disk_entry_from_file.c+++ b/libarchive/archive_read_disk_entry_from_file.c@@ -1033,7 +1033,7 @@ setup_sparse(struct archive_read_disk *a, goto exit_setup_sparse; } off_e = lseek(*fd, off_s, SEEK_HOLE);- if (off_s == (off_t)-1) {+ if (off_e == (off_t)-1) { if (errno == ENXIO) { off_e = lseek(*fd, 0, SEEK_END); if (off_e != (off_t)-1)

I guess that it works starting from v3.0.4 after that bugfix:

我猜该错误修复后从v3.0.4开始有效:

$ git show v3.0.4 | head -n4tag v3.0.4Tagger: Andres Mejia 
Date: Wed Mar 28 09:53:16 2012 -0400$ git show v3.0.3 | head -n4commit e235511e964cf8b13bf49a1e343bfdc5c11014daAuthor: Tim Kientzle
Date: Fri Jan 13 00:32:07 2012 -0500

在3.6.5内核和2.6.32内核上进行测试 (Tests on 3.6.5 kernel and 2.6.32 kernel)

On Fedora 17 with 3.6.5 kernel:

在具有3.6.5内核的Fedora 17上:

[zma@office t]$ uname -aLinux office 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 201[zma@office tmp]$ ls -lh pmem-1 -rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1real    0m0.003suser    0m0.003ssys 0m0.000s[zma@office tmp]$ time cp pmem-1 pmem-1-copyreal    0m0.020suser    0m0.000ssys 0m0.003s[zma@office tmp]$ ls -lh pmem*-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:15 pmem-1-copy-rw-rw-r-- 1 zma zma  10K Nov  7 20:15 pmem-1.tar[zma@office tmp]$ mkdir t[zma@office tmp]$ cd t[zma@office t]$ time tar xSf ../pmem-1.tar real    0m0.003suser    0m0.000ssys 0m0.002s[zma@office t]$ ls -lhatotal 8.0Kdrwxrwxr-x   2 zma  zma  4.0K Nov  7 20:16 .drwxrwxrwt. 35 root root 4.0K Nov  7 20:16 ..-rw-rw-r--   1 zma  zma  1.0T Nov  7 20:14 pmem-1

For comparison, on Fedora 12 with a 2.6.32 kernel:

为了进行比较,在具有2.6.32内核的Fedora 12上:

$ du -hs sparse-10   sparse-1$ ls -lha sparse-1-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1$ time tar cSf sparse-1.tar sparse-1real    96m19.847suser    22m3.314ssys     52m32.272s$ time gzip sparse-1real    200m18.714suser    164m33.835ssys     10m39.971s$ ls -lha sparse-1*-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz-rw-rw-r-- 1 user1 user1   10K 2012-11-06 23:13 sparse-1.tar$ time rsync --sparse sparse-1 sparse-1-copyreal    124m46.321suser    107m15.084ssys     83m8.323s$ du -hs sparse-1-copy 4.0K    sparse-1-copy

评论和讨论 (Comments and discussions)

Well, I can run some virtual machines.

好吧,我可以运行一些虚拟机。

Another crazy thing that I have noticed.

我注意到的另一件事。

dd if=/dev/zero of=xen-guest.img bs=1 count=0 seek=10Gdd if=/dev/zero of=xen-guest.img bs=1 count=1 seek=10G

Notice the difference in the “count” parameter. They will both produce an empty sparse file.

注意“ count”参数的不同。 它们都会产生一个空的稀疏文件。

However, only the one with “count=1” will be fast with bsdtar. The other one will take ages.

但是,只有bsdtar可以快速计数“ count = 1”。 另一个会花很多时间。



Btw, I also notices that bsdtar 3.0.3 does not support this at all. You have to use at least 3.0.4 for it to work.

顺便说一句,我还注意到bsdtar 3.0.3完全不支持此功能。 您必须至少使用3.0.4才能运行。

I find it very strange that it’s so hard to find information about this online.

我很奇怪,很难在网上找到有关此信息。

With clouds popping up everywhere, you would think that people would care about copying 200GB sparse files in minutes instead of hours.

随着无处不在的乌云密布,您会认为人们会在乎几分钟而不是几小时内复制200GB的稀疏文件。



This observation is interesting. There are some differences:

这个观察很有趣。 有一些区别:

$ dd if=/dev/zero of=xen-guest.img bs=1 count=0 seek=10G0+0 records in0+0 records out0 bytes (0 B) copied, 1.2637e-05 s, 0.0 kB/s$ dd if=/dev/zero of=xen-guest2.img bs=1 count=1 seek=10G1+0 records in1+0 records out1 byte (1 B) copied, 4.2406e-05 s, 23.6 kB/s

If count is 0, 0 byte is copied. While 1 byte is copied if count is 1. And the size of the xen-guest2.img is 1 byte larger than the size of xen-guest.img. There is totally no data in xen-guest.img.

如果count为0,则复制0字节。 如果count为1,则复制1个字节。xen-guest2.img的大小比xen-guest.img的大小大1个字节。 xen-guest.img中完全没有数据。

My suspicion is that bsdtar is not designed to handle the situation that the file is totally empty or there is a bug there.

我的怀疑是bsdtar并非旨在处理文件完全为空或那里有bug的情况。

翻译自:

rd_kafka_seek

转载地址:http://qplwd.baihongyu.com/

你可能感兴趣的文章
Linux下安装Android的adb驱动-解决不能识别的问题
查看>>
Why is the size of an empty class not zero in C++?
查看>>
海亮SC
查看>>
[Hibernate] - Generic Dao
查看>>
【Linux】一步一步学Linux——Linux系统常用快捷键(12) 待更新...
查看>>
Vue中computed和watch使用场景和方法
查看>>
laravel路由与控制器(资源路由restful)
查看>>
Html5移动端页面自适应布局详解(阿里rem布局)
查看>>
memoize-one在React中的应用
查看>>
SpringBoot整合JDBC数据库操作第二弹-配置基本数据库连接源
查看>>
nginx日志切割脚本
查看>>
ipvsadm添加虚拟服务器报错问题
查看>>
LVS-DR集群搭建脚本
查看>>
Docker拉取的镜像源更改为国内的镜像源
查看>>
LVS健康检查脚本
查看>>
PowerCLI 对vm批量关机
查看>>
拿来即用学PYTHON:序
查看>>
github+jenkins+maven+docker自动化构建部署
查看>>
前端禁止鼠标右键、禁止全选、复制、粘贴
查看>>
六. k8s--ingress学习笔记
查看>>