Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[buildbot, proof-of-concept (*2)/testing, #2129 + new (tracepoint multilist) #3115], <feedback, comments> #3216

Conversation

kernelOfTruth
Copy link
Contributor

The goal of this pull-request compared to ( #3189 , #3190 - working tests, so not doing too much changes to them)

is to get an up-to-date base (zfs master from March 20th, latest),

and clean, logical-split up commits to make any problems in this combination/porting as easily and clearly visible right from the start to prevent any long-standing issues or regressions introduced from this

Overview/contents:

  1. zfs master

  2. ABD: linear/scatter dual typed buffer for ARC, ABD: https://github.com/kernelOfTruth/zfs/tree/tuxoko_zfs/abd_17.02.2015 (current state)

  3. Illumos - 5497 lock contention, Illumos - 5408 cache devices requiring lots of RAM, Illumos 5369 - arc flags enum and some additional minor changes: https://github.com/kernelOfTruth/zfs/tree/dweeezil_zfs/lock-contention-on-arcs_mtx_23.03.2015 (current state, just updated)

I'm pushing this early to get the buildbots running and to get an early insight if fixes or changes work

edit:

personal porting notes in HTML, exported from tomboy:

http://pastebin.com/A1sMCGXG

The difference in this approach from #3189 and #3190 was that

Revert "Allow arc_evict_ghost() to only evict meta data"
dweeezil@730d690

was omitted

and that "Fix arc_adjust_meta() behavior"
bc88866

thus the reverts of both practically where done in commit "5497 lock contention on arcs_mtx" kernelOfTruth@2130841

Previously I had assumed that there would be at least some changes that would survive in the new code - but was wrong - like pointed out in #3115 (comment) .

Obviously: longer observation beforehand == less work later, less clutter

this made porting more difficult but still realizable.

The pull-requests are done on a per-commit basis which should give a better overview instead of a huge cumulative single "blob"

Not sure how "mergeable" the code is - the single commits should be logical, working units by themselves, like pointed out by @behlendorf

Each commit has some details in merge conflicts (due to #2119 and/or #3115 ) added

There's still the warning from #3115 , dweeezil@7168285

multilist.c:31:1: warning: ‘multilist_d2l’ defined but not used [-Wunused-function]
 multilist_d2l(multilist_t *ml, void *obj)

The buildbot issues with undefined PAGE_SHIFT have been dealt with in kernelOfTruth@e19172f , will have to test it in the other proof-of-concept (*1) to see if that gets the green lights for the other failing buildbots

Not sure why here some commits are still "queued for automatic testing" despite long time having passed

edit:

quick link to the tree: https://github.com/kernelOfTruth/zfs/commits/zfs_master_20.03.2015_2129+3115_WIP_clean

Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some
major problems with this approach. One is that 32-bit system have only a
handful of vmalloc space. Another is that the fragmentation in slab will easily
trigger OOM in busy system.

With ABD, we use scatterlist to allocate data buffers. In this approach we can
allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also,
we don't have to rely on slab, so there's no fragmentation issue.

But for metadata buffers, we still uses linear buffer from slab. The reason for
this is that there are a lot of *_phys pointers directly point to metadata
buffers. So it's kind of impractical to change all those code.

Currently, ABD is not enabled and its API will treat them as normal buffers.
We will enable it once all relevant code is modified to use the API.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Modify/Add incremental fletcher function prototype to match abd_iterate_rfunc
callback type. Also, reduce duplicated code a bit in zfs_fletcher.c.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Use abd_t in arc_buf_t->b_data, dmu_buf_t->db_data, zio_t->io_data and
zio_transform_t->zt_orig_data
2. zio_* function take abd_t for data

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Add checksum function for abd_t
2. Use abd_t version checksum function in zio_checksum_table
3. Make zio_checksum_compute and zio_checksum_error handle abd_t

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
…d zil.c

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.)

Suggested-by: DHE <git@dehacked.net>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch 2 times, most recently from bb93410 to e19172f Compare March 23, 2015 21:58
pad[PAGE_SIZE] to pad[4096]

(how about archs other than x86, x86_64 ?)
@kernelOfTruth kernelOfTruth changed the title [buildbot, proof-of-concept, #2129 + new (tracepoint multilist) #3115], WIP WIP, [buildbot, proof-of-concept, #2129 + new (tracepoint multilist) #3115] Mar 23, 2015
This reverts commit 037763e.

XXX - expand this comment as to why we're reverting it.

conflicts due to openzfs#2129, in:

static void arc_buf_free_on_write
(abd_t)

static void arc_buf_l2_cdata_free
(related to arc_buf_free_on_write(l2hdr->b_tmp_cdata, hdr->b_size, abd_free); )

l2arc_release_cdata_buf
(abd_free instead of zio_data_buf_free)
This reverts commit ecf3d9b.

Conflicts:
	module/zfs/arc.c
	module/zfs/ddt.c
	module/zfs/spa_misc.c
@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch from cd43e4a to 115a8c2 Compare March 23, 2015 23:40
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

conflicts due to openzfs#2129, in:

arc.c

static void arc_evict_ghost :
(with additional argument "arc_buf_contents_t type" since
Revert "Allow arc_evict_ghost() to only evict meta data" is left out)

l2arc_release_cdata_buf(arc_buf_hdr_t *ab) :
zio_data_buf_free(l2hdr->b_tmp_cdata, ab->b_size);
is changed to
abd_free(l2hdr->b_tmp_cdata, hdr->b_size);

fix commit 989fd51
"Change ASSERT(!"...") to  cmn_err(CE_PANIC, ...)"
5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tim Chase <tim@chase2k.com>

Porting notes:

Due to the restructuring of the ARC-related structures, this
patch conflicts with at least the following existing ZoL commits:

    6e1d727
    Fix inaccurate arcstat_l2_hdr_size calculations

        The ARC_SPACE_HDRS constant no longer exists and has been
        somewhat equivalently replaced by HDR_L2ONLY_SIZE.

    e0b0ca9
    Add visibility in to cached dbufs

        The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr
        struct requires additional structure member names to be used
        when referencing the inner items.  Also, the presence of L1 or L2
        inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR
        macros.

conflicts due to openzfs#2129 solely and/or in combination with these changes, in:

cmd/ztest/ztest.c
@ztest_dmu_read_write_zcopy
abd_copy_from_buf(bigbuf_arcbufs[j]->b_data,
				    (caddr_t)bigbuf + (off - bigoff),
				    chunksize);

stays, only if-statement above is expanded to:
if (i != 5 || chunksize < (SPA_MINBLOCKSIZE * 2)) {

module/zfs/arc.c

@l2arc_decompress_zio
bzero(hdr->b_buf->b_data, hdr->b_size);
to
abd_zero(hdr->b_buf->b_data, hdr->b_size);

moved:
struct l2arc_buf_hdr {
to
typedef struct l2arc_buf_hdr {

keeping in mind for later:

/* temporary buffer holder for in-flight compressed data */
	abd_t			*b_tmp_cdata;

@l2arc_release_cdata_buf(arc_buf_hdr_t *hdr)
abd_free(l2hdr->b_tmp_cdata, hdr->b_size);
to
zio_data_buf_free(l2hdr->b_tmp_cdata, hdr->b_size);
to
abd_free(hdr->b_l1hdr.b_tmp_cdata,
		    hdr->b_size);

@l2arc_decompress_zio(zio_t *zio, arc_buf_hdr_t *hdr, enum zio_compress c)
bzero(hdr->b_buf->b_data, hdr->b_size);
to
bzero(hdr->b_l1hdr.b_buf->b_data, hdr->b_size)
abd_zero(hdr->b_buf->b_data, hdr->b_size);
to
abd_zero(hdr->b_l1hdr.b_buf->b_data, hdr->b_size);

line below that:

zio->io_data = zio->io_orig_data = hdr->b_buf->b_data;
to
zio->io_data = zio->io_orig_data = hdr->b_l1hdr.b_buf->b_data;

merge conflict in
@arc_cksum_equal(arc_buf_t *buf)
fletcher_2_native(buf->b_data, buf->b_hdr->b_size, &zc);
to
abd_fletcher_2_native(buf->b_data, buf->b_hdr->b_size, &zc);
(no change)

@arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
cmn_err(CE_PANIC, "invalid arc state 0x%p", hdr->b_state);
to
cmn_err(CE_PANIC, "invalid arc state 0x%p", hdr->b_l1hdr.b_state);

+

fixing >80 lines problem

in include/sys/arc_impl.h

void			*b_tmp_cdata;
to
abd_t			*b_tmp_cdata;
@kernelOfTruth kernelOfTruth changed the title WIP, [buildbot, proof-of-concept, #2129 + new (tracepoint multilist) #3115] WIP, [buildbot, testing, #2129 + new (tracepoint multilist) #3115] Mar 27, 2015
@dweeezil
Copy link
Contributor

dweeezil commented Apr 6, 2015

@kernelOfTruth Please see my recent comments in #3115. I don't think you should have 2130841 (from 7168285 or wherever) in your testing stack for the time being.

That said, I do hope to get the "lock contention on arcs_mtx" patch in working order very soon.

@kernelOfTruth
Copy link
Contributor Author

@dweeezil thank you very much for the heads up 👍

I should have made my explanation in the first comment more clearer: bc88866 (Fix arc_adjust_meta() behavior) - I've made sure that that commit isn't included - only the handling to do it is different from #3189 & #3190

The difference in this approach from #3189 and #3190 was that

Revert "Allow arc_evict_ghost() to only evict meta data"
dweeezil/zfs@730d690

was omitted

and that "Fix arc_adjust_meta() behavior"
bc88866

thus the reverts of both practically where done in commit "5497 lock contention on arcs_mtx" kernelOfTruth/zfs@2130841

Previously I had assumed that there would be at least some changes that would survive in the new code - but was wrong - like pointed out in #3115 (comment) .

so everything should be in order

@sempervictus
Copy link
Contributor

Threw this @ SCST-operated iSCSI hosts and the crashes are upon us:

[Wed Apr 15 02:44:48 2015] INFO: Slab 0xffffea001a262600 objects=2 used=1 fp=0xffff88068989c000 flags=0x200000000004080
[Wed Apr 15 02:44:48 2015] CPU: 3 PID: 19794 Comm: rmmod Tainted: P    B      OE  3.17.6-sv-i7 #sv
[Wed Apr 15 02:44:48 2015] Hardware name: Supermicro X7DB8/X7DB8, BIOS 2.1a 12/20/2008
[Wed Apr 15 02:44:48 2015]  ffff88085399eb00 ffff8800635ebcb0 ffffffff8278e1f1 ffffea001a262600
[Wed Apr 15 02:44:48 2015]  ffff8800635ebd88 ffffffff8219e364 ffffffff00000020 ffff8800635ebd98
[Wed Apr 15 02:44:48 2015]  ffff8800635ebd48 656a624f82ec4900 616d657220737463 6e6920676e696e69
[Wed Apr 15 02:44:48 2015] Call Trace:
[Wed Apr 15 02:44:48 2015]  [<ffffffff8278e1f1>] dump_stack+0x45/0x56
[Wed Apr 15 02:44:48 2015]  [<ffffffff8219e364>] slab_err+0xb4/0xe0
[Wed Apr 15 02:44:48 2015]  [<ffffffff82787831>] ? printk+0x54/0x56
[Wed Apr 15 02:44:48 2015]  [<ffffffff821a043f>] ? __kmalloc+0x15f/0x190
[Wed Apr 15 02:44:48 2015]  [<ffffffff821a20b4>] ? __kmem_cache_shutdown+0x114/0x2d0
[Wed Apr 15 02:44:48 2015]  [<ffffffff821a20d5>] __kmem_cache_shutdown+0x135/0x2d0
[Wed Apr 15 02:44:48 2015]  [<ffffffff821657ac>] kmem_cache_destroy+0x2c/0xc0
[Wed Apr 15 02:44:48 2015]  [<ffffffffc066f64d>] spl_kmem_cache_destroy+0x11d/0x1f0 [spl]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc066e942>] ? spl_kmem_free+0x32/0x50 [spl]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc0dfad0a>] lz4_fini+0x1a/0x30 [zfs]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc0e5fa67>] zio_fini+0x97/0xa0 [zfs]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc0e15a9f>] spa_fini+0x2f/0x130 [zfs]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc0e42e68>] _fini+0x78/0x110 [zfs]
[Wed Apr 15 02:44:48 2015]  [<ffffffffc0e42f2f>] spl__fini+0xf/0x30 [zfs]
[Wed Apr 15 02:44:48 2015]  [<ffffffff820dff36>] SyS_delete_module+0x146/0x1d0
[Wed Apr 15 02:44:48 2015]  [<ffffffff82013029>] ? do_notify_resume+0x59/0x80
[Wed Apr 15 02:44:48 2015]  [<ffffffff827970ed>] system_call_fastpath+0x1a/0x1f
[Wed Apr 15 02:44:48 2015] INFO: Object 0xffff880689898000 @offset=0

With a side of:

[Thu Apr 16 11:31:45 2015] Call Trace:
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07fed0b>] vdev_mirror_map_alloc+0x24b/0x340 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc05f9a87>] ? spl_kmem_alloc+0xe7/0x1b0 [spl]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc05f9a87>] ? spl_kmem_alloc+0xe7/0x1b0 [spl]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07ff26e>] vdev_mirror_io_start+0x1e/0x1a0 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07ec257>] ? spa_config_enter+0xf7/0x160 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc083c867>] zio_vdev_io_start+0x237/0x300 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc083ff76>] zio_nowait+0xc6/0x1b0 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc078dc0d>] arc_read+0x63d/0xb60 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc0797733>] dbuf_prefetch+0x1e3/0x320 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07aeaa0>] dmu_zfetch_dofetch.isra.5+0x170/0x1d0 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07af168>] dmu_zfetch_find+0x668/0x850 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffff82794be2>] ? mutex_lock+0x12/0x2f
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07af68d>] dmu_zfetch+0xad/0x900 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07967ef>] dbuf_read+0x41f/0xb20 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc079f30b>] dmu_buf_hold_array_by_dnode+0x12b/0x5a0 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc079f85d>] dmu_buf_hold_array+0x5d/0x80 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc07a0e42>] dmu_read_req+0x52/0x120 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc08229dd>] ? zfs_range_lock+0x22d/0x640 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc084afb9>] zvol_read+0x79/0xc0 [zfs]
[Thu Apr 16 11:31:45 2015]  [<ffffffffc05fd1b0>] taskq_thread+0x1b0/0x360 [spl]
[Thu Apr 16 11:31:45 2015]  [<ffffffff8208fad0>] ? wake_up_process+0x40/0x40
[Thu Apr 16 11:31:45 2015]  [<ffffffffc05fd000>] ? taskq_cancel_id+0x120/0x120 [spl]
[Thu Apr 16 11:31:45 2015]  [<ffffffff82087822>] kthread+0xd2/0xf0
[Thu Apr 16 11:31:45 2015]  [<ffffffff82087750>] ? kthread_create_on_node+0x180/0x180
[Thu Apr 16 11:31:45 2015]  [<ffffffff8279703c>] ret_from_fork+0x7c/0xb0
[Thu Apr 16 11:31:45 2015]  [<ffffffff82087750>] ? kthread_create_on_node+0x180/0x180
[Thu Apr 16 11:31:45 2015] Code: 00 00 66 66 66 66 90 55 be 08 00 00 00 48 89 e5 53 48 89 fb 48 8d 7d f0 48 83 ec 08 e8 01 f4 cb c1 48 8b 45 f0 31 d2 48 83 c4 08 <48> f7 f3 5b 48 89 d0 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 
[Thu Apr 16 11:31:45 2015] RIP  [<ffffffffc07ed0e9>] spa_get_random+0x29/0x40 [zfs]
[Thu Apr 16 11:31:45 2015]  RSP <ffff8806e7533810>

Context: 8 disk Z2 VDEV with two 250GB SSDs partitioned @ 80/20 with the 20's being a SLOG mirror and the 80's being an L2ARC span. On the bright side, the system showed IOWait @ 50-70% prior after a few days of load due to the L2ARC evictions, now it works just fine before the crash.

kernelOfTruth and others added 2 commits May 1, 2015 01:20
…3115_WIP_clean

integrated changes from the (old, archived) openzfs#2129 pull-request

into "Fix misuse of input argument in traverse_visitbp"

ecfb0b5
@kernelOfTruth
Copy link
Contributor Author

Merging the "zfs_master_28.04.2015" branch - so now that this pull-request is up to the 0.6.4 (or inofficial 0.6.4.1) tag state with stability and performance enhancements,

adding one additional commit from #3349 which should address a hardlock ( superseded by #3367 - only the commit message got updated ).

Those, that are using this pull-request for heavy testing, throughput, production, etc.

please wait until the buildbots give green light & then give it a try.

During the next days/weeks I might integrate the current outstanding commits from upstream.

Since the latest ABD patchstack integrates support for metadata in addition to data and the changes from #3115 also add kind of similar changes that makes things more complex for me - so I'm somewhat stuck at the moment at the new pull-request ( #3360 ) - therefore focusing more on this one.

Thanks

… devices

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>

additional changes & notes for ZFSOnLinux:

changes related to

@struct l2arc_dev {

are in include/sys/arc_impl.h

changes for module/zfs/arc.c

removed (dupe) superfluous comment from

@l2arc_evict(l2arc_dev_t *dev, uint64_t distance, boolean_t all)

 			/*
 			 * Tell ARC this no longer exists in L2ARC.
 			 */

@arc_hdr_destroy(arc_buf_hdr_t *hdr)

		arc_space_return(HDR_L2ONLY_SIZE, ARC_SPACE_L2HDRS);

arc_space_return functions are also existent in dweeezil@ab37b29
(openzfs#3115 pull-request)

thus neither uncommenting or removing since it might hamper functionality or data integrity

comments ?
@kernelOfTruth
Copy link
Contributor Author

hm,

arc.c:1911:2: error: implicit declaration of function ‘IMPLY’ [-Werror=implicit-function-declaration]
  IMPLY(l2hdr->b_daddr == L2ARC_ADDR_UNSET, HDR_L2_WRITING(hdr));

https://github.com/illumos/illumos-gate/blob/20afa66e72e7c210ef1f9053d4bc8f5b60d1eeed/usr/src/uts/common/sys/debug.h#L73

@sempervictus
Copy link
Contributor

I saw the same call to IMPLY when going over the patch on their side, nice catch finding it. Should this be ported as well, or redacted in the patch?

EDIT - by the way, in order to test this with recent changes in master, you might want to merge tuxoko's abd_next branch which apparently works with large blocks now (i just built with it instead of 2129 or this PR).

@kernelOfTruth
Copy link
Contributor Author

@sempervictus I'd say that's up to @behlendorf and the other devs in charge of ZFSOnLinux to decide if this needs to be introduced for debug purposes

this support goes way back until 2010 (partly even 2009 and before) and illumos/illumos-gate@56f3320 - which would be quite some changes

The current question is - if it's safe to comment out that line - according to the comments in the code, it appears so,

but better safe than sorry

@kernelOfTruth
Copy link
Contributor Author

it compiles - but:

                 from /var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/zvol.c:38:
/var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/zvol.c: In function ‘zvol_request’:
include/linux/blkdev.h:619:26: warning: switch condition has boolean value [-Wswitch-bool]
 #define rq_data_dir(rq)  (((rq)->cmd_flags & 1) != 0)
                          ^
/var/tmp/portage/sys-fs/zfs-kmod-9999-r1/work/zfs-kmod-9999/module/zfs/../../module/zfs/zvol.c:767:11: note: in expansion of macro ‘rq_data_dir’
   switch (rq_data_dir(req)) {
           ^

not really sure how to handle that right now - since it implies that it would need an change in the current code ...

http://stackoverflow.com/questions/26411482/switch-condition-has-boolean-value

it appears to be more of a benign warning and related to coding style rather than something serious

edit:

Pushing the commit which comments out the talked about line

Let's see how it fares with xfstest and zfsstress

@arc_hdr_l2hdr_destroy(arc_buf_hdr_t *hdr)

line 1912

//	IMPLY(l2hdr->b_daddr == L2ARC_ADDR_UNSET, HDR_L2_WRITING(hdr));
@kernelOfTruth
Copy link
Contributor Author

@sempervictus thanks for the heads-up - I will use his abd-next probably within the next few days 👍

Like written in #3360 I'm rather stuck with the two metadata-handling ways of ABD & the changes from 3115 - so I'll leave that to the experts for now (also due to personal time constraints)

@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch from 4a1ec31 to d7e1fd0 Compare May 19, 2015 01:38
@kernelOfTruth
Copy link
Contributor Author

Now contains the fix for "l2arc space accounting mismatch"

Use write_psize instead of write_asize when doing vdev_space_update.
Without this change the accounting of L2ARC usage would be wrong and
give 16EB free space because the number became negative and overflows.

@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch 2 times, most recently from e3036cd to 84029b2 Compare May 21, 2015 23:41
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this pull request May 21, 2015
adapted to openzfs#3216,

adaption to openzfs#2129 in
@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr)

 		/*
 		 * Compression succeeded, we'll keep the cdata around for
 		 * writing and release it afterwards.
 		 */
+		if (rounded > csize) {
+			bzero((char *)cdata + csize, rounded - csize);
+			csize = rounded;
+		}

to

		/*
		 * Compression succeeded, we'll keep the cdata around for
		 * writing and release it afterwards.
		 */
		if (rounded > csize) {
			abd_zero_off(cdata, rounded - csize, csize);
			csize = rounded;
		}

ZFSonLinux:
openzfs#3114
openzfs#3400
openzfs#3433
@kernelOfTruth
Copy link
Contributor Author

Now contains the patch from #3433 l2arc-write-target-size.diff

@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch from 84029b2 to db01007 Compare June 8, 2015 00:42
@kernelOfTruth
Copy link
Contributor Author

old branch with l2arc-write-target-size.diff from #3433 can be found at:

zfs_master_20.03.2015_2129+3115_WIP_clean+Illumos-5701+3433

Pushing (more correct) L2ARC fixes from #3451 , meanwhile waiting for upstream review:

https://reviews.csiden.org/r/112/

@kernelOfTruth kernelOfTruth force-pushed the zfs_master_20.03.2015_2129+3115_WIP_clean branch from db01007 to a7683f5 Compare June 8, 2015 00:51
…to be written to l2arc device

If we don't account for that, then we might end up overwriting disk
area of buffers that have not been evicted yet, because l2arc_evict
operates in terms of disk addresses.

The discrepancy between the write size calculation and the actual increment
to l2ad_hand was introduced in
commit e14bb3258d05c1b1077e2db7cf77088924e56919

Also, consistently use asize / a_sz for the allocated size, psize / p_sz
for the physical size.  Where the latter accounts for possible size
reduction because of compression, whereas the former accounts for possible
size expansion because of alignment requirements.

The code still assumes that either underlying storage subsystems or
hardware is able to do read-modify-write when an L2ARC buffer size is
not a multiple of a disk's block size.  This is true for 4KB sector disks
that provide 512B sector emulation, but may not be true in general.
In other words, we currently do not have any code to make sure that
an L2ARC buffer, whether compressed or not, which is used for physical I/O
has a suitable size.

modified to account for the changes of

openzfs#2129 (ABD) ,

openzfs#3115 (Illumos - 5408 managing ZFS cache devices requires lots of RAM)

and

Illumos - 5701 zpool list reports incorrect "alloc" value for cache devices
@kernelOfTruth
Copy link
Contributor Author

closing this one, too

waiting now for a refreshed ABD patch stack

The respective added patches to resolve the issues regarding l2arc space accounting are to be found at:

#3451 (superseded by #3491 )
#3491
#3498

Discussion is at:

#3400

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants