hadd issue when using parallelization together with indirect file #14910

mlizzo · 2024-03-07T11:46:46Z

Check duplicate issues.

Checked for duplicates

Description

Dear ROOT experts,

It seems that when running the hadd command, the code parallelization is not triggered (e.g. -j 10) if target files are stored in a .txt file ("indirect file")

Reproducer

Input files and .txt file are stored here, I am sorry but I wasn't able to upload root files to git. The issue can be reproduced by running:
hadd -j 10 mkShapes__RDF_2018_v9_emu_ttHMVA.root @doHadd.txt
The initial output says:

Parallelizing with 10 processes
hadd Target file: mkShapes__RDF_2018_v9_emu_ttHMVA.root
hadd compression setting for all output: 1
Each process should handle at least 3 files for efficiency. Setting the number of processes to: 1

The last line is not expected when going to parallel mode, it looks like the script was expecting more arguments from the command line, but target files are stored in the doHadd.txt file

ROOT version

   ------------------------------------------------------------------
  | Welcome to ROOT 6.28/00                        https://root.cern |
  | (c) 1995-2022, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Feb 03 2023, 14:50:41                 |
  | From tags/v6-28-00@v6-28-00                                      |
  | With g++ (GCC) 11.3.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

Installation method

/cvmfs/sft.cern.ch/lcg/views/LCG_103/x86_64-centos9-gcc11-opt/bin/root

Operating system

Linux

Additional context

No response

The text was updated successfully, but these errors were encountered:

Fixes root-project#14910

ferdymercury · 2024-03-07T18:43:16Z

Feel free to try this patch: #14913

mlizzo · 2024-03-08T09:43:35Z

Hi @ferdymercury , thank you for your prompt answer. Maybe I'm doing something wrong, but I wasn't able to get the target file. The patch seems to correctly trigger the parallelization execution:

Parallelizing with 10 processes. hadd Target file: /afs/cern.ch/work/m/mlizzo/mkShapesRDF_el9/PlotsConfigurationsRun3/VBS_OS_pol/ttHMVA_Full2018_v9/rootFiles__RDF_2018_v9_emu_ttHMVA/mkShapes__RDF_2018_v9_emu_ttHMVA.root hadd compression setting for all output: 101 // compression level is different, don't know why

but then I get:

hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/VBS_2j_em
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/VBS_2j_em/events
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/VBS_2j_em/ttHMVASF_LepWP
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/top_2j_em
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/top_2j_em/events
hadd Target path: /tmp/mlizzo/partial0_088bf922-dd2d-11ee-b92d-3f4eb9bcbeef.root:/top_2j_em/ttHMVASF_LepWP
hadd failed at the parallel stage

Were you able to properly merge root files? I compiled the hadd.cxx macro with the following command:
c++ hadd.cxx -o hadd $(root-config --cflags --libs)

ferdymercury · 2024-03-08T10:05:06Z

wrt compression level, did you check the flags to set the compression?
see:
hadd --help

mlizzo · 2024-03-08T10:11:45Z

Unfortunately I had to comment out #include haddCommandLineOptionsHelp.h and the corresponding print, I don't know why that header could not be found - the "standard" hadd executable is the one installed in lxplus (CERN) and to be honest I have no idea how that is compiled. In any case, I'm using default options, I just noticed that it changed (101) with respect to what was set before (1), can this play a role? Did you manage to merge input files eventually? Thank you

ferdymercury · 2024-03-08T10:24:46Z

If you do not specify anything, you will get:

/// Use the compile-time default setting
         kUseCompiledDefault = 101,

Here the full help:

usage: hadd [-a] [-f] [-f[0-9]] [-fk] [-ff] [-k] [-O] [-T] [-v V] [-j J]
            [-dbg] [-d D] [-n N] [-cachesize CACHESIZE]
            [-experimental-io-features EXPERIMENTAL_IO_FEATURES]
            TARGET SOURCES

This program will add histograms, trees and other objects from a list
of ROOT files and write them to a target ROOT file. The target file is
newly created and must not exist, or if -f ("force") is given, must
not be one of the source files.

OPTIONS:
  -a                                   Append to the output
  -f                                   Force overwriting of output file
  -f[0-9]                              Gives the ability to specify the compression level of the target file.
                                       Default is 1 (kDefaultZLIB), 0 is uncompressed, 9 is maximum
                                       compression (see TFile::TFile documentation). You can also specify the
                                       full compresion algorithm, e.g. -f206
  -fk                                  Sets the target file to contain the baskets with the same compression
                                       as the input files (unless -O is specified). Compresses the meta data
                                       using the compression level specified in the first input or the
                                       compression setting after fk (for example 206 when using -fk206)
  -ff                                  The compression level use is the one specified in the first input
  -k                                   Skip corrupt or non-existent files, do not exit
  -O                                   Re-optimize basket size when merging TTree
  -T                                   Do not merge Trees
  -v                                   Explicitly set the verbosity level: 0 request no output, 99 is the
                                       default
  -j                                   Parallelize the execution in 'J' processes. If the number of processes
                                       is not specified, use the system maximum.
  -dbg                                 Enable verbosity. If -j was specified, do not not delete partial files
                                       stored inside working directory.
  -d                                   Carry out the partial multiprocess execution in the specified
                                       directory
  -n                                   Open at most 'N' files at once (use 0 to request to use the system
                                       maximum)
  -cachesize                           Resize the prefetching cache use to speed up I/O operations (use 0 to
                                       disable)
  -experimental-io-features            Used with an argument provided, enables the corresponding experimental
                                       feature for output trees. See ROOT::Experimental::EIOFeatures
  TARGET                               Target file
  SOURCES                              Source files

If TARGET and SOURCES have different compression settings a slower
method is used. For options that takes a size as argument, a decimal
number of bytes is expected. If the number ends with a ``k'', ``m'',
``g'', etc., the number is multiplied by 1000 (1K), 1000000 (1MB),
1000000000 (1G), etc. If this prefix is followed by i, the number is
multiplied by the traditional 1024 (1KiB), 1048576 (1MiB), 1073741824
(1GiB), etc. The prefix can be optionally followed by B whose casing
is ignored, eg. 1k, 1K, 1Kb and 1KB are the same.

No, I also get a "failed" error.

mlizzo · 2024-03-08T11:06:20Z

Hi @ferdymercury , I think I have found what is causing the error. I believe that here the argc argument must be replaced with filesToProcess when an indirect file is present. I made a quick test and it worked fine, but maybe it's better if you experts could take a closer look into that

ferdymercury · 2024-03-08T12:44:17Z

I just edited the patch, try again ;)

mlizzo · 2024-03-08T14:47:15Z

It seems to work properly now, thanks! For which ROOT versions will this fix be available eventually? I was wondering if this modification could be pushed to the LCG release in /cvmfs/sft.cern.ch as well, but I don't know if this is feasible

ferdymercury · 2024-03-08T15:01:47Z

I guess that's a question for @dpiparo or @vepadulano

vepadulano · 2024-03-13T04:49:15Z

As soon as the PR with the patch is merged in master, you will find it in the LCG release of the following day (i.e. at /cvmfs/sft.cern.ch/lcg/views/dev3/...). Afterwards, you will find it in the next ROOT release 6.32 (a couple months from now).

Fixes root-project#14910 [skip-ci] document behavior of indirectFileCount

…14913) Fixes #14910 The full list of input files to merge is now created beforehand by accumulating both the direct and indirect ones (e.g. those found listed in file(s) signaled by the `@` syntax at the command line). Afterwards, the list is used either in the sequential or parallel merge depending on user input. --------- Co-authored-by: Philippe Canal <pcanal@fnal.gov> Co-authored-by: Vincenzo Eduardo Padulano <vincenzo.eduardo.padulano@cern.ch>

…oot-project#14913) Fixes root-project#14910 The full list of input files to merge is now created beforehand by accumulating both the direct and indirect ones (e.g. those found listed in file(s) signaled by the `@` syntax at the command line). Afterwards, the list is used either in the sequential or parallel merge depending on user input. --------- Co-authored-by: Philippe Canal <pcanal@fnal.gov> Co-authored-by: Vincenzo Eduardo Padulano <vincenzo.eduardo.padulano@cern.ch>

…14913) Fixes #14910 The full list of input files to merge is now created beforehand by accumulating both the direct and indirect ones (e.g. those found listed in file(s) signaled by the `@` syntax at the command line). Afterwards, the list is used either in the sequential or parallel merge depending on user input. --------- Co-authored-by: Philippe Canal <pcanal@fnal.gov> Co-authored-by: Vincenzo Eduardo Padulano <vincenzo.eduardo.padulano@cern.ch>

…oot-project#14913) Fixes root-project#14910 The full list of input files to merge is now created beforehand by accumulating both the direct and indirect ones (e.g. those found listed in file(s) signaled by the `@` syntax at the command line). Afterwards, the list is used either in the sequential or parallel merge depending on user input. --------- Co-authored-by: Philippe Canal <pcanal@fnal.gov> Co-authored-by: Vincenzo Eduardo Padulano <vincenzo.eduardo.padulano@cern.ch>

mlizzo added the bug label Mar 7, 2024

dpiparo assigned vepadulano Mar 7, 2024

ferdymercury added a commit to ferdymercury/root that referenced this issue Mar 7, 2024

[hadd] count the number of subfiles in indirect file for multiprocessing

190a70f

Fixes root-project#14910

ferdymercury mentioned this issue Mar 7, 2024

[hadd] fix processing of indirect file when multiprocess is enabled #14913

Merged

2 tasks

dpiparo assigned ferdymercury Mar 7, 2024

ferdymercury mentioned this issue Mar 13, 2024

[roottest] Ideas for new tests #14319

Open

15 tasks

ferdymercury added a commit to ferdymercury/root that referenced this issue Apr 6, 2024

[hadd] count the number of subfiles in indirect file for multiprocessing

7f6efaa

Fixes root-project#14910 [skip-ci] document behavior of indirectFileCount

ferdymercury added a commit to ferdymercury/root that referenced this issue Apr 9, 2024

[hadd] count the number of subfiles in indirect file for multiprocessing

0065f51

Fixes root-project#14910 [skip-ci] document behavior of indirectFileCount

vepadulano closed this as completed in #14913 Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadd issue when using parallelization together with indirect file #14910

hadd issue when using parallelization together with indirect file #14910

mlizzo commented Mar 7, 2024 •

edited by ferdymercury

Loading

ferdymercury commented Mar 7, 2024

mlizzo commented Mar 8, 2024 •

edited

Loading

ferdymercury commented Mar 8, 2024

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024 •

edited

Loading

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024

vepadulano commented Mar 13, 2024

hadd issue when using parallelization together with indirect file #14910

hadd issue when using parallelization together with indirect file #14910

Comments

mlizzo commented Mar 7, 2024 • edited by ferdymercury Loading

Check duplicate issues.

Description

Reproducer

ROOT version

Installation method

Operating system

Additional context

ferdymercury commented Mar 7, 2024

mlizzo commented Mar 8, 2024 • edited Loading

ferdymercury commented Mar 8, 2024

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024 • edited Loading

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024

mlizzo commented Mar 8, 2024

ferdymercury commented Mar 8, 2024

vepadulano commented Mar 13, 2024

mlizzo commented Mar 7, 2024 •

edited by ferdymercury

Loading

mlizzo commented Mar 8, 2024 •

edited

Loading

ferdymercury commented Mar 8, 2024 •

edited

Loading