From 7d271e9e247f95047779b05d58f4905b6edb7ca8 Mon Sep 17 00:00:00 2001 From: Vlad Khorsun Date: Thu, 16 Jun 2022 18:48:10 +0300 Subject: [PATCH] Documentation. --- builds/install/misc/firebird.conf | 12 +++- doc/README.gbak | 97 ++++++++++++++++++++++++++++++- doc/README.parallel_features | 80 +++++++++++++++++++++++++ 3 files changed, 186 insertions(+), 3 deletions(-) create mode 100644 doc/README.parallel_features diff --git a/builds/install/misc/firebird.conf b/builds/install/misc/firebird.conf index efd77085d1..6b4feb8ba3 100644 --- a/builds/install/misc/firebird.conf +++ b/builds/install/misc/firebird.conf @@ -1062,16 +1062,24 @@ # ============================ # -# Limit number of parallel workers for the single task. Per-process. +# Limits the total number of parallel workers that could be created within a +# single Firebird process for each attached database. +# Note, workers are accounted for each attached database independently. # Valid values are from 1 (no parallelism) to 64. All other values # silently ignored and default value of 1 is used. +# Per-process. +# +# Type: integer # #MaxParallelWorkers = 1 # -# Default number of parallel workers for the single task. Per-process. +# Default number of parallel workers for the single task. # Valid values are from 1 (no parallelism) to MaxParallelWorkers (above). # Values less than 1 is silently ignored and default value of 1 is used. +# Per-process. +# +# Type: integer # #ParallelWorkers = 1 diff --git a/doc/README.gbak b/doc/README.gbak index 941389dd35..45608b2ae4 100644 --- a/doc/README.gbak +++ b/doc/README.gbak @@ -1,4 +1,9 @@ -In Firebird 4.0 a new switch was added to gbak: -INCLUDE(_DATA). +gbak enhancements in Firebird v4. +--------------------------------- + +A new switch was added to gbak: -INCLUDE(_DATA). + +Author: Dimitry Sibiryakov It takes one parameter which is "similar like" pattern matching table names in a case-insensitive way. @@ -17,3 +22,93 @@ a table is following: | MATCH | excluded | excluded | excluded | | NOT MATCH | included | included | excluded | +-----------+------------+------------+------------+ + + + +gbak enhancements in Firebird v5. +--------------------------------- + +1. Parallel execution. + +Author: Vladyslav Khorsun + +a) gbak backup + +Backup could read source database tables using multiple threads in parallel. + +New switch +-PAR(ALLEL) parallel workers + +set number of workers that should be used for backup process. Default is 1. +Every additional worker creates own thread and own new connection used to read +data in parallel with other workers. All worker connections shares same database +snapshot to ensure consistent data view across all of its. Workers are created +and managed by gbak itself. Note, metadata still reads by single thread. + +b) gbak restore + +Restore could put data into user tables using multiple threads in parallel. + +New switch +-PAR(ALLEL) parallel workers + +set number of workers that should be used for restore process. Default is 1. +Every additional worker creates own thread and own new connection used to load +data in parallel with other workers. Metadata is still created using single +thread. Also, "main" connection uses DPB tag isc_dpb_parallel_workers to pass +the value of switch -PARALLEL to the engine - it allows to use engine ability +to build indices in parallel. If -PARALLEL switch is not used gbak will load +data using single thread and will not use DPB tag isc_dpb_parallel_workers. In +this case engine will use value of ParallelWorkers setting when building +indices, i.e. this phase could be run in parallel by the engine itself. To +fully avoid parallel operations when restoring database, use -PARALLEL 1. + + Note, gbak not uses firebird.conf by itself and ParallelWorkers setting does +not affect its operations. + + +Examples. + + Set in firebird.conf ParallelWorkers = 4, MaxParallelWorkers = 8 and restart +Firebird server. + +a) backup using 2 parallel workers + + gbak -b -parallel 2 + + Here gbak will read user data using 2 connections and 2 threads. + + +b) restore using 2 parallel workers + + gbak -r -parallel 2 + + Here gbak will put user data using 2 connections and 2 threads. Also, +engine will build indices using 2 connections and 2 threads. + +c) restore using no parallel workers but let engine to decide how many worker +shoudl be used to build indices + + gbak -r + + Here gbak will put user data using single connection. Eengine will build +indices using 4 connections and 4 threads as set by ParallelWorkers. + +d) restore using no parallel workers and not allow engine build indices in +parallel + + gbak -r -par 1 + + +2. Direct IO for backup files. + +New switch +-D(IRECT_IO) direct IO for backup file(s) + +instruct gbak to open\create backup file(s) in direct IO (or unbuferred) mode. +It allows to not consume file system cache memory for backup files. Usually +backup is read (by restore) or write (by backup) just once and there is no big +use from caching it contents. Performance should not suffer as gbak uses +sequential IO with relatively big chunks. + Direct IO mode is silently ignored if backup file is redirected into standard +input\output, i.e. if "stdin"\"stdout" is used as backup file name. diff --git a/doc/README.parallel_features b/doc/README.parallel_features new file mode 100644 index 0000000000..c3dcb47695 --- /dev/null +++ b/doc/README.parallel_features @@ -0,0 +1,80 @@ +Firebird engine parallel features in v5. +---------------------------------------- + +Author: Vladyslav Khorsun + + + The Firebird engine can now execute some tasks using multiple threads in +parallel. Currently parallel execution is implemented for the sweep and the +index creation tasks. Parallel execution is supported for both auto- and manual +sweep. + + To handle same task by multiple threads engine runs additional worker threads +and creates internal worker attachments. By default, parallel execution is not +enabled. There are two ways to enable parallelism in user attachment: +- set number of parallel workers in DPB using new tag isc_dpb_parallel_workers, +- set default number of parallel workers using new setting ParallelWorkers in + firebird.conf. + + For gfix utility there is new command-line switch -parallel that allows to +set number of parallel workers for the sweep task. For example: + + gfix -sweep -parallel 4 + +will run sweep on given database and ask engine to use 4 workers. gfix uses DPB +tag isc_dpb_parallel_workers when attaches to , if switch -parallel +is present. + + New firebird.conf setting ParallelWorkers set default number of parallel +workers that can be used by any user attachment running parallelizable task. +Default value is 1 and means no use of additional parallel workers. Value in +DPB have higher priority than setting in firebird.conf. + + To control number of additional workers that can be created by the engine +there are two new settings in firebird.conf: +- ParallelWorkers - set default number of parallel workers that used by user + attachments. + Could be overriden by attachment using tag isc_dpb_parallel_workers in DPB. +- MaxParallelWorkers - limit number of simultaneously used workers for the + given database and Firebird process. + + Internal worker attachments are created and managed by the engine itself. +Engine maintains per-database pools of worker attachments. Number of items in +each of such pool is limited by value of MaxParallelWorkers setting. The pools +are created by each Firebird process independently. + + In Super Server architecture worker attachments are implemented as light- +weight system attachments, while in Classic and Super Classic its looks like +usual user attachments. All worker attachments are embedded into creating +server process. Thus in Classic architectures there is no additional server +processes. Worker attachments are present in monitoring tables. Idle worker +attachment is destroyed after 60 seconds of inactivity. Also, in Classic +architectures worker attachments are destroyed immediately after last user +connection detached from database. + + +Examples: + + Set in firebird.conf ParallelWorkers = 4, MaxParallelWorkers = 8 and restart +Firebird server. + +a) Connect to test database not using isc_dpb_parallel_workers in DPB and +execute "CREATE INDEX ..." SQL statement. On commit the index will be actually +created and engine will use 3 additional worker attachments. In total, 4 +attachments in 4 threads will work on index creation. + +b) Ensure auto-sweep is enabled for test database. When auto-sweep will run on +that database, it also will use 3 additional workers (and run within 4 threads). + +c) more than one single task at time could be parallelized: make 2 attachments +and execute "CREATE INDEX ..." in each of them (of course indices to be built +should be different). Each index will be created using 4 attachments (1 user +and 3 worker) and 4 threads. + +d) run gfix -sweep - not specifying switch -parallel: sweep will run +using 4 attachments in 4 threads. + +d) run gfix -sweep -parallel 2 : sweep will run using 2 attachments in +2 threads. This shows that value in DPB tag isc_dpb_parallel_workers overrides +value of setting ParallelWorkers. +