Christian Ebner [Wed, 12 Jun 2024 13:17:12 +0000 (15:17 +0200)]
accessor: adapt and restrict contents access
Add checks for split variant inputs when accessing the payload
contents via the accessor instance. Both cases, accessing via the
safe `contents` method and via the previousely unsafe
`open_contents_at_range` call are covered.
Reduce possible misuse by wrapping the current plain content range
into an opaque `ContentRange` type with an additional optional
payload reference field to check consistency between the payload
reference encoded in the metadata archive and the payload header'
found in the payload data archive.
Because of the additional type wrapping and the payload header check,
the `open_contents_at_range` is considered safe now, dropping the
previously unsafe implementation.
The corresponding interfaces have been adapted accordingly.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Wed, 12 Jun 2024 08:23:58 +0000 (10:23 +0200)]
decoder: move payload header check for split input
The payload entries in the payload output for split pxar archives are
separated by payload headers, which allow to perform consistency
checks for the payload references encoded in the metadata archive.
Currently, this consistency check is performed right after reading the
entry in the metadata archive, which however has the downside that the
payload has to be fetched and decoded just for this consistency check.
This greatly impacts performance when accessing a metadata archive
with attached payload input reader, e.g. in the fuse implementation to
mount pxar archives, being especially severe when accessed over the
network in combination with a remote chunk reader as the Proxmox
Backup Server does.
Therefore, move this check to the contents reader instantiation
instead and add an additional flag to the decoder's `InPayload` state.
Getting the decoder now needs to be async and the method must return
an error when the check fails.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Fri, 22 Mar 2024 14:08:30 +0000 (15:08 +0100)]
format/encoder/decoder: new pxar entry type `Prelude`
Introduces a new pxar format entry type `Prelude` and the associated
encoder and decoder methods.
A prelude starts with header marker `PXAR_PRELUDE` followed by raw
byte content, used to store additional metadata associated with the
pxar archive, e.g. command line arguments passed on archive creation.
The prelude's content has no fixed encoding format but is stored as
an raw, arbitrary byte slice. A prelude entry is encoded right after
a pxar format version entry, both being encoded in the metadata
archive in case of an archive with dedicated payload output.
The prelude is not backwards compatible to pxar format version 1.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Fri, 22 Mar 2024 11:13:17 +0000 (12:13 +0100)]
format/encoder/decoder: new pxar entry type `Version`
Introduces a new pxar format entry type `Version` and the associated
encoder and decoder methods. The format version entry is only allowed
once, as the first entry of the pxar archive, marked with a
`PXAR_FORMAT_VERSION` header followed by the encoded version number.
If not present, the default format version 1 is assumed as encoding
format for the archive.
The entry allows to early detect incompatibility with an encoded
archive and bail or switch mode based on the encountered version.
The format version entry is not backwards compatible to pxar format
version 1.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Mon, 26 Feb 2024 11:08:30 +0000 (12:08 +0100)]
encoder/format: finish payload stream with marker
Mark the end of the optional payload stream, this makes sure that at
least some bytes are written to the stream (as empty archives are not
allowed by the proxmox backup server) and possible injected chunks
must be consumed.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Tue, 20 Feb 2024 13:07:14 +0000 (14:07 +0100)]
encoder: add payload advance capability
Allows to advance the payload writer position by a given size.
This is used to update the encoders payload input position when
injecting reused chunks for files with unchanged metadata.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Wed, 13 Mar 2024 08:00:21 +0000 (09:00 +0100)]
encoder: add payload position capability
Allows to read the current payload offset from the dedicated payload
input stream. This is required to get the current offset for calculation
of forced boundaries in the proxmox-backup-client, when injecting reused
payload chunks into the payload stream.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Mon, 29 Apr 2024 08:41:18 +0000 (10:41 +0200)]
decoder: set payload input range when decoding via accessor
When accessing the file contents via the sequential file restore
the range of the payload contents cannot be inferred a-priori but need
to be calculated based on the payload references encountered during
decoding.
Extending the `SeqRead` trait by the method `update_range` allows to
set the range in the payload reader instance by implementing the
method for `SeqReadAtAdapter`, thereby setting the correct content
range to be accessed.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Tue, 12 Mar 2024 14:06:44 +0000 (15:06 +0100)]
decoder/accessor: allow for split input stream variant
When a pxar archive was encoded using the split stream output
variant, access to the payload of regular files has to be redirected
to the corresponding dedicated input.
Allow to pass the split input variant to the decoder and accessor
instances to handle the split streams accordingly and decode split
stream archives.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Thu, 1 Feb 2024 08:47:18 +0000 (09:47 +0100)]
encoder: allow split output writer for archive creation
During regular pxar archive encoding, the payload of regular files is
written as part of the archive.
This patch introduces functionality to instead attach a writer variant
with a split payload writer instance to redirect the payload to a
different output.
The separation of data and metadata streams allows for efficient
reuse of payload data by referencing the payload writer byte offset,
without having to reencode it.
Whenever the payload of regular files is redirected to a dedicated
output writer, encode a payload reference header followed by the
required data to locate the data, instead of adding the regular payload
header followed by the encoded payload to the archive.
This is in preparation for reusing payload chunks for unchanged files
of backups created via the proxmox-backup-client.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Thu, 1 Feb 2024 11:51:56 +0000 (12:51 +0100)]
format/examples: add header type `PXAR_PAYLOAD_REF`
Introduces the header type `PXAR_PAYLOAD_REF` to mark regular file
entry payloads, not encoded within the regular pxar archive but
rather redirected to a dedicated payload output writer.
It therefore substitutes the `PXAR_PAYLOAD` header type for these
entries.
The header marks the start and size for a `PayloadRef` typed object
in the archive, storing the offset to the payload header offset in the
payload stream of the dedicated payload output as well as the payload
size.
The `PayloadRef` provides the means to store, serialize and
deserialize the entry.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Tue, 19 Mar 2024 13:51:08 +0000 (14:51 +0100)]
encoder: move to stack based state tracking
In preparation for the proxmox-backup-client look-ahead caching,
where a passing around of different encoder instances with internal
references is not feasible.
Instead of creating a new encoder instance for each directory level
and keeping references to the parent state, use an internal stack.
Adds additional helper functions to solve borrow issues, when both
the state and writers have to be accessed by a mutable reference.
This is a breaking change in the pxar library API.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Christian Ebner [Thu, 23 May 2024 06:54:58 +0000 (08:54 +0200)]
lib: add type for input/output variant differentiation
Introduce an enum which stores 2 different possible variants of
inputs or outputs to be passed to encoder and decoder/accessor
instances, depending whether to read/write a fully self contained
pxar archive or whether to split off the payload stream into a
separate input/output.
Co-authored-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Max Carrara [Fri, 21 Jul 2023 15:58:19 +0000 (17:58 +0200)]
decoder: aio: Make `TokioReader` public
This exposes `decoder::aio::TokioReader<T>` in a similar manner to
`decoder::sync::StandardReader<T>`, which is necessary if one wants
to remain generic over `T: tokio::io::AsyncRead`, e.g.:
Stefan Reiter [Wed, 31 Mar 2021 10:21:43 +0000 (12:21 +0200)]
decoder/aio: add contents() and content_size() calls
Returns a decoder::Contents without a wrapper type, since in this case
we don't want to hide the SeqRead implementation (as done in
decoder::sync). For conviencience also implement AsyncRead if "tokio-io"
is enabled.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Stefan Reiter [Tue, 9 Feb 2021 12:03:47 +0000 (13:03 +0100)]
make aio::Encoder actually behave with async
To really use the encoder with async/await, it needs to support
SeqWrite implementations that are Send. This requires changing a whole
bunch of '&mut dyn SeqWrite' trait objects to instead take a 'T:
SeqWrite' generic parameter directly instead. Most of this is quite
straightforward, though incurs a lot of churn (FileImpl needs a generic
parameter now for example).
The trickiest part is returning a new Encoder instance in
create_directory, as the trait object trick with
SeqWrite::as_trait_object doesn't work if SeqWrite is implemented for
generic '&mut S'.
Instead, work with the generic object directly, and express the
owned/borrowed state in the Encoder (to avoid nested borrowing) as an
enum EncoderOutput.
Add to the aio test to ensure the Encoder is now actually useable.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
unfortunately, futures::io::AsyncRead and tokio::io::AsyncRead no longer
share a do_poll_read signature, so we need to adapt one to the other
(and also no longer generate some wrapper implementations via macro).