Google Cloud Storage C++ Client 2.13.0
A C++ Client Library for Google Cloud Storage
Loading...
Searching...
No Matches
object_write_stream.h
1// Copyright 2021 Google LLC
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// https://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15#ifndef GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
16#define GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
17
18#include "google/cloud/storage/headers_map.h"
19#include "google/cloud/storage/internal/object_write_streambuf.h"
20#include "google/cloud/storage/version.h"
21#include <memory>
22#include <ostream>
23#include <string>
24
25namespace google {
26namespace cloud {
27namespace storage {
28GOOGLE_CLOUD_CPP_INLINE_NAMESPACE_BEGIN
29
30/**
31 * Defines a `std::basic_ostream<char>` to write to a GCS object.
32 *
33 * This class is used to upload objects to GCS. It can handle objects of any
34 * size, but keep the following considerations in mind:
35 *
36 * - This API is designed for applications that need to stream the object
37 * payload. If you have the payload as one large buffer consider using
38 * `Client::InsertObject()`; it is simpler and faster in most cases.
39 * - This API can be used to perform unformatted I/O, as well as formatted I/O
40 * using the familiar `operator<<` APIs.
41 * - Note that formatted I/O typically implies some form of buffering and data
42 * copying.
43 * - For best performance, consider using the [.write()][cpp-reference-write]
44 * member function.
45 * - GCS expects to receive data in multiples of the *upload quantum* (256KiB).
46 * Sending a buffer that is not a multiple of this quantum terminates the
47 * upload.
48 * - Consequently, this class must maintain an internal buffer before sending
49 * the data to the service.
50 * - Understanding how this buffer is used is important to get the best
51 * possible performance.
52 * - When using unformatted I/O, try to size your data in multiples of the
53 * upload quantum, as this often results in better performance.
54 *
55 * The maximum size of this internal buffer is configured using
56 * `UploadBufferSizeOption`. As with all options, this can be set when the
57 * @ref Client object is created. The current default value is 8 MiB, but this
58 * default value can change. If the size of this buffer is important for your
59 * application please set the value explicitly. You can also provide an override
60 * when calling `Client::WriteObject()`. Note that this setting is expressed in
61 * bytes, but it is always rounded (up) to a multiple of the upload quantum.
62 *
63 * #### Unformatted I/O
64 *
65 * On a `.write()` call this class attempts to send the data immediately. That
66 * is, without copying it to the internal buffer. If any previously buffered
67 * data and the data provided in the `.write()` call are larger than the maximum
68 * size of the internal buffer then the largest amount of data that is a
69 * multiple of the upload quantum is flushed. Any data in excess of a multiple
70 * of the upload quantum are buffered for the next upload.
71 *
72 * These examples may clarify how this works:
73 * -# Consider a fresh `ObjectWriteStream`, configured to buffer at most
74 * 256KiB. Assume this stream receives a `.write()` call with 257 KiB of
75 * data. The first 256 KiB are immediately sent and the remaining 1 KiB is
76 * buffered for a future upload.
77 * - If the same stream receives another `.write()` call with 256 KiB then
78 * it will send the buffered 1 KiB of data and the first 255 KiB from the
79 * new buffer. The last 1 KiB is buffered for a future upload.
80 * -# Consider a fresh `ObjectWriteStream`, configured to buffer at most
81 * 256KiB. If this stream receives a `.write()` call with 4 MiB of data the
82 * data is sent immediately. No data is buffered, as the data size is a
83 * multiple of the upload quantum.
84 * -# Consider a stream configured to buffer 512 KiB before flushing.
85 * Assume this stream has 256 KiB of data in its buffer from previous
86 * buffered I/O. If this stream receives a `.write()` call with 1024 KiB
87 * then both the 256 KiB and the 1024 KiB of data are flushed.
88 *
89 * #### Formatted I/O
90 *
91 * When performing formatted I/O, typically used via `operator<<`, this class
92 * will buffer data based on the @ref UploadBufferSizeOption setting.
93 *
94 * #### Recommendations
95 *
96 * For best performance uploading data we recommend using *exclusively* the
97 * unbuffered I/O API. Furthermore, we recommend that applications use data in
98 * multiples of the upload quantum in all calls to `.write()`. Larger buffers
99 * result in better performance. Our [empirical results][github-issue-2657] show
100 * that these improvements tapper off around 32MiB or so.
101 *
102 * If you are planning to use unbuffered I/O, and you are already planning to
103 * provide large buffers in the `.write()` calls, then there is no need to
104 * configure a large value for `UploadBufferSizeOption`. As described above,
105 * calling `.write()` with more data than the `UploadBufferSizeOption`
106 * immediately flushes the data and only leaves any non-multiple of 256 KiB in
107 * the internal buffer.
108 *
109 * #### Suspending Uploads
110 *
111 * As it is customary in C++, the destructor of this class finalizes the upload.
112 * If you want to prevent the class from finalizing an upload, use the
113 * `Suspend()` function.
114 *
115 * #### Examples
116 *
117 * @par Starting a resumable upload.
118 * @snippet storage_object_resumable_write_samples.cc start resumable upload
119 *
120 * @par Resuming a resumable upload.
121 * @snippet storage_object_resumable_write_samples.cc resume resumable upload
122 *
123 * [cpp-reference-put]: https://en.cppreference.com/w/cpp/io/basic_ostream/put
124 *
125 * [cpp-reference-write]:
126 * https://en.cppreference.com/w/cpp/io/basic_ostream/write
127 *
128 * [github-issue-2657]:
129 * https://github.com/googleapis/google-cloud-cpp/issues/2657
130 */
131class ObjectWriteStream : public std::basic_ostream<char> {
132 public:
133 /**
134 * Creates a stream not associated with any buffer.
135 *
136 * Attempts to use this stream will result in failures.
137 */
139
140 /**
141 * Creates a stream associated with the give request.
142 *
143 * Reading from the stream will result in http requests to get more data
144 * from the GCS object.
145 *
146 * @param buf an initialized ObjectWriteStreambuf to upload the data.
147 */
148 explicit ObjectWriteStream(
149 std::unique_ptr<internal::ObjectWriteStreambuf> buf);
150
151 ObjectWriteStream(ObjectWriteStream&& rhs) noexcept;
152
154 ObjectWriteStream tmp(std::move(rhs));
155 swap(tmp);
156 return *this;
157 }
158
159 void swap(ObjectWriteStream& rhs) {
160 basic_ostream<char>::swap(rhs);
161 std::swap(buf_, rhs.buf_);
162 rhs.set_rdbuf(rhs.buf_.get());
163 set_rdbuf(buf_.get());
164 std::swap(metadata_, rhs.metadata_);
165 std::swap(headers_, rhs.headers_);
166 std::swap(payload_, rhs.payload_);
167 }
168
169 ObjectWriteStream(ObjectWriteStream const&) = delete;
171
172 /// Closes the stream (if necessary).
173 ~ObjectWriteStream() override;
174
175 /**
176 * Return true if the stream is open to write more data.
177 *
178 * @note
179 * write streams can be "born closed" when created using a previously
180 * finalized upload session. Applications that restore a previous session
181 * should check the state, for example:
182 *
183 * @code
184 * auto stream = client.WriteObject(...,
185 * gcs::RestoreResumableUploadSession(session_id));
186 * if (!stream.IsOpen() && stream.metadata().ok()) {
187 * std::cout << "Yay! The upload was finalized previously.\n";
188 * return;
189 * }
190 * @endcode
191 */
192 bool IsOpen() const { return buf_ != nullptr && buf_->IsOpen(); }
193
194 /**
195 * Close the stream, finalizing the upload.
196 *
197 * Closing a stream completes an upload and creates the uploaded object. On
198 * failure it sets the `badbit` of the stream.
199 *
200 * The metadata of the uploaded object, or a detailed error status, is
201 * accessible via the `metadata()` member function. Note that the metadata may
202 * be empty if the application creates a stream with the `Fields("")`
203 * parameter, applications cannot assume that all fields in the metadata are
204 * filled on success.
205 *
206 * @throws std::ios_base::failure if the application has enabled the
207 * exception mask.
208 */
209 void Close();
210
211 ///@{
212 /**
213 * Access the upload results.
214 *
215 * Note that calling these member functions before `Close()` is undefined
216 * behavior.
217 */
218 StatusOr<ObjectMetadata> const& metadata() const& { return metadata_; }
219 StatusOr<ObjectMetadata>&& metadata() && { return std::move(metadata_); }
220
221 /**
222 * The received CRC32C checksum and the MD5 hash values as reported by GCS.
223 *
224 * When the upload is finalized (via `Close()`) the GCS server reports the
225 * CRC32C checksum and, if the object is not a composite object, the MDF hash
226 * of the uploaded data. This class compares the reported hashes against
227 * locally computed hash values, and reports an error if they do not match.
228 *
229 * The values are reported as comma separated `tag=value` pairs, e.g.
230 * `crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==`. The format of this string
231 * is subject to change without notice, they are provided for informational
232 * purposes only.
233 *
234 * @see https://cloud.google.com/storage/docs/hashes-etags for more
235 * information on checksums and hashes in GCS.
236 */
237 std::string const& received_hash() const { return buf_->received_hash(); }
238
239 /**
240 * The locally computed checksum and hashes, as a string.
241 *
242 * This object computes the CRC32C checksum and MD5 hash of the uploaded data.
243 * There are several cases where these values may be empty or irrelevant, for
244 * example:
245 * - When performing resumable uploads the stream may not have had access to
246 * the full data.
247 * - The application may disable the CRC32C and/or the MD5 hash computation.
248 *
249 * The string has the same format as the value returned by `received_hash()`.
250 * Note that the format of this string is also subject to change without
251 * notice.
252 *
253 * @see https://cloud.google.com/storage/docs/hashes-etags for more
254 * information on checksums and hashes in GCS.
255 */
256 std::string const& computed_hash() const { return buf_->computed_hash(); }
257
258 /**
259 * The headers (if any) returned by the service. For debugging only.
260 *
261 * @warning The contents of these headers may change without notice. Unless
262 * documented in the API, headers may be removed or added by the service.
263 * Furthermore, he headers may change from one version of the library to
264 * the next, as we find more (or different) opportunities for
265 * optimization.
266 */
267 HeadersMap const& headers() const { return headers_; }
268
269 /// The returned payload as a raw string, for debugging only.
270 std::string const& payload() const { return payload_; }
271 ///@}
272
273 /**
274 * Returns the resumable upload session id for this upload.
275 *
276 * Note that this is an empty string for uploads that do not use resumable
277 * upload session ids. `Client::WriteObject()` enables resumable uploads based
278 * on the options set by the application.
279 */
280 std::string const& resumable_session_id() const {
281 return buf_->resumable_session_id();
282 }
283
284 /**
285 * Returns the next expected byte.
286 *
287 * For non-resumable uploads this is always zero. Applications that use
288 * resumable uploads can use this value to resend any data not committed in
289 * the GCS.
290 */
291 std::uint64_t next_expected_byte() const {
292 return buf_->next_expected_byte();
293 }
294
295 /**
296 * Suspends an upload.
297 *
298 * This is a destructive operation. Using this object after calling this
299 * function results in undefined behavior. Applications should copy any
300 * necessary state (such as the value `resumable_session_id()`) before calling
301 * this function.
302 *
303 * @snippet storage_object_resumable_write_samples.cc suspend resumable upload
304 */
305 void Suspend() &&;
306
307 /**
308 * Returns the status of partial errors.
309 *
310 * Application may write multiple times before closing the stream, this
311 * function gives the capability to find out status even before stream
312 * closure.
313 *
314 * This function is different than `metadata()` as calling `metadata()`
315 * before Close() is undefined.
316 */
317 Status last_status() const { return buf_->last_status(); }
318
319 private:
320 /**
321 * Closes the underlying object write stream.
322 */
323 void CloseBuf();
324
325 std::unique_ptr<internal::ObjectWriteStreambuf> buf_;
326 StatusOr<ObjectMetadata> metadata_;
327 std::multimap<std::string, std::string> headers_;
328 std::string payload_;
329};
330
331GOOGLE_CLOUD_CPP_INLINE_NAMESPACE_END
332} // namespace storage
333} // namespace cloud
334} // namespace google
335
336#endif // GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
Represents the metadata for a Google Cloud Storage Object.
Definition: object_metadata.h:94
Defines a std::basic_ostream<char> to write to a GCS object.
Definition: object_write_stream.h:131
ObjectWriteStream & operator=(ObjectWriteStream const &)=delete
StatusOr< ObjectMetadata > const & metadata() const &
Access the upload results.
Definition: object_write_stream.h:218
Status last_status() const
Returns the status of partial errors.
Definition: object_write_stream.h:317
ObjectWriteStream()
Creates a stream not associated with any buffer.
StatusOr< ObjectMetadata > && metadata() &&
Access the upload results.
Definition: object_write_stream.h:219
~ObjectWriteStream() override
Closes the stream (if necessary).
ObjectWriteStream(ObjectWriteStream const &)=delete
ObjectWriteStream & operator=(ObjectWriteStream &&rhs) noexcept
Definition: object_write_stream.h:153
void Suspend() &&
Suspends an upload.
bool IsOpen() const
Return true if the stream is open to write more data.
Definition: object_write_stream.h:192
std::string const & received_hash() const
The received CRC32C checksum and the MD5 hash values as reported by GCS.
Definition: object_write_stream.h:237
std::uint64_t next_expected_byte() const
Returns the next expected byte.
Definition: object_write_stream.h:291
ObjectWriteStream(ObjectWriteStream &&rhs) noexcept
void swap(ObjectWriteStream &rhs)
Definition: object_write_stream.h:159
std::string const & computed_hash() const
The locally computed checksum and hashes, as a string.
Definition: object_write_stream.h:256
std::string const & resumable_session_id() const
Returns the resumable upload session id for this upload.
Definition: object_write_stream.h:280
void Close()
Close the stream, finalizing the upload.
ObjectWriteStream(std::unique_ptr< internal::ObjectWriteStreambuf > buf)
Creates a stream associated with the give request.
std::string const & payload() const
The returned payload as a raw string, for debugging only.
Definition: object_write_stream.h:270
HeadersMap const & headers() const
The headers (if any) returned by the service.
Definition: object_write_stream.h:267
Contains all the Google Cloud Storage C++ client APIs.
Definition: auto_finalize.h:24