Google Cloud Storage C++ Client  1.42.0
A C++ Client Library for Google Cloud Storage
object_write_stream.h
Go to the documentation of this file.
1 // Copyright 2021 Google LLC
2 //
3 // Licensed under the Apache License, Version 2.0 (the "License");
4 // you may not use this file except in compliance with the License.
5 // You may obtain a copy of the License at
6 //
7 // https://www.apache.org/licenses/LICENSE-2.0
8 //
9 // Unless required by applicable law or agreed to in writing, software
10 // distributed under the License is distributed on an "AS IS" BASIS,
11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 // See the License for the specific language governing permissions and
13 // limitations under the License.
14 
15 #ifndef GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
16 #define GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
17 
18 #include "google/cloud/storage/headers_map.h"
19 #include "google/cloud/storage/internal/object_write_streambuf.h"
20 #include "google/cloud/storage/version.h"
21 #include <memory>
22 #include <ostream>
23 #include <string>
24 
25 namespace google {
26 namespace cloud {
27 namespace storage {
29 
30 /**
31  * Defines a `std::basic_ostream<char>` to write to a GCS object.
32  *
33  * This class is used to upload objects to GCS. It can handle objects of any
34  * size, but keep the following considerations in mind:
35  *
36  * * This API is designed for applications that need to stream the object
37  * payload. If you have the payload as one large buffer consider using
38  * `storage::Client::InsertObject()`, it is simpler and faster in most cases.
39  * * This API can be used to perform unformatted I/O, as well as formatted I/O
40  * using the familiar `operator<<` APIs. Note that formatted I/O typically
41  * implies some form of buffering and data copying. For best performance,
42  * consider using the [.write()][cpp-reference-write] member function.
43  * * GCS expects to receive data in multiples of the *upload quantum* (256KiB).
44  * Sending a buffer that is not a multiple of this quantum terminates the
45  * upload. This constraints the implementation of buffered and unbuffered I/O
46  * as described below.
47  *
48  * @par Unformatted I/O
49  * On a `.write()` call this class attempts to send the data immediately, this
50  * this the unbuffered API after all. If any previously buffered data and the
51  * data provided in the `.write()` call are larger than an upload quantum the
52  * class sends data immediately. Any data in excess of a multiple of the upload
53  * quantum are buffered for the next upload.
54  *
55  * These examples may clarify how this works:
56  * -# Consider a fresh `ObjectWriteStream` that receives a `.write()` call
57  * with 257 KiB of data. The first 256 KiB are immediately sent and the
58  * remaining 1 KiB is buffered for a future upload.
59  * -# If the same stream receives another `.write()` call with 256 KiB then it
60  * will send the buffered 1 KiB of data and the first 255 KiB from the new
61  * buffer. The last 1 KiB is buffered for a future upload.
62  * -# Consider a fresh `ObjectWriteStream` that receives a `.write()` call
63  * with 4 MiB of data. This data is sent immediately, and no data is
64  * buffered.
65  * -# Consider a stream with a 256 KiB buffer from previous buffered I/O (see
66  * below to understand how this might happen). If this stream receives a
67  * `.write()` call with 1024 KiB then both the 256 KiB and the 1024 KiB of
68  * data are uploaded immediately.
69  *
70  * @par Formatted I/O
71  * When performing formatted I/O, typically used via `operator<<`, this class
72  * will buffer data based on the`ClientOptions::upload_buffer_size()` setting.
73  * Note that this setting is expressed in bytes, but it is always rounded (up)
74  * to an upload quantum.
75  *
76  * @par Recommendations
77  * For best performance uploading data we recommend using *exclusively* the
78  * unbuffered I/O API. Furthermore, we recommend that applications use data in
79  * multiples of the upload quantum in all calls to `.write()`. Larger buffers
80  * result in better performance. Note that our
81  * [empirical results][github-issue-2657] show that these improvements tapper
82  * off around 32MiB or so.
83  *
84  * @par Suspending Uploads
85  * Note that, as it is customary in C++, the destructor of this class finalizes
86  * the upload. If you want to prevent the class from finalizing an upload, use
87  * the `Suspend()` function.
88  *
89  * @par Example: starting a resumable upload.
90  * @snippet storage_object_resumable_write_samples.cc start resumable upload
91  *
92  * @par Example: resuming a resumable upload.
93  * @snippet storage_object_resumable_write_samples.cc resume resumable upload
94  *
95  * [cpp-reference-put]: https://en.cppreference.com/w/cpp/io/basic_ostream/put
96  *
97  * [cpp-reference-write]:
98  * https://en.cppreference.com/w/cpp/io/basic_ostream/write
99  *
100  * [github-issue-2657]:
101  * https://github.com/googleapis/google-cloud-cpp/issues/2657
102  */
103 class ObjectWriteStream : public std::basic_ostream<char> {
104  public:
105  /**
106  * Creates a stream not associated with any buffer.
107  *
108  * Attempts to use this stream will result in failures.
109  */
111 
112  /**
113  * Creates a stream associated with the give request.
114  *
115  * Reading from the stream will result in http requests to get more data
116  * from the GCS object.
117  *
118  * @param buf an initialized ObjectWriteStreambuf to upload the data.
119  */
121  std::unique_ptr<internal::ObjectWriteStreambuf> buf);
122 
124 
126  ObjectWriteStream tmp(std::move(rhs));
127  swap(tmp);
128  return *this;
129  }
130 
131  void swap(ObjectWriteStream& rhs) {
132  basic_ostream<char>::swap(rhs);
133  std::swap(buf_, rhs.buf_);
134  rhs.set_rdbuf(rhs.buf_.get());
135  set_rdbuf(buf_.get());
136  std::swap(metadata_, rhs.metadata_);
137  std::swap(headers_, rhs.headers_);
138  std::swap(payload_, rhs.payload_);
139  }
140 
143 
144  /// Closes the stream (if necessary).
145  ~ObjectWriteStream() override;
146 
147  /**
148  * Return true if the stream is open to write more data.
149  *
150  * @note
151  * write streams can be "born closed" when created using a previously
152  * finalized upload session. Applications that restore a previous session
153  * should check the state, for example:
154  *
155  * @code
156  * auto stream = client.WriteObject(...,
157  * gcs::RestoreResumableUploadSession(session_id));
158  * if (!stream.IsOpen() && stream.metadata().ok()) {
159  * std::cout << "Yay! The upload was finalized previously.\n";
160  * return;
161  * }
162  * @endcode
163  */
164  bool IsOpen() const { return buf_ != nullptr && buf_->IsOpen(); }
165 
166  /**
167  * Close the stream, finalizing the upload.
168  *
169  * Closing a stream completes an upload and creates the uploaded object. On
170  * failure it sets the `badbit` of the stream.
171  *
172  * The metadata of the uploaded object, or a detailed error status, is
173  * accessible via the `metadata()` member function. Note that the metadata may
174  * be empty if the application creates a stream with the `Fields("")`
175  * parameter, applications cannot assume that all fields in the metadata are
176  * filled on success.
177  *
178  * @throws If the application has enabled the exception mask this function may
179  * throw `std::ios_base::failure`.
180  */
181  void Close();
182 
183  //@{
184  /**
185  * Access the upload results.
186  *
187  * Note that calling these member functions before `Close()` is undefined
188  * behavior.
189  */
190  StatusOr<ObjectMetadata> const& metadata() const& { return metadata_; }
191  StatusOr<ObjectMetadata>&& metadata() && { return std::move(metadata_); }
192 
193  /**
194  * The received CRC32C checksum and the MD5 hash values as reported by GCS.
195  *
196  * When the upload is finalized (via `Close()`) the GCS server reports the
197  * CRC32C checksum and, if the object is not a composite object, the MDF hash
198  * of the uploaded data. This class compares the reported hashes against
199  * locally computed hash values, and reports an error if they do not match.
200  *
201  * The values are reported as comma separated `tag=value` pairs, e.g.
202  * `crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==`. The format of this string
203  * is subject to change without notice, they are provided for informational
204  * purposes only.
205  *
206  * @see https://cloud.google.com/storage/docs/hashes-etags for more
207  * information on checksums and hashes in GCS.
208  */
209  std::string const& received_hash() const { return buf_->received_hash(); }
210 
211  /**
212  * The locally computed checksum and hashes, as a string.
213  *
214  * This object computes the CRC32C checksum and MD5 hash of the uploaded data.
215  * There are several cases where these values may be empty or irrelevant, for
216  * example:
217  * - When performing resumable uploads the stream may not have had access to
218  * the full data.
219  * - The application may disable the CRC32C and/or the MD5 hash computation.
220  *
221  * The string has the same format as the value returned by `received_hash()`.
222  * Note that the format of this string is also subject to change without
223  * notice.
224  *
225  * @see https://cloud.google.com/storage/docs/hashes-etags for more
226  * information on checksums and hashes in GCS.
227  */
228  std::string const& computed_hash() const { return buf_->computed_hash(); }
229 
230  /**
231  * The headers (if any) returned by the service. For debugging only.
232  *
233  * @warning The contents of these headers may change without notice. Unless
234  * documented in the API, headers may be removed or added by the service.
235  * Also note that the client library uses both the XML and JSON API,
236  * choosing between them based on the feature set (some functionality is
237  * only available through the JSON API), and performance. Consequently,
238  * the headers may be different on requests using different features.
239  * Likewise, the headers may change from one version of the library to the
240  * next, as we find more (or different) opportunities for optimization.
241  */
242  HeadersMap const& headers() const { return headers_; }
243 
244  /// The returned payload as a raw string, for debugging only.
245  std::string const& payload() const { return payload_; }
246  //@}
247 
248  /**
249  * Returns the resumable upload session id for this upload.
250  *
251  * Note that this is an empty string for uploads that do not use resumable
252  * upload session ids. `Client::WriteObject()` enables resumable uploads based
253  * on the options set by the application.
254  */
255  std::string const& resumable_session_id() const {
256  return buf_->resumable_session_id();
257  }
258 
259  /**
260  * Returns the next expected byte.
261  *
262  * For non-resumable uploads this is always zero. Applications that use
263  * resumable uploads can use this value to resend any data not committed in
264  * the GCS.
265  */
266  std::uint64_t next_expected_byte() const {
267  return buf_->next_expected_byte();
268  }
269 
270  /**
271  * Suspends an upload.
272  *
273  * This is a destructive operation. Using this object after calling this
274  * function results in undefined behavior. Applications should copy any
275  * necessary state (such as the value `resumable_session_id()`) before calling
276  * this function.
277  *
278  * @snippet storage_object_resumable_write_samples.cc suspend resumable upload
279  */
280  void Suspend() &&;
281 
282  /**
283  * Returns the status of partial errors.
284  *
285  * Application may write multiple times before closing the stream, this
286  * function gives the capability to find out status even before stream
287  * closure.
288  *
289  * This function is different than `metadata()` as calling `metadata()`
290  * before Close() is undefined.
291  */
292  Status last_status() const { return buf_->last_status(); }
293 
294  private:
295  /**
296  * Closes the underlying object write stream.
297  */
298  void CloseBuf();
299 
300  std::unique_ptr<internal::ObjectWriteStreambuf> buf_;
301  StatusOr<ObjectMetadata> metadata_;
302  std::multimap<std::string, std::string> headers_;
303  std::string payload_;
304 };
305 
307 } // namespace storage
308 } // namespace cloud
309 } // namespace google
310 
311 #endif // GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H