Google Cloud Storage C++ Client  1.34.0
A C++ Client Library for Google Cloud Storage
object_write_stream.h
Go to the documentation of this file.
1 // Copyright 2021 Google LLC
2 //
3 // Licensed under the Apache License, Version 2.0 (the "License");
4 // you may not use this file except in compliance with the License.
5 // You may obtain a copy of the License at
6 //
7 // http://www.apache.org/licenses/LICENSE-2.0
8 //
9 // Unless required by applicable law or agreed to in writing, software
10 // distributed under the License is distributed on an "AS IS" BASIS,
11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 // See the License for the specific language governing permissions and
13 // limitations under the License.
14 
15 #ifndef GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
16 #define GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H
17 
18 #include "google/cloud/storage/internal/object_write_streambuf.h"
19 #include "google/cloud/storage/version.h"
20 #include <map>
21 #include <memory>
22 #include <ostream>
23 #include <string>
24 
25 namespace google {
26 namespace cloud {
27 namespace storage {
29 
30 /// Represents the headers returned in a streaming upload or download operation.
31 using HeadersMap = std::multimap<std::string, std::string>;
32 
33 /**
34  * Defines a `std::basic_ostream<char>` to write to a GCS object.
35  *
36  * This class is used to upload objects to GCS. It can handle objects of any
37  * size, but keep the following considerations in mind:
38  *
39  * * This API is designed for applications that need to stream the object
40  * payload. If you have the payload as one large buffer consider using
41  * `storage::Client::InsertObject()`, it is simpler and faster in most cases.
42  * * This API can be used to perform unformatted I/O, as well as formatted I/O
43  * using the familiar `operator<<` APIs. Note that formatted I/O typically
44  * implies some form of buffering and data copying. For best performance,
45  * consider using the [.write()][cpp-reference-write] member function.
46  * * GCS expects to receive data in multiples of the *upload quantum* (256KiB).
47  * Sending a buffer that is not a multiple of this quantum terminates the
48  * upload. This constraints the implementation of buffered and unbuffered I/O
49  * as described below.
50  *
51  * @par Unformatted I/O
52  * On a `.write()` call this class attempts to send the data immediately, this
53  * this the unbuffered API after all. If any previously buffered data and the
54  * data provided in the `.write()` call are larger than an upload quantum the
55  * class sends data immediately. Any data in excess of a multiple of the upload
56  * quantum are buffered for the next upload.
57  *
58  * These examples may clarify how this works:
59  * -# Consider a fresh `ObjectWriteStream` that receives a `.write()` call
60  * with 257 KiB of data. The first 256 KiB are immediately sent and the
61  * remaining 1 KiB is buffered for a future upload.
62  * -# If the same stream receives another `.write()` call with 256 KiB then it
63  * will send the buffered 1 KiB of data and the first 255 KiB from the new
64  * buffer. The last 1 KiB is buffered for a future upload.
65  * -# Consider a fresh `ObjectWriteStream` that receives a `.write()` call
66  * with 4 MiB of data. This data is sent immediately, and no data is
67  * buffered.
68  * -# Consider a stream with a 256 KiB buffer from previous buffered I/O (see
69  * below to understand how this might happen). If this stream receives a
70  * `.write()` call with 1024 KiB then both the 256 KiB and the 1024 KiB of
71  * data are uploaded immediately.
72  *
73  * @par Formatted I/O
74  * When performing formatted I/O, typically used via `operator<<`, this class
75  * will buffer data based on the`ClientOptions::upload_buffer_size()` setting.
76  * Note that this setting is expressed in bytes, but it is always rounded (up)
77  * to an upload quantum.
78  *
79  * @par Recommendations
80  * For best performance uploading data we recommend using *exclusively* the
81  * unbuffered I/O API. Furthermore, we recommend that applications use data in
82  * multiples of the upload quantum in all calls to `.write()`. Larger buffers
83  * result in better performance. Note that our
84  * [empirical results][github-issue-2657] show that these improvements tapper
85  * off around 32MiB or so.
86  *
87  * @par Suspending Uploads
88  * Note that, as it is customary in C++, the destructor of this class finalizes
89  * the upload. If you want to prevent the class from finalizing an upload, use
90  * the `Suspend()` function.
91  *
92  * @par Example: starting a resumable upload.
93  * @snippet storage_object_resumable_write_samples.cc start resumable upload
94  *
95  * @par Example: resuming a resumable upload.
96  * @snippet storage_object_resumable_write_samples.cc resume resumable upload
97  *
98  * [cpp-reference-put]: https://en.cppreference.com/w/cpp/io/basic_ostream/put
99  *
100  * [cpp-reference-write]:
101  * https://en.cppreference.com/w/cpp/io/basic_ostream/write
102  *
103  * [github-issue-2657]:
104  * https://github.com/googleapis/google-cloud-cpp/issues/2657
105  */
106 class ObjectWriteStream : public std::basic_ostream<char> {
107  public:
108  /**
109  * Creates a stream not associated with any buffer.
110  *
111  * Attempts to use this stream will result in failures.
112  */
114 
115  /**
116  * Creates a stream associated with the give request.
117  *
118  * Reading from the stream will result in http requests to get more data
119  * from the GCS object.
120  *
121  * @param buf an initialized ObjectWriteStreambuf to upload the data.
122  */
124  std::unique_ptr<internal::ObjectWriteStreambuf> buf);
125 
127 
129  ObjectWriteStream tmp(std::move(rhs));
130  swap(tmp);
131  return *this;
132  }
133 
134  void swap(ObjectWriteStream& rhs) {
135  basic_ostream<char>::swap(rhs);
136  std::swap(buf_, rhs.buf_);
137  rhs.set_rdbuf(rhs.buf_.get());
138  set_rdbuf(buf_.get());
139  std::swap(metadata_, rhs.metadata_);
140  std::swap(headers_, rhs.headers_);
141  std::swap(payload_, rhs.payload_);
142  }
143 
146 
147  /// Closes the stream (if necessary).
148  ~ObjectWriteStream() override;
149 
150  /**
151  * Return true if the stream is open to write more data.
152  *
153  * @note
154  * write streams can be "born closed" when created using a previously
155  * finalized upload session. Applications that restore a previous session
156  * should check the state, for example:
157  *
158  * @code
159  * auto stream = client.WriteObject(...,
160  * gcs::RestoreResumableUploadSession(session_id));
161  * if (!stream.IsOpen() && stream.metadata().ok()) {
162  * std::cout << "Yay! The upload was finalized previously.\n";
163  * return;
164  * }
165  * @endcode
166  */
167  bool IsOpen() const { return buf_ != nullptr && buf_->IsOpen(); }
168 
169  /**
170  * Close the stream, finalizing the upload.
171  *
172  * Closing a stream completes an upload and creates the uploaded object. On
173  * failure it sets the `badbit` of the stream.
174  *
175  * The metadata of the uploaded object, or a detailed error status, is
176  * accessible via the `metadata()` member function. Note that the metadata may
177  * be empty if the application creates a stream with the `Fields("")`
178  * parameter, applications cannot assume that all fields in the metadata are
179  * filled on success.
180  *
181  * @throws If the application has enabled the exception mask this function may
182  * throw `std::ios_base::failure`.
183  */
184  void Close();
185 
186  //@{
187  /**
188  * Access the upload results.
189  *
190  * Note that calling these member functions before `Close()` is undefined
191  * behavior.
192  */
193  StatusOr<ObjectMetadata> const& metadata() const& { return metadata_; }
194  StatusOr<ObjectMetadata>&& metadata() && { return std::move(metadata_); }
195 
196  /**
197  * The received CRC32C checksum and the MD5 hash values as reported by GCS.
198  *
199  * When the upload is finalized (via `Close()`) the GCS server reports the
200  * CRC32C checksum and, if the object is not a composite object, the MDF hash
201  * of the uploaded data. This class compares the reported hashes against
202  * locally computed hash values, and reports an error if they do not match.
203  *
204  * The values are reported as comma separated `tag=value` pairs, e.g.
205  * `crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==`. The format of this string
206  * is subject to change without notice, they are provided for informational
207  * purposes only.
208  *
209  * @see https://cloud.google.com/storage/docs/hashes-etags for more
210  * information on checksums and hashes in GCS.
211  */
212  std::string const& received_hash() const { return buf_->received_hash(); }
213 
214  /**
215  * The locally computed checksum and hashes, as a string.
216  *
217  * This object computes the CRC32C checksum and MD5 hash of the uploaded data.
218  * There are several cases where these values may be empty or irrelevant, for
219  * example:
220  * - When performing resumable uploads the stream may not have had access to
221  * the full data.
222  * - The application may disable the CRC32C and/or the MD5 hash computation.
223  *
224  * The string has the same format as the value returned by `received_hash()`.
225  * Note that the format of this string is also subject to change without
226  * notice.
227  *
228  * @see https://cloud.google.com/storage/docs/hashes-etags for more
229  * information on checksums and hashes in GCS.
230  */
231  std::string const& computed_hash() const { return buf_->computed_hash(); }
232 
233  /**
234  * The headers (if any) returned by the service. For debugging only.
235  *
236  * @warning The contents of these headers may change without notice. Unless
237  * documented in the API, headers may be removed or added by the service.
238  * Also note that the client library uses both the XML and JSON API,
239  * choosing between them based on the feature set (some functionality is
240  * only available through the JSON API), and performance. Consequently,
241  * the headers may be different on requests using different features.
242  * Likewise, the headers may change from one version of the library to the
243  * next, as we find more (or different) opportunities for optimization.
244  */
245  HeadersMap const& headers() const { return headers_; }
246 
247  /// The returned payload as a raw string, for debugging only.
248  std::string const& payload() const { return payload_; }
249  //@}
250 
251  /**
252  * Returns the resumable upload session id for this upload.
253  *
254  * Note that this is an empty string for uploads that do not use resumable
255  * upload session ids. `Client::WriteObject()` enables resumable uploads based
256  * on the options set by the application.
257  */
258  std::string const& resumable_session_id() const {
259  return buf_->resumable_session_id();
260  }
261 
262  /**
263  * Returns the next expected byte.
264  *
265  * For non-resumable uploads this is always zero. Applications that use
266  * resumable uploads can use this value to resend any data not committed in
267  * the GCS.
268  */
269  std::uint64_t next_expected_byte() const {
270  return buf_->next_expected_byte();
271  }
272 
273  /**
274  * Suspends an upload.
275  *
276  * This is a destructive operation. Using this object after calling this
277  * function results in undefined behavior. Applications should copy any
278  * necessary state (such as the value `resumable_session_id()`) before calling
279  * this function.
280  *
281  * @snippet storage_object_resumable_write_samples.cc suspend resumable upload
282  */
283  void Suspend() &&;
284 
285  /**
286  * Returns the status of partial errors.
287  *
288  * Application may write multiple times before closing the stream, this
289  * function gives the capability to find out status even before stream
290  * closure.
291  *
292  * This function is different than `metadata()` as calling `metadata()`
293  * before Close() is undefined.
294  */
295  Status last_status() const { return buf_->last_status(); }
296 
297  private:
298  /**
299  * Closes the underlying object write stream.
300  */
301  void CloseBuf();
302 
303  std::unique_ptr<internal::ObjectWriteStreambuf> buf_;
304  StatusOr<ObjectMetadata> metadata_;
305  std::multimap<std::string, std::string> headers_;
306  std::string payload_;
307 };
308 
310 } // namespace storage
311 } // namespace cloud
312 } // namespace google
313 
314 #endif // GOOGLE_CLOUD_CPP_GOOGLE_CLOUD_STORAGE_OBJECT_WRITE_STREAM_H