public class PercentEscaper extends UnicodeEscaper
UnicodeEscaper
that escapes some set of Java characters using the URI percent encoding
scheme. The set of safe characters (those which remain unescaped) is specified on
construction.
For details on escaping URIs for use in web pages, see RFC 3986 - section 2.4 and RFC 3986 - appendix A
When encoding a String, the following rules apply:
plusForSpace
is true, the space character " " is converted into a plus
sign "+".
RFC 3986 defines the set of unreserved characters as "-", "_", "~", and "." It goes on to state:
URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource. However, URI comparison implementations
do not always perform normalization prior to comparison (see Section
6). For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.
Note: This escaper produces uppercase hexadecimal sequences. From RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits for all
percent-encodings."
Modifier and Type | Field and Description |
---|---|
static String |
SAFE_PLUS_RESERVED_CHARS_URLENCODER
Contains the safe characters plus all reserved characters.
|
static String |
SAFECHARS_URLENCODER
A string of safe characters that mimics the behavior of
URLEncoder . |
static String |
SAFEPATHCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI path segments, as
specified in RFC 3986.
|
static String |
SAFEQUERYSTRINGCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI query strings, as
specified in RFC 3986.
|
static String |
SAFEUSERINFOCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI user info part, as
specified in RFC 3986.
|
Constructor and Description |
---|
PercentEscaper(String safeChars)
Constructs a URI escaper with the specified safe characters.
|
PercentEscaper(String safeChars,
boolean plusForSpace)
Deprecated.
use
PercentEscaper(String safeChars) instead which is the same as invoking
this method with plusForSpace set to false. Escaping spaces as plus signs does not
conform to the URI specification. |
Modifier and Type | Method and Description |
---|---|
protected char[] |
escape(int cp)
Escapes the given Unicode code point in UTF-8.
|
String |
escape(String s)
Returns the escaped form of a given literal string.
|
protected int |
nextEscapeIndex(CharSequence csq,
int index,
int end)
Scans a sub-sequence of characters from a given
CharSequence , returning the index of
the next character that requires escaping. |
codePointAt, escapeSlow
public static final String SAFECHARS_URLENCODER
URLEncoder
.public static final String SAFEPATHCHARS_URLENCODER
public static final String SAFE_PLUS_RESERVED_CHARS_URLENCODER
public static final String SAFEUSERINFOCHARS_URLENCODER
public static final String SAFEQUERYSTRINGCHARS_URLENCODER
public PercentEscaper(String safeChars)
safeChars
- a non null string specifying additional safe characters for this escaper (the
ranges 0..9, a..z and A..Z are always safe and should not be specified here)IllegalArgumentException
- if any of the parameters are invalid@Deprecated public PercentEscaper(String safeChars, boolean plusForSpace)
PercentEscaper(String safeChars)
instead which is the same as invoking
this method with plusForSpace set to false. Escaping spaces as plus signs does not
conform to the URI specification.+
instead of %20
. and optional handling of the spacesafeChars
- a non null string specifying additional safe characters for this escaper. The
ranges 0..9, a..z and A..Z are always safe and should not be specified here.plusForSpace
- true if ASCII space should be escaped to +
rather than %20
IllegalArgumentException
- if safeChars includes characters that are always safe or
characters that must always be escapedprotected int nextEscapeIndex(CharSequence csq, int index, int end)
UnicodeEscaper
CharSequence
, returning the index of
the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override this method for
efficiency. The base class implementation determines successive Unicode code points and invokes
UnicodeEscaper.escape(int)
for each of them. If the semantics of your escaper are such that code
points in the supplementary range are either all escaped or all unescaped, this method can be
implemented more efficiently using CharSequence.charAt(int)
.
Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
See PercentEscaper
for an example.
nextEscapeIndex
in class UnicodeEscaper
csq
- a sequence of charactersindex
- the index of the first character to be scannedend
- the index immediately after the last character to be scannedpublic String escape(String s)
UnicodeEscaper
If you are escaping input in arbitrary successive chunks, then it is not generally safe to
use this method. If an input string ends with an unmatched high surrogate character, then this
method will throw IllegalArgumentException
. You should ensure your input is valid UTF-16 before calling this method.
escape
in class UnicodeEscaper
s
- the literal string to be escapedstring
protected char[] escape(int cp)
escape
in class UnicodeEscaper
cp
- the Unicode code point to escape if necessarynull
if no escaping was neededCopyright © 2011–2020 Google. All rights reserved.