public abstract class CSet extends Object implements Comparable<CSet>
Characters are Java characters, i.e. 16-bit unsigned integers. Character sets are mostly encoded as an ordered list of character intervals, with some special values for degenerate cases, and are thus optimized for space rather than for lookup speed.
The special value 0xFFFF is reserved to denote end of file. It is not a valid Unicode character anyway.
Instances of this class are immutable.
They implement the Comparable
interface via a total
ordering based on the lexicographic comparison of the character set,
with the empty character set being a conventional minimum.
It is not a particularly meaningful total order but it is
used to guarantee a deterministic traversal order in
CSet
-indexed maps. A more natural, yet partial, order
based on character set inclusion can be retrieved via
the method included(CSet, CSet)
.
Modifier and Type | Class and Description |
---|---|
static class |
CSet.Gen
Generates random character sets based on a
probability configuration
and a random number generator
|
Modifier and Type | Field and Description |
---|---|
static CSet |
ALL
The set of all possible characters, including
the special end-of-file marker.
|
static CSet |
ALL_BUT_EOF
Like
ALL but does not contain the special
end-of-input marker |
static CSet |
EMPTY
The empty character set
|
static CSet |
EOF
Special character set used to denote end-of-file
|
Constructor and Description |
---|
CSet() |
Modifier and Type | Method and Description |
---|---|
abstract int |
cardinal() |
static CSet |
chars(char... chars) |
static String |
charToString(char ch,
boolean javaSafe) |
int |
compareTo(@Nullable CSet cs) |
static CSet |
complement(CSet cs) |
abstract boolean |
contains(char ch) |
static CSet |
diff(CSet cs1,
CSet cs2) |
boolean |
equals(@Nullable Object o) |
static boolean |
equivalent(CSet cs1,
CSet cs2) |
void |
forEach(Consumer<? super Character> f)
Applies the function
f to all characters
in the character set |
void |
forEachInterval(BiConsumer<? super Character,? super Character> f)
Applies the function
f to all intervals in
the character set. |
static CSet.Gen |
generator() |
abstract int |
hashCode() |
static boolean |
included(CSet cs1,
CSet cs2) |
static CSet |
inter(CSet cs1,
CSet cs2)
Returns the intersection of the two given character sets
|
static CSet |
interval(char first,
char last) |
abstract boolean |
isEmpty() |
static CSet |
singleton(char c) |
abstract String |
toString() |
static CSet |
union(CSet... charSets) |
static CSet |
union(CSet cs1,
CSet cs2)
Returns the union of the two given character sets
|
static Iterable<Character> |
witnesses(CSet cset) |
public static final CSet ALL
Warning: also contains character values which do not correspond to a valid Unicode character.
public static final CSet EMPTY
public static final CSet EOF
public abstract boolean isEmpty()
true
if this character set is emptypublic abstract boolean contains(char ch)
ch
- ch
public abstract int cardinal()
public final int compareTo(@Nullable CSet cs)
compareTo
in interface Comparable<CSet>
public static String charToString(char ch, boolean javaSafe)
ch
- javaSafe
- should be true
if the result
string may end up in Java sourcesch
, either
using the character itself if appropriate, or
the '\\uxxxx' form. If javaSafe
is true
the '\\uxxxx' is not used because it would be unescaped
by a Java compiler, and the prefix '0x' is used insteadpublic static CSet singleton(char c)
c
- c
public static CSet interval(char first, char last)
first
- last
- first
to
last
, inclusivepublic void forEach(Consumer<? super Character> f)
f
to all characters
in the character setf
- public void forEachInterval(BiConsumer<? super Character,? super Character> f)
f
to all intervals in
the character set. The intervals are guaranteed to
be normalized in the following fashion:
f
- public static CSet union(CSet cs1, CSet cs2)
public static CSet union(CSet... charSets)
charSets
- public static CSet chars(char... chars)
chars
- public static CSet inter(CSet cs1, CSet cs2)
public static CSet diff(CSet cs1, CSet cs2)
cs1
- cs2
- cs1
except those that are in cs2
public static CSet complement(CSet cs)
cs
- cs
,
excluding the end-of-input markerpublic static boolean equivalent(CSet cs1, CSet cs2)
cs1
- cs2
- true
if and only if the two given
character sets are equivalentpublic static boolean included(CSet cs1, CSet cs2)
cs1
- cs2
- true
if and only if the character set
cs1
is included in cs2
public static CSet.Gen generator()