[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cp-patches] Patch: FYI: more 1.5 updates
From: |
Tom Tromey |
Subject: |
[cp-patches] Patch: FYI: more 1.5 updates |
Date: |
16 Sep 2005 13:07:50 -0600 |
I'm checking this in on the trunk.
This adds some 1.5 methods to Character and String, and updates a few
of the 1.5 methods in StringBuffer.
Tom
Index: ChangeLog
from Tom Tromey <address@hidden>
* java/lang/Character.java (MIN_SURROGATE, MAX_SURROGATE): New
constants.
(isHighSurrogate): New method.
(isLowSurrogate): Likewise.
(isSurrogatePair): Likewise.
(toCodePoint): Likewise.
(codePointAt): Likewise.
(codePointBefore): Likewise.
* java/lang/StringBuffer.java (codePointCount): Check bounds.
(codePointAt): Rewrote.
(codePointBefore): Likewise.
* java/lang/String.java (codePointAt): New method.
(codePointBefore): Likewise.
(codePointCount): Likewise.
(contentEquals): New overload.
Index: java/lang/Character.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/Character.java,v
retrieving revision 1.38
diff -u -r1.38 Character.java
--- java/lang/Character.java 14 Sep 2005 15:14:15 -0000 1.38
+++ java/lang/Character.java 16 Sep 2005 19:11:34 -0000
@@ -1508,6 +1508,20 @@
public static final char MAX_LOW_SURROGATE = '\udfff';
/**
+ * Minimum surrogate code in UTF-16 encoding.
+ *
+ * @since 1.5
+ */
+ public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
+
+ /**
+ * Maximum low surrogate code in UTF-16 encoding.
+ *
+ * @since 1.5
+ */
+ public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;
+
+ /**
* Grabs an attribute offset from the Unicode attribute database. The lower
* 5 bits are the character type, the next 2 bits are flags, and the top
* 9 bits are the offset into the attribute tables.
@@ -2414,5 +2428,211 @@
public static boolean isValidCodePoint(int codePoint)
{
return codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT;
+ }
+
+ /**
+ * Return true if the given character is a high surrogate.
+ * @param ch the character
+ * @return true if the character is a high surrogate character
+ *
+ * @since 1.5
+ */
+ public static boolean isHighSurrogate(char ch)
+ {
+ return ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE;
+ }
+
+ /**
+ * Return true if the given character is a low surrogate.
+ * @param ch the character
+ * @return true if the character is a low surrogate character
+ *
+ * @since 1.5
+ */
+ public static boolean isLowSurrogate(char ch)
+ {
+ return ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE;
+ }
+
+ /**
+ * Return true if the given characters compose a surrogate pair.
+ * This is true if the first character is a high surrogate and the
+ * second character is a low surrogate.
+ * @param ch1 the first character
+ * @param ch2 the first character
+ * @return true if the characters compose a surrogate pair
+ *
+ * @since 1.5
+ */
+ public static boolean isSurrogatePair(char ch1, char ch2)
+ {
+ return isHighSurrogate(ch1) && isLowSurrogate(ch2);
+ }
+
+ /**
+ * Given a valid surrogate pair, this returns the corresponding
+ * code point.
+ * @param high the high character of the pair
+ * @param low the low character of the pair
+ * @return the corresponding code point
+ *
+ * @since 1.5
+ */
+ public static int toCodePoint(char high, char low)
+ {
+ return ((high - MIN_HIGH_SURROGATE) << 10) + (low - MIN_LOW_SURROGATE);
+ }
+
+ /**
+ * Get the code point at the specified index in the CharSequence.
+ * This is like CharSequence#charAt(int), but if the character is
+ * the start of a surrogate pair, and there is a following
+ * character, and this character completes the pair, then the
+ * corresponding supplementary code point is returned. Otherwise,
+ * the character at the index is returned.
+ *
+ * @param sequence the CharSequence
+ * @param index the index of the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * @since 1.5
+ */
+ public static int codePointAt(CharSequence sequence, int index)
+ {
+ int len = sequence.length();
+ if (index < 0 || index >= len)
+ throw new IndexOutOfBoundsException();
+ char high = sequence.charAt(index);
+ if (! isHighSurrogate(high) || ++index >= len)
+ return high;
+ char low = sequence.charAt(index);
+ if (! isLowSurrogate(low))
+ return high;
+ return toCodePoint(high, low);
+ }
+
+ /**
+ * Get the code point at the specified index in the CharSequence.
+ * If the character is the start of a surrogate pair, and there is a
+ * following character, and this character completes the pair, then
+ * the corresponding supplementary code point is returned.
+ * Otherwise, the character at the index is returned.
+ *
+ * @param chars the character array in which to look
+ * @param index the index of the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * @since 1.5
+ */
+ public static int codePointAt(char[] chars, int index)
+ {
+ return codePointAt(chars, index, chars.length);
+ }
+
+ /**
+ * Get the code point at the specified index in the CharSequence.
+ * If the character is the start of a surrogate pair, and there is a
+ * following character within the specified range, and this
+ * character completes the pair, then the corresponding
+ * supplementary code point is returned. Otherwise, the character
+ * at the index is returned.
+ *
+ * @param chars the character array in which to look
+ * @param index the index of the codepoint to get, starting at 0
+ * @param limit the limit past which characters should not be examined
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >=
+ * limit, or if limit is negative or >= the length of the array
+ * @since 1.5
+ */
+ public static int codePointAt(char[] chars, int index, int limit)
+ {
+ if (index < 0 || index >= limit || limit < 0 || limit >= chars.length)
+ throw new IndexOutOfBoundsException();
+ char high = chars[index];
+ if (! isHighSurrogate(high) || ++index >= limit)
+ return high;
+ char low = chars[index];
+ if (! isLowSurrogate(low))
+ return high;
+ return toCodePoint(high, low);
+ }
+
+ /**
+ * Get the code point before the specified index. This is like
+ * #codePointAt(char[], int), but checks the characters at
+ * <code>index-1</code> and <code>index-2</code> to see if they form
+ * a supplementary code point. If they do not, the character at
+ * <code>index-1</code> is returned.
+ *
+ * @param chars the character array
+ * @param index the index just past the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * @since 1.5
+ */
+ public static int codePointBefore(char[] chars, int index)
+ {
+ return codePointBefore(chars, index, 1);
+ }
+
+ /**
+ * Get the code point before the specified index. This is like
+ * #codePointAt(char[], int), but checks the characters at
+ * <code>index-1</code> and <code>index-2</code> to see if they form
+ * a supplementary code point. If they do not, the character at
+ * <code>index-1</code> is returned. The start parameter is used to
+ * limit the range of the array which may be examined.
+ *
+ * @param chars the character array
+ * @param index the index just past the codepoint to get, starting at 0
+ * @param start the index before which characters should not be examined
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is > start or >
+ * the length of the array, or if limit is negative or >= the
+ * length of the array
+ * @since 1.5
+ */
+ public static int codePointBefore(char[] chars, int index, int start)
+ {
+ if (index < start || index > chars.length
+ || start < 0 || start >= chars.length)
+ throw new IndexOutOfBoundsException();
+ --index;
+ char low = chars[index];
+ if (! isLowSurrogate(low) || --index < start)
+ return low;
+ char high = chars[index];
+ if (! isHighSurrogate(high))
+ return low;
+ return toCodePoint(high, low);
+ }
+
+ /**
+ * Get the code point before the specified index. This is like
+ * #codePointAt(CharSequence, int), but checks the characters at
+ * <code>index-1</code> and <code>index-2</code> to see if they form
+ * a supplementary code point. If they do not, the character at
+ * <code>index-1</code> is returned.
+ *
+ * @param sequence the CharSequence
+ * @param index the index just past the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * @since 1.5
+ */
+ public static int codePointBefore(CharSequence sequence, int index)
+ {
+ int len = sequence.length();
+ if (index < 1 || index > len)
+ throw new IndexOutOfBoundsException();
+ --index;
+ char low = sequence.charAt(index);
+ if (! isLowSurrogate(low) || --index < 0)
+ return low;
+ char high = sequence.charAt(index);
+ if (! isHighSurrogate(high))
+ return low;
+ return toCodePoint(high, low);
}
} // class Character
Index: java/lang/String.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/String.java,v
retrieving revision 1.71
diff -u -r1.71 String.java
--- java/lang/String.java 16 Sep 2005 15:59:02 -0000 1.71
+++ java/lang/String.java 16 Sep 2005 19:11:34 -0000
@@ -554,6 +554,40 @@
}
/**
+ * Get the code point at the specified index. This is like #charAt(int),
+ * but if the character is the start of a surrogate pair, and the
+ * following character completes the pair, then the corresponding
+ * supplementary code point is returned.
+ * @param index the index of the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * @since 1.5
+ */
+ public synchronized int codePointAt(int index)
+ {
+ // Use the CharSequence overload as we get better range checking
+ // this way.
+ return Character.codePointAt(this, index);
+ }
+
+ /**
+ * Get the code point before the specified index. This is like
+ * #codePointAt(int), but checks the characters at <code>index-1</code> and
+ * <code>index-2</code> to see if they form a supplementary code point.
+ * @param index the index just past the codepoint to get, starting at 0
+ * @return the codepoint at the specified index
+ * @throws IndexOutOfBoundsException if index is negative or >= length()
+ * (while unspecified, this is a StringIndexOutOfBoundsException)
+ * @since 1.5
+ */
+ public synchronized int codePointBefore(int index)
+ {
+ // Use the CharSequence overload as we get better range checking
+ // this way.
+ return Character.codePointBefore(this, index);
+ }
+
+ /**
* Copies characters from this String starting at a specified start index,
* ending at a specified stop index, to a character array starting at
* a specified destination begin index.
@@ -731,6 +765,26 @@
}
/**
+ * Compares the given CharSequence to this String. This is true if
+ * the CharSequence has the same content as this String at this
+ * moment.
+ *
+ * @param seq the CharSequence to compare to
+ * @return true if CharSequence has the same character sequence
+ * @throws NullPointerException if the given CharSequence is null
+ * @since 1.5
+ */
+ public boolean contentEquals(CharSequence seq)
+ {
+ if (seq.length() != count)
+ return false;
+ for (int i = 0; i < count; ++i)
+ if (value[offset + i] != seq.charAt(i))
+ return false;
+ return true;
+ }
+
+ /**
* Compares a String to this String, ignoring case. This does not handle
* multi-character capitalization exceptions; instead the comparison is
* made on a character-by-character basis, and is true if:<br><ul>
@@ -1679,6 +1733,49 @@
public String intern()
{
return VMString.intern(this);
+ }
+
+ /**
+ * Return the number of code points between two indices in the
+ * <code>StringBuffer</code>. An unpaired surrogate counts as a
+ * code point for this purpose. Characters outside the indicated
+ * range are not examined, even if the range ends in the middle of a
+ * surrogate pair.
+ *
+ * @param start the starting index
+ * @param end one past the ending index
+ * @return the number of code points
+ * @since 1.5
+ */
+ public synchronized int codePointCount(int start, int end)
+ {
+ if (start < 0 || end >= count || start > end)
+ throw new StringIndexOutOfBoundsException();
+
+ start += offset;
+ end += offset;
+ int count = 0;
+ while (start < end)
+ {
+ char base = value[start];
+ if (base < Character.MIN_HIGH_SURROGATE
+ || base > Character.MAX_HIGH_SURROGATE
+ || start == end
+ || start == count
+ || value[start + 1] < Character.MIN_LOW_SURROGATE
+ || value[start + 1] > Character.MAX_LOW_SURROGATE)
+ {
+ // Nothing.
+ }
+ else
+ {
+ // Surrogate pair.
+ ++start;
+ }
+ ++start;
+ ++count;
+ }
+ return count;
}
/**
Index: java/lang/StringBuffer.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/StringBuffer.java,v
retrieving revision 1.32
diff -u -r1.32 StringBuffer.java
--- java/lang/StringBuffer.java 14 Sep 2005 15:20:42 -0000 1.32
+++ java/lang/StringBuffer.java 16 Sep 2005 19:11:37 -0000
@@ -252,7 +252,6 @@
* @param index the index of the character to get, starting at 0
* @return the character at the specified index
* @throws IndexOutOfBoundsException if index is negative or >= length()
- * (while unspecified, this is a StringIndexOutOfBoundsException)
*/
public synchronized char charAt(int index)
{
@@ -269,22 +268,11 @@
* @param index the index of the codepoint to get, starting at 0
* @return the codepoint at the specified index
* @throws IndexOutOfBoundsException if index is negative or >= length()
- * (while unspecified, this is a StringIndexOutOfBoundsException)
* @since 1.5
*/
public synchronized int codePointAt(int index)
{
- if (index < 0 || index >= count)
- throw new StringIndexOutOfBoundsException(index);
- char base = value[index];
- if (base < Character.MIN_HIGH_SURROGATE
- || base > Character.MAX_HIGH_SURROGATE
- || index == count
- || value[index + 1] < Character.MIN_LOW_SURROGATE
- || value[index + 1] > Character.MAX_LOW_SURROGATE)
- return base;
- return (((base - Character.MIN_HIGH_SURROGATE) << 10)
- + (value[index + 1] - Character.MIN_LOW_SURROGATE));
+ return Character.codePointAt(value, index, count);
}
/**
@@ -294,23 +282,15 @@
* @param index the index just past the codepoint to get, starting at 0
* @return the codepoint at the specified index
* @throws IndexOutOfBoundsException if index is negative or >= length()
- * (while unspecified, this is a StringIndexOutOfBoundsException)
* @since 1.5
*/
public synchronized int codePointBefore(int index)
{
- --index;
- if (index < 0 || index >= count)
- throw new StringIndexOutOfBoundsException(index);
- char base = value[index];
- if (base < Character.MIN_LOW_SURROGATE
- || base > Character.MAX_LOW_SURROGATE
- || index == 0
- || value[index - 1] < Character.MIN_HIGH_SURROGATE
- || value[index - 1] > Character.MAX_HIGH_SURROGATE)
- return base;
- return (((value[index - 1] - Character.MIN_HIGH_SURROGATE) << 10)
- + (base - Character.MIN_LOW_SURROGATE));
+ // Character.codePointBefore() doesn't perform this check. We
+ // could use the CharSequence overload, but this is just as easy.
+ if (index >= count)
+ throw new IndexOutOfBoundsException();
+ return Character.codePointBefore(value, index, 1);
}
/**
@@ -1093,6 +1073,9 @@
*/
public synchronized int codePointCount(int start, int end)
{
+ if (start < 0 || end >= count || start > end)
+ throw new StringIndexOutOfBoundsException();
+
int count = 0;
while (start < end)
{
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [cp-patches] Patch: FYI: more 1.5 updates,
Tom Tromey <=