classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cp-patches] Patch: FYI: more 1.5 updates


From: Tom Tromey
Subject: [cp-patches] Patch: FYI: more 1.5 updates
Date: 16 Sep 2005 13:07:50 -0600

I'm checking this in on the trunk.

This adds some 1.5 methods to Character and String, and updates a few
of the 1.5 methods in StringBuffer.

Tom

Index: ChangeLog
from  Tom Tromey  <address@hidden>

        * java/lang/Character.java (MIN_SURROGATE, MAX_SURROGATE): New
        constants.
        (isHighSurrogate): New method.
        (isLowSurrogate): Likewise.
        (isSurrogatePair): Likewise.
        (toCodePoint): Likewise.
        (codePointAt): Likewise.
        (codePointBefore): Likewise.
        * java/lang/StringBuffer.java (codePointCount): Check bounds.
        (codePointAt): Rewrote.
        (codePointBefore): Likewise.
        * java/lang/String.java (codePointAt): New method.
        (codePointBefore): Likewise.
        (codePointCount): Likewise.
        (contentEquals): New overload.

Index: java/lang/Character.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/Character.java,v
retrieving revision 1.38
diff -u -r1.38 Character.java
--- java/lang/Character.java 14 Sep 2005 15:14:15 -0000 1.38
+++ java/lang/Character.java 16 Sep 2005 19:11:34 -0000
@@ -1508,6 +1508,20 @@
   public static final char MAX_LOW_SURROGATE = '\udfff';
 
   /**
+   * Minimum surrogate code in UTF-16 encoding.
+   *
+   * @since 1.5
+   */
+  public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
+
+  /**
+   * Maximum low surrogate code in UTF-16 encoding.
+   *
+   * @since 1.5
+   */
+  public static final char MAX_SURROGATE = MAX_LOW_SURROGATE;
+
+  /**
    * Grabs an attribute offset from the Unicode attribute database. The lower
    * 5 bits are the character type, the next 2 bits are flags, and the top
    * 9 bits are the offset into the attribute tables.
@@ -2414,5 +2428,211 @@
   public static boolean isValidCodePoint(int codePoint)
   {
     return codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT;
+  }
+
+  /**
+   * Return true if the given character is a high surrogate.
+   * @param ch the character
+   * @return true if the character is a high surrogate character
+   *
+   * @since 1.5
+   */
+  public static boolean isHighSurrogate(char ch)
+  {
+    return ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE;
+  }
+
+  /**
+   * Return true if the given character is a low surrogate.
+   * @param ch the character
+   * @return true if the character is a low surrogate character
+   *
+   * @since 1.5
+   */
+  public static boolean isLowSurrogate(char ch)
+  {
+    return ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE;
+  }
+
+  /**
+   * Return true if the given characters compose a surrogate pair.
+   * This is true if the first character is a high surrogate and the
+   * second character is a low surrogate.
+   * @param ch1 the first character
+   * @param ch2 the first character
+   * @return true if the characters compose a surrogate pair
+   *
+   * @since 1.5
+   */
+  public static boolean isSurrogatePair(char ch1, char ch2)
+  {
+    return isHighSurrogate(ch1) && isLowSurrogate(ch2);
+  }
+
+  /**
+   * Given a valid surrogate pair, this returns the corresponding
+   * code point.
+   * @param high the high character of the pair
+   * @param low the low character of the pair
+   * @return the corresponding code point
+   *
+   * @since 1.5
+   */
+  public static int toCodePoint(char high, char low)
+  {
+    return ((high - MIN_HIGH_SURROGATE) << 10) + (low - MIN_LOW_SURROGATE);
+  }
+
+  /**
+   * Get the code point at the specified index in the CharSequence.
+   * This is like CharSequence#charAt(int), but if the character is
+   * the start of a surrogate pair, and there is a following
+   * character, and this character completes the pair, then the
+   * corresponding supplementary code point is returned.  Otherwise,
+   * the character at the index is returned.
+   *
+   * @param sequence the CharSequence
+   * @param index the index of the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   * @since 1.5
+   */
+  public static int codePointAt(CharSequence sequence, int index)
+  {
+    int len = sequence.length();
+    if (index < 0 || index >= len)
+      throw new IndexOutOfBoundsException();
+    char high = sequence.charAt(index);
+    if (! isHighSurrogate(high) || ++index >= len)
+      return high;
+    char low = sequence.charAt(index);
+    if (! isLowSurrogate(low))
+      return high;
+    return toCodePoint(high, low);
+  }
+
+  /**
+   * Get the code point at the specified index in the CharSequence.
+   * If the character is the start of a surrogate pair, and there is a
+   * following character, and this character completes the pair, then
+   * the corresponding supplementary code point is returned.
+   * Otherwise, the character at the index is returned.
+   *
+   * @param chars the character array in which to look
+   * @param index the index of the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   * @since 1.5
+   */
+  public static int codePointAt(char[] chars, int index)
+  {
+    return codePointAt(chars, index, chars.length);
+  }
+
+  /**
+   * Get the code point at the specified index in the CharSequence.
+   * If the character is the start of a surrogate pair, and there is a
+   * following character within the specified range, and this
+   * character completes the pair, then the corresponding
+   * supplementary code point is returned.  Otherwise, the character
+   * at the index is returned.
+   *
+   * @param chars the character array in which to look
+   * @param index the index of the codepoint to get, starting at 0
+   * @param limit the limit past which characters should not be examined
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;=
+   * limit, or if limit is negative or &gt;= the length of the array
+   * @since 1.5
+   */
+  public static int codePointAt(char[] chars, int index, int limit)
+  {
+    if (index < 0 || index >= limit || limit < 0 || limit >= chars.length)
+      throw new IndexOutOfBoundsException();
+    char high = chars[index];
+    if (! isHighSurrogate(high) || ++index >= limit)
+      return high;
+    char low = chars[index];
+    if (! isLowSurrogate(low))
+      return high;
+    return toCodePoint(high, low);
+  }
+
+  /**
+   * Get the code point before the specified index.  This is like
+   * #codePointAt(char[], int), but checks the characters at
+   * <code>index-1</code> and <code>index-2</code> to see if they form
+   * a supplementary code point.  If they do not, the character at
+   * <code>index-1</code> is returned.
+   *
+   * @param chars the character array
+   * @param index the index just past the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   * @since 1.5
+   */
+  public static int codePointBefore(char[] chars, int index)
+  {
+    return codePointBefore(chars, index, 1);
+  }
+
+  /**
+   * Get the code point before the specified index.  This is like
+   * #codePointAt(char[], int), but checks the characters at
+   * <code>index-1</code> and <code>index-2</code> to see if they form
+   * a supplementary code point.  If they do not, the character at
+   * <code>index-1</code> is returned.  The start parameter is used to
+   * limit the range of the array which may be examined.
+   *
+   * @param chars the character array
+   * @param index the index just past the codepoint to get, starting at 0
+   * @param start the index before which characters should not be examined
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is &gt; start or &gt;
+   * the length of the array, or if limit is negative or &gt;= the
+   * length of the array
+   * @since 1.5
+   */
+  public static int codePointBefore(char[] chars, int index, int start)
+  {
+    if (index < start || index > chars.length
+       || start < 0 || start >= chars.length)
+      throw new IndexOutOfBoundsException();
+    --index;
+    char low = chars[index];
+    if (! isLowSurrogate(low) || --index < start)
+      return low;
+    char high = chars[index];
+    if (! isHighSurrogate(high))
+      return low;
+    return toCodePoint(high, low);
+  }
+
+  /**
+   * Get the code point before the specified index.  This is like
+   * #codePointAt(CharSequence, int), but checks the characters at
+   * <code>index-1</code> and <code>index-2</code> to see if they form
+   * a supplementary code point.  If they do not, the character at
+   * <code>index-1</code> is returned.
+   *
+   * @param sequence the CharSequence
+   * @param index the index just past the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   * @since 1.5
+   */
+  public static int codePointBefore(CharSequence sequence, int index)
+  {
+    int len = sequence.length();
+    if (index < 1 || index > len)
+      throw new IndexOutOfBoundsException();
+    --index;
+    char low = sequence.charAt(index);
+    if (! isLowSurrogate(low) || --index < 0)
+      return low;
+    char high = sequence.charAt(index);
+    if (! isHighSurrogate(high))
+      return low;
+    return toCodePoint(high, low);
   }
 } // class Character
Index: java/lang/String.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/String.java,v
retrieving revision 1.71
diff -u -r1.71 String.java
--- java/lang/String.java 16 Sep 2005 15:59:02 -0000 1.71
+++ java/lang/String.java 16 Sep 2005 19:11:34 -0000
@@ -554,6 +554,40 @@
   }
 
   /**
+   * Get the code point at the specified index.  This is like #charAt(int),
+   * but if the character is the start of a surrogate pair, and the
+   * following character completes the pair, then the corresponding
+   * supplementary code point is returned.
+   * @param index the index of the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   * @since 1.5
+   */
+  public synchronized int codePointAt(int index)
+  {
+    // Use the CharSequence overload as we get better range checking
+    // this way.
+    return Character.codePointAt(this, index);
+  }
+
+  /**
+   * Get the code point before the specified index.  This is like
+   * #codePointAt(int), but checks the characters at <code>index-1</code> and
+   * <code>index-2</code> to see if they form a supplementary code point.
+   * @param index the index just past the codepoint to get, starting at 0
+   * @return the codepoint at the specified index
+   * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
+   *         (while unspecified, this is a StringIndexOutOfBoundsException)
+   * @since 1.5
+   */
+  public synchronized int codePointBefore(int index)
+  {
+    // Use the CharSequence overload as we get better range checking
+    // this way.
+    return Character.codePointBefore(this, index);
+  }
+
+  /**
    * Copies characters from this String starting at a specified start index,
    * ending at a specified stop index, to a character array starting at
    * a specified destination begin index.
@@ -731,6 +765,26 @@
   }
 
   /**
+   * Compares the given CharSequence to this String. This is true if
+   * the CharSequence has the same content as this String at this
+   * moment.
+   *
+   * @param seq the CharSequence to compare to
+   * @return true if CharSequence has the same character sequence
+   * @throws NullPointerException if the given CharSequence is null
+   * @since 1.5
+   */
+  public boolean contentEquals(CharSequence seq)
+  {
+    if (seq.length() != count)
+      return false;
+    for (int i = 0; i < count; ++i)
+      if (value[offset + i] != seq.charAt(i))
+       return false;
+    return true;
+  }
+
+  /**
    * Compares a String to this String, ignoring case. This does not handle
    * multi-character capitalization exceptions; instead the comparison is
    * made on a character-by-character basis, and is true if:<br><ul>
@@ -1679,6 +1733,49 @@
   public String intern()
   {
     return VMString.intern(this);
+  }
+
+  /**
+   * Return the number of code points between two indices in the
+   * <code>StringBuffer</code>.  An unpaired surrogate counts as a
+   * code point for this purpose.  Characters outside the indicated
+   * range are not examined, even if the range ends in the middle of a
+   * surrogate pair.
+   *
+   * @param start the starting index
+   * @param end one past the ending index
+   * @return the number of code points
+   * @since 1.5
+   */
+  public synchronized int codePointCount(int start, int end)
+  {
+    if (start < 0 || end >= count || start > end)
+      throw new StringIndexOutOfBoundsException();
+
+    start += offset;
+    end += offset;
+    int count = 0;
+    while (start < end)
+      {
+       char base = value[start];
+       if (base < Character.MIN_HIGH_SURROGATE
+           || base > Character.MAX_HIGH_SURROGATE
+           || start == end
+           || start == count
+           || value[start + 1] < Character.MIN_LOW_SURROGATE
+           || value[start + 1] > Character.MAX_LOW_SURROGATE)
+         {
+           // Nothing.
+         }
+       else
+         {
+           // Surrogate pair.
+           ++start;
+         }
+       ++start;
+       ++count;
+      }
+    return count;
   }
 
   /**
Index: java/lang/StringBuffer.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/StringBuffer.java,v
retrieving revision 1.32
diff -u -r1.32 StringBuffer.java
--- java/lang/StringBuffer.java 14 Sep 2005 15:20:42 -0000 1.32
+++ java/lang/StringBuffer.java 16 Sep 2005 19:11:37 -0000
@@ -252,7 +252,6 @@
    * @param index the index of the character to get, starting at 0
    * @return the character at the specified index
    * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
-   *         (while unspecified, this is a StringIndexOutOfBoundsException)
    */
   public synchronized char charAt(int index)
   {
@@ -269,22 +268,11 @@
    * @param index the index of the codepoint to get, starting at 0
    * @return the codepoint at the specified index
    * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
-   *         (while unspecified, this is a StringIndexOutOfBoundsException)
    * @since 1.5
    */
   public synchronized int codePointAt(int index)
   {
-    if (index < 0 || index >= count)
-      throw new StringIndexOutOfBoundsException(index);
-    char base = value[index];
-    if (base < Character.MIN_HIGH_SURROGATE
-       || base > Character.MAX_HIGH_SURROGATE
-       || index == count
-       || value[index + 1] < Character.MIN_LOW_SURROGATE
-       || value[index + 1] > Character.MAX_LOW_SURROGATE)
-      return base;
-    return (((base - Character.MIN_HIGH_SURROGATE) << 10)
-           + (value[index + 1] - Character.MIN_LOW_SURROGATE));
+    return Character.codePointAt(value, index, count);
   }
 
   /**
@@ -294,23 +282,15 @@
    * @param index the index just past the codepoint to get, starting at 0
    * @return the codepoint at the specified index
    * @throws IndexOutOfBoundsException if index is negative or &gt;= length()
-   *         (while unspecified, this is a StringIndexOutOfBoundsException)
    * @since 1.5
    */
   public synchronized int codePointBefore(int index)
   {
-    --index;
-    if (index < 0 || index >= count)
-      throw new StringIndexOutOfBoundsException(index);
-    char base = value[index];
-    if (base < Character.MIN_LOW_SURROGATE
-       || base > Character.MAX_LOW_SURROGATE
-       || index == 0
-       || value[index - 1] < Character.MIN_HIGH_SURROGATE
-       || value[index - 1] > Character.MAX_HIGH_SURROGATE)
-      return base;
-    return (((value[index - 1] - Character.MIN_HIGH_SURROGATE) << 10)
-           + (base - Character.MIN_LOW_SURROGATE));
+    // Character.codePointBefore() doesn't perform this check.  We
+    // could use the CharSequence overload, but this is just as easy.
+    if (index >= count)
+      throw new IndexOutOfBoundsException();
+    return Character.codePointBefore(value, index, 1);
   }
 
   /**
@@ -1093,6 +1073,9 @@
    */
   public synchronized int codePointCount(int start, int end)
   {
+    if (start < 0 || end >= count || start > end)
+      throw new StringIndexOutOfBoundsException();
+
     int count = 0;
     while (start < end)
       {




reply via email to

[Prev in Thread] Current Thread [Next in Thread]