|
Because Internet protocols such as XML and HTML are text-based, software programs
spend a considerable amount of time converting text to numbers (integers and floats) and vice versa. To
optimize performance, Eyal Eliahu Alaluf, CTO of Mainsoft, developed high-performance
algorithms for integers and real numbers that deliver up to 2.6x the number of conversions
per second in .NET, and typically deliver 3x more conversions per second than the
equivalent Java APIs.
Background
Mainsoft for Java EE software implements the .NET 2.0 APIs to format primitive types
on top of the Java VM. To improve the performance of C# and Visual Basic server
applications running on Java using Mainsoft software, we recently conducted a comparative
analysis of the .NET and Java APIs that convert strings to integers and to real
numbers, and we tested their performance. Then, we designed two algorithms that
format integers and real numbers significantly faster than Microsoft's implementation
of the .NET APIs, and with one exception, our algorithms also deliver 3x as many
conversions per second as the Java APIs.
What follows are the results of our performance tests and a description of the algorithms
we developed. The algorithms are included in the Mainsoft for Java EE v 2.x release
and will be included in future Mono releases.
Performance testing: .NET and Java formatting APIs
Both .NET and Java provide a set of APIs that format primitive types into strings
and vice versa. While their semantics differ, both define format types (decimal
format, scientific format, fixed point format, etc.) and the required precision,
and they support locale-dependent information, such as which negative sign and which
decimal point to use.
During Q4 2007, Mainsoft conducted a comparative analysis and performance testing
of the .NET and Java APIs' strings-to-integers and strings-to-numbers conversion
rates (see the table below). We used a 1.8 GHz Pentium M with 2GB RAM to test the
APIs, and we used a Sun J2SE 6.0 server virtual machine to measure J2SE 6.0 performance.
|
Conversion |
.NET 2.0 Conversions*
(000's per second)
|
J2SE 6.0 Conversions
(000's per second)
|
|
|
Integers
|
Integers using default formatting (e.g. 12345) |
|
3722 |
11557 |
|
Integers using currency formatting (e.g. $12,345) |
|
2250 |
1174 |
|
|
Real numbers
|
Doubles using default formatting (e.g. 0.12345) |
|
1871 |
1036 |
|
Small doubles using default formatting (e. g. 1.2345E-200) |
|
1633 |
142 |
|
Large doubles using default formatting (e. g. 1.2345E+200) |
|
1583 |
165 |
*The implementation of the APIs to format strings-to-integers and strings-to-real-numbers
is identical across .NET 2.0, .NET 3.0, and .NET 3.5.
The results indicate Java delivers 3x as many conversions per second as .NET in
the simplest case, which uses default formatting for integers. As soon as you add
culture-specific formatting, such as currencies, Java performance falls by a factor
of ten.
When converting strings to real numbers, implementation of the .NET APIs outperforms
the Java APIs. The performance differential between .NET and Java is even more pronounced
when you consider conversion speeds for very large and very small double values.
Mainsoft's integer formatting algorithm
Overview
Mainsoft's integer formatting algorithm follows a 3-step process that is dictated
by the semantics of the .NET APIs:
-
The format is parsed to retrieve the type of format and the precision.
-
The number is converted to a list of digits.
-
The list of digits is converted to a string by applying the logic of the specific
format and the culture-specific information.
We improve the conversion performance by:
- Formatting digits representation
using bitwise operations, which CPU registers are optimized to use, rather than
array traversals.
- Using thread-specific storage
to minimize memory allocations.
Digits representation
The conventional representation of digits uses a byte array in which each digit
occupies one cell. Allocating a byte array and transforming an integer is a resource-intensive
operation when you consider the extremely high demands placed on this algorithm.
Rather than using a byte array, Mainsoft's algorithm encodes the digits list using
binary-coded
decimals (BCD), which represents the number 12345 as the number 0x12345
(which has a decimal value of 74565). Then, the algorithm stores the digits BCD
values within four integer fields. This representation works for up to 32 digits,
which accommodates the .NET decimal data type.
Using BCD, digit manipulations and conversions to characters are performed with
simple and fast bitwise operations. In addition, the operation of converting an
integer to its BCD representation is an efficient and simple arithmetic operation.
Converting an integer to a binary-coded decimal
The BCD conversion algorithm minimizes the number of the comparatively slow operations
of division and remainders for 8-64 bits signed and unsigned integers.
- The number is divided iteratively
by 100,000,000.
- The remainder of the number
from 100,000,000 is divided again by 10,000.
- The remainder of the last
operation is converted to BCD using a pre-calculated cache.
- The divisor of the last operation
is also smaller than 10,000, and is converted to BCD using the same pre-calculated
cache.
- The two BCD results are stored
in the lower 16 bits and upper 16 bits of the first (or second, or third, according
to the current iteration) BCD integer fields.
Minimizing memory allocations
Mainsoft's integer formatting algorithm uses a class to maintain the data about
the formatted number. The class contains the format, precision, the four BCD fields,
and the character array used to create the resulting string.
Allocating memory for this conversion class, and allocating the character array,
carries a significant overhead compared to the required performance. This issue
is typical in high throughput operations, where the performance degradation caused
by memory allocations can be significant.
To dramatically reduce the use of memory allocations, Mainsoft's algorithm places
the conversion class as a member of the current thread class, allowing both the
class and character array to be reused for different numbers. The algorithm also
allows for further optimizing the lookup of the culture-specific formatting information
by having the thread update the class whenever the thread culture information changes.
Mainsoft's real numbers formatting algorithm
Real numbers in both .NET and Java are defined by the IEEE 754 floating point standard.
Mainsoft's algorithm for implementing the .NET APIs:
-
Converts a double precision floating point into a 64-bit integer and a power of
ten using a very efficient O(1) conversion formula.
-
Applies the integer formatting algorithm to the 64-bit integer, taking into account
where the decimal point should be placed according to the given power of ten.
We'll use an example to demonstrate the issues and efficiencies of the O(1) conversion
formula.
Converting 0.12345
Consider the real number 0.12345. Its double precision representation is 8895509983982204
* 2 ** -56. In order to transform this representation into 0.12345, the algorithm
pre-calculates 2 ** -56 as 1387778780781445675 * 10 ** -35. The result of the multiplication
is 12345000000000000412733332592767700 * 10 ** -35. When we convert the result into
digits, round the number, and place the decimal point in the correct place, we get
the desired 0.12345.
In this above calculation, converting the result of the multiplication into digits
is a resource-intensive operation, since the result (in this case) has 35 digits
while the required precision is 17 digits. We achieve this by:
- Pre-calculating 2 ** -56
as 2560000000000000000 * 2 ** -64 * 10 ** -16. The result is 22772505558994442240000000000000000
* 2 ** -64 * 10 ** -16.
- Shifting the number by 64
bits, we get 1234500000000000 * 10 ** -16.
- Placing the decimal point
at the right place to produce the desired result: 0.12345.
The conversion algorithm
The IEEE 754 standard divides the bits of a double precision floating point into
a sign S, and exponent E, and a fraction F. The real number value is (-1) ** S *
2 ** (E – 1075) * (1F), where 1F is intended to represent the binary number created
by prefixing F with an implicit leading 1. We'll ignore the special cases E=0 and
E=2047, to simplify matters.
The algorithm pre-calculates the values of 2 ** (E – 1075) for the valid range of
E (i.e. [0..2047]) using I * 2 ** -64 * 10 ** T where I is a 64-bit integer and
T is a 32-bit integer. The resulting multiplication is ((1F * I) * 2 ** -64) * 10
** T. This formula produces a 64-bit integer by taking the upper 64 bits of 1F *
I and a Ten's power (T).
In order to preserve the desired precision, the algorithm is required to ensure
that the 64-bit integer has exactly 17 decimal digits. The algorithm does this by
multiplying 1F * I by 10 until its upper 64 bits have 17 decimal digits. Note that
by the same requirement of preserving precision, I must always be bigger then 2
** 64 / 10.
Performance testing
We measured the performance of Mainsoft's implementation of the .NET APIs using
the same hardware and Sun J2SE 6.0 server virtual machine that was used to measure
conversion performance of J2SE 6.0. The results are listed in the right-hand column,
in the table below.
|
Conversion |
.NET 2.0
(000's per second)
|
J2SE 6.0
(000's per second)
|
Mainsoft
(000's per second)
|
|
|
Integers
|
Integers using default formatting (e.g. 12345) |
|
3722 |
11557 |
9714 |
|
Integers using currency formatting (eg $12,345) |
|
2250 |
1174 |
3027 |
|
|
Real numbers
|
Doubles using default formatting (e.g. 0.12345) |
|
1871 |
1036 |
2717 |
|
Small doubles using default formatting 1.2345E-200 |
|
1633 |
142 |
2324 |
|
Large doubles using default formatting 1.2345E+200 |
|
1583 |
165 |
2307 |
Mainsoft's implementation of the .NET APIs significantly out-performs the Microsoft
.NET APIs implementation in all cases. And, apart from the specific case of converting
string to integers using default formatting, which uses a specialized Java algorithm,
Mainsoft's algorithms deliver 3 to 15 times the number of conversions as the Java
APIs.
Conclusion
I invite you to review our implementation—posted at NumberFormatter.cs and NumberFormatter.jvm.cs—and provide feedback and
questions. I'd also welcome any questions you may have regarding Mainsoft's commitment
to deliver equivalent performance of C# and Visual Basic applications running on
the Java VM. Please provide your comments in our
developer forums.
I look forward to hearing from you!
|