Numerical Precision

1. Analyze the floating point format IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. The standard defines formats for representing floating-point numbers and special values together with a set of floating-point operations that operate on these values. It also specifies four rounding modes and five exceptions (Michael L Overton).

2. How floating point numbers are stored in memory An IEEE-754 float (4 bytes) or double (8 bytes) has three components (there is also an analogous 96-bit extended-precision format under IEEE-854): a sign bit telling whether the number is positive or negative, an exponent giving its order of magnitude, and a mantissa specifying the actual digits of the number. Using single-precision floats as an example, here is the bit layout: seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm meaning 31 0 bit # s = sign bit, e = exponent, m = mantissa In the internal representation, there is 1 bit for the sign (S), 8 bits for the exponent (E), and 23 bits for the mantissa (m).

The number is stored as follows, with high memory to the right: Byte 0 Byte 1 Byte 2 Byte 3 00000000 11111100 22221111 33222222 76543210 54321098 32109876 10987654 FFFFFFFF FFFFFFFF EFFFFFFF SEEEEEEE 3. The difficulty of manipulating and using floating point numbers in c calculations There are two reasons why a real number might not be exactly represented as a floating-point number. The most common situation is illustrated by the decimal number 0. 1. Although it has a finite decimal representation, in binary it has an infinite repeating representation. Thus when ? = 2, the number 0.

1 lies strictly between two floating-point numbers and is exactly represented by neither of them (Cleve Moler). Floating-point representations are not necessarily unique. For example, both 0. 01 ? 10^1 and 1. 00 ? 10^-1 represent 0. 1. If the leading digit is nonzero, then the representation is said to be normalized. The floating-point number 1. 00 ? 10^-1 is normalized, while 0. 01 ? 10^1 is not. 4. The format used to store numbers using the binary coded decimal format Binary-coded decimal, or BCD, is a method of using binary digits to represent the decimal digits 0 through 9.

A decimal digit is represented by four binary digits (Raymond Filiatreault), as shown below: Decimal DigitBCD 8 4 2 1 00000 10001 20010 30011 40100 50101 60110 70111 81000 91001 As most computers store data in 8-bit bytes, it is possible to use one of the following methods to encode a BCD number: •Uncompressed: each numeral is encoded into one byte, with four bits representing the numeral and the remaining bits having no significance. •Packed: two numerals are encoded into a single byte, with one numeral in the least significant nibble (bits 0-3) and the other numeral in the most significant nibble (bits 4-7).

5. Compare and contrast the BCD format to the floating point format Although the integer and floating point formats cover most of the numeric needs of an average program, there are some special cases where other numeric representations are convenient. In this section we’ll discuss the Binary Coded Decimal (BCD) format since the 80×86 CPU provides a small amount of hardware support for this data representation. BCD values are a sequence of nibbles with each nibble representing a value in the range zero through nine. Of course you can represent values in the range 0-0.

15 using a nibble; the BCD format, however, uses only 10 of the possible 16 different values for each nibble. Each nibble in a BCD value represents a single decimal digit. Therefore, with a single byte (i. e. , two digits) we can represent values containing two decimal digits, or values in the range 0-0. 99. With a word, we can represent values having four decimal digits, or values in the range 0-0. 9999. Likewise, with a double word we can represent values with up to eight decimal digits (since there are eight nibbles in a double word value).

6. The BCD format isn’t memory efficient As you can see, BCD storage isn’t particularly memory efficient. For example, an eight-bit BCD variable can represent values in the range 0-0. 99 while that same eight bits, when holding a binary value, can represent values in the range 0.. 255. Likewise, a 16-bit binary value can represent values in the range 0-0. 65535 while a 16-bit BCD value can only represent about 1/6 of those values (0-0. 9999). Inefficient storage isn’t the only problem. BCD calculations tend to be slower than binary calculations.

At this point, you’re probably wondering why anyone would ever use the BCD format. The BCD format does have two saving graces: it’s very easy to convert BCD values between the internal numeric representation and their string representation; also, it’s very easy to encode multi-digit decimal values in hardware (e. g. , using a “thumb wheel” or dial) using BCD than it is using binary. For these two reasons, you’re likely to see people using BCD in embedded systems (e. g. , toaster ovens and alarm clocks) but rarely in general purpose computer software.

7. Floating point numbers and rounding errors Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation.

Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation.

Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? 8. Differentiate between the exponential format and the floating point format Exponential notation is a method of simplifying the writing and handling of very large or very small numbers. In exponential notation, a number usually is expressed as a coefficient between one and ten times an integral power of ten, the exponent (Donald E. Simanek). Here are some examples: 12E7 = 120000000 /* Displays “1” */ 12E-5 = 0. 00012 /* Displays “1” */

-12e4 = -120000 /* Displays “1” */ 0e123 = 0e456 /* Displays “1” */ 0e123 == 0e456 /* Displays “0” */ The results of calculations are returned in either conventional or exponential form, depending on the setting of numeric digits. If the number of places needed before the decimal point exceeds digits, or the number of places after the point exceeds twice digits, the exponential form is used. The exponential form the language processor generates always has a sign following the E to improve readability. If the exponent is 0, the exponential part is omitted–that is, an exponential part of E+0 is not generated.