Primitive Types

Boolean

Boolean is a fundamental primitive type in computer science, representing a logical value that can be either true or false.It is extensively used in logical operations and conditional statements.

Application:

Character

Character is a primitive type representing a single character, encompassing letters, digits, punctuation marks, whitespaces, and control characters.

Application:

Floating-Point

Floating-point is a representation of a finite subset of the rationals. This includes single-precision and double-precision IEEE 754 floats, among others.(it consists of significand and an exponent)

Application:

Fixed-Point

Fixed-point is representation of rational numbers, providing a fixed number of digits in the fractional part.

Advantages:

Integer

Integer is a primitive type that provides a direct representation of either the integers or the non-negative integers.

Application:

Reference

Reference, sometimes erroneously referred to as a pointer or handle, is a value that refers to another value. It may also refer to itself.

Implementation: Application:

Symbol

Symbol is a primitive type representing a unique identifier.

Application:

Enumerated Type

Enumerated type is a primitive type that represents a set of symbols. It allows the programmer to define a type with a set of named values, each of which represents a unique symbolic constant.

Application:

Character in Computer Terminology

In computer and telecommunications, a character is a unit of information corresponding to a grapheme, symbol, or grapheme-like unit in the written form of a natural language. It includes letters, digits, punctuation marks, whitespace, and control characters for text formatting. Characters are often combined into strings. Historically, the term character referred to a specific number of bits, with various options like 6-bit and 5-bit character codes. Modern systems use varying-size sequences of fixed-sized pieces, such as UTF-8 and Unicode.

Encoding

Computers represent characters using character encodings like ASCII and UTF-8, mapping characters to integers or bit sequences. Morse code represents characters using electrical impulses.

Terminology

The term glyph describes the visual appearance of a character. With Unicode, a character is viewed as a unit of information independent of its visual manifestation. Unicode differentiates between abstract characters, graphemes, and glyphs.

Combining Character

Unicode addresses combining characters, allowing coding of characters like 'ï' using a single character or a combination of base character and combining diacritic.

char in C Programming Language

In C, a char is a data type with a size of one byte, typically 8 bits. It holds any member of the basic execution character set. In newer standards, char is required to hold UTF-8 code units. Unicode code points may require more than one byte.

Floating-Point Arithmetic in Computing

In computing, floating-point arithmetic (FP) represents subsets of real numbers using an integer with a fixed precision, known as the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. Commonly, base two (binary) and base ten (decimal floating point) are used. Floating-point arithmetic operations approximate real number operations by rounding results to nearby floating-point numbers. The term "floating point" refers to the radix point's ability to "float" anywhere among the significant digits, indicated by the exponent, similar to scientific notation. Floating-point systems accommodate numbers of varying orders of magnitude, making them suitable for very small and very large real numbers, such as those in astronomy or atomic scales.

Overview of Floating-Point Numbers

IEEE 754 Standard

Over the years, various floating-point representations have been used. In 1985, the IEEE 754 Standard for Floating-Point Arithmetic was established, defining the most commonly encountered representations.

Floating-Point Unit (FPU)

A floating-point unit (FPU) is a part of a computer system designed for operations on floating-point numbers. The speed of floating-point operations, measured in FLOPS, is crucial for applications involving intensive mathematical calculations.

Number Representation Mechanisms

There are different mechanisms for representing numbers, including standard mathematical notation, fixed-point systems, and scientific notation. Floating-point representation is similar to scientific notation, with a scaling factor indicated separately at the end of the number.

Alternatives to Floating-Point Numbers

While floating-point representation is prevalent, there are alternatives such as fixed-point representation, logarithmic number systems (LNSs), tapered floating-point representation, and rational arithmetic. Interval arithmetic allows representing numbers as intervals, providing guaranteed bounds on results.

Floating-Point Numbers and IEEE 754 Standard

A floating-point number consists of two fixed-point components, the significand, and the exponent. The range of the floating-point number linearly depends on the significand range and exponentially on the exponent range.

Representation Details

On a typical computer system, a double-precision (64-bit) binary floating-point number has a 53-bit coefficient, an 11-bit exponent, and 1 sign bit. The positive normal floating-point numbers in this format range from $2^{-1022}$ to $2^{1024}$. The number of normal floating-point numbers in a system with base $B$, precision $P$, smallest exponent $L$, and largest exponent $U$ is given by $2(B-1)(B^{P-1})(U-L+1)$.

Special Values

There is a smallest positive normal floating-point number (Underflow level) given by $UFL = B^{L}$. The largest floating-point number (Overflow level) is given by $OFL = (1-B^{-P})(B^{U+1})$. There are representable values strictly between $-UFL$ and $UFL$, including positive and negative zeros, as well as subnormal numbers.

IEEE 754 Standard

The IEEE 754 Standard, established in 1985 and revised in 2008, standardized computer representation for binary floating-point numbers. It includes various formats like single precision (binary32), double precision (binary64), and double extended.

Floating-Point Formats

IEEE 754 formats include 16-bit, 32-bit, 64-bit, 128-bit, and 256-bit representations. Notable formats include binary16, binary32, binary64, binary128, and binary256.

Internal Representation

Floating-point numbers are packed into a computer datum with a sign bit, an exponent field, and a significand. The exponent is stored as an unsigned number with a fixed bias added to it. The IEEE binary interchange formats use a hidden or implicit bit, resulting in 24 bits of precision for single precision, 53 for double precision, and 113 for quad precision.

Comparison and Special Values

Comparison of floating-point numbers follows IEEE standard rules, with positive and negative zeros considered equal, and every NaN comparing unequal to every value, including itself. Special values include positive infinity, negative infinity, negative zero, and NaNs.

Floating-Point Numbers: Characteristics and Conversions

By their nature, all numbers expressed in floating-point format are rational numbers with a terminating expansion in the relevant base—such as a decimal expansion in base-10 or a binary expansion in base-2. Irrational numbers like $\pi$ or $\sqrt{2}$, along with non-terminating rational numbers, necessitate approximation. The precision, specified in digits or bits, limits the exact representation of rational numbers. For instance, the decimal number 123456789 cannot be precisely represented with only eight decimal digits of precision, resulting in rounding to one of the straddling representable values. When representing a number in a format not natively supported in a computer's floating-point implementation, a conversion is required. If the number has an exact representation in floating-point format, the conversion is precise. However, in cases without an exact representation, a choice must be made for the floating-point number, resulting in a rounded value. The terminability of the expansion of a rational number is contingent upon the base. In base-10, 1/2 has a terminating expansion (0.5), while 1/3 does not (0.333...). In base-2, only rationals with denominators as powers of 2 terminate. Any rational with a denominator having a prime factor other than 2 will have an infinite binary expansion. Consequently, seemingly concise decimal numbers may require approximation when converted to binary floating-point. For instance, the decimal number 0.1 is non-representable in binary floating-point with finite precision due to its endless "1100" sequence in the exact binary representation. An additional example is the real number $\pi$, represented in binary as an infinite sequence of bits. When approximated by rounding to a precision of 24 bits, it results in a representation that deviates from the true value by about 0.03 parts per million. In the realm of floating-point arithmetic, the arithmetical difference between two consecutive representable floating-point numbers with the same exponent is termed a unit in the last place (ULP). For instance, in base-2 with a single precision, an ULP is exactly $2^{-23}$ or about $10^{-7}$. Rounding is crucial when the exact result of a floating-point operation would require more digits than available in the significand. IEEE 754 mandates correct rounding, rounding the result as if infinitely precise arithmetic was employed. The default method is rounding to the nearest representable value (ties to even), with specific rules for tie-breaking. IEEE 754 requires consistent rounding across all fundamental algebraic operations. Various rounding modes are available, including rounding to the nearest, rounding towards positive or negative infinity, and truncation. These alternative modes are useful for bounding errors and diagnosing numerical instability. The conversion between decimal and binary floating-point formats, especially with minimal digits, poses challenges. Notable algorithms like Dragon4, dtoa.c, Grisu3, Errol3, Ryū, and Schubfach have been introduced to address these challenges, providing accurate and minimal results. The complexity of parsing a decimal string into a binary floating-point representation has been a subject of research, with accurate parsers emerging in later works. Certainly! Here's the rephrased text:

Floating-Point Numbers: Exception Handling and Accuracy Issues

Floating-point computation in a computer can encounter three types of problems:
  1. Undefined Operations: Such as ∞/∞ or division by zero.
  2. Unsupported Operations: Operations like calculating the square root of −1 or the inverse sine of 2, not supported by the specific format.
  3. Representation Issues: Results impossible to represent due to an exponent too large or too small, leading to overflow, underflow, or denormalization.
Before the IEEE standard, these conditions would often terminate the program or trigger traps. IEEE 754 introduced exception handling, recording arithmetic exceptions in "sticky" status flag bits. These sticky flags persist until explicitly reset, allowing delayed testing of exceptional conditions after a full floating-point expression or subroutine.
IEEE 754 specifies five arithmetic exceptions recorded in status flags:
  1. Inexact: Rounded value differs from the exact result.
  2. Underflow: Rounded value is tiny and inexact.
  3. Overflow: Absolute value of the rounded value is too large to be represented.
  4. Divide-by-Zero: Result is infinite given finite operands.
  5. Invalid: Real-valued result cannot be returned, e.g., sqrt(−1) or 0/0, resulting in a quiet NaN.
Default return values are designed to provide correct results in most cases. For instance, divide-by-zero returns infinity, and inexact returns a correctly rounded result. Overflow and invalid exceptions may not necessarily represent errors but situations like evaluating functions outside their domain.

Accuracy Problems

Floating-point numbers' inability to represent all real numbers accurately, coupled with imprecise arithmetic operations, leads to several challenges. For example, the decimal numbers 0.1 and 0.01 cannot be represented exactly in binary floating-point numbers. Accuracy issues also arise from the non-associativity and non-distributivity of floating-point addition and multiplication. Subtraction of nearly equal operands can result in significant loss of accuracy, a common and serious problem. Additionally, conversions to integers and testing for safe division or equality are problematic.

Incidents

Notably, on February 25, 1991, a loss of significance in a MIM-104 Patriot missile battery contributed to the failure to intercept an incoming Scud missile in Dhahran, Saudi Arabia, resulting in the death of 28 soldiers. ### Machine Precision and Backward Error Analysis Machine precision, denoted as Εmach, characterizes the accuracy of a floating-point system and is crucial for backward error analysis. It bounds the relative error in representing any non-zero real number within the normalized range. Backward error analysis, popularized by James H. Wilkinson, ensures that an algorithm implementing a numerical function is numerically stable. It involves showing that the calculated result, though not exactly correct due to roundoff errors, is the exact solution to a nearby problem with slightly perturbed input data. IEEE 754 exception handling and machine precision play pivotal roles in managing the challenges associated with floating-point computation, ensuring accurate and reliable results in various computational scenarios. Further considerations involve addressing the inherent limitations of floating-point representation and arithmetic operations.

fixed point

In computing, fixed-point is an approach for representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. It contrasts with the more complex and computationally demanding floating-point representation. Fixed-point representation is often utilized in specific scenarios, such as low-cost embedded microprocessors, applications requiring high speed and/or low power consumption (e.g., image, video, and digital signal processing), or when its use aligns more naturally with the problem at hand. In fixed-point representation, the fractional part is expressed in the same number base as the integer part, commonly in decimal or binary. The choice of scaling factors, determining how the fixed-point value is implicitly multiplied, depends on the precision needed and the range of values to be stored. Scaling factors are often powers of the base used for internal integer representation, but occasionally other factors, like powers of 10, are chosen for human convenience. Fixed-point computations can offer advantages in terms of speed, hardware efficiency, and portability compared to floating-point operations. Many embedded processors lack a Floating-Point Unit (FPU), making fixed-point arithmetic a practical choice. Moreover, fixed-point computations are often more predictable and portable across different systems. Applications of fixed-point representation include storing monetary values due to its straightforward handling of rounding rules, real-time computing in mathematically intensive tasks like flight simulation, and custom-made microprocessors in DSP applications. Fixed-point representation can also be employed in electricity meters, digital clocks, and other instruments where compensating for introduced errors is crucial. Operations in fixed-point arithmetic involve addition, subtraction, multiplication, division, and scaling conversion. Hardware support for fixed-point arithmetic is not always explicit, but processors with binary arithmetic often offer fast bit shift instructions that can be utilized for scaling. While some older programming languages provide explicit support for fixed-point types, more modern languages tend to rely on floating-point processors. However, there have been efforts to extend languages like C to include fixed-point data types for embedded processor applications. Relational databases and SQL notation commonly support fixed-point decimal arithmetic and storage. In summary, fixed-point representation, with its fixed scaling factors, serves specific needs in computing, providing efficiency and predictability in various applications.

Notations

Various notations have been used to specify fixed-point formats:

Software Application Examples

Common Integral Data Types

Bytes and Octets

In computer science, an integer is a datum of integral data type, representing a range of mathematical integers. Integral data types vary in size and may or may not allow negative values. Integers are commonly represented as binary digits (bits) in computer memory, and the grouping size varies between computers. Computer hardware provides ways to represent processor registers or memory addresses as integers.

Value and Representation

The value of an integral type is the corresponding mathematical integer. Integral types may be unsigned (representing non-negative integers) or signed (capable of representing negative integers). Programming languages allow specifying integer values in source code using digits, with optional prefixes like + or -. Some languages support alternative notations, such as hexadecimal or octal, and may permit digit group separators. The internal representation of an integer in computer memory has minimum and maximum possible values. Positive integers are commonly represented as binary strings, and the width or precision of an integral type is the number of bits in its representation. Different encoding schemes, like binary-coded decimal or Gray code, may also be used. Four well-known representations of signed numbers in binary systems are two's complement, offset binary, sign-magnitude, and ones' complement. Two's complement is the most common, providing a one-to-one correspondence between representations and values, without separate +0 and -0.

Language Specifics

Integer sizes in computer languages may be machine-independent or vary depending on the underlying processor word size. Not all languages define variables of all integer sizes, and sizes may differ between languages or processors. Some older architectures used decimal representations (BCD), and modern CPUs may offer limited support for decimal integers. Decimal integers can have fixed or variable-length representations, and their support depends on the architecture.

Common Integral Data Types

Different CPUs support various integral data types. High-level programming languages offer more possibilities, including a 'double width' integral type with twice the bits of the largest hardware-supported type. Languages may also support bit-field types and range types. Some languages, like Lisp, Smalltalk, REXX, Haskell, Python, and Raku, support arbitrary precision integers (bignums). Others, lacking top-level support, may have libraries for large numbers (e.g., Java's BigInteger or Perl's "bigint"). A Boolean or Flag type represents only two values: 0 and 1, identified with false and true. It can be stored in memory using a single bit or a full byte for convenience. A four-bit quantity is a nibble (or nybble), corresponding to one digit in hexadecimal and holding one digit or a sign code in binary-coded decimal.

Bytes and Octets

Term Definitions

In computer science, an integer is a datum of integral data type, representing a range of mathematical integers. Integral data types vary in size and may or may not allow negative values. Integers are commonly represented as binary digits (bits) in computer memory, and the grouping size varies between computers. Computer hardware provides ways to represent processor registers or memory addresses as integers.

Value and Representation

The value of an integral type is the corresponding mathematical integer. Integral types may be unsigned (representing non-negative integers) or signed (capable of representing negative integers). Programming languages allow specifying integer values in source code using digits, with optional prefixes like + or -. Some languages support alternative notations, such as hexadecimal or octal, and may permit digit group separators. The internal representation of an integer in computer memory has minimum and maximum possible values. Positive integers are commonly represented as binary strings, and the width or precision of an integral type is the number of bits in its representation. Different encoding schemes, like binary-coded decimal or Gray code, may also be used. Four well-known representations of signed numbers in binary systems are two's complement, offset binary, sign-magnitude, and ones' complement. Two's complement is the most common, providing a one-to-one correspondence between representations and values, without separate +0 and -0.

Language Specifics

Integer sizes in computer languages may be machine-independent or vary depending on the underlying processor word size. Not all languages define variables of all integer sizes, and sizes may differ between languages or processors. Some older architectures used decimal representations (BCD), and modern CPUs may offer limited support for decimal integers. Decimal integers can have fixed or variable-length representations, and their support depends on the architecture.

Reference

In computer programming, a reference is a value that allows a program to indirectly access specific data, such as a variable's value or a record, stored in the computer's memory or another storage device. The reference points to the data, and the process of accessing the data through the reference is called dereferencing. Importantly, a reference is distinct from the actual data it refers to. A reference is considered an abstract data type and can be implemented in various ways. Typically, it refers to data stored in memory, with its internal value being the memory address of the data, effectively making it a pointer. Other implementations include an offset between the data's address and a fixed "base" address, an index or identifier used for array or table lookup, an operating system handle, a physical address on a storage device, or a network address like a URL. Formally, a reference, denoted as R, is a value that supports dereference(R), an operation yielding a value. References are often typed, specifying the type of values they can point to. For instance, an interface in a programming language may define a generic Reference with operations like value(). In addition to dereferencing, references may support an assignment operation store(R, x), effectively behaving as abstract variables. References find widespread use in programming, especially for efficiently passing large or mutable data as arguments to procedures or sharing such data among various parts of a program. They enable flexibility in storing objects, allocating them, and passing them between code areas, as long as one has access to a reference pointing to the data. Despite their utility, references can introduce complexity into programs due to potential issues like dangling and wild references. The topology of data with references forms a directed graph, adding to the complexity of analysis. Nevertheless, references are simpler to analyze than pointers, largely due to the absence of pointer arithmetic. Examples of references include pointers, which are a fundamental and powerful type of reference. However, they require a deep understanding of memory architecture to avoid issues like dangling and wild pointers. Smart pointers act like pointers but provide controlled access through specific methods, reducing some of the risks associated with raw pointers. Handles are abstract references, with file handles being a common example. They abstract file content and may represent both the file itself and a specific position within the file. In distributed computing, references may extend beyond addresses, including embedded specifications of network protocols and encoding methods. For instance, a WSDL description of a remote web service serves as a reference, containing instructions on how to locate and bind to the service. References can be modeled using functions from keys to data objects, where null represents a key not referring to anything meaningful. A reachability graph, forming a directed graph, is an alternative representation, useful in garbage collection for distinguishing accessible from inaccessible objects. In the context of internal and external storage, references play a crucial role. Internal storage involves storing smaller object contents inside larger objects, promoting efficiency but with space and time costs. External storage, on the other hand, allocates smaller objects independently, with the larger object storing references to them. The choice between internal and external storage depends on factors such as recursive data structures, limited space, varying object sizes, and adaptability to new requirements. Some programming languages, like Java, Smalltalk, Python, and Scheme, uniformly use references to access all objects. In computer programming, an enumerated type (also referred to as enumeration, enum, or factor in R programming, and a categorical variable in statistics) is a data type comprising a set of named values known as elements, members, enumeral, or enumerators of the type. These enumerator names usually act as identifiers functioning as constants within the language. An enumerated type can be viewed as a degenerate tagged union of a unit type. A variable declared with an enumerated type can be assigned any of the enumerators as its value. Essentially, an enumerated type has distinct values that can be compared and assigned, but their concrete representation in the computer's memory is not explicitly specified by the programmer; compilers and interpreters have the flexibility to represent them arbitrarily. For instance, in a deck of playing cards, the four suits may be represented as enumerators named Club, Diamond, Heart, and Spade, belonging to an enumerated type named suit. If a variable V is declared with suit as its data type, any of those four values can be assigned to it. While enumerators are typically distinct, some languages might permit the same enumerator to be listed multiple times in the type's declaration. The names of enumerators need not be semantically complete or compatible. For example, an enumerated type named color may consist of enumerators like Red, Green, Zebra, Missing, and Bacon. Some languages may intentionally define an ordering of enumerators, while others leave them unordered. In certain cases, implicit ordering may arise from the compiler representing enumerators as integers. Certain enumerator types may be intrinsic to the language. For instance, the Boolean type is often a predefined enumeration of the values False and True. A unit type, consisting of a single value, might also be defined to represent null. Many languages allow users to define their own enumerated types. Values and variables of an enumerated type are typically implemented using an integer type as the underlying representation. In some languages, especially in system programming, users can specify the bit combinations to be used for each enumerator, facilitating the efficient representation of sets of enumerators as fixed-length bit strings. In type theory, enumerated types are often considered tagged unions of unit types, and they can be represented as natural numbers.

Rationale:

In the early days of programming languages, enumerated types were not always available. If a programmer wanted a variable, say `myColor`, to have a value of red, they would declare a variable `red` and assign it some arbitrary value, usually an integer constant. Subsequently, the variable `red` would be assigned to `myColor`. Other techniques involved assigning arbitrary values to strings containing enumerator names. These arbitrary values were often called magic numbers because there was usually no explanation regarding how the numbers were obtained or whether their actual values were significant. Such magic numbers could make the source code more challenging for others to comprehend and maintain. Enumerated types, on the other hand, make the code more self-documenting. Depending on the language, the compiler could automatically assign default values to the enumerators, thereby hiding unnecessary detail from the programmer. These values might not even be visible to the programmer (see information hiding). Enumerated types can also prevent a programmer from writing illogical code, such as performing mathematical operations on the values of the enumerators. If the value of a variable assigned an enumerator were to be printed, some programming languages could print the name of the enumerator rather than its underlying numerical value. Another advantage is that enumerated types can allow compilers to enforce semantic correctness. For instance, `myColor = TRIANGLE` could be forbidden, while `myColor = RED` is accepted, even if `TRIANGLE` and `RED` are both internally represented as 1. Conceptually, an enumerated type is similar to a list of nominals (numeric codes), where each possible value of the type is assigned a distinctive natural number. An enumerated type represents a concrete implementation of this notion. When order is meaningful and/or used for comparison, an enumerated type becomes an ordinal type.

Conventions:

Programming languages often have their own programming styles and naming conventions. The variable assigned to an enumeration is usually a noun in singular form and frequently follows either a PascalCase or uppercase convention, while lowercase and other conventions are seen less frequently.