Search This Blog

Thursday, April 15, 2010

checksum from wikipedia for Itn 155 class, chapter 7

http://en.wikipedia.org/wiki/Checksum[close]
Checksum
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2008)
Effect of a typical checksum function (the Unix cksum utility).

A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and comparing it with the stored one. If the checksums do not match, the data was almost certainly altered (either intentionally or unintentionally).

The procedure that yields the checksum from the data is called a checksum function or checksum algorithm. A good checksum algorithm will yield a different result with high probability when the data is accidentally corrupted; if the checksums match, the data is very likely to be free of accidental errors.

Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums that not only detect common errors but also allow the original data to be recovered in certain cases.
Contents
[hide]

* 1 Applications
* 2 Checksum algorithms
o 2.1 Parity byte or parity word
o 2.2 Modular sum
o 2.3 Position-dependent checksums
o 2.4 General considerations
* 3 Checksum tools
* 4 See also

[edit] Applications

The goal of checksum algorithms is to detect limited and accidental modification such as corruption to stored data or errors in a communication channel. They are not designed to detect intentional corruption by a malicious agent, nor are they reliable in detecting mass changes by an authorized agent. Indeed, many checksum algorithms can be easily inverted, in the sense that one can easily modify the data so as to preserve its checksum. To guard against malicious changes one should use a cryptographic hash function. A checksum algorithm is ultimately a mapping function where the domain of all the possible input values is mapped to the fixed and usually much smaller range of possible checksum values. This is necessarily a hugely many-to-one mapping so there will always be many different data inputs that compute to the same checksum. If there are 5000 instances of "Monday" in the input, a checksum is usually likely to catch a change of one of them to "Munday", but if all 5000 are changed to "Tuesday" it is quite possible for a given checksum algorithm to compute the same check sum for the second comprehensively altered input.
[edit] Checksum algorithms
[edit] Parity byte or parity word

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred.

With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n.
[edit] Modular sum

A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant too detects any single-bit error, but the probability that a two-bit error will go undetected is a little less than 1/n.
[edit] Position-dependent checksums

The simple checksums described above fail to detect some common errors that affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms that are most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost of computing the checksum.
[edit] General considerations

A message that is m bits long can be viewed as a corner of the m-dimensional hypercube. The effect of a checksum algorithm that yields an n-bit checksum is to map each m-bit message to a corner of a larger hypercube, with dimension m+n. The 2m+n corners of this hypercube represent all possible received messages. The valid received messages (those that have the correct checksum) comprise a smaller set, with only 2m corners.

A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the m adjacent corners. An error that affects k bits moves the message to a corner that is k steps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, so as to increase the likelihood that "typical" transmission errors will end up in an invalid corner.
[edit] Checksum tools

* cksum, a Unix command that generates both a 32-bit CRC and a byte count for any given input file.
* md5, a Unix command that generates an MD5 sum (commonly used to verify .iso files)
* jdigest, a Java GUI tool that generates and checks MD5 and SHA sums

[edit] See also

* Check digit
* File verification
* Hamming code
* List of hash functions
* Luhn algorithm
* Parity bit
* Frame check sequence
* Bit rot
* ZFS A file system that performs automatic file integrity checking using checksums

Retrieved from "http://en.wikipedia.org/wiki/Checksum"
Categories: Checksum algorithms
Hidden categories: Articles lacking sources from January 2008 | All articles lacking sources
Views

* Article
* Discussion
* Edit this page
* History

Personal tools

* Try Beta
* Log in / create account

Navigation

* Main page
* Contents
* Featured content
* Current events
* Random article

Search

Interaction

* About Wikipedia
* Community portal
* Recent changes
* Contact Wikipedia
* Donate to Wikipedia
* Help

Toolbox

* What links here
* Related changes
* Upload file
* Special pages
* Printable version
* Permanent link
* Cite this page

Languages

* Afrikaans
* العربية
* Български
* Català
* Česky
* Dansk
* Deutsch
* Eesti
* Español
* فارسی
* Français
* 한국어
* Íslenska
* Italiano
* עברית
* Nederlands
* 日本語
* ‪Norsk (bokmål)‬
* Polski
* Português
* Русский
* Simple English
* Slovenčina
* Suomi
* Svenska
* Tiếng Việt
* 中文

Powered by MediaWiki
Wikimedia Foundation

* This page was last modified on 15 April 2010 at 15:03.
* Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
* Contact us
* Privacy policy
* About Wikipedia
* Disclaimers

No comments:

Post a Comment