Tuesday, July 6, 2010

Programming Language

A programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human communication.

Many programming languages have some form of written specification of their syntax (form) and semantics (meaning). Some languages are defined by a specification document. For example, the C programming language is specified by an ISO Standard. Other languages, such as Perl, have a dominant implementation that is used as a reference.

The earliest programming languages predate the invention of the computer, and were used to direct the behavior of machines such as Jacquard looms and player pianos. Thousands of different programming languages have been created, mainly in the computer field, with many more being created every year. Most programming languages describe computation in an imperative style, i.e., as a sequence of commands, although some languages, such as those that support functional programming or logic programming, use alternative forms of description.


Definition


A programming language is a notation for writing programs, which are specifications of a computation or algorithm. Some, but not all, authors restrict the term "programming language" to those languages that can express all possible algorithms. Traits often considered important for what constitutes a programming language include:

* Function and target: A computer programming language is a language used to write computer programs, which involve a computer performing some kind of computation or algorithm and possibly control external devices such as printers, disk drives, robots, and so on. For example PostScript programs are frequently created by another program to control a computer printer or display. More generally, a programming language may describe computation on some, possibly abstract, machine. It is generally accepted that a complete specification for a programming language includes a description, possibly idealized, of a machine or processor for that language.[6] In most practical contexts, a programming language involves a computer; consequently programming languages are usually defined and studied this way. Programming languages differ from natural languages in that natural languages are only used for interaction between people, while programming languages also allow humans to communicate instructions to machines.

* Abstractions: Programming languages usually contain abstractions for defining and manipulating data structures or controlling the flow of execution. The practical necessity that a programming language support adequate abstractions is expressed by the abstraction principle; this principle is sometimes formulated as recommendation to the programmer to make proper use of such abstractions.

* Expressive power: The theory of computation classifies languages by the computations they are capable of expressing. All Turing complete languages can implement the same set of algorithms. ANSI/ISO SQL and Charity are examples of languages that are not Turing complete, yet often called programming languages.
Markup languages like XML, HTML or troff, which define structured data, are not generally considered programming languages. Programming languages may, however, share the syntax with markup languages if a computational semantics is defined. XSLT, for example, is a Turing complete XML dialect. Moreover, LaTeX, which is mostly used for structuring documents, also contains a Turing complete subset.

The term computer language is sometimes used interchangeably with programming language. However, the usage of both terms varies among authors, including the exact scope of each. One usage describes programming languages as a subset of computer languages.21][ In this vein, languages used in computing that have a different goal than expressing computer programs are generically designated computer languages. For instance, markup languages are sometimes referred to as computer languages to emphasize that they are not meant to be used for programming.[22] Another usage regards programming languages as theoretical constructs for programming abstract machines, and computer languages as the subset thereof that runs on physical computers, which have finite hardware resources.[23] John C. Reynolds emphasizes that formal specification languages are just as much programming languages as are the languages intended for execution. He also argues that textual and even graphical input formats that affect the behavior of a computer are programming languages, despite the fact they are commonly not Turing-complete, and remarks that ignorance of programming language concepts is the reason for many flaws in input formats.[24]



Machine language

For the first machines in the 1940s, programmers had no choice but to write in the sequences of digits that the computer executed. For example, assume we want to compute the absolute value of A + B − C, where A is the value at machine address 3012, B is the value at address 3013, and C is the value at address 3014, and then store this value at address 3015.

It should be clear that programming in this manner is difficult and fraught with errors. Explicit memory locations must be written, and it is not always obvious if simple errors are present. For example, at location 02347, writing 101… instead of 111… would compute |A + B + C| rather than what was desired. This is not easy to detect.

Assembly language

Since each component of a program stands for an object that the programmer understands, using its name rather than numbers should make it easier to program. By naming all locations with easy-to-remember names, and by using symbolic names for machine instructions, some of the difficulties of machine programming can be eliminated. A relatively simple program called an assembler converts this symbolic notation into an equivalent machine language program.

The symbolic nature of assembly language greatly eased the programmer's burden, but programs were still very hard to write. Mistakes were still common. Programmers were forced to think in terms of the computer's architecture rather than in the domain of the problem being solved.

High-level language

The first programming languages were developed in the late 1950s. The concept was that if we want to compute |A + B − C|, and store the result in a memory location called D, all we had to do was write D = |A + B − C| and let a computer program, the compiler, convert that into the sequences of numbers that the computer could execute. FORTRAN (an acronym for Formula Translation) was the first major language in this period.

FORTRAN statements were patterned after mathematical notation. In mathematics the = symbol implies that both sides of the equation have the same value. However, in FORTRAN and some other languages, the equal sign is known as the assignment operator. The action carried out by the computer when it encounters this operator is, “Make the variable named on the left of the equal sign have the same value as the expression on the right.” Because of this, in some early languages the statement would have been written as −D → D to imply movement or change, but the use of → as an assignment operator has all but disappeared.

The compiler for FORTRAN converts that arithmetic statement into an equivalent machine language sequence. In this case, we did not care what addresses the compiler used for the instructions or data, as long as we could associate the names A, B, C, and D with the data values we were interested in.

Structure of programming languages

Programs written in a programming language contain three basic components: (1) a mechanism for declaring data objects to contain the information used by the program; (2) data operations that provide for transforming one data object into another; (3) an execution sequence that determines how execution proceeds from start to finish.

Data declarations

Data objects can be constants or variables. A constant always has a specific value. Thus the constant 42 always has the integer value of forty-two and can never have another value. On the other hand, we can define variables with symbolic names. The declaration of variable A as an integer informs the compiler that A should be given a memory location much like the way the variable A in example (2) was given the machine address 03012. The program is given the option of changing the value stored at this memory location as the program executes.

Each data object is defined to be of a specific type. The type of a data object is the set of values the object may have. Types can generally be scalar or aggregate. An object declared to be a scalar object is not divisible into smaller components, and generally it represents the basic data types executable on the physical computer. In a data declaration, each data object is given a name and a type. The compiler will choose what machine location to assign for the declared name.

Data operations

Data operations provide for setting the values into the locations allocated for each declared data variable. In general this is accomplished by a three-step process: a set of operators is defined for transforming the value of each data object, an expression is written for performing several such operations, and an assignment is made to change the value of some data object.

For each data type, languages define a set of operations on objects of that type. For the arithmetic types, there are the usual operations of addition, subtraction, multiplication, and division. Other operations may include exponentiation (raising to a power), as well as various simple functions such as modula or remainder (when dividing one integer by another). There may be other binary operations involving the internal format of the data, such as binary and, or, exclusive or, and not functions. Usually there are relational operations (for example, equal, not equal, greater than, less than) whose result is a boolean value of true or false. There is no limit to the number of operations allowed, except that the programming language designer has to decide between the simplicity and smallness of the language definition versus the ease of using the language.

Execution sequence

The purpose of a program is to manipulate some data in order to produce an answer. While the data operations provide for this manipulation, there must be a mechanism for deciding which expressions to execute in order to generate the desired answer. That is, an algorithm must trace a path through a series of expressions in order to arrive at an answer. Programming languages have developed three forms of execution sequencing: (1) control structures for determining execution sequencing within a procedure; (2) interprocedural communication between procedures; and (3) inheritance, or the automatic passing of information between two procedures.

Corrado Böhm and Giuseppi Jacopini showed in 1966 that a programming language needs only three basic statements for control structures: an assignment statement, an IF statement, and a looping construct. Anything else can simplify programming a solution, but is not necessary. If we add an input and an output statement, we have all that we need for a programming language. Languages execute statements sequentially with the following variations to this rule.

IF statement. Most languages include the IF statement. In the IF-THEN statement, the expression is evaluated, and if the value is true, then Statement1 is executed next. If the value is false, then the statement after the IF statement is the next one to execute. The IF-THEN-ELSE statement is similar, except that specific true and false options are given to execute next. After executing either the THEN or ELSE part, the statement following the IF statement is the next one to execute.

The usual looping constructs are the WHILE statement and the REPEAT statement. Although only one is necessary, languages usually have both.

Inheritance is the third major form of execution sequencing. In this case, information is passed automatically between program segments. This is the basis for the models used in the object-oriented languages C++ and Java.

Inheritance involves the concept of a class object. There are integer class objects, string class objects, file class objects, and so forth. Data objects are instances of these class objects. Objects inherit the properties of the objects from which they were created. Thus, if an integer object were designed with the methods (that is, functions) of addition and subtraction, each instance of an integer object would inherit those same functions. One would only need to develop these operations once and then the functionality would pass on to the derived object.

All objects are derived from one master object called an Object. An Object is the parent class of objects such as magnitude, collection, and stream. Magnitude now is the parent of objects that have values, such as numbers, characters, and dates. Collections can be ordered collections such as an array or an unordered collection such as a set. Streams are the parent objects of files. From this structure an entire class hierarchy can be developed.

If we develop a method for one object (for example, print method for object), then this method gets inherited to all objects derived from that object. Therefore, there is not the necessity to always define new functionality. If we create a new class of integer that, for example, represents the number of days in a year (from 1 to 366), then this new integerlike object will inherit all of the properties of integers, including the methods to add, subtract, and print values. It is this concept that has been built into C++, Java, and current object-oriented languages.

Once we build concepts around a class definition, we have a separate package of functions that are self-contained. We are able to sell that package as a new functionality that users may be willing to pay for rather than develop themselves. This leads to an economic model where companies can build add-ons for existing software, each add-on consisting of a set of class definitions that becomes inherited by the parent class. See also Object-oriented programming.

Current programming language models

C was developed by AT&T Bell Laboratories during the early 1970s. At the time, Ken Thompson was developing the UNIX operating system. Rather than using machine or assembly language as in (2) or (3) to write the system, he wanted a high-level language. See also Operating system.

C has a structure like FORTRAN. A C program consists of several procedures, each consisting of several statements, that include the IF, WHILE, and FOR statements. However, since the goal was to develop operating systems, a primary focus of C was to include operations that allow the programmer access to the underlying hardware of the computer. C includes a large number of operators to manipulate machine language data in the computer, and includes a strong dependence on reference variables so that C programs are able to manipulate the addressing hardware of the machine.

C++ was developed in the early 1980s as an extension to C by Bjarne Stroustrup at AT&T Bell Labs. Each C++ class would include a record declaration as well as a set of associated functions. In addition, an inheritance mechanism was included in order to provide for a class hierarchy for any program.

By the early 1990s, the World Wide Web was becoming a significant force in the computing community, and web browsers were becoming ubiquitous. However, for security reasons, the browser was designed with the limitation that it could not affect the disk storage of the machine it was running on. All computations that a web page performed were carried out on the web server accessed by web address (its Uniform Resource Locator, or URL). That was to prevent web pages from installing viruses on user machines or inadvertently (or intentionally) destroying the disk storage of the user.

Java bears a strong similarity to C++, but has eliminated many of the problems of C++. The three major features addressed by Java are:

1.There are no reference variables, thus no way to explicitly reference specific memory locations. Storage is still allocated by creating new class objects, but this is implicit in the language, not explicit.
2.There is no procedure call statement; however, one can invoke a procedure using the member of class operation. A call to CreateAddress for class address would be encoded as address.CreateAddress( ).
3.A large class library exists for creating web-based objects.

The Java bytecodes (called applets) are transmitted from the web server to the client web site and then execute. This saves transmission time as the executing applet is on the user's machine once it is downloaded, and it frees machine time on the server so it can process more web “hits” effectively. See also Client-server system.

Visual Basic, first released in 1991, grew out of Microsoft's GW Basic product of the 1980s. The language was organized around a series of events. Each time an event happened (for example, mouse click, pulling down a menu), the program would respond with a procedure associated with that event. Execution happens in an asynchronous manner.

Although Prolog development began in 1970, its use did not spread until the 1980s. Prolog represents a very different model of program execution, and depends on the resolution principle and satisfaction of Horn clauses of Robert A. Kowalski at the University of Edinburgh. That is, a Prolog statement is of the form p:- q, r which means p is true if both q is true or r is true.

A Prolog program consists of a series Horn clauses, each being a sequence of relations concerning data in a database. Execution proceeds sequentially through these clauses. Each relation can invoke another Horn clause to be satisfied. Evaluation of a relation is similar to returning a procedure value in imperative languages such as C or C++.

Unlike the other languages mentioned, Prolog is not a complete language. That means there are algorithms that cannot be programmed in Prolog. However, for problems that are amenable for searching large databases, Prolog is an efficient mechanism for describing those algorithms. See also Software engineering; Software engineering.

A language used to write instructions for the computer. It lets the programmer express data processing in a symbolic manner without regard to machine-specific details.

2nd Description

From Source Code to Machine Language

The statements that are written by the programmer are called "source language," and they are translated into the computer's "machine language" by programs called "assemblers," "compilers" and "interpreters." For example, when a programmer writes MULTIPLY HOURS TIMES RATE, the verb MULTIPLY must be turned into a code that means multiply, and the nouns HOURS and RATE must be turned into memory locations where those items of data are actually located.

Grammar and Syntax

Like human languages, each programming language has its own grammar and syntax. There are many dialects of the same language, and each dialect requires its own translation system. Standards have been set by ANSI for many programming languages, and ANSI-standard languages are dialect free. However, it can take years for new features to be included in ANSI standards, and new dialects inevitably spring up as a result.

Low Level and High Level

Programming languages fall into two categories: low-level assembly languages and high-level languages. Assembly languages are available for each CPU family, and each assembly instruction is translated into one machine instruction by the assembler program. With high-level languages, a programming statement may be translated into one or several machine instructions by the compiler.

Following is a brief summary of the major high-level languages. Look up each one for more details. For a list of high-level programming languages designed for client/server development, see client/server development system.

ActionScript

Programming language for Flash programs. See Flash and ActionScript.

Ada

Comprehensive, Pascal-based language used by the Department of Defense. See Ada.

ALGOL

International language for expressing algorithms. See ALGOL.

APL

Used for statistics and mathematical matrices. Requires special keyboard symbols. See APL.

BASIC

Developed as a timesharing language in the 1960s. It has been widely used in microcomputer programming in the past, and various dialects of BASIC have been incorporated into many different applications. Microsoft's Visual Basic is widely used. See BASIC and Visual Basic.

C

Developed in the 1980s at AT&T. Widely used to develop commercial applications. Unix is written in C. See C.

C++

Object-oriented version of C that is popular because it combines object-oriented capability with traditional C programming syntax. See C++.

C#

Pronounced "C-sharp." A Microsoft .NET language based on C++ with elements from Visual Basic and Java. See .NET.

COBOL

Developed in the 1960s. Widely used for mini and mainframe programming. See COBOL.

dBASE

Used to be widely used in business applications, but FoxPro (Microsoft's dBASE) has survived the longest. See Visual FoxPro, FoxBase, Clipper and Quicksilver.

F#

Pronounced "F-sharp." A Microsoft .NET scripting language based on ML. See F#.

FORTH

Developed in the 1960s, FORTH has been used in process control and game applications. See FORTH.

FORTRAN

Developed in 1954 by IBM, it was the first major scientific programming language and continues to be widely used. Some commercial applications have been developed in FORTRAN. See FORTRAN.

Java

The programming language developed by Sun and repositioned for Web use. It is widely used on the server side, although client applications are increasingly used. See Java.

JavaScript

The de facto scripting language on the Web. JavaScript is embedded into millions of HTML pages. See JavaScript.

JScript

Microsoft's version of JavaScript. Used in ASP programs. See JScript.

LISP

Developed in 1960. Used for AI applications. Its syntax is very different than other languages. See LISP.

Logo

Developed in the 1960s, it was noted for its ease of use and "turtle graphics" drawing functions. See Logo.

M

Originally MUMPS (Massachusetts Utility MultiProgramming System), it includes its own database. It is widely used in medical applications. See M.

Modula-2

Enhanced version of Pascal introduced in 1979. See Modula-2.

Pascal

Originally an academic language developed in the 1970s. Borland commercialized it with its Turbo Pascal. See Pascal.

Perl

A scripting language widely used on the Web to write CGI scripts. See Perl.

Prolog

Developed in France in 1973. Used throughout Europe and Japan for AI applications. See Prolog.

Python

A scripting language used for system utilities and Internet scripts. Developed in Amsterdam by Guido van Rossum. See Python.

REXX

Runs on IBM mainframes and OS/2. Used as a general-purpose macro language. See REXX.

VBScript

Subset of Visual Basic used on the Web similar to JavaScript. See VBScript.

Visual Basic

Version of BASIC for Windows programming from Microsoft that has been widely used. See Visual Basic.

Web Languages

Languages such as JavaScript, Jscript, Perl and CGI are used to automate Web pages as well as link them to other applications running in servers.

Millions of Languages!

Programmers must use standard names for the instruction verbs (add, compare, etc.) in the language they use. In addition, a company generally uses standardized names for the data elements in its databases. However, programmers typically "make up" names for all the functions (subroutines) in the program. Since programmers are loathe to document their code, the readability of the names chosen for these routines is critical.

In a single program, the programmer could make up hundreds of function names as well as names for data structures that hold fixed sums, predefined tables and display messages.

Just Make It Up!

Unless rigid naming conventions are enforced or pair programming is used, whereby one person looks over the shoulders of the other, programmers can make up names that make no sense whatsoever. Little understood by non-programmers, this is the bane of many professionals when they have to modify someone else's program. Debugging another person's code is very difficult if the names are cryptic, and there are few comments, which is often the case. It often requires tracing the logic one statement at a time.

In fact, if programmers are not attentive to naming things clearly, they can have a miserable time reading their own code later on. See pair programming, programmer, to the recruiter and naming fiascos.