JavaScript array “extras” in detail

By Dmitry Soshnikov

Table of contents:

  1. Introduction
  2. Browser support
  3. Theory and rationale
  4. Arrays: extra processing
    1. forEach
    2. map
    3. filter
    4. some
    5. every
    6. indexOf
    7. lastIndexOf
    8. reduce
    9. reduceRight
  5. Generic nature
  6. Summary
  7. Further reading

Introduction

In this article we’ll look at the functionality made available by the new methods of array objects standardized in ECMA-262 5th edition (aka ES5). Most of the methods dicussed below are higher-order (we’ll clarify this term shortly below), and related to functional programming. In addition, most of them have been added to different JavaScript implementations since version 1.6 (SpiderMonkey), although these were only standardised in ES5.

Unless stated otherwise, all the methods discussed below were introduced in JavaScript 1.6.

Note: You can probably learn a lot from this article whether you are an expert in JavaScript, or a comparative novice.

Browser support

At the time of writing “Array extras” (which are actually standardized methods, rather than extras) are supported by the new versions of all major browsers. Unless stated otherwise, all the discussed methods can be safely used in:

  • Opera 11+
  • Firefox 3.6+
  • Safari 5+
  • Chrome 8+
  • Internet Explorer 9+

If we have to support older browsers we can always implement our own versions of required methods by extending the Array.prototype object, for example:

// for old browsers

if (typeof Array.prototype.forEach != "function") {
  Array.prototype.forEach = function () {
    /* own implementation */
  };
}

With the introductory information out the way, we’ll start our exploration of array extras by looking at the theory and practical rationale behind the methods.

Theory and rationale

Every new generation of a programming language arrives with newer and higher abstractions. These new abstractions make our development (and the perception of programs in general) easier and allow us to control complex structures in simpler ways. Consider e.g. the following two functions:

// sum of numbers in range

function getSum(from, to) {
  var result = 0;
  for (var k = from; k < to; k++) {
    result += k;
  }
  return result;
}

var sum = getSum(1, 10); // 45

// sum of squares of numbers in range

function getSumOfSquares(from, to) {
  var result = 0;
  for (var k = from; k < to; k++) {
    result += k * k;
  }
  return result;
}

var sumOfSquares = getSumOfSqures(1, 10); // 285

In first function we loop through the numbers in the required range and collect the sum of the numbers. In the second function we do the same, but collect the squares of the numbers. What if we wanted to provide a function that calculates the sum of cubes of numbers for example, or any possible transformation?

Obviously, we have almost identical code structures in the two previous examples: In a well-designed system we’ll want to reuse the common parts. This is called code reuse in computer science. Generally, it may appear in several aspects (for example, in OOP we can reuse code from ancestor classes in descendant classes).

The common part of the above two functions (the exact action applied on current number k) can be encapsulated into a function. In such a way, we can separate the common (often boring) part of the processing (the for...length in this case) from the transformation made on the each element. Having such an approach, the transformation can be passed as an argument to our common function, like so:

function genericSum(handler, from, to) {
  var result = 0;
  for (var k = from; k < to; k++) {
    result += handler(k);
  }
  return result;
}

Here every subsequent summed value is presented not simply as the current number, but as a result of the transformation (provided by the function handler) made on the number. That is, we get the ability to parameterize the handling of every element in the sequence.

It’s a very powerful abstraction, which allows us to have for example just a sum (where the handler function simply returns the number):

var sum = genericSum(function (k) { return k; }, 1, 10); // 45

or the sum of squares:

var sumOfSqures = genericSum(function (k) { return k * k; }, 1, 10); // 285

or even sum of cubes:

var sumOfCubes = genericSum(function (k) { return k * k * k; }, 1, 10); // 2025

And all this using only one function: genericSum.

Functions that accept other functions as arguments (as is the case with our genericSum function) are called higher-order functions (HOF). And functions that can be passed as an arguments are called first-class functions.

Having this combination of higher-order and first-class functions in JavaScript allows us to create very expressive and highly-abstracted constructions, which help us to solve complex tasks in an easier manner, conveniently reusing the code.

That covers the theory. Let’s see what we can do in practice.

Arrays: extra processing

The pattern described above gives us an almost unlimited number of ways to carry out generic processing of arrays. Thus, as we said above all the boring details of applying this processing is hidden from us. Instead of repeating for k ... length every time, we concentrate on the task itself, leaving the non-interesting (lower-abstracted) details behind the scenes. JavaScript has several HOFs for parametrized array processing. They are all available on the Array.prototype object and therefore available on every array instance. Let’s consider these methods.

forEach

The most frequent one of these methods you’ll encounter, which corresponds to parametrized looping over an array is the forEach method. It simply applies a function on each element in the array. This means that only existing elements are visited and handled. For example:

[1, 2 ,3, 4].forEach(alert);

Here, “the function passed in as an argument is applied to each item in the array”, which in this case is an alert. So what is the difference between this and a casual for...length loop such as:

var array = [1, 2, 3, 4];

for (var k = 0, length = array.length; k < length; k++) {
  alert(array[k]);
}

Since we can't refer to an array without a variable, we use an additional variable array; for the loop counter we also use the variable k. And the code itself becomes longer because we are repeating the for...length loop over and over again. We could of course use another iteration (e.g. while) and wrap the code into a function (thereby hiding helper variables and not polluting the global scope), but obviously this is less abstract than the forEach approach.

If we replace the action function with for example console.log, we get another interesting result:

[1, 2 ,3, 4].forEach(console.log);

// Result:

// 1, 0, [1, 2, 3, 4]
// 2, 1, [1, 2, 3, 4]
// 3, 2, [1, 2, 3, 4]
// 4, 3, [1, 2, 3, 4]

The Debug function console.log (which works with Opera Dragonfly or Firebug) can accept any number of arguments: here three arguments are passed to every call of console.log by the forEach function.

It’s not hard to see that these arguments are: the current item, the index of the item, and the array itself. We can provide any function of three arguments and perform required actions with these arguments:

var sum = 0;

[1, 2, 3, 4].forEach(function (item, index, array) {
  console.log(array[index] == item); // true
  sum += item;
});

alert(sum); // 10

Thus we get the first generic higher-order method of arrays, whose signature is defined as:

array.forEach(callback,[ thisObject])

The first argument is already known to us — it’s a function of three arguments, which is applied for items. The second argument is a context object (or a this value), which will be used as a value of this in the code of the applied function. It can be useful, for example when we want to use a method of an object as a processing function:

var database = {

  users: ["Dmitry", "John", "David"],

  sendEmail: function (user) {
    if (this.isValidUser(user)) {
      /* sending message */
    }
  },

  isValidUser: function (user) {
    /* some checks */
  }

};

// send an email to every user

database.users.forEach(  // for each user in database
  database.sendEmail,    // send email
  database               // using context (this) as database
);

Let’s discuss what is going on here. Inside the sendEmail activation function the this value is set to a database object, and this.isValidUser refers to the required function. If we didn’t pass this second argument, the this value would be set to the global object (in browsers it’s window) or even to undefined in strict mode.

Let’s show again, that only existing items are handled (i.e. “holes” are not included into the process):

var array = [1, 2, 3];

delete array[1]; // remove 2
alert(array); // "1,,3"

alert(array.length); // but the length is still 3

array.forEach(alert); // alerts only 1 and 3

map

Sometimes we might want to get the transformation or the mapping of the original array. JavaScript provides a HOF for that too: map. This function has a signature, as follows:

array.map(callback,[ thisObject])

This method also applies callback functions for each element of an array (again, only in the required context of this, and only for existing items). It does however also return the transformed (mapped) array as a result. Take a look at this example:

var data = [1, 2, 3, 4];

var arrayOfSquares = data.map(function (item) {
  return item * item;
});

alert(arrayOfSquares); // 1, 4, 9, 16

In practice we may use this technique to get any transformation of a list. For example, if we have a list of user objects, we can get the list of their email addresses:

var users = [
  {name: "Dmitry", "email": "dmitry@email.com"},
  {name: "John",   "email": "john@email.com"},
  {name: "David",  "email": "david@email.de"},
  // etc
];

var emails = users.map(function (user) { return user.email; });

alert(emails); // ["dmitry@email.com", "john@email.com", "david@email.de"]

filter

Instead of the basic mapped result, we may want to only get certain entries that satisfy a certain condition, for example ones that have an email address that starts with "d". We can create a filter for exactly this kind of purpose, which will exclude items that don't pass our conditions. The filter method can be used to do this quickly and easily.

The signature is quite similar to that of map:

array.filter(callback,[ thisObject])

The callback function of the filter should return the boolean value (either true or false). true means that the filter is passed, and false means that an item shouldn’t be included in the result set.

Considering the previous example, we can select a subset of users or emails, for example only emails that are registered in the com domain:

var comEmails = users // from users ...

  // get emails ...
  .map(function (user) { return user.email; })

  // and remove non-needed, leaving only "com"-emails
  .filter(function (email) { return /com$/.test(email); });

alert(comEmails); // ["dmitry@email.com", "john@email.com"]

Note how we used the chained pattern of method invocations. This is quite a normal practice in JavaScript — the map method returns an array so we can then directly call the next method of the array, i.e. filter. In the latter we use a regular expression to check whether passed email addresses end with the com string (the $ sign means “the end of the testing string”). Note also that in the former case we accept the user object, whereas in the second case we already have the email string available.

It’s not hard to see that, even though we have highly-abstracted handling here, we nevertheless have an inefficient operation. Indeed, we go through the whole array twice. It would be great if we could do all the needed checks and mapping in one pass. There is some syntactic sugar available to do that: map + filter. This is called array comprehensions. Currently it’s implemented only in Firefox, but it is still worth covering here:

var comEmails = [user.email for each (user in users) if (/com$/.test(user.email)) ];

alert(comEmails); // ["dmitry@email.com", "john@email.com"]

The code snippet above basically says “build an array of user emails if the email ends with the string com”. If we don’t have array comprehensions available, we can always fall back to the simple for enumeration:

var comEmails = [];
var email;

for (var k = 0, length = users.length; k < length; k++) {
  email = user.email;
  if (/com$/.test(email)) {
    comEmails.push(email);
  }
}

alert(comEmails); // ["dmitry@email.com", "john@email.com"]

The choice is ours. Sometimes it is convenient (for example when the operation of transformation is not known in advance) to use array processing HOFs, and sometimes it’s more useful and efficient to use an old-school way.

Note: One thing to note in the above example is that we left two “garbage” variables intact after our actions had completed: k and email. This wouldn’t happen if we used forEach and array comprehension, so we need should consider this also in our choice.

some

Often we want to know whether some or all items of a collection satisfy a specified condition. JavaScript provides two array methods allowing us to create easy solutions to such problems: some and every. We’ll tackle every in the next section; we’ll look at some first:

array.some(callback,[ thisObject])

This method accepts the function of three arguments and the context object. However, the result of the some function is boolean. It returns true if some (that is, at least one) of the items satisfies the condition. The condition is determined by the callback function, which also should return a boolean result.

For example, we might be storing test results, and want to test whether some user’s scores are higher than a certain threshold:

var scores = [5, 8, 3, 10];
var current = 7;

function higherThanCurrent(score) {
  return score > current;
}

if (scores.some(higherThanCurrent)) {
  alert("Accepted");
}

This code gives the result "Accepted", since the some method determines that the second element of the scores array (value 8) is higher than the current item, value 7. The processing therefore stops, returning true.

This technique can be used for performing a complex search (i.e. meeting several conditions at once) of the first found element in the array:

var found = null;

var points = [
  {x: 10, y: 20},
  {x: 15, y: 53},
  {x: 17, y: 72}
];

points.some(function (point) {

  if (point.x > 10 && point.y < 60) {
    found = point; // found
    return true;
  }

  return false;

});

if (found) {
  alert("Found: " + found.x + ", " + found.y); // Found: 15, 53
}

We could also use forEach for searching, however forEach wouldn’t stop on the first found element: we'd need to throw a special exception to exit from it.

We’ll look at testing an element using just the === operator to provide a simple search below.

every

By contrast, if we instead want to test whether all the scores are higher than the threshold, we can use the every method, which looks like this:

array.every(callback,[ thisObject])

Our updated example looks like so:

if (scores.every(higherThanCurrent)) {
  alert("Accepted");
} else {
  alert("Not all scores are higher than " + current);
}

// change our value to 2
current = 2;

// now it’s OK
alert(scores.every(higherThanCurrent)); // true

indexOf

Another frequent task we’ll run into is testing whether an element is present in a collection. There are two convenient methods to do this: indexOf and lastIndexOf, which simply search an element, testing it with strict equality === operation. The indexOf definition is as follows:

array.indexOf(searchElement[, fromIndex])

This method results an integer index of a searched element in the list. In a case where an item is not found, the value -1 is returned. The fromIndex parameter is optional — if this is passed then the search starts from this index. If omitted, the default value 0 is used (i.e. the whole array is searched):

var data = [2, 5, 7, 3, 5];

alert(data.indexOf(5)); // 1
alert(data.indexOf(5, 3)); // 4 (start search from 3 index)

alert(data.indexOf(4)); // -1 (not found)
alert(data.indexOf("5")); // -1 (also not found since 5 !== "5")

lastIndexOf

lastIndexOf is very similar to indexOf, except that it searches the element starting from the end of the array. The lastIndexOf definition is as follows:

array.lastIndexOf(searchElement[, fromIndex])

The fromIndex parameter is again optional; the default value for it is the array length - 1:

var data = [2, 5, 7, 3, 5];

alert(data.lastIndexOf(5)); // 4
alert(data.lastIndexOf(5, 3)); // 1 (start search from 3 index)

if (data.indexOf(4) == -1) {
  alert("4 is not found");
}

reduce

The last two new methods we’ll discuss allow us to reduce an array into a single value: they are reduce and reduceRight. The former starts its analysis from the beginning, while the latter starts it from the end. These methods were introduced into JavaScript later than others: at version 1.8.

We’ll discuss reduce first, and then go on to reduceRight in the next section. reduce has the following definition:

array.reduce(callback[, initialValue])

The callback function accepts four arguments: previous value, current value, index, and again the array itself. The initialValue parameter is optional and, if omitted, is set to the first element of the array. Consider the following example:

// reduce the array to sum of elements

var sum = [1, 2, 3, 4].reduce(function (previous, current, index, array) {
  return previous + current;
});

alert(sum); // 10

Here we go through the array elements and get:

  1. The previous value of every callback, which initially is the first element since it’s equal to the default initialValue
  2. The current value of every callback, which at first call is 2
  3. The two last arguments — index and array

We then return the sum of our previous and current values, which becomes the previous value of the next iteration, and the current value is set to the next element, i.e. to 3. The process loops until the end of the array:

// initial set
previous = initialValue = 1, current = 2

// first iteration
previous = (1 + 2) =  3, current = 3

// second iteration
previous = (3 + 3) =  6, current = 4

// third iteration
previous = (6 + 4) =  10, current = undefined (exit)

The resulting value is not required to be a primitive value. With reduce we can, for example, transform two-dimensional arrays into flat vectors:

var matrix = [
  [1, 2],
  [3, 4],
  [5, 6]
];

alert(matrix[0][1]); // 2

// and now get the flatten array

var flatten = matrix.reduce(function (previous, current) {
  return previous.concat(current);
});

alert(flatten); // [1, 2, 3, 4, 5, 6]

reduceRight

The definition of reduceRight is as follows:

array.reduceRight(callback[, initialValue])

This function works in the same way as reduce, except that it processes an array from the end. Let’s have look at an example:

var data = [1, 2, 3, 4];

var specialDiff = data.reduceRight(function (previous, current, index) {

  if (index == 0) {
    return previous + current;
  }

  return previous - current;

});

alert(specialDiff); // 0

This results in a value of zero. I’m going to leave the explanation for you as an exercise: draw every step of the process, like we did in the previous example.

Generic nature

One of the biggest advantages of the array methods discussed in this article is the fact that they are all generic with respect to the objects on which they operate. In other words, it’s not required that the object to process should be an array. The object just needs the length property, and numeric indices.

This means we can reuse the functionality of arrays, applying it to other kinds of objects, for example strings:

// get the reference to the map method

var map = Array.prototype.map;

// and call it for a string

var hello = map.call("hello world", function (char) {
  return char + "*";
});

alert(hello.join("")); // "h*e*l*l*o* *w*o*r*l*d*"

Here we apply the map function to the "hello world" string, then get the result as an array (yes, the map function has converted the string into an array) and then convert the array into another string — "h*e*l*l*o* *w*o*r*l*d*". This is of course only one solution, included to show the generic nature of the methods: we could instead solve this using regular expressions, or with a combination of split and join functions.

This approach can work in the opposite way too — here’s how we can reuse a string method to handle an array:

// reuse "toUpperCase" method
var toUpperCase = String.prototype.toUpperCase;

var upper = toUpperCase.apply(["foo", "bar"]).split(",");

alert(upper); // ["FOO", "BAR"]

In Firefox these generic methods are duplicated for constructors as well, as a non-standard extension: this provides an even more convenient generic application:

// reuse array's "map" method for a string
Array.map("foo", String.toUpperCase).join(""); // "FOO"

// reuse string's "toLowerCase" method for an array
String.toLowerCase(["F", "O", "O"]).split(","); // ["f", "o", "o"]

Another example — the arguments object isn’t an array and hasn't got such methods available intrinsically. However, it has length and properties-indices: here’s a better way to handle passed arguments:

// function "foo" accepts
// any number of arguments

function foo(/* arguments */) {

  var every = Array.prototype.every;

  var allNumbers = every.call(arguments, function (arg) {
    return typeof arg == "number";
  });

  if (!allNumbers) {
    throw "Some argument is not a number";
  }

  /* further handling */

}

foo(1, 2, 3); // OK
foo(1, 2, "3"); // Error

We can also call a method of an array for the DOM nodes collection, even though the DOM NodeList collection is not an array and has none of the discussed methods natively:

var paragraphs = document.querySelectorAll("p");

[].forEach.call(paragraphs, console.log);

In this example we select all paragraphs in the document and then log every paragraph to the console. Note how we reuse the forEach method of the array — the empty array is created only to get the reference to the forEach method.

We can use all the other array methods discussed in this article in exactly the same way.

Summary

Higher-order methods of arrays provide convenient and elegant ways to process different collections. Having the ability to parametrize the action applied on the element of a sequence increases the abstraction and therefore makes our code cleaner and shorter, with easier handling of complex structures.

At the same time, sometimes it can be more efficient to fall back to a lower abstraction level. Bear in mind that every new abstraction level can bring some kind of performance penalty, in exchange for providing more convenient ways of programming. We can use different programming styles depending on our situation and needs.

Further reading

ECMA-262-5:

Other useful articles:

Dmitry is a programmer and researcher in computer science. The main specialization is JavaScript; ECMAScript theorist. He is interested also in other object-oriented and functional languages, such as Ruby, Python, Java, Erlang, etc. As a hobby he writes music.

Currently Dmitry practices Erlang and JavaScript programming languages. Besides the practice, he is specialized on analytic and theoretical articles on JavaScript, describing the fundamentals and the core design of the language. Author of the "ECMAScript in detail" (http://dmitrysoshnikov.com) series, a detailed description of the ECMA-262 specification.

Languages:

- JavaScript
- CoffeeScript
- Erlang
- Python
- Ruby
- PHP
- Lua


This article is licensed under a Creative Commons Attribution 3.0 Unported license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.