If you want an idea of where embedded processors are headed, take a look at Texas Instrument's OMAP 3 architecture for the mobile phone market, which was announced last February at the 3GSM World Congress in Barcelona. OMAP 3 devices will store 12 megapixel images with less than a second between shots, handle high definition video, with support for S-video output to a monitor or projector, and support ever more sophisticated video games. They will also, presumably, handle voice.
Max Baron, a principal analyst for the research firm In-Stat, says that the OMAP 3 architecture is a bellwether for the embedded market because "the mobile phone market represents the largest single market for multicore embedded processors-particularly for higher-end cell phones offering video and other advanced data applications." He calls the architecture "staggering."
The OMAP family has been around since 2001, with the first OMAP 2 processor going into production last year. Like other processor companies, TI announces architectures well ahead of "getting to silicon," and the first OMAP 3 chip, the OMAP 3430, will debut in the second half of 2007. "The leap forward is to the next generation ARM core: the ARM Cortex-A8," says Robert Tolbert, TI's product marketing manager for OMAP. "We've also improved the imaging video and audio accelerator, the IVA 2+, so that it supports HD playback and DVD quality camcording."
So while the new generation doesn't represent a fundamentally new architecture, it does demonstrate how embedded processors are becoming complete systems on a chip. And like many new embedded processor architectures, OMAP 3 is multicore-presenting the functionality of multiple CPUs within the form factor of a single chip. Again, this is nothing new: Freescale Semiprocessor's PowerQUICC, which has been around for a decade, combines a special-purpose communications RISC engine with a general-purpose PowerPC processor. The former handles low-level protocol, the latter runs application code." But multicore architectures are growing more capable: with more cores, and more transistors packed within each core.
As for the OMAP 3, it has three cores. The first is the general-purpose ARM processor, essentially a CPU-hosting the operating system and graphical user interface, among other functions. The second is a digital signal processor that accelerates audio and video. The third is a 2D/
"We were multicore when we introduced OMAP, we're multicore today, and will be multicore tomorrow," says Tolbert. The main advantage is that specialized processors use fewer cycles to get specialized tasks done, which in turns leads to relatively lower power consumption-a goal of virtually every embedded application.
To imagine how the three cores play out on an actual mobile phone, it helps to be Japanese: most Americans and their gadgets haven't gotten to this level of complexity. (Most DoCoMo FOMA handsets use an OMAP processor and Japanese companies were the first to adopt OMAP 2.) Tolbert suggests a scenario in which you are watching a video on the handset, while synchronizing your email over the cellular network, while getting an incoming call. "When your call comes in, your video session doesn't completely drop because your accelerator isn't over burdened doing modem applications or email synchronizations: it is only doing video." While it is tempting to think of this device as a tiny laptop replacement, Tolbert doesn't quite go that far: the processing power is comparable only to a Pentium II and local memory storage is lacking. On the other hand, you can plug in a keyboard and monitor and use your phone to drive a PowerPoint presentation.
Multicore embedded processing comes in two flavors: asymmetrical, or "homogenous"-in which each core is designed for a different function, and symmetrical processing, in which the cores are identical and work on a task in parallel. In-Stat's Max Baron says that the asymmetrical processing represented by the OMAP family is the trend for consumer electronics applications, in which ever more features are crammed into ever smaller devices. "Each core is doing more specialized jobs, but doing them with increasing efficiency," he says. This trend, in turn, has resulted in hundred of companies producing multicore chips. Many are licensing the ARM or MIPS architecture, or the Power architecture from IBM. Others have designed their own core in-house.
"There used to be one solution to a given problem-now there are many, with multiple vendors offering slightly different variations on each," Baron says. "And it's significantly easier today than 10 years ago for a company to create its own solution-with tools speeding up the process." The results can be dramatic, with the intelligence for an entire mobile phone being put on a single multicore chip. "That means these companies are moving from being merely chip suppliers to suppliers of entire systems." Just as there are hundreds of kinds of cell phones, there will be hundreds of cell phones-on-a-chip.
Baron points out that while high-end, high-functioning, multicore systems are getting most of the attention, lower-end processors are finding their own niche among a burgeoning market for low-cost hardware. "Before Christmas, I bought a DVD player for just $14. Nobody claims this is a high-performance unit, but for many people, that doesn't matter. A lot of people are not very sensitive to sound and image quality. So if you can get away with a codec that requires less processing power, you can use less costly embedded processors. We are now seeing multiple price points for multicore chips-each with its own customer segment."
Asymmetric processing in embedded applications mirrors the kind of off-loading seen on PCs and servers, where dedicated processors for the disk drives and video card take on tasks that would otherwise be handled by the CPU. "Such heterogeneous multiprocessor systems are also common in many embedded applications," says Dan Bouvier, director of advanced processor Architecture, at Freescale Semiconductor's Networking and Computing Systems Group, in an email exchange. "So the first trend is a continuation of mixing a general-purpose processing function, such as a PowerPC core, with more application-specific offload processors."
By contrast, a processor used for symmetrical applications integrates one or more general-purpose processors, "which maintains software partitioning but still achieves the cost-advantages of system density," says Bouvier. He notes that in embedded applications, traditional single-core instruction level parallelism (ILP) has run its course, replaced by thread-level parallelism (TLP), in which multiple processors run multiple threads. The programming challenge depends on the nature of the software. Some applications are inherently parallel "such that they can readily take advantage of multiple cores. Other applications are not so easy to convert. Luckily, the art of programming parallel processors is not new. What is new is packaging the techniques in such a way that lots of legacy code can be adapted to the newer programming paradigms and end up with a net performance advantage."
Such symmetrical processing is represented by Freescale's MSC812x family. Here, a single multicore DSP can replace multiple discrete DSPs in a board, "thus saving system costs, power dissipation and size, and enabling OEMs to increase channel densities or processing performance in a given system," says Freescale's Barry Stern, marketing manager for multicore DSP products, in an email exchange. "Single silicon die is usually more cost-effective than multiple discrete dies because of the silicon overhead and package and ease of software development."
The MSC812x family embeds four DSP cores on a single silicon die, running at 500MHz, sharing the same packet and external interfaces, as well as both internal memory for applications programs and external memory-with all memory available to all cores. Applications include VoIP and wireless infrastructure, as well as IP-based multimedia services. Parallelization for voice and video applications is done through a single instance of the code running in parallel in all four cores, without the need for load balancing. A single development environment allows the programmer to synchronize the debugging of all 4 cores simultaneously. "The customer only needs to take care of the shared resources of the device, such as memories and external interfaces, and for the tools that are built-in into the device, such as semaphores, core-to-core communications, multichannel DMA, multiple buffer queues, and instruction cache per core," Stern says.
Among the companies playing in the embedded processor space are relative newcomers like venture-funded P.
Founded in July 2003, P.
Mark Hayter, P.
At the other end of the spectrum, the Intel Core Duo dual-core processor has gotten some traction in industrial control applications, where robotic equipment is increasingly being linked on the Internet. Intel and Apple jointly announced the technology last January, and followed up at the Embedded World conference by announcing "extended lifecycle support" to accommodate the harsh operating conditions of a factory floor, as well as a number of board-level products from third-party integrators. In many cases, these embedded platforms are running a real-time operating system on one core and a conventional OS on the other. Intel supports the Core Duo with the Mobile Intel 945GM Express Chipset, which provides enhanced graphics, I/
"The challenge for many of these applications is that the nodes are getting connected over the Internet," says Phil Ames, Intel's embedded marketing manager. "For example, if equipment goes down on the manufacturing floor, that once required deploying someone to the spot with diagnostic equipment." The trend now is to make the fix remotely. "I can be sitting in my office in Phoenix and monitor equipment in plants around the world. I can do the diagnostics, upload firmware, reset systems, and change the OS." At Embedded World, Intel demonstrated how a second core running some flavor of Windows or Linux could run in the background to handle the firewall, virus scan, as well as do data backup and encryption, sending the monitoring information back upstream. "Priority goes to the actual application running on the real-time OS."
Intel is also seeing another use for dual-core in which one core acts as a fail-over duplicate of the other. Both cores run the same operating system and process the same instruction set. If one core fails, the other takes over-presumably skipping over the software glitch that triggered the crash in the first place.
Ames says that dual-core processors can host almost twice the number of simultaneous applications as a comparable uniprocessor, with the same heat dissipation. In a demonstration, the company ran two instances of a compute-intensive application that saturated a Pentium M processor, and four of them on the Intel Core Duo processor-hence doubling the performance with the same "thermal envelope." "It stems all the way back to Moore's Law: roughly every 24 months, we can essentially double the number of cores," Ames says. Doing so dramatically increases the performance vector over a comparable single-core processor, while keeping power dissipation in check.
Meanwhile, back on the server/
QNX's Robert Craig says that, as far as he knows, the company makes the only real-time operating system that can run symmetrically. That ability has helped give the company a jump start when it comes to multicore development with the QNX Neutrino MultiCore Technology Development Kit winning top honors at this year's Embedded World in Nurnberg. The company's tools are aimed at symmetrical multiprocessing development-which Craig says is often a better, but less familiar, alternative to the asymetrical model. He made his case by phone from QNX headquarters in Ottowa.
Symmetric and asymmetric processing
Room for startups
Intel dual-core: OS+ RTOS
Sidebar: An interview with Robert Craig, senior developer, QNX's operating systems group