Mile High Milestone: Tegra K1 “Denver” Will Be First 64-bit ARM Processor for Android

by Nick Stam

Our 32-bit Tegra K1 mobile processor has been racking up praise for bringing amazing performance and true console-quality graphics to the mobile space.

It “handily beats every other ARM SoC” in GPU performance benchmarks, according to Anandtech. And “the GPU performance is what stands out with the Tegra K1, nothing else on the market today is really able to get even close,” according to PC Perspective.

Now, eight months after unveiling Tegra K1’s 32-bit version, we’re providing further architectural details of the chip’s 64-bit version at HOT CHIPS, a technical conference on high-performance chips.

You can get more technical details here, while below is a general view of what we presented:

This new version of Tegra K1 pairs our 192-core Kepler architecture-based GPU with our own custom-designed, 64-bit, dual-core “Project Denver” CPU, which is fully ARMv8 architecture compatible. Further, Denver is fully pin compatible with the 32-bit Tegra K1 for ease of implementation and faster time to market.

With its exceptional performance and superior energy efficiency, the 64-bit Tegra K1 is the world’s first 64-bit ARM processor for Android, and completely outpaces other ARM-based mobile processors.

Tegra K1 Denver

Highest Single-Core CPU Throughput

Denver is designed for the highest single-core CPU throughput, and also delivers industry-leading dual-core performance. Each of the two Denver cores implements a 7-way superscalar microarchitecture (up to 7 concurrent micro-ops can be executed per clock), and includes a 128KB 4-way L1 instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 cache, which services both cores.

Denver implements an innovative process called Dynamic Code Optimization, which optimizes frequently used software routines at runtime into dense, highly tuned microcode-equivalent routines. These are stored in a dedicated, 128MB main-memory-based optimization cache. After being read into the instruction cache, the optimized micro-ops are executed, re-fetched and executed from the instruction cache as long as needed and capacity allows.

Effectively, this reduces the need to re-optimize the software routines. Instead of using hardware to extract the instruction-level parallelism (ILP) inherent in the code, Denver extracts the ILP once via software techniques, and then executes those routines repeatedly, thus amortizing the cost of ILP extraction over the many execution instances.

As part of the Dynamic Code Optimization process, Denver looks across a window of hundreds of instructions and unrolls loops, renames registers, removes unused instructions, and reorders the code in various ways for optimal speed. This effectively doubles the performance of the base-level hardware through the conversion of ARM code to highly optimized microcode routines and increases the execution energy efficiency.

The slight overhead of the dynamic optimization process is outweighed by the performance gains of already having optimized code ready to execute. In cases where code may not be frequently reused, Denver can process those ARM instructions directly without going through the dynamic optimization process, delivering the best of both worlds!

Dynamic Code Optimization works with all standard ARM-based applications, requiring no customization from developers, and without added power consumption versus other ARM mobile processors. That’s because the 7-wide superscalar design allows faster throughput than would otherwise be possible at the same clock speed.

NVIDIA Tegra K1 64-bit Denver CPU

Denver’s remarkable design delivers great performance for both single- and multi-threaded applications, as well as multitasking scenarios. The dual-CPU cores can attain significantly higher performance than existing four- to eight-core mobile CPUs on most mobile workloads.

Denver also features new low latency power-state transitions, in addition to extensive power-gating and dynamic voltage and clock scaling based on workloads. Combining Dynamic Code Optimization, 7-way superscalar design and efficient power usage, Denver’s performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.

This means that future mobile devices using our 64-bit Tegra K1 chip can offer PC-class performance for standard apps, extended battery life and the best web browsing experience – all while opening new possibilities for gaming, content creation and enterprise apps.

Look forward later this year to some amazing mobile devices based on the 64-bit Tegra K1 from our partners. And for hard-core Android fans, take note that we’re already developing the next version of Android – “L” – on the 64-bit Tegra K1.

Similar Stories

  • Fleeced_Again

    Looks interesting. Sounds like the perfect SoC for a Shield console to compete with Apple TV and Roku.

  • Tim Glaser

    Google already said that the Android TV box coming later this year will ship with this SoC.

  • Montisaquadeis

    You sure its going to be Denver Core? As the Boxes they sent out at the Dev conference were all the 32bit k1s. Hmm wonder if they jetson dev board will get an update to Denver core.

  • Tim Glaser

    I believe the boxes they sent out were not even K132b. I think they were T4’s? Not 100% on that. But I’m fairly certain that the Android TV demo and Android Auto demo all took place on K164b stuff.

  • Chase Leonard

    Why wasn’t this chip put into the shield tablet? Why does the 32bit chip even exist?

  • Montisaquadeis

    Ok after looking they are t4s could of sworn I read K1s might be getting them confuised with the jetson dev board which is defiently 32bit k1

  • Anyone

    This dynamic recompilation stuff looks a bit like the old Transmeta Crusoë CPU.

    This is indeed a bold move from nVidia to try this instead of a ‘traditional’ OoO core.

    The memory subsystem, cache hierarchy stuff is something to look at, as such design
    can perform brilliantly on small benchmarks, but be crushed on large, complex code.

  • Montisaquadeis

    Thinking 32bit is a stop gap while they wait for google to get 64bit into android.

  • Grahaman27

    The Denver chip will use in-order control logic…… Mind blown- I applaud nvidia engineers!

  • Sloff1155

    Will we see a new Shield with this 64 bit chip ?????

  • obiwantoby

    How about this in a Surface 3 or 4? Would love to have Windows with this …

  • kron123456789

    So, it looks like Nvidia started number-of-CPU-cores race and now intended to finish it))

  • DrewNusser

    Pretty good stop gap if you look at the benchmarks! Nothing else is catching up to it any time soon. Nvidia is killing it!

  • Muthuraj Krishnasamy

    “hard-core Android fans, take note that we’re already developing the next version of Android – “L” – on the 64-bit Tegra K1.”

    does this mean that a nexus device with 64 bit tegra k1 denver on the way?

  • kron123456789

    But that CPU isn’t as power efficient as K1’s GPU. Mostly because of CPU’s power consumption Nvidia Shield Tablet can only 2.5-3 hours batterylife of Trine 2 gameplay.

  • DrewNusser

    Definitely. Don’t get me wrong – I’m stoked about the 64-bit K1, but the 32-bit one already destroys everything else on the market. I doubt it will have any competition (as far as pure power) for at least another year, and the 64-bit is just gonna be even that much better. I wonder if Nvidia has a device planned for later this year with the new chip.

  • Montisaquadeis

    Microsoft has basically kill windows RT so no you won’t see a windows tablet running this.

  • Gilrond

    Shield with this SoC running Linux with native Wayland support would be a blast.

    What’s going on with handset Nvidia SoCs by the way?

  • Nikhil Subramaniam

    Could you please explain how this is the world’s first 64-bit ARM processor for Android? MediaTek and Qualcomm have also announced their 64-bit SoCs.

  • renz

    It was lucky to have Denver on current tegra. Initially it is not expected to come out on sixth gen tegra (K1 is gen 5).

  • kron123456789

    But, they’re not produce them yet. It’s only announcement.

  • Riccardo Robecchi

    The fact that they killed the current Windows-on-ARM product does not mean they killed the project altogether. The fact is they’re probably merging products (WIndows, Windows RT, Windows Phone) into one single product capable of running on different platforms; this simplifies things and hence there is no need to continue the Windows RT madness. It’s been a placeholder for a “true” Windows-on-ARM-tablets all the time.

  • Slacker

    In other words, it is an in-order VLIW core with a dynamic recompiler ARM emulator in firmware. Like the old “x86” chips from Transmeta. Interesting design choice, we’ll see how well it works in practice.

  • Tjaldid

    h.265 or bust

  • deltatux

    Should have launched the Shield Tablet with this SoC instead of the Cortex A15 instead.

  • Robyn Kiriko Takami

    I hope so!

  • Bradley Groot

    Will this be enough to run Dungeon Keeper?

  • Force Majeure

    Yay!! Finally 64-bit ARM SoC with an awesome GPU and good CPU performance. Can’t wait to see this in any products. Exciting, to say the least.

  • Fleeced_Again

    If Denver is put on a video card would it be accessed through the graphics driver? Would it use it’s native instruction set or still use ARM?

  • Force Majeure

    If that were the case. That is excellent!!!

  • Force Majeure

    This SoC on a board, tablet or anything with either Debian or Ubuntu… What a mini-Linux machine that would be. Yes indeed, things are really improving as of late, for 64-bit that is.

  • Brian Caulfield

    Can’t comment on rumors and speculation.

  • Ketul Patel

    Can this handle 4K h.265 encode?

  • d3v15 4dv0c4t3

    “Only? 2.5-3 hours”.

    Ok.. How many hours of Trine 2 do you get on Adreno 300 devices?

  • Romeo

    What’s the power consumption of those two cores when running at 2,5 Ghz ? Wats only please.

  • Stachura5

    So the 64 bit version of K1 will be having only 2 cores? If yes then I’m not even mad about it

  • Romeo

    I doubt it will last for another month in the performance war, but we will see.

    The big problem is that this is not a performance war, this is efficiency war. And considering K1 big appetite, compared to competition, it’s not so efficient SoC at all.

  • Jane Archer

    World’s worst approach. The ‘Transmeta’ solution has been ‘re-invented again and again and again- long before Transmeta existed, and log after. In every case, the method proved to be a disaster.

    The ‘problems’ it solves aren’t really problems at all. And the problems introduced by this approach are ruinous.

    1) mediocre code is where this approach excels, but mediocre code is found in mediocre applications where good performance is hardly an issue.

    2) well written, well-optimised code is ruined by the dynamic runtime ‘recompile’ approach. Optimised code is based on expectations about how the underlying CPU hardware works. All these expectations are missing when an ‘alien’ CPU disguises itself as a native one.

    3) Denver clearly ’emulates’ NEON which is the worst idea in the world, and resets the Tegra project back to the bad old days of Tegra2 (the only significant SoC at that time lacking both NEON and H264 decode hardware).

    4) Eliminating the bugs in a runtime software based dynamic recompile system is impossible. Worse, just as one sees with the GPU drivers from Nvidia and AMD, the software engine will experience regular ‘updates’ so it can be optimised to whatever app is ‘flavour of the month’ for benchmark purposes.

    5) The ‘granularity’ of a dynamic recompile process is just hideous. ‘Juddery’ , unpredictable rates of code and data flow become commonplace.

    6) the use of dynamic recompile designs happens for one reason only- hardware patents. Companies following this approach seek to design the hardware of the CPU using ancient methods for which the patents have long since expired, or are very cheap to license. An attempt is made to move expensive (or unavailable) state-of-the-art patented hardware methods into the software system.

    7) the significant of point 6 is nullified by the fact one can simply license a state-of-the art 64-bit ARM design from ARM itself. This has a cost, of course, but relatively speaking that cost is very low. Of course, Project Denver was originally created to allow Nvidia to produce an x86/x86 compatible CPU- an option Nvidia gave up when Intel threatened them over many other core aspects of Nvidia’s output. So Project Denver was kicking around at Nvidia, having swallowed a vast amount of their money already. Even so, choosing to keep Project Denver alive would be as idiotic as Intel still trying to market Larrabee (the world’s most expensive chip design failure) as a GPU, when better/cheaper options exist like licensing the core design from ARM.

    One cannot deny it is ‘fun’ to see another weird CPU design hit the market, but Nvidia was finally starting to see real traction in its Tegra project, and (in my opinion) needs this pointless distraction like a bullet in the head. The real war is between Intel and ARM, and even AMD, with far more sane advanced CPU designs than ‘Denver’, comes up tragically short against these two market leaders. ARM’s latest A15 design has proven to be a big success in the first K1- a bit of a hint about sticking to one’s strengths (GPU design) and letting ARM handle its areas of hardware superiority.

  • vita09

    any design wins yet?

  • shaun walsh

    Look up HTC volantis

  • shaun walsh

    There are areas where I think the 32b will excel. Plus the Denver core will probably consume more power

  • shaun walsh

    They redid that benchmark capping the fps at 60-45-30-26.. All of which increased battery life substantially

  • kron123456789

    I know that Trine 2 even won’t start on Adreno-devices. But it’s not the point. Point is that K1’s CPU requires too much energy. Even if it’s most powerful ARM CPU.

  • kron123456789

    Wait, what? Trine 2 is running only 30fps max.

  • kron123456789

    In Android market it will last at least to the end of the year, because there will be only Snapdragon 805 devices(S810 devices will show up only in 2015). And unlike Shield Tablet, they are gonna have displays with insane 2560×1440(1600) resolution. Don’t know what GPU are going to put Apple in their A8 chip, though.

  • stucrmnx120fshwf

    Well, I can comment on rumors and speculation, Nexus 8 is said to use K1 Denver, Nexus 6 is said in latest rumors to be 32 bit, personally I want to see UHD to avoid interim steps, making devices commodities, which is bad for OEMs and chip makers. I don’t want Apple to beat Android, on resolution again, Tegra 4 could run UHD, K1 has more than twice the GPU cores, runs in 64 bit. From my $300 QHD Nexus 10, 22 month old tablet, if QHD in 5.5″ phablet, then UHD in 8″ tablet, (probably not Nex 8,) my UHD TV $400 now; my UHD Android TV box $100, worked for a while, then cheapie broke down. Project Butter, Jelly Bean 1and 2 showed on the original Nexus 7, that good hardware could be reliable and fast, with the right software, congratulations Nvidia, on getting it, that minimal skin, runs a good device, in shield tablet

  • Brian Caulfield

    Can’t comment on rumors, speculation, and unannounced stuff.

  • André Alécio

    After Tegra K1 Denver, will Nvidia launch a new tablet with it? Something like Nvidia Shield 2 as i was told the Shield Tablet is not a sequel.

  • Montisaquadeis

    I don’t see why they won’t put a Shield out with this version of the k1. but they might be getting away from that form factor to focus on the tablet and separate controller idea.

  • obiwantoby

    Windows has to continue to run on ARM. ARM is continuing to push the most innovative small form factor PCs.

    People need to get over legacy Windows support. No one needs old apps that do not scale well (DPI). I for one look forward to some advanced Tegra device with Windows on board. Android still has that classic lag. It just never feels as smooth to use as Windows or iOS.

  • A Popov

    I agree that GPU is really better than anything out on market nowadays
    (just take a look at PC-grade graphics of Trine2 and UE4 demos, wow now
    that’s really a huge leap forward).

    But on the other side, this article is full of phrases like “great performance”,” significantly higher performance than other 8-core ARM chips” and “PC-class performance”, and comments about emulation of NEON. I’ve read white paper and it *has no* words NEON and SIMD in it. NVidia, you need some real-life CPU (not GPU, we know it is good) benchmarks. Run a LINPACK benchmark (and some other tests please) on it and prove that your CPU is better than 8-core rivals.

  • A Popov

    There’s no 64-bit Android for ARM. Yet.

  • Riccardo Robecchi

    Yes, I agree. I think that Windows HAS to run on ARM, but it’ll do that with some key differences than how it is doing it now (BTW, is this tense’s grammar right?).
    I think Microsoft is moving Windows Phone on tablets (or Windows on phones, if you please) as it should have done since the beginning. We’ll see the true convergence of platforms with one software that rules them all. And if they want to do that and support current tablet hardware, they have to support Tegra – so I think we’ll see Tegra tablets in the future.
    As for the legacy Windows support, I think Microsoft got it wrong all the time. In my opinion, they had to port Windows Phone on tablets or create a drastically different Windows-on-ARM product with no desktop and tablet-only apps. Touch Office support had to be a killer app, but we see none even today. If they put all the puzzle pieces together, they’ll definitely have the ultimate platform out there.

  • vasras

    That’s the only thing that matters now. nVidia screwed it up before…

  • shaun walsh

    Oh completely zoomed over the trine 2 part. I was thinking of the trex bench they did that only lasted for 2.5 hours but had stupid high frames. They went back and capped the fps and battery life surpassed the ipad

  • shaun walsh

    Where do you see trine 2 test with battery life?

  • kron123456789

    Definetely not here. You know, there is one useful thing called The Internet. You can find anything there.

  • shaun walsh

    I found it. It was actually 3 hours

  • shaun walsh

    How is it not efficient. Perf/watts it kills the a7 for GPU. Now the a15s aren’t efficient but hey, they aren’t Intel

  • Drew Forester

    I’d love to see something like this powering Ubuntu Touch devices. Even though that OS will probably never actually see it to market. But it’s nice to dream.

  • Brian Caulfield

    Thanks for the feedback!

  • fteoOpty64

    Congrats NV. This achievement is going to be significant for the industry and surely make Intel very nervous!. I am sure NV would be doing serious R&D on multi-socket chipsets in order to scale Denver type SoC to many dsitributed nodes forming a single system or many virtual smaller systems. Great for automotive and other vehicular applications for sure. Best of luck rolling out this chip into products so the world can enjoy the benefits and develop software to take more advantage of this architecture. OpenCL is great and should be pushed further ….

  • fteoOpty64

    @Slacker: Dude, there is nothing VLIW in the architecture. It is all standard ARM architecture. Just pre-code optimisation by the other cpu to populate the Optimization cache. No code-morphying either but some intelligent re-ordering done pre-execution. It looks like an alternative to OoO design without the complex transistor/power overhead but using another core for pre-processing of code in an optimised way. Of course, there would be a penalty if re-ordering goes wrong but if that is rare enough (like 0.001%), then it is a great trade-off. I am sure they simulate this design very very throughly in their supercomputers so they know it works well.

    This architecture will throw some benchmarks in complete twists, so we shall see that soon as products roll out and benchmarks gets done in n ways by thousands of people.

  • fteoOpty64

    Just because NV has not designed a cpu, does not mean they cannot!. Look at Apple A7 which is a jolly great cpu yet consumes just a little more than the outgoing Arm equivalent. You got it wrong in transmeta type implementation. There is none in Denver design. Someone seems to throw in that old architecture and started talking VLIW. There is no VLIW anywhere in this design. Go look for it.
    The reality is the proof of the pudding in real-world benchmarking and we shall see soon, surely NV has done that before and knows. They will have the last laugh here, I am sure!.
    Preliminary benchmarks shows Denver to be almost twice as fast as S800!!.

  • fteoOpty64

    If you think, it is possible to merge Win8.1 to WP then you are solely mistaken on how OS works and how the apps work in their framework. It is just not possible due to different cpu architectures x86 vs Arm. WinRT was crippled at birth and remained so which let to its demise. Had MS “opened” up RT, the story might have been different. As such, due to BayTrail cpu and low power Haswell chips, the Win tablets are running pure x86. The “RT Madness” might not have happened had MS made RT the same as Pro but opened up RT for very free 3rd-party installable programs. They LOCKED it up to be almost useless!.
    With Android and IOS being so strong, there is NOT going to be anymore “Windows -on-Arm”. That boat sunked long time ago, sorry.

  • fteoOpty64

    You can extend battery by capping the framerate to 30 fps and several things. I was told it is possible to get 4 hours plus of gameplay rather than 2.5hrs unrestricted. Besides, when playing games, you are likely to hold the controller so the tablet can be plugged in on a table or some surface.

  • fteoOpty64

    Would love to see a Shield Tablet 10 with a 10inch 2560X1600 screen bundled with two wifi controllers for dual play kids!.

  • fteoOpty64

    Considering the Maxwell architecture has been on the market for some time, NV can crank up the maxwell version of the gpu and name it M2. They can then churn up a “large-tablet” version of Denver with 2 SMX maxwell cores. Hows that ?

  • fteoOpty64

    Actually for a large tablet (ie 10 or 12 inches), fully Ubuntu with a virtualized Android together would be real cool. Full Ubuntu is for keyboard and mouse people. You get into terminal and IDE to do some serious work while Android is more for web and social media stuff. Audio can be backgrounded by either. Imagine a work machine with your own VPNed Android virtualised to your home server!.

  • fteoOpty64

    Dude, it is cheaper if you just buy a GT-750 card today!. That is faster than K1 as it eats 55watts TDP max. Cost around 100 bucks or less.

  • haakonks

    Wow! Thats a lot of sweeping statements!
    Let me just answer to your point number 3. According to the slides from Nvidia-s presentation at HOT CHIPS, Denver supports NEON, with 2 x 128-bit (FP0 and FP1 in the execution pipeline).

  • Nicolai Behmann

    Are you planning an update of the Jetson TK1 dev board to 64bit Denver TK1 version, maybe even this year?

  • Drew Forester

    You just brought a tear to my eye.

    I have to laugh about the K1 powering Acer’s new Chromebook. Maybe the other ARM Chromebooks aren’t quite up to par, but the incredible versatility and power of the K1 just seems a bit much for a glorified web browser. That said, if a K1 showed up in a Firefox OS device, I would still be all over it, but mostly because I just like Firefox OS as an open source comcept.

    But Ubuntu Touch would rock on this chip. I mean… holy crap. Yeah.

    If NVIDIA isn’t already part of Canonical’s little cabal, they should be.

  • Mexor

    So you expect Apple to release a GPU better than the K1 in September? I imagine when playing Trine 2 it is the GPU performance and GPU efficiency that plays the vastly bigger role, isn’t it? I have a feeling if people want a jump in efficiency with equal or greater GPU performance and ability they will have to wait for NVidia’s Erista next year.

    The issue for NVidia is that GPU performance is not at this time as important for tablets as it is for desktops. They are trying to help drive a seemingly inevitable change along that direction with their Shield line, but it also seems like they are trying to differentiate their CPU as well with Denver. If they can do that they would be expected to be the dominant player in the Android tablet space, replacing Qualcomm there. But that remains to be seen.

    However I don’t see how you can expect these other companies to suddenly match NVidia’s graphics prowess. Consider this: if they do match it, they will do it coming from an entirely different direction. Nvidia successfully brought their bread-and-butter PC GPU technology and expertise to the mobile space which caused sudden jumps in GPU performance and perf/watt in mobile. Since the other players don’t have PC GPU technology and expertise they would need to make similar jumps in performance and perf/watt through another avenue at almost exactly the same time. Seems unlikely to me.

  • The Calm Critic

    They’d be clinically insane if there’s not gonna be any. At the very least you gotta have 1 reference Denver driven Shield devices for devs, tighten that sh*t up and then profit from mass market afters.

    ..or so I hope…

  • robjl

    Will there be a developer board for the 64bit Tegra K1 – would be very interested to get one

  • nobodyspecial

    I’d rather see people start using game benchmarks (meaning ACTUAL GAMES) than this crap. What does linpack prove to me when compared to what I’ll actually be using a device like this for? I want to know how good it is in stuff I’ll ACTUALLY DO on tablets/phones etc. I’m not sure why mobile games haven’t started building in benchmarking like the usually do on PC’s. It would certainly be closer to what we are really doing this most of this synthetic crap they benchmark today.

    Linpack is fairly useless in this usage scenario and I’d hope nobody in tablet/phones would be optimizing for that stuff. It’s a waste of time and resources when they should be concentrating on optimizing gaming, because that’s what 90% of the revenue from googleplay is coming from (and 80+% at apple store, and 65% at amazon). It’s the games, not linpack that matters 😉

  • Michael Gainor

    Why is it not quad core?

  • Rich- Don’

    mini-hdmi on phone!

  • Kane

    No. Your playback speed will be too fast.

  • juanrga

    It has been a long-time wait for Nvidia CPU, and finally it is a very interesting product. I have been reading the whytepaper and it mentions that DCO could optimize over a 100-instruction size window. Does this mean that Denver could be classified as KIP (Kilo-Instruction Processor)?

    The rumour is this is a 256bit VLIW hardware _at metal level_, but someone here rejected that. In case this is not VLIW design, I have a serious difficulties to understand how Nvidia could design a 7-wide core, when the rest of the academy/industry has difficulties going above 4-wide, die to the quadratic complexity law for superscalar.