They should just call it what it is: a "clearance sale", instead of this wishy-washy "SE" nomenclature to make it sound like something new ;) What's next, GTX 950 collector's edition?
Well sure, I'm sure they want to salvage the cameras and USB cables and include them in TX2 kits.
Everything here screams "we are trying to sell off stuff". Say it as such. It's called a "clearance". It's totally fine to call it that and as an engineer (their target market) I would respect them more for being direct and straightforward in their marketing.
As a customer I would prefer that new names be given to new designs, not existing designs with components that can be salvaged taken off. Trying to pull off this sales nonsense just reduces my respect for the NVIDIA brand by a notch.
If they came up with a redesigned TX1 module that consumes half the power, or a kit with a freshly-designed compact carrier board, or some other novel hardware features, then sure, call it the "TX1 SE" or "TX1s" or whatever they want.
1. $200 for the TX1 is a much lower entry point for GPUs than $600 for the TX2
2. According to Nvidia (see comparison here: https://devblogs.nvidia.com/parallelforall/jetson-tx2-delive...), the TX2 is about twice the performance of the TX1, but since the cost of the TX2 is about 3 times the cost of the TX1, I think you get more "bangs for your bucks" with this offer
Performance is not the only point of consideration.
There are probably features on the TX2 that are not found on the TX1, though I haven't actually looked at the difference.
For instance, IIRC, the hardware accelerated jpeg decoding is pretty limited on the TK1, compared to the TX1/TK2. Basically you can decode and display (m)jpeg really fast, but you have little control over the output pixel format. It's a black box hidden in nvidia's special hardware. If you try to programmatically get at the pixel data, you'll find that the pointer is actually DMA-mapped, which means it's extremely slow to memcpy from, which defeats the purpose of hw acceleration.
I don't get your claim about DMA memory being slow to memcpy from. DMA memory isn't special, apart from being non-relocatable and contiguous within an allocation. Things would be much worse if the copy to main memory wasn't done with DMA.
The explanation I got (from somebody else a while back) is this: DMA is only fast if it's accessed in large blocks, but memcpy is not designed to work that way.
The hw accelerated jpeg decoding on the Jetson TK1 is backed by something called NVMM. I suspect it's a small chunk of memory that is closer to the special hw decoder, not part of the main 2 GiB RAM. When you decode jpeg, you get a DMA-mapped pointer to it.
When you memcpy from it, memcpy tries to copy the data one byte at a time (or in very small chunks, if vectorized). So what happens is that in order to copy data from a large block of DMA-mapped memory, memcpy makes many requests for single bytes or very small chunks, but DMA ends up fetching and staging the same large block multiple times for memcpy to read.
The same problem applies to any other operation that reads from the DMA-mapped memory in very small chunks. There is also no way to make the DMA-mapped pixel data directly available to a CUDA context, which is what Jetson board users would probably care about.
I'm not usually a system programmer so my explanation may be a bit fuzzy or inaccurate, but that's the gist of it.
DMA isn't something you access. Direct Memory Access means that an external device can issue reads and writes to main system memory (you call such a transaction a "DMA transfer"). What you access is entirely normal memory.
A DMA transfer is done entirely by the external device, without any intervention by you or the CPU (apart from potentially a cache invalidation before/after, depending on platform). From the point of view of the CPU (and therefore you), data just magically appears. You don't need to memcpy anything anywhere, unless you need to make sure the external device doesn't accidentally overwrite things (who knows, it might be stupid). You can read and write directly to wherever it appeared as if it was any other memory. Caching still applies, although a fresh DMA transfer will of course be a cache miss.
There is a little bit of magic related to memory regions used for DMA transfers, but that is invisible to the end-user. 'memcpy' does not discriminate, and unless you're copying a single byte, memcpy will never operate on byte-sized chunks.
Source: I do driver development for devices that DMA at extremely high rates (we're desperately waiting for PCIe 4.0 to become normal—PCIe3.0x16's 126Gb/s is way too slow for us).
Apart from the fact that the cache needs to be invalidated before/after a DMA transfer to ensure cache coherency (automatically or not, depends on platform), DMA allocations have no problem with caches.
DMA allocations have no problems with caches, but in some situations you can get a performance boost by marking as write-combined if the CPU only writes to the memory once and then hands it off. This results in less cache being occupied with useless data and can be beneficial due to coalesced writes, but will result in horrifically slow CPU reads from the region.
I am suggesting that the author conflated "DMA allocation" with "write-combined allocation".
Ooo, so shiny, must collect! On a more serious note, perhaps borrowing AMD's strategy of bundling a few games would make for a more effective clearance strategy, instead of rebranding and reducing prices so much.
I guess they should pick one of the plethora of games targeting ARM-based embedded hardware running non-Android Linux? That will definitely make it sell better to their target market of deep learning engineers.
That actually quite nice. When I benchmarked it, for my application without using any GPU stuff, it was a little more powerful than my current Macbook. Arguably the 2016 macbook is underpowered, but I still found the TX1 quite nice.
I guess it's fortunate that it's US and Canada only, otherwise I'd probably buy one and not use it. :)
I really liked the X1 SoC. Nvidia did a fantastic job targeting a wide range: The Pixel-C tablet used it, as did the Shield TV and even the Nintendo Switch. It was far and away the most powerful mobile SoC upon release, and it was great to have an alternative to Qualcomm (Nvidia opengl drivers smoke the qcom garbage).
It's such a shame that Nvidia is no longer targeting the mobile market with their SoCs. I play games/emulators on my Shield TV, as well as h.265 4K HDR content and it's great, but I'd gladly buy another Shield TV using the X2 (to better handle GameCube/Dreamcast emulators). Unfortunately Nvidia doesn't seem to have a followup SoC designed for the 'mobile' market. They are going after the higher margin (less TDP sensitive) AI/computer vision market.
This is pretty decent device. Played around with these when I was working in automotive stuff in previous job and was playing with the idea that I'll get one of those to replace my current Kodi box.
Runs QNX atleast and Linux/wayland, no idea about possible Xorg drivers...
The Jetson makes sense as many things, but it's serious overkill to be a Kodi box. It's a nice development kit that includes CUDA, and for some people it's even a nice and interesting desktop.
For Kodi and media purposes however it makes far more sense to buy either (1) the Nvidia Shield TV with the same chipset and also costs $200, if you're looking for a media player with and don't care about running your own software or (2) the Odroid C2, which is one third the cost of this, is a much smaller device (and probably more power efficient), will soon run mainline Linux (nightlies out already) and plays h265 and 4k just fine!
When are these things going to become SBBR compliant? I am not even going as far as asking for an XHCI controller that doesn't require a blob to act as an XHCI controller (although this could well be hidden by firmware), but adopting at least UEFI (even without ACPI!) would be a really great start.
Yes, along with the 2nd and now 3rd generation of ARM server solutions from the known players...there's just no excuse
for nVidia to make the TX1/TX2 software system so closed and non-compliant.
They are [somewhat] popular for embedded computer vision and machine learning development. Think developing algorithms for autonomous cars or similar as an example. Stuff where you need an embedded platform, but need a powerful GPU for computational purposes.
That combination is somewhat rare, especially in an affordable, ready-to-go format available in small quantities that developers can use to play around.
I use it for robotics research, and it's pretty fantastic. A few years ago it was common to see people literally strapping laptops onto robots, but this device is just as powerful with a much better form factor.
"Are you a member of the NVIDIA Developer Program in the US or Canada? If so, you’re eligible for an exclusive developer discount on the NVIDIA® Jetson™ TX1 Developer Kit SE"
is it easy to join the developer program? Seems like not everyone gets this discount