Hi,
Yesterday I had to dump the NOR from a target with JTAG. It took about 8 hours to dump 64 Mo of flash. That's pretty long !
When looking at the jtag bus, I notices that it is currently the limiting factor. (the with 'fast' uart mode, the uart keeps up but is slowed down by the bit-banging).
The JTAG bus is aroung 100-125 kHz right now (with a very un-even duty cycle). I think that by using an optimized loop, we could bring that to 500 - 600 kHz. Which should provide a pretty nice speed up.
I haven't programmed PIC in a long time ... (back when the 16f84 was a new product), but I tried estimating what could be done by a quick look at the datasheet. Below is what I came up with (probably buggy but that's just to estimate the cycle count) :
Basically there would be several 'layers' of loop to avoid doing the uart test or handling #bits != 16 ... and you'd have one very fast function doing just the bit bang for a certain number of 16 bits words and for the 'end' and when uart flushes are needed, we'd just have an outer loop in C.
[tt:]params:
W0 inBuffer
W1 outBuffer
W2 bits
# Main full 16 bits loop
# ----------------------
SL W2, #4, W3 # W3 = W2 >> 4;
_loop_f16: # do {
# Fetch & organize 2x16 bits
MOV [W0++], W5 # W5 = *((uint16_t*)W0++);
MOV [W0++], W6 # W6 = *((uint16_t*)W0++);
# W5 has one byte TMS and one byte TDO
# W6 has one btes TMS and one byte TDO
# -> We must reorgonize that !
SWAP W6 # W6 = swap_bytes(W6);
XOR W5, W6, W7 # W7 = W5 ^ W6;
AND #0xff, W7 # W7 &= 0xff;
XOR W5, W7, W5 # W5 = W5 ^ W7;
XOR W6, W7, W6 # W6 = W6 ^ W7;
SWAP W6 # W6 = swap_bytes(W6);
# Clear destination register
CLR W7
# Inner loop
MOV #15, W4
_loop_f16_inner:
# Clear TCK
BCLR IOPOR, OOCD_TCK
# Set TDO & TMS
MOV IOPOR, W8
BTST.Z W5, W4
BSW.Z W8, OOCD_TDO
BTST.Z W6, W4
BSW.Z W8, OOCD_TMS
MOV W8, IOPOR1
# Set TCK
BSET IOPOR, OOCD_TCK
# Read TDI
BTST IOPOR, OOCD_TDI
BSW.Z W7, W4
# Loop condition
DEC W4
BRA NZ, _loop_f16_inner
# Store result
MOV W7, [W1++]
# Loop condition
DEC W3, W3
BRA NC, _loop_f16 # } while (W2--);
# FIXME: Do the rest 0..15 bits[/tt:]
This would probably help, but you have to take into account, that you are not going to get 2bytes of TDO/TMS all the time. The odd count would have to be worked out. But it might be doable for someone speaking pic24 assembly. :)
One better speedup, is to transmit 8bits of TDO/TDI using SPI engine already inside PIC. TMS is usualy 8bits long (maybe less) and could be bitbanged. This would also need some rework of the protocol - add higher level commands, and also major change in the client sw (openocd). One issue that comes into my mind is space. There is not enough space in the BP to add such statemachine.
Ok, I've refined a little my code and tried to compile it. (didn't test it in hw yet).
My current hope is to achieve ~ 700 kHz and use only about 1/3 of the program space previously used by the tap shift method (currently 288 bytes, going down to about 90 bytes)
I could push it up to 1 MHz (by unrolling the inner loop) but then the program space would be a little much to my taste.
Using the SPI Hw could provide a lot of speedup indeed, but that's a lot of hack to sync TMS and to make sure you don't generate glitches and stuff when going from SPI mode to normal IO mode (if not a multiple of 8 bits or so). Too bad the spi module isn't more flexible.
I think you dont understand the purpose of TMS well :) But it's OK.
The TMS is only to move around Jtag state machine. Once in the correct state, you shift data TDI/TDO. (and tms stays in the same state). But there are some more things to work around when the SPI module is used - like other than 8/16 bit data.
if you could test the TAP_shift assembly function, it would be nice to have such speedup :)
I understand exactly what TMS is used for.
When I say 'sync it', I mean toggle it when needed. If you give control of the CLK / TDI / TDO to the SPI module you can no longer 'toggle' TMS when it needs to (during state change, but that's not a 'rare' event) and if you don't have a number of bits to shift compatible with SPI, you'd have to switch those pins function back and forth from SPI to GPIO and play like that, basically a big mess.
Exactly. But then you need to move the state machine from OpenOCD to the BP. And there is no space left.
I would stick to the Assembly for a while :)
You could make a frontend for the openOCD binmode ;P
it is possible to sacrifice an other protocol (like now there is a picprogramming protocol, but it is not linked in). Everyone could make his own build with his or her favorite protocols.
I have a working loop and it outputs bits at ~ 1.33 MHz, which is faster than the UART can provide them anyway.
I still need to clean up some stuff in the serial but it's going well so far.
Is there a procedure to post patches ?
I have to change some stuff in the linker script to put the terminal buffer at the end of the memory, so that all small variables are in the first 8192 bytes of memory (which ASM can access more easily and faster).
You can post a patch here, we'll apply it and make a nightly for people to test.
Here are the 4 patches.
The loop is at 1MHz exactly. You can pull it to 1.33 MHz by dropping the support for the 'delay' by undefining USE_DELAY in the .S file. But there is not much point because the bitbanging loop is not the limiting factor anymore.
Even tough the loop is now ~8-10x faster than before, the transfers haven't sped up that much which is disappointing. Only ~ 2x speed up ( from 3.5 k/s -> 7 k/s in download and from 9.5 k/s -> 19 ko/s in upload)
The two main limiting factors are :
* The UART speed: Currently, while in a burst you can _clearly_ see the bitbanging for 16 bits, then a gap where it's waiting for the next 4 bytes ... If we could somehow make the UART go faster ... 2MBps would be nice.
* Space between commands: Open OCD has some giant hole where no data is xfered between commands. These take up about 50 % of the time. If we could somehow improve the openocd driver, this could speed up by another 1.5 -> 2 times.
[quote author="tnt"]
The two main limiting factors are :
* The UART speed: Currently, while in a burst you can _clearly_ see the bitbanging for 16 bits, then a gap where it's waiting for the next 4 bytes ... If we could somehow make the UART go faster ... 2MBps would be nice.[/quote]One way to break the UART speed barrier is to redesign the Bus Pirate without the FTDI chip, and instead use a PIC with native USB support. However, I've already floated this idea, and learned that there are historical reasons why the BP was based on FTDI solutions, as well as issues with the firmware to support USB. With the right coding skills, though, you could potentially reach maximum USB speeds.
Of course, now I wonder whether you meant 2 Mbps or 2 MB/s. USB operates at 12 Mbps, and the maximum achievable byte rate is really only 1 MB/s (not 1.5 MB/s, as you might calculate, due to overhead). Whether you have room to speed things up depends largely on the limits of Full Speed USB.
* Space between commands: Open OCD has some giant hole where no data is xfered between commands. These take up about 50 % of the time. If we could somehow improve the openocd driver, this could speed up by another 1.5 -> 2 times.
This might only be solvable via FPGA "programming" - i.e. the Openbench Logic Sniffer might be a better platform if you want to develop "faster" JTAG. On the other hand, perhaps some smart interrupt programming would help. It's too bad the PIC does not have peripheral DMA, because that could possibly eliminate the holes. Caveat: I am not completely familiar with the JTAG protocol, so I cannot be certain that there is a way to avoid holes in data if the protocol has overhead.
Thanks, I will apply these today and get them in SVN.
There is a native USB version of the Bus Pirate in the works. Demand for features that need more speed makes it unavoidable. I'll post the boards in the forum for comments in a few days. There's a thread somewhere discussing an open source USB-ACM (virtual serial port) driver for the PIC, that's the main impediment to the project. The Bus Pirate is mostly public domain source, and there's not currently a great driver solution that we can distribute with the project (microchip USB stack is not a real open license).
I couldn't apply the patches with SVN, it just shows nothing to do (I'm far from an expert). Is there a trick? I thought maybe line endings, so I converted them but it didn't help. You could also .zip and attached the files, or I can give you SVN access to commit the changes yourself (you'll need a google-registered email, I tried your current email but google didn't know it).
The code is committed. Feel free to test and report :)
Thanks for the commit.
I compiled for v2go/v3 and v1a, nightlies are here:
http://the-bus-pirate.googlecode.com/sv ... e-v5.2.hex (http://the-bus-pirate.googlecode.com/svn/trunk/firmware/v5-nightly/BPv3&v2go/BPv3-Firmware-v5.2.hex)
I am confused so if someone can shed some light ... tnt made few patches regarding jtag, ian made a nightly build - but there's no jtag mode on the 5.2 firmware at all ?! What one need to do to make jtag mode work ? (except fall back to 4.x)
the code states:
...
#define BP_USE_1WIRE
#define BP_USE_HWUART //hardware uart (now also MIDI)
#define BP_USE_I2C
#define BP_USE_I2C_HW
#define BP_USE_HWSPI //hardware spi
#define BP_USE_RAW2WIRE
#define BP_USE_RAW3WIRE
#define BP_USE_PCATKB
//#define BP_USE_LCD // include HD44780 LCD library
//#define BP_USE_PIC
..
don't see how to turn on jtag here :(
Currently the user terminal JTAG mode is removed from v5. These patches are for the OpenOCD JTAG support - the Bus Pirate supports the open source OpenOCD JTAG debugger from the binary interface mode: http://openocd.berlios.de/web/ (http://openocd.berlios.de/web/)
Did you use the low-level JTAG user mode library? We're trying to give it an update - what features have you used in the past, and how could it be tweaked to work best for the things you do?
[quote author="ian"]
the Bus Pirate supports the open source OpenOCD JTAG debugger from the binary interface mode: http://openocd.berlios.de/web/ (http://openocd.berlios.de/web/)
[/quote]
ah, so openocd bit bang's it, thanks for the explanation.
Did you use the low-level JTAG user mode library? We're trying to give it an update - what features have you used in the past, and how could it be tweaked to work best for the things you do?
Nope, I just tried it to see if it works.. have not used it at all. Just started to work with some cpld's (cold runner2) and fpga's (spartan 3an) so not too experienced with the whole jtag chain thingy :( but looks like I will have to use it more and more so BP might be useful in that regard