The loop is at 1MHz exactly. You can pull it to 1.33 MHz by dropping the support for the 'delay' by undefining USE_DELAY in the .S file. But there is not much point because the bitbanging loop is not the limiting factor anymore.
Even tough the loop is now ~8-10x faster than before, the transfers haven't sped up that much which is disappointing. Only ~ 2x speed up ( from 3.5 k/s -> 7 k/s in download and from 9.5 k/s -> 19 ko/s in upload)
The two main limiting factors are : * The UART speed: Currently, while in a burst you can _clearly_ see the bitbanging for 16 bits, then a gap where it's waiting for the next 4 bytes ... If we could somehow make the UART go faster ... 2MBps would be nice. * Space between commands: Open OCD has some giant hole where no data is xfered between commands. These take up about 50 % of the time. If we could somehow improve the openocd driver, this could speed up by another 1.5 -> 2 times.
I have a working loop and it outputs bits at ~ 1.33 MHz, which is faster than the UART can provide them anyway. I still need to clean up some stuff in the serial but it's going well so far.
Is there a procedure to post patches ?
I have to change some stuff in the linker script to put the terminal buffer at the end of the memory, so that all small variables are in the first 8192 bytes of memory (which ASM can access more easily and faster).
When I say 'sync it', I mean toggle it when needed. If you give control of the CLK / TDI / TDO to the SPI module you can no longer 'toggle' TMS when it needs to (during state change, but that's not a 'rare' event) and if you don't have a number of bits to shift compatible with SPI, you'd have to switch those pins function back and forth from SPI to GPIO and play like that, basically a big mess.
Ok, I've refined a little my code and tried to compile it. (didn't test it in hw yet).
My current hope is to achieve ~ 700 kHz and use only about 1/3 of the program space previously used by the tap shift method (currently 288 bytes, going down to about 90 bytes) I could push it up to 1 MHz (by unrolling the inner loop) but then the program space would be a little much to my taste.
Using the SPI Hw could provide a lot of speedup indeed, but that's a lot of hack to sync TMS and to make sure you don't generate glitches and stuff when going from SPI mode to normal IO mode (if not a multiple of 8 bits or so). Too bad the spi module isn't more flexible.
I was indeed using pirateloader. With the export, I get an hex that's the same length as the official one. I'll try flashing it tonight see if it works.
I was wondering if there was something special needed to rebuild the firmware ?
I downloaded MPLAB and the C compiler evaluation, installed them. Then cloned the SVN and loaded the project file in source/ and build that. Then I tried flashing the output/busPirate.hex resulting of this but the loaded doesn't even want to load the .hex file ...
Yesterday I had to dump the NOR from a target with JTAG. It took about 8 hours to dump 64 Mo of flash. That's pretty long !
When looking at the jtag bus, I notices that it is currently the limiting factor. (the with 'fast' uart mode, the uart keeps up but is slowed down by the bit-banging).
The JTAG bus is aroung 100-125 kHz right now (with a very un-even duty cycle). I think that by using an optimized loop, we could bring that to 500 - 600 kHz. Which should provide a pretty nice speed up.
I haven't programmed PIC in a long time ... (back when the 16f84 was a new product), but I tried estimating what could be done by a quick look at the datasheet. Below is what I came up with (probably buggy but that's just to estimate the cycle count) :
Basically there would be several 'layers' of loop to avoid doing the uart test or handling #bits != 16 ... and you'd have one very fast function doing just the bit bang for a certain number of 16 bits words and for the 'end' and when uart flushes are needed, we'd just have an outer loop in C.
I tested it but it doesn't work. Same problem as a standard v5.0
Another piece of information: depending if I put the bus pirate in 'normal' or 'fast' speed, the error changes slightly. The weird swap at the beginning is exactly the same, but there is no swap at the end.
Am I the only one with this problem ? Do some people use v5 successfully with openocd ? Because here, the problem is not random, it happens each time and exactly the same way.
In attachement, there is the full 'debug_level 3' output of OpenOCD as well as logic analyzer logs in VCD format (GtkWave) for the three firmware versions.
Here's an extract where you see the DR scan for IDCODE and you clearly see the problem :
I just tried the OpenOCD with the v5 firmware ( BPv3-Firmware-v5.0.hex ) and it doesn't work (not even the IDCODE) Looking at it with a logic analyzer shows the the commands sent to the JTAG bus are correct, the TDO signal is correct on the bus but does _not_ match the bits received by openocd.
With the exact same setup and just reflashing to BPv3-Firmware-v4.2.OpenOCD.hex and things work fine.