My next big project is probably going to be this idea, I am just interested in if anyone else here is interested in the concept.
The basic point is a modular very high quality music player for your home.
Basically the structure will be a Cortex M3 with a I2S interface to your choice of DAC backend.
The M3 micro will have a bunch of SDRAM for you to have for scratch decoding space in which is directly accessible in the memory map. For storage it will use FAT on a USB mass storage device. Say a 2.5" hard drive.
The uC I think I will use is a LM3S9B90 but I also might use a LM3S9790 for this project.
These have built in Ethernet but given the 12 MBit rate of USB Full-Speed to the mass storage I don't know if that is really the best transfer means, probably a pull the disc add files scheme is better.
The project may have a daughter board for the user interface/display to let people choose the display they want to use. This way if someone wants to do touch they could or what have you. My inclination is for VFD + a few buttons and a rotary encoder.
My codec priorities are:
WAV
FLAC
OGG
anything else
I don't think there will be MP3 support for licensing reasons, some sort of computer side transcode might be the best workaround for large MP3 collections, but as the point of this is HiFi hopefully the sources are better than average MP3s anyway...
The first DAC backend I have planned is WM8740 based.
Update:
There is now a documentation page on my wiki for this: http://teholabs.com/docs/openhifi:overview (http://teholabs.com/docs/openhifi:overview)
I don't know much of filesystems (how hard to code support?), but ext2/3/4 support would be nice, because most of linux distributions use it as default. And if it is possible, network attached storage would be really nice :)
I could be potential buyer, if this comes to Seeed or something :)
Edit: And remote control via IR / Ethernet / Bluetooth or something would also be really nice :)
What advantage would ext provide?
A journal system of any kind is totally out of the question. Writing a file system is actually quite hard. BSD style FAT exists from Chan.
Remotes are totally possible.
Great idea.
You might consider UPnP, since many existing network audio streaming hardware devices use this protocol, and I assume it's easier to implement than a filesystem.
WAV and FLAC are certainly required for a high grade player. These are the formats being sold now (beyond MP3 and AAC, of course). WAV and AIFF are so similar that you might as well support AIFF, too. CAF is a new, open standard from Apple which breaks the 32-bit barrier and fixes many of the problems of the existing formats, making it very useful for 24-bit 192 kHz audio. Maybe that sounds like RF64, but CAF does have a few unique advantages (more for recording than playback, though). If you get WAV and FLAC working, I'd be interested in adding support for CAF and certainly AIFF.
Personally, I would be most interested in designing a high-end FireWire player, although I admit that Ethernet would be a nice option. FireWire allows both isochronous streaming via guaranteed bandwidth and basic file access, but FireWire would obviously be expensive to develop. FireWire is a fairly robust interface, and I believe that people have wired entire stadium sound systems with FireWire (although I assume they need some sort of repeater for the long spans).
Maybe some sort of modular PCB with two or three interfacing options (FW, EN, USB) that would only be populated as users desire. Ethernet is easier and connects to more devices, but is not optimized for isochronous streaming in any way. FireWire is much less common, but is extremely optimized for audio. I'm not sure how many unique parts are required by FireWire, so I guess more information would be needed to know whether it's a sane option to offer.
Another personal interest would be to base the platform on a TMS320 chip instead of ARM, so really fast DSP could be performed as an option. The downside of a DSP is that interfaces like Ethernet might be more difficult to implement without a second processor on board for more general-purpose programming. Sometimes '8-bit' programming on a DSP can be a nightmare.
Thanks for your input RSDIO.
Yeah the design is in no way final. I am leaning toward some ARM as I have all the tools for it. I totally think it should be as modular as possible. The thing that got me thinking of trying to build a single board that did the file system/decode is that there are all these audiophiles that hack other things to get I2S output for their home built DAC boards. Mostly over at hydrogen audio, and diyAudio.
FLAC is fixed point for what it is worth so it should be okay on ARM. FLAC is my huge priority (because my music archive is FLAC and it supports gap-less audio for classical audiophiles like my father). The sub goal of the project when I get to it (and it might not be till summer honestly), is to see if I can squeeze enough performance out of it to do the decode. There would be plenty of memory in the approach I suggest, but it is only readable at 50 MHz.
In my opinion, this is a perfect application for an embedded Linux system. With Linux as a basis one could avoid to reinvent the wheel regarding all the network interfacing and filesystem stuff, maybe we could even use mplayer for audio decoding.
I've played around with an AVR32 board, the NGW100. The board itself is cheap and runs Linux. The AP7000 has plenty of power for audio decoding, the synchronous serial comm controllers can be used for I2S output, it even has a complete LCD controller (I've used it with a 320*240 TFT). One drawback is the missing USB host controller.
Jan
[quote author="JanW"]In my opinion, this is a perfect application for an embedded Linux system.
Jan[/quote]
I definitely see advantages to using a linux base for this type of project. However in the end you still have to port the CODECs of choice to the system. I don't know how much easier Linux would make that. Plus you will still need hardware specific drivers for I2S I/O etc.
For a simple design I am not sure Linux therefore adds much other than overhead??? What explicit advantages other than less effort on software porting do you see?
I sort of don't want/believe in a through hardware at it approach. I don't know if anyone else will agree with me but to keep the cost as low as possible I want the design to use the least hardware as possible. If I can do a FLAC decoder in 96 KB of SRAM I am going to do that!
I really want to know what advantages you see to using Linux here, mostly because I am not a Linux jock and I don't know truthfully how much overhead there would be. I don't see a 80 MHz M3 as having a ton of power for this application.
I agree with 'brian' on this one. Linux is a general purpose system, and does not seem to offer much of anything. Network interfacing and filesystem support are portable on Linux mostly because they're C based, although there are probably also some specific threading API from Linux for the networking stack, or at least I assume that's a fair summary. Audio systems and other signal processing systems usually benefit from a RTOS, which also have threading API, but they (RTOS) are usually tighter and perform better on timing critical systems than a full-blown general purpose OS.
Texas Instruments offers a RTOS that is open source for their TMS320, although I have not used it yet. I looked over the design, and it seems quite tailored for event-based programming. I'm not sure how difficult it would be to port networking code to an RTOS, but the filesystem code should be straight C without much dependency on OS calls other than I/O.
[quote author="brian"]I am leaning toward some ARM as I have all the tools for it. I totally think it should be as modular as possible. The thing that got me thinking of trying to build a single board that did the file system/decode is that there are all these audiophiles that hack other things to get I2S output for their home built DAC boards. Mostly over at hydrogen audio, and diyAudio.
FLAC is fixed point for what it is worth so it should be okay on ARM. FLAC is my huge priority (because my music archive is FLAC and it supports gap-less audio for classical audiophiles like my father). The sub goal of the project when I get to it (and it might not be till summer honestly), is to see if I can squeeze enough performance out of it to do the decode. There would be plenty of memory in the approach I suggest, but it is only readable at 50 MHz.[/quote]50 MHz is still 5 times the highest bit rate, right?
I'm not sure what to suggest. A DSP chip would handle the FLAC decoding and I2S most efficiently, but would probably make the networking and/or filesystem more difficult. An ARM would probably make most of the coding easier, but might not handle decoding with the same efficiency. I guess so long as the ARM could handle DMA to I2S with clock rates high enough to support 24/192, then you have a fine platform.
I've been exposed to the edges of hydrogen audio and diyAudio. In fact, I recently started reading "Designing Audio Power Amplifiers" by Bob Cordell, who is fairly prominent on diyAudio. I've also been interested in building one of the current-to-voltage DAC circuits from Nelson Pass, since it seems very similar to a current-output (high-speed) DAC that I am designing with now.
What I would be concerned about is clocking. The best-sounding DAC designs have a local, low-jitter clock on the same board, to avoid the problems of clock degradation when connecting multiple boards. Based on that bit of knowledge from the community, I wonder how clocking is handled with these I2S boards for external DACs. Ideally, the DAC should send the clock to the main board (ARM) and drive the timing for the whole system (including ethernet timing). Is it possible with I2S for the ARM to be a slave?
I agree with you on jitter. Mostly the things that matter most for jitter are things like SPDIF where like a UART there is no separate clock channel. The PLL jitter is relevant still but if you use a ribbon interconnect I see no reason a low jitter clock has to be local to the DAC it can be master crystal on the decode board. Regular quartz has rather low jitter in the scheme of things really...
Some reasons I think it would be easier to do with ARM have to do with branches of OGG decoding for ARM. WAV should be pretty easy to do, that should prove all the hardware works and sounds good, the file system interface etc. FLAC is next most important as you could transcode anything to FLAC on 'import', let's face it if this is hard disc based, with a USB host... You can have 2 TB on this thing if you want... Size of the music library is minor. The complexity of the FLAC decoder depends on if you want to support all block sizes. The reference encoder is pretty much the only one used in practice and it has a fixed block size less than the max as I recall.
I don't feel any modular design will make the really hard core people happy. But it probably will exceed my ear and equipment. I want people to all hear what clear music sounds like in the days of MP3s and iPods that sound horrific, we seem to be going lower fi all the time when we have more bandwidth and storage space.
That said I think Phillips/Sony did an amazing job with Red Book Audio, really other than 1-bit oversampling tricks at fast rates to reduce the effective noise floor, it is a great standard, a well mastered CD is still very very good. To bad the trends in mastering that reduce dynamic range for the sake of volume...
On the ARM/DSP thing most of the chips that RockBox are ported to are ARM TDMI7 based, but I bet they have some DSP features on them.
A freeRTOS port to the suggested micro should be doable as SafeRTOS is ported to its sister chip.
Page 816 of the LM3S9B92 datasheet (90 is the same but may be a different page) says:
"Configurable sample size from 8 to 32 bits"
The tables starting on 822 give values for up to 192 KHz sample rates.
As far as your question about slave, I think it is possible but I'd have to read about it again, it might not help the jitter as the ARM would still transfer on one of its clock edges after sensing the master external clock, assuming the output FIFO is not asynchronously triggered. In which case it would be perfect.
Edit:
"SCLK Master/Slave Source of serial bit clock (I2S0TXSCK) and Word Select (I2S0TXWS)." So probably possible but does not answer synchronicity of it.
True. SPDIF is the worst possible scenario for clocking. HDMI is probably the same. I am not familiar with all the ways that clocking designs can compromise the audio, but I will concede that you are probably right that the clock does not literally need to be local to the DAC for low-jitter performance. So long as the decoder and converter boards can share a chassis, ground reference, and ribbon (or FFC) interconnect, then you probably have a great basic design that will allow plenty of variation.
I seem to recall that FLAC was ported to ARM, but I cannot remember. There was at least one engineer who ported FLAC from Intel/PPC to something embedded like ARM and submitted the source back to the open project. If you're still having trouble deciding on the processor (ARM, DSP, other), then perhaps a good starting point would be to see what FLAC has already been ported to in the embedded world, since that could make your job a lot easier. By the way, I contributed to the FLAC project, but that was before I started programming the TMS320 DSP, so I cannot recall whether there is any overlap.
If you want a reasonable alternative to MP3 (as I do), then FLAC and WAV are the prominent choices. There are so many web sites offering FLAC and WAV that I don't think you can ignore them these days. FLAC merely makes the download go about twice as fast without sacrificing quality. Supporting FLAC on the player just makes it easier to manage archives and file sharing. Even though we all have lots of disk space, there's still the hassle of conversion when you end up buying a lot of music in FLAC and simply want to listen to it without decoding to another file first. For me, I buy things online and burn a backup so that I don't lose my purchase in a computer crash. From that point on, it's really more trouble to deal with AIFF or WAV for iTunes and FLAC for the original and the archives. It would be much more convenient to have a player that could simply handle FLAC. Anyway, I'm sure I don't have to sell you on the idea.
WAV doesn't really even require a library to decode. If you're in a hurry, a quick bit of hacked code could interpret the RIFF chunks well enough to play the audio samples. Of course, it would be great to support more formats like AIFF and CAF, using a proper, generic API, but that sort of fancy solution can indeed wait until after the proof-of-concept, as you've already said. I have nothing against OGG, per se, but the format does not seem nearly as popular as FLAC or even AIFF. It would seem to be the lowest priority. I might have a hard time justifying my belief that CAF should even be higher priority than OGG, but my evidence is that recording software from reliable vendors has support for CAF, but nothing I've seen supports OGG.
I agree that Red Book Audio, i.e., 16-bit 44.1 kHz, is quite good. SACD is more of a tradeoff than an improvement, and represents Sony's attempt to renew their patent rather than an actual attempt to take quality to a new level. I won't get into any more of the details distinguishing DSD from PCM, but I will say that I have no specific complaints about 16/44.1 as an 'improvement' compared to a simple MP3 player. You really can hear the difference, even when you're not trying, at least when the playback equipment is good enough.
Overall, my recommendation would be to design an interconnect (ribbon or FFC) with the appropriate signalling to support stereo 24-bit 192 kHz audio. If it's not too much additional trouble, then support for 8-channel (7.1 surround) 24-bit 192 kHz audio would be cool, but maybe that could just as easily be done with 4 (7.1) or 3 (5.1) stereo interconnects. Once you standardize on an interconnect that can handle 24/192 without clocking issues in the hardware specification, then you have a modular system where either the DAC or the decoder can be upgraded as the budget permits.
Even though most people would only use stereo 16/44.1 capabilities, it would be really cool to be able to swap out the DAC and bump up to 24/192, or even just 24/96 or 24/48. At some point, your original ARM platform decoder might run out of processing power for higher sample rates, but with a solid interconnect standard, it would be much less expensive to design a new decoder board that connects to all existing DAC boards.
Maybe it's a little silly to think about designing a hardware interconnect that goes way beyond your satisfactory 16/44.1 starting point, but I have a hunch that it wouldn't really increase the cost to be compatible with 24/192 and multiple channels. Compatibility like this might make it possible for me to design a TMS320 decoder platform that is compatible with your ARM decoder platform, and both would benefit from connecting with the same DAC boards.
By the way, if hydrogen audio or diyAudio have covered the specific topic of low-jitter clocked I2S interconnects, then please point me to the juicy links. I sincerely doubt that I'm the first to suggest a universal interconnect for DIY digital audio processing and conversion, so I'm more than willing to learn from those who've already gotten a head start. It seems like you've had maybe more exposure on those sites, so that's why I'm asking for links.
My previous reply was from the high-level viewpoint and covered a lot of subjects but not a lot of details. Feel free to disregard any of the sidetracks like SACD and just comment on whatever strikes your interest. But I would also be interested in discussing very detailed ideas like the exact pinouts of the interconnect, the type of connectors, the signals, et cetera. I'm quite familiar with FLAC and embedded programming, but I just haven't done both in the same project yet. So far, I've only written CoreAudio FLAC playback software, and written embedded and DSP firmware that sadly did not include FLAC.
I think that starting with the interconnect specification will make the choice of processor easier, so I'd be keen to start with the interconnect design.
RSDIO:
Some points. I2S in most versions is for stereo there are several alignments for L/R etc but mostly it is used to send stereo.
I am a big supporter of headphones and I feel like Red Book is the audio format I want to give priority to so 16-bit 44.1 KHz (not 48 KHz) should be given accuracy performance in the layout. (So I am not sure about 7.1 or anything, for Music I don't know if I see any point at all and I promise this is a music only device).
As far as phase noise goes, it really depends on how the VCO in the PLL is done on the chip. As I said the quartz itself is very low phase noise.
I will certainly check into the slave type I2S in detail before committing to a design, if that is indeed totally fine, I probably won't worry much about the clock on the main board.
The interconnect is going to be an IDC ribbon for sure. All of my application specific boards will use ribbons. Power supplies in the end will be a big deal for this. There will have to be a dirty 5V and a clean 5V/3.3V/+/- 12V. The dirty will have to do bus power.
I think the power supply will be modular also, I looked into this a lot. A linear supply for the USB host power is pretty much out of the question cost wise. I may want to use a hybrid switched + linear approach.
So you know I won't personally code anything to help support an Apple defined standard, they are the most closed and controlled system of all, actively stopping people who want to make things better, I don't want to help such an evil company in even a tangential way. I might change my mind if they donate something brilliant in a totally strings free way. Now that doesn't mean someone else can't contribute it though. Probably the code will be done via a permissive license, but I haven't totally decided on that.
I have in my lab space 3 LM3S9B90-C1 chips so my inclination is to attempt it with them. I will be happy to sketch out my plan on a schematic when I get to it but honestly it may be awhile. I have to take my A-Exam for my PhD soon, and a conference and a paper. Since I am a grad student I am quite poor, so I hope someone buys my ARM board (Eridani)!
I want to build a great sounding FLAC player for myself so one way or another I will do it.
BTW, where do you buy FLAC from? I would love to know, most places I get tracks from are high bit rate VBR MP3 (ick).
The ideal system would have HS USB support. The chip that Cumby is based on in BGA would have everything you would want for a Linux based approach and is quite cheap, but it looks a Linux only style approach. I want to do it bare bones, if I have to writing ASM optimization, etc, etc.
In the end this project is a classic integrate and make it cheaper, modular make it more flexible but expensive idea. I feel like if the price can stay under 250 for a nice one and 80 for an OK one (excluding hard drive) then it would be okay.
We def should talk a lot more... If you see my coding style you might just laugh though, I am a "real programmer" but I am certainly not a professional one if you take my meaning.
[quote author="brian"]I2S in most versions is for stereo there are several alignments for L/R etc but mostly it is used to send stereo.[/quote]Thanks for explaining this. It seems to fit the way the DAC chips are designed, too. Although there are single chips to handle 5.1, all of the high-end multichannel systems that I have seen are actually designed with multiple stereo chips. So, the best design would probably still need stereo links no matter what.
I am a big supporter of headphones and I feel like Red Book is the audio format I want to give priority to so 16-bit 44.1 KHz (not 48 KHz) should be given accuracy performance in the layout. (So I am not sure about 7.1 or anything, for Music I don't know if I see any point at all and I promise this is a music only device).
My comments about higher sample rates are geared towards the interconnect, not for the initial DAC board layout (other than how the layout might be affected by the connector). Also, I don't think that layout would really be affected by the choice of 44.1 kHz, 48 kHz, or the higher rates.
As for Music, I do have many surround releases. I have DTS Music Disc and computer files in surround, and I can burn DVD-Audio but I don't actually have a DVD-Audio player. I have also dabbled with recording in surround. Basically, any time I think about designing a two-channel system, I try to also think about designing for surround so that the systems can be compatible and share components. As for the experience of listening to Music in surround, it's a fun diversion; I'm not totally convinced it isn't a second passing fad (after quad); but I'm also not in a hurry to dismiss it when compatibility can usually be quite easy.
As far as phase noise goes, it really depends on how the VCO in the PLL is done on the chip. As I said the quartz itself is very low phase noise.
Correct me if I am wrong, but the ultimate clock is self-referenced, and thus you would not even need a PLL. The PLL is used to synchronize to an external clock, but if you shun SPDIF and HDMI then you can skip the PLL/VCO entirely and just use a clock that is generated within the system. This is a perfectly workable system with FireWire or UPnP links, because the media server just provides the data on demand.
I will certainly check into the slave type I2S in detail before committing to a design, if that is indeed totally fine, I probably won't worry much about the clock on the main board.
Please do keep that in mind, in case it looks like a good choice. Not to contradict myself too severely, but you may be right that the decoder could easily be the master clock with the DAC board working just fine as a slave. Provided that the clock jitter does not increase over the interconnect between the decoder and converter boards, then perhaps the DAC boards would end up being cheaper if the decoder was always the master. Also, if surround is available as a configuration option, then you could piece together a 5.1 system with three stereo DAC boards and one master clock on the decoder board. I like the idea of reusing the same boards to build lots of configuration variations.
The interconnect is going to be an IDC ribbon for sure. All of my application specific boards will use ribbons. Power supplies in the end will be a big deal for this. There will have to be a dirty 5V and a clean 5V/3.3V/+/- 12V. The dirty will have to do bus power.
So, that's at least 6 lines for power. Then you have a minimum of 3 to 5 lines for the I2S (I'm looking at Wikipedia). Seems like a minimum of 10-pin, but here's where I suggest designing in support for surround. I am assuming that an 8-channel (7.1 surround) system could share bit clock, word clock, and master clock for all channels. Then, you really only need one extra data line for each stereo pair. So, 4 data lines would cover decent surround. You could thus come up with a standard 14-pin or 16-pin ribbon that works with anything from stereo CD to full 7.1 surround at 24/192, all on compatible interconnects. Maybe I'm dreaming, but I would love to be involved in designing something like that where you could build just about anything you want for Music or Movies by reusing inexpensive building blocks that all work together.
I think the power supply will be modular also, I looked into this a lot. A linear supply for the USB host power is pretty much out of the question cost wise. I may want to use a hybrid switched + linear approach.
Modular power is a really good idea. This probably makes the decoder and converter boards cheaper, and allows people to just buy a switching supply rather than build their own. I suppose the only question is whether there is a standard connector for the voltages you'll need. Also, with modular power, it seems that the digital audio interconnect might only need data, clock, and ground. That's assuming the power would be provided on an independent connector, direct from the supply (rather than have the decoder board power the converter board, or vice versa).
So you know I won't personally code anything to help support an Apple defined standard, they are the most closed and controlled system of all, actively stopping people who want to make things better, I don't want to help such an evil company in even a tangential way. I might change my mind if they donate something brilliant in a totally strings free way. Now that doesn't mean someone else can't contribute it though. Probably the code will be done via a permissive license, but I haven't totally decided on that.
Apple seems to be all over the place. It's really annoying that they will not support FLAC on OSX or iOS. However, Steve Jobs seems to be taking a stand against proprietary formats like Blu-ray (BD), because he believes in open standards. As for CAF, I'm resigned to writing that myself, so you needn't worry about the issue. The funny thing is that Apple actually provides very minimal API support for CAF - nothing but basic read and write without all the meta data support. But they have fully documented the CAF format openly, such that I know I will end up eventually writing my own CAF library (unless sox or sndlib supports it). I have read the CAF Specification a couple of times, most recently for a paid client, and I do not recall any strings attached. However, I was looking for features, not legal ramifications.
I have in my lab space 3 LM3S9B90-C1 chips so my inclination is to attempt it with them. I will be happy to sketch out my plan on a schematic when I get to it but honestly it may be awhile.
No hurry. I, too, have ideas along these lines, and I have to move slowly because my clients get higher priority than my dream projects (although I'm lucky that one of my clients
is my dream project!). As I mentioned, you're probably not the first to use ARM for FLAC, so hopefully there is some documentation out there related to optimizing for ARM.
I want to build a great sounding FLAC player for myself so one way or another I will do it.
Just for reference about what I've been evaluating, check out the Linn Akurate DS. This is a Digital Stream Player that interfaces over an Ethernet network to a UPnP server and supports many formats including FLAC. They also generate their own clock local to the DS, so a PLL is not integral to the design. I just met the son of the owner and founder of Linn, and he's both very technical (he's an engineer) and very musically oriented. The Akurate DS has balanced outputs and an excellent DAC, but it's priced so high that I doubt I will ever own one. But that doesn't mean I think I can't design something just as good. In other words, I think we may have very similar goals.
I do already own the Sound Devices 702, which supports FLAC recording and playback, and even has a nice headphone preamp. But for some reason I'd rather leave that thing untethered so I can use it in portable situations as it was intended. For home use, I'd rather build myself something that can be 'permanently' installed.
BTW, where do you buy FLAC from? I would love to know, most places I get tracks from are high bit rate VBR MP3 (ick).
LINN Records (http://http://www.linnrecords.com/) for classical and folk, HD tracks (http://http://www.hdtracks.com/) for jazz, and individual artists like Trent Reznor of Nine Inch Nails, but primarily WARP Records (http://http://warp.net/) and Bleep.com (http://http://www.bleep.com/) where I get the music I listen to the most. I have purchased countless FLAC files from Bleep.com, and since I was an early adopter, I've seen them go through phases where FLAC waned slightly and now it's actually coming back again, but I have actually made purchases from all these listed sites. I'm sure there are more.
The ideal system would have HS USB support. The chip that Cumby is based on in BGA would have everything you would want for a Linux based approach and is quite cheap, but it looks a Linux only style approach. I want to do it bare bones, if I have to writing ASM optimization, etc, etc.
I agree with your inclination to avoid Linux, and I totally understand the desire to use ASM if necessary. Personally, I would not bother with High Speed USB because I would prefer FireWire, but I'm not even going to try to convince you to join me in designing a FireWire decoder. There's also the issue that High Speed USB-Audio drivers do not really work all that well. In all actuality, an Ethernet system might be easier than High Speed USB. If anything, I think it would be great to set some interconnect standards so that enthusiasts could mix and match decoders and converters.
In the end this project is a classic integrate and make it cheaper, modular make it more flexible but expensive idea. I feel like if the price can stay under 250 for a nice one and 80 for an OK one (excluding hard drive) then it would be okay.
I think that with a little careful planning up front, it could be quite easy to set a standard so that cheap boards and medium-prices boards would all connect properly. It seems ideal if people could mix a nice DAC with a cheap decoder, or a cheap DAC with a fancy surround decoder, or go all cheap, or even all high-end.
As for keeping costs down, I've been thinking all along that it should be possible to avoid SPDIF, avoid HDMI, avoid any kind of PLL, avoid SRC (Sample Rate Conversion) in real time (unless the ARM can handle it without adding cost), and thus skip a lot of the chips and expensive circuits that usually drive up costs. Sure, some features might cost more, but I think that designing for flexibility does not necessary make the product cost even $1 more. The only cost there is spending a little extra time to think about alternative uses for the product and whether a few extra pins on a connector might facilitate a compatible interconnect. Besides, the more generic the standard, the more people who will be interesting in buying the boards, and that brings the price down.
I'm not an audio guru, but if you avoid SPDIF and HDMI, is there any digital output format/connector left to the field? If I were to buy one of these players, I'd need digital output or really expensive active speakers or headphone amplifier or... I know I'm not even high-end or 'hifi' user, but I need good quality sometimes, not always. My wife listens to radio at home, so I'm somewhat used to so poor quality music (both technically and as in 'music').
What I'm trying to say:
Sometimes I listen my FLAC collection with good headphones (AKG K701) and sometimes just as much better radio replacement via not-so-high-end speakers.
My amplifier has many digital inputs and some analog, but *if* this nice project would have only analog output, I hardly ever could/would listen my FLACs as background music. So if it isn't really hard or give poor quality or make it expensive, could you please have one digital out also? Thanks :)
ASDF: It would have a DAC board and that board can have a SPDIF enabled CODEC on it in place of a high end stereo DAC output. The point of the project is high quality analog output better than the converted in your amplifier.
If all you want is SPDIF you could get a board like this:
http://www.twistedpearaudio.com/digital/wm8804.aspx (http://www.twistedpearaudio.com/digital/wm8804.aspx)
I would put it together myself to save money... That is one of the few stand alone I2S to SPDIF transceivers right now. Though now that I think of it, that board might only be setup as a receiver and not output but I think the chip does both directions, you'd have to make your own PCB in that case.
rsdio:
Tons of interesting stuff in there. Let me cover some points though without quoting, just let me know if it is confusing.
USB audio does have a problem, but this isn't a USB audio device so those problems aren't there. Firewire isn't supported really outside Apple anymore and USB is so def USB for any mass storage.
On the jitter/PLL thing, what I meant was the I2S interface in most of these micros it goes:
XTAL -> PLL -> counter -> Clock out
So the XTAL jitter isn't bad, the PLL is the thing that give most of the jitter I think. I wrote the Ti people on their forum about jitter on the chips I have. I think the jitter could be an issue but the more I think of it why have a FIFO that has to be synchronous to the PLL clock in slave mode, so I bet slave mode will solve it.
Can I ask what you do for a living? (Like what these projects are?, PM me if you don't want to talk in public, although you don't have to say at all... I just am curious now).
You might be interested in some of the DAC boards at: http://www.twistedpearaudio.com/ (http://www.twistedpearaudio.com/)
Their stuff is quasi-open. Buffalo-II seems a bit overkill to me, but that's me. A WM8740 board isn't that pricey you can even do bridged stereo to get 3 extra dB wooooo. Voltage out is easier than IV stages too...
My basic plan is to design a board that could work as the core for this project but also could be used for things analogous to Ian's web platform only also having a USB host on it as well. The I2S stuff would get special treatment on the Dev board.
You can actually see this on the wiki on my website: http://teholabs.com/docs/procyon:overview (http://teholabs.com/docs/procyon:overview)
When I was doing parts lists for this design I found out is is cheaper to buy DIMM and desolder the RAM than get the RAM from Digikey/Mouser/Newark/Arrow. I don't think I will do that though...
I hope to have a design you and others can look at schematic wise maybe late April if I am honest. Maybe before if I suddenly find free time.
Unless I'm mistaken, we're talking about an inexpensive design with a basic concept of two boards: a decoder board and a DAC board. The interface that I keep talking about is an internal interface between the two boards, not really an external interface. By standardizing on the internal interface, people can spend as much money as they want on the decoder, completely independently of how much they spend on the converter.
Another important aspect is that digital output is very easy to include as an optional feature on the 'DAC' board a.k.a. converter board. The decoder would still interface to the converter via the custom interface that I am describing, but it would be a simple matter to put SPDIF, headphone output, analog output, or whatever you want on the converter board. Expensive converter boards might have all of these output options, cheaper boards would have fewer.
@asdf
In your case, a very inexpensive converter board design might have nothing but SPDIF outputs! That's going to cost far less than any converter board with DAC chips. This is actually my point: With a common interface between decoder and converter, you're not stuck paying for expensive features that you'll never use.
By the way, asdf, you might be surprised to learn that a DIY music player like brian has described might actually sound better through analog outputs than if you use the digital inputs of your existing home stereo. But rather than get into the details of why a DIY analog output would be better than your digital inputs, I will just reiterate that the designs I'm talking about do not preclude having digital output on some of the converter board variations.
To clarify, I'm only talking about avoiding SPDIF *inputs* and HDMI *inputs* - which would necessarily be on the decoder board. The system will be less expensive if the input options are focused on EN/FW/USB (Ethernet, FireWire, USB) - note that these are all bidirectional interfaces, where SPDIF and HDMI are unidirectional. The question of *outputs* is actually independent, and any option could easily be added to the converter boards without changing the rest of the design in any way.
EDIT: Mass storage is an equally viable option along with EN/FW/USB, because it is inherently bidirectional.
Your have it correct RSDIO. I should draw a block diagram at least for people this weekend :-)
[quote author="brian"]USB audio does have a problem, but this isn't a USB audio device so those problems aren't there. Firewire isn't supported really outside Apple anymore and USB is so def USB for any mass storage.[/quote]I agree that if you are going to have mass storage, then it should be USB. I was actually thinking that mass storage would be quite optional, whereas you absolutely need some kind of input for the data. For audio streaming purposes, I'm thinking EN/FW/USB, even though I don't really see any advantage to FW for mass storage. If you start with Ethernet streaming via UPnP, you could feasibly start with a design that has no mass storage at all.
On the jitter/PLL thing, what I meant was the I2S interface in most of these micros it goes:
XTAL -> PLL -> counter -> Clock out
So the XTAL jitter isn't bad, the PLL is the thing that give most of the jitter I think. I wrote the Ti people on their forum about jitter on the chips I have. I think the jitter could be an issue but the more I think of it why have a FIFO that has to be synchronous to the PLL clock in slave mode, so I bet slave mode will solve it.
Synchronization is only necessary when the audio comes into the system via a unidirectional protocol like SPDIF, HDMI, or some of the brain-dead variations of USB (and even FW). However, you can completely eliminate the need for synchronization if you design around a bidirectional protocol like Ethernet streaming, or the sane variations of FireWire and USB, i.e., where the data flow is isochronous.
I'm fairly certain that you can eliminate the PLL and thus eliminate all of the jitter. The only thing you give up is SPDIF (and HDMI) input and some of cheaper USB-Audio implementations. The only question is whether the DAC board (converter) is master clock or the ARM board (decoder) is master. I have a hunch that it would be easiest and cheapest to make the decoder board have a master crystal, and the only reason this wouldn't work is if the ribbon between decoder and converter adds jitter somehow. I doubt that it would be anywhere near as bad as a design with a PLL.
Note: Many digital audio chips have a PLL built in, so the challenge is to find out how to set up the clocking for those chips such that it bypasses the PLL. On the other hand, many pure DAC chips do not have a PLL, and so it's really easy to avoid jitter with those designs. I think that the PLL designs are found mostly on the high-level chips that handle SPDIF or USB, whereas if you're doing the high-level stuff on an ARM anyway, then you might not even need anything but low-level DAC chips which have no PLL.
Can I ask what you do for a living? (Like what these projects are?, PM me if you don't want to talk in public, although you don't have to say at all... I just am curious now).
I earned a BSEE, spending 4.5 years at NCSU, but then I got a job in software. Even though I quit being an employee long ago, and have focused on independent consulting ever since, most of my early career has still been software and firmware. Starting about four or five years ago, though, I began working professionally on designing circuits, PCB layout, and electronic product design with a focus on selecting parts for high features and affordable manufacturing. I continue to do firmware and software in addition to hardware. I have always tried to keep my career focused on sound and music, so I have dabbled in live recording, mixing, mastering, and writing audio plugins. I'm also a mid-level to high-level audiophile freak. I always get compliments on my stereo sound system, even from people who don't know about the audiophile world.
My basic plan is to design a board that could work as the core for this project but also could be used for things analogous to Ian's web platform only also having a USB host on it as well. The I2S stuff would get special treatment on the Dev board.
Ah, that's a very interesting niche. Your 'decoder' would then cost a bit more than a bare bones variation, and would be more likely to have mass storage. I think there would be some benefit to coordinating on a common interface between decoder and converter (as I keep saying), although perhaps there are already some standards out there in the open hardware world. Thanks for the links!
When I was doing parts lists for this design I found out is is cheaper to buy DIMM and desolder the RAM than get the RAM from Digikey/Mouser/Newark/Arrow. I don't think I will do that though...
I am mostly focused on embedded designs with SRAM, so I have not experienced DRAM pricing issues. For the decoder that I envision, massive amounts of DRAM are not necessary. But I can see why your overall platform might need lots of memory. Maybe someone else in the Dangerous Prototypes community would be better suited to advising about DIMM and DRAM issues.
I hope to have a design you and others can look at schematic wise maybe late April if I am honest. Maybe before if I suddenly find free time.
Great! I'll certainly take a look when it's ready.
Meanwhile, I'll probably continue some research, including following the links you provided, and see if I can figure out a decent proposal for a common interface. Maybe you won't like my ideas, and maybe there's already a great standard out there, but if the timing is right then maybe we can come up with a powerful interface.
For me I want to have it in a room without computers, given the cost of WiFi (unless I do that via USB host on some awful networking stack) that means I need to use mass storage, but I will totally support streaming when I can get the time to do it.
I sketched the 4 concept boards and made a wikipage:
http://teholabs.com/docs/openhifi:overview (http://teholabs.com/docs/openhifi:overview)
Given your business you clearly know a lot about this area. My concept is just a concept. Price of SDRAM is pretty low <4 dollars for 64 MB I think it was. The FLAC decoder *might* fit in 96 KB but it would be tight (at least the reference decoder, the memory requirement as I recall was because of the block size, but I haven't read the FLAC spec in months), extra memory could be used to buffer files. The max throughput over USB FS I have gotten with Chan's FatFS is about 750 KB/s so all bitrates for this system will need to be less than that.
[quote author="brian"]Price of SDRAM is pretty low <4 dollars for 64 MB I think it was. The FLAC decoder *might* fit in 96 KB but it would be tight (at least the reference decoder, the memory requirement as I recall was because of the block size, but I haven't read the FLAC spec in months), extra memory could be used to buffer files. The max throughput over USB FS I have gotten with Chan's FatFS is about 750 KB/s so all bitrates for this system will need to be less than that.[/quote]
Embedded systems usually segregate code memory from data RAM. A FLAC decoder could easily require 96 KB of code memory, but I don't really know what it's RAM footprint would be. I can say one thing for sure: FLAC was designed with embedded systems in mind. I cannot imagine that it would require 64 MB of RAM. I am not familiar with the ARM memory setup; I'm sure it varies from chip to chip. An example PIC has 128 KB of Flash code memory and 4 KB of internal SRAM. That's probably too small for FLAC, but an ARM chip probably has more memory unless it is designed purely for external memory. Keep in mind that FLAC probably only need Kilobytes of memory, not Megabytes.
A PIC32 might do it, the big brother of all as 128Kb of RAM, but I think that there are ARM's with more.
I'm a follower of the diyAudio forum to, and I like HiFi a lot, so I will follow this topic closelly, but as a simple student I'm a bit behind all your knowledge, I can understand everything, but I cant help for example in the design of a low noise DAC or even code a CODEC reader.
[quote author="rsdio"]
Embedded systems usually segregate code memory from data RAM. A FLAC decoder could easily require 96 KB of code memory, but I don't really know what it's RAM footprint would be. [/quote]
It is a flat memory space you can use the whole SDRAM just like internal SRAM from my understanding of the EPI interface... It just shows up in the memory map at a different address. I know for sure you can execute out of it though it is slower. Though I doubt a bootloader approach would be need given 256 KB flash for code.
http://www.hydrogenaudio.org/forums/lof ... 68766.html (http://www.hydrogenaudio.org/forums/lofiversion/index.php/t68766.html)
If you want to support any block in the standard it looks like you need 2MB of space to load a full block. That said all the FLAC I have seen is in the less block size as mentioned in this thread.
As jcoalson says at the end you could reduce the memory footprint to the frame buffer size if you rewrite the library to 18 KB.
The open questions are do I want to spend the time rewriting libFlac and do I want to support the max block size.
[quote author="senso"]A PIC32 might do it, the big brother of all as 128Kb of RAM, but I think that there are ARM's with more.
I'm a follower of the diyAudio forum to, and I like HiFi a lot, so I will follow this topic closelly, but as a simple student I'm a bit behind all your knowledge, I can understand everything, but I cant help for example in the design of a low noise DAC or even code a CODEC reader.[/quote]
There actually is a portable player on PIC32 that does FLAC (I think) or maybe it was just OGG that used a PIC32 with 128 KB.
The largest SRAM I could find on a chip with a good I2S was 96KB (LM3S9B9x).
Is PIC32 programmed with regular JTAG?
[quote author="brian"][quote author="rsdio"]If you want to support any block in the standard it looks like you need 2MB of space to load a full block. That said all the FLAC I have seen is in the less block size as mentioned in this thread.
As jcoalson says at the end you could reduce the memory footprint to the frame buffer size if you rewrite the library to 18 KB.
The open questions are do I want to spend the time rewriting libFlac and do I want to support the max block size.[/quote][/quote]
The default block sizes for flac are 1.125 KB and 4 KB. I would be surprised if any commercial FLAC files had more than 4 KB blocks, because that would require the vendor to use the command-line version of flac with a very customized set up options. I suspect that most web sites just use a GUI front end for FLAC, and never bother with the command line, much less creating a custom option set. The format itself does indeed support quite a lot of flexibility, including 8-channel surround, but I don't think you need to support the maximums.
It looks like you're already researching FLAC directly, so I'm hoping you'll find others that have modified libFLAC to compile for an embedded processor. As I mentioned earlier in this length thread, I remember on the FLAC Dev mailing list that someone contributed embedded optimizations, and they were probably for ARM. If not, then your easiest choice might be to use whatever embedded processor already has open-source optimizations in the standard FLAC distribution (unless this was all a pleasant dream that I had).
I did do a lot of research already. I mean I looked and read a lot to figure out how big the total SRAM footprint would be. I had STM32F103 development going and killed it because the connectivity line didn't have enough RAM to be sure I could fit the decoders in. Then I found Luminary parts and well frankly they are a *LOT* easier to work with than the STM32 stuff which had only CMSIS pretty much which makes me want to cry (I feel it is pretty obfuscated code).
There is an ARM ASM branch for OGG. I would guess if there is embedded anything other than a DSP code for FLAC it would be on ARM also.
Oh back on the PLL jitter thing, there has to be a PLL used somewhere unless you want an only integers of the master clock. 44.1 and 48 KHz support at once makes that hard (I think).
[quote author="brian"]Oh back on the PLL jitter thing, there has to be a PLL used somewhere unless you want an only integers of the master clock. 44.1 and 48 KHz support at once makes that hard (I think).[/quote]
Most audio boards seem to have two crystals.
Interesting... can you give an example? Do you know what frequencies? Most of the DACs seem to only have one clock input that I have seen at least all the Ti and Wolfson ones that I looked at.
Everyone seems to use 24.576 MHz and 22.5792 MHz crystals. Those are power-of-two multiples of the desired clock rates, and so a DPLL is not needed because only simple clock dividers can be used.
However, I have seen designs with only a 24.576 MHz crystal which were initially most accurate at 48 kHz multiples and less accurate at 44.1 kHz, but then some hardware upgrades improved things by putting both crystals. By the way, this upgrade attached at the point where the 'decoder' and 'converter' boards connected.
There are designs that I looked at today which use an FPGA that has clock multipliers and clock dividers. When you multiply a clock to a higher frequency, you need a DPLL. This may explain why some rates are less accurate unless you have two crystals.
My question is how to switch between clocks, although I suppose some chips have multiple crystal I/O pins, so that might help.
If you have the ability to take a clock from a single pin I can see how that would work well... particularly if these clocks are the master for the DAC and and the processor/uC is a slave with it's own 3rd clock.
Then the uC could switch clocks though some buffer/CPLD arrangement.
In any case that is quite interesting. I have mostly seen 12.288 MHz for audio.
Ti has answered my question about I2S clocking:
"The FIFOs are dual port, "asynchronous" between the system clock and the I2S SCLK domains. The transmit/receive serial shifting is handled in the I2S SCLK domain. "
http://e2e.ti.com/support/microcontroll ... spx#330116 (http://e2e.ti.com/support/microcontrollers/stellaris_arm_cortex-m3_microcontroller/f/471/p/94384/330116.aspx#330116)
So that looks good for ultra low jitter support, from the DAC board.
So, you just need a way to generate SCLK outside the ARM, and that would be the master for both DAC and ARM.
An idea that I just presented on diyAudio was to make the interconnect have the option of clock loopback, so that either the decoder or converter board could supply master clock. If the converter board is master, then it ignores the clock from the decoder board and sends its local clock back to the decoder. If the converter board does not have a clock, then it can connect the incoming decoder clock via a short trace to the return clock. The decoder board might have a simple clock based on the system clock, and that would be good enough unless the converter board has a better clock. Basically, plugging in a converter board would effectively switch the clock between boards without needing a jumper, but the problem with this idea is that the clock still traverses the interconnect.
Others have been talking about LVDS for I2S interconnects, and still others mention IsoLoop GMR devices for isolation. I'm wondering if there is a way to standardize on an interconnect pinout that makes these options possible without requiring them. LVDS could be implemented by just grounding the negative pin on single-ended transmitters or ignoring the negative pin on receivers, but if both transmitter and receiver are differential then the signals would have extra protection from noise or jitter.
[quote author="rsdio"]So, you just need a way to generate SCLK outside the ARM, and that would be the master for both DAC and ARM.[/quote]
For the implementation I am considering more like: SCLK of the I2S can be generated by the ARM PLL or from anywhere else and then the ARM is the slave.
Both the ARM and DAC I wish to use are single ended for what is is worth.
As far as loopback/detect. It seems like just having a extra pin from the DAC board is the easiest solution to detect who should give the clock. If it is high then the ARM should be a slave.
I don't mind interoperability with other designs but not if it increases cost.
[quote author="brian"]For the implementation I am considering more like: SCLK of the I2S can be generated by the ARM PLL or from anywhere else and then the ARM is the slave.[/quote]Any design considered should certainly support the simplest case of having the ARM generate the master clock. In all likelihood, this could be good enough for most people, and might even be better than most SPDIF-clocked DACs. My goal is to come up with a design that supports the cheapest and simplest options without locking the design out of any better options that may come along.
Both the ARM and DAC I wish to use are single ended for what is is worth.
What I am suggesting is that high-end designs would add LVDS or some other kind of differential transceiver chips to improve isolation and signal quality between the decoder and converter boards. I am not aware of any DAC chips with differential clock inputs, because they're designed to take signals which are local to the board. Whenever you send signals between separate boards, though, you want to consider whether certain techniques can improve the signal quality, especially if jitter or clock edge ringing could pose a problem. In other words, a high-end DAC board might add isolation between the interconnect and the DAC chip, and later there might be improve decoder (ARM) boards which do more signal conditioning between the ARM and the interconnect.
My idea blends a couple of common things I've seen. On the one hand, it seems quite standard for single-ended ribbon connectors to place a ground line between every signal. On the other hand, differential signals would pair the + and - versions of each signal. My thought is to combine both into a common pinout. The connector could treat the - pin as allowing either ground or the - signal, while the + pin always carries the non-inverted signal. This whole concept is potentially unneeded, but since I've already been reading on diyAudio where people suggest LDVS for I2S between boards, it sure seems like it's worth designing the option for LVDS in a way that doesn't require boards to be changed later if LVDS becomes demonstrably beneficial. Obviously, the standard wouldn't work in differential mode unless both sides are using LVDS, but I think it's possible to make the connection compatible between differential and single-ended, thus allowing more experimentation and more compatibility between DIY boards.
As far as loopback/detect. It seems like just having a extra pin from the DAC board is the easiest solution to detect who should give the clock. If it is high then the ARM should be a slave.
That's a decent suggestion, but I think it costs more than what I'm suggesting. With your idea, the voltage on that pin would need to drive a 2-to-1 mux chip to select between two clock sources, and you would still need another pin for the DAC board to send its clock to the ARM board as a potential master clock selection. So, your idea requires 3 pins (ARM clock to DAC board, DAC clock to ARM board, and the select voltage) plus a mux chip (unless the ARM has multiple clock input pins and can internally mux them).
My idea only requires 2 pins, the same ARM clock and DAC clock. Basically, the DAC board would have the equivalent of a jumper. The ARM should always slave to the clock pin coming from the DAC board. If a trace on the DAC board connects the two clock pins, then the ARM clock will technically slave to the very same clock that it is generating. But if the DAC board does not short the pins, then the DAC board must generate a clock (either ignoring the ARM clock or using a PLL to sync) to send to the ARM.
I hope this makes some sense. I could draw a diagram if it will help.
I don't mind interoperability with other designs but not if it increases cost.
Agreed. I am always seeking the lowest part count, and lowest cost parts. But I have also learned that a reasonable amount of advance planning and design work can allow you to reuse initial, cheaper designs that interoperate with more advanced designs as you learn more and make improvements. So far, I don't think any of my suggestions require additional expense in the least. About the only potential cost increase I'm suggesting is possibly a greater pin count on the decoder (ARM) to converter (DAC) interconnect, but if you start with the typical "ground pin between each signal" convention that many people use for ribbons, then my suggestions don't even require more pins.
Here's a proposal for an interconnect:
01 GND
02 DACLK+
03 DACLK-
04 MCLK+
05 MCLK-
06 BCLK+
07 BCLK-
08 WCLK+
09 WCLK-
10 DAT0+
11 DAT0-
12 DAT1+
13 DAT1-
14 DAT2+
15 DAT2-
16 GND
DACLK is either (a) generated on the DAC board, or (b) cheaper DAC boards can just loop MCLK back to DACLK with short traces (this is partially inspired by the JTAG standard, which is flexible about where its master TCLK comes from)
MCLK is generated on the ARM board, but may not be used by all DAC configurations.
BCLK is the bit clock
WCLK is the work clock
DAT0 is the output data from ARM to DAC
DAT1 is optional input data from DAC to ARM, for SPI DAC chips which need bidirectional communication for GPIO or other uses
DAT2 is a spare output data in case it's useful to have a digital controlled potentiometer on the DAC board that is separate from what can be done with GPIO on the DAC chip itself
Thus, you have a 16-pin ribbon, or even a 14-pin if you skip the DAT2 signals. Since DAT2 is at the highest-numbered pins, systems would still be compatible.
All pins are considered outputs from the ARM board, except for DACLK and DAT1, which are inputs to the ARM board.
All signals are single-ended and differential compatible. Single-ended boards just use the + pin of the signal, but ground the - pin for output signals while leaving the - pin floating for input signals. Differential boards will use transceiver chips to drive both + and - pins for output signals, and also to read both + and - pins for input signals. Single-ended boards would have no added costs, because no transceiver chips are needed, and the 'extra' - pins just end up being nice little ground lines between each digital line. Differential boards may or may not perform better with regard to jitter and signal edges, but there is certainly no added expense compared to other LVDS designs for high-quality I2S interconnects.
P.S.
I was tempted to put 4 sets of output data lines to handle 8 channels of audio for 7.1 surround, but that would require a 20-pin ribbon that might be too much trouble. That would keep the total pin count for surround to a minimum by sharing clock lines as much as possible. On second thought, now that I look at this in more detail, maybe it makes more sense for surround systems to use four separate connectors rather than combining all channels on one connector. With the flexible master clock options, it should be easy enough for surround systems to just use multiple 14-pin or 16-pin connectors, scaling nicely from stereo to quad to 5.1 to 7.1, and only really affecting the decoder board (which will probably need more processing power for more channels anyway). The nice thing about compatible interconnects is that you could build a 7.1 system with a single master clock by just plugging in 4 identical, cheap stereo DAC boards.
Let me cover a few bits here:
First I wasn't clear enough on the single pin select the I2S is a 4 wire interface one of which is the master clock. If the 5th pin is high the MCLK is an input if the 5th pin is low then it is a output. It is one pin extra.
To be clear the bus I was proposing is:
[Master CLK; Tx CLK ; DATA; WORD Select; CLK Select; GND]
The only place you might need mux-like things is if you want multiple clocks on a DAC board.
The current design concept for the processing board requires only 4 ICs and some passives.
On LVDS, here is my question: why? Differential signals have two advantages: noise immunity and speed. We are using 3.3 V LVCMOS the noise margin is large and the speed will be more than fast enough I think, so why make it LVDS? For an analog signal I would fully support a differential output.
Do you know of any chips that use LVDS for I2S? I don't at this time.
I get your loopback idea now...
The CLK traversing the interface isn't any issue that I can think of. You can't get jitter (phase noise) from that. Other types of noise slightly. We aren't talking about an RF system here though, this interconnect has huge noise immunity already...
However, the loopback idea won't work on any ARM I know of because of how hardware is designed. MCLK is either an input or an output it isn't 2 pins. And there is no synchronous way to generate the MCLK elsewhere and always have it as a slave. It needs to 'know' to switch modes.
Let me add though I don't mind adding pins to the ribbon. MCLK GND TCLK GND DATA GND WS GND CLK Select GND would be fine also. But to add LVDS to any ARM I can think of would add a chip. If you want an option of replacing the processing board with something else and just want all the ribbons for the 'standard DAC' boards to be the same that's fine I don't mind adding GNDs. What I don't get is why LVDS would be needed at all...
Simplified Proposed Processing/DAC I2S connection:
PIN Funtion (alt function)
01 MCLK (MCLK+)
02 GND (MCLK-)
03 BCLK (BCLK+)
04 GND (BCLK-)
05 WCLK (WCLK+)
06 GND (WCLK-)
07 DATA (DAT0+)
08 GND (DAT0-)
09 Master/Slave CLK Select
10 GND (Free)
10 PIN ICD cables are very common/cheap
MCLK is an output from processing board if CLK select is low, input if high.
There will be another header on the processing board that has things like SPI on it if the board need SPI control the software will need customization anyway so it isn't a general purpose interconnect and therefore I think should be separate.
[quote author="brian"]Simplified Proposed Processing/DAC I2S connection[/quote]
Perfect!
(personally, I might change the order of the pins, but that's a non-technical aesthetic)
[quote author="brian"]On LVDS, here is my question: why? Differential signals have two advantages: noise immunity and speed. We are using 3.3 V LVCMOS the noise margin is large and the speed will be more than fast enough I think, so why make it LVDS? For an analog signal I would fully support a differential output.[/quote]Frankly, I only thought about LVDS because it was depicted on diyAudio as a good solution for I2S to avoid jitter. I don't know yet whether it would offer an improvement or if it would be necessary, but your latest interconnect would seem to support it so why not keep it on the back-burner as a potential future enhancement? I certainly agree that there is no need to start out with it, but it's nice to be prepared in case there is a need. Apparently, ground bounce and other issues can cause problems with the edges of the clock, and thus affect timing (i.e. causing jitter), where LVDS avoids the ground bounce because each signal has a private reference. On the other hand, some of the diyAudio folks seem to have solved this by tuning their traces via series resistance. Having all options available without redesigning the interconnect is my goal.
Do you know of any chips that use LVDS for I2S? I don't at this time.
No, this is purely a design consideration for signalling between boards, so it is separate from the I2S interface on individual chips. Generally, I've read that when analog signals need to be moved between boards, it's often better to convert to digital before sending the signal to another board, even if the other board is just going to convert directly back to analog. I2S is already digital, so it doesn't make much sense to use anything fancier than straight digital, except for the one fact that the timing of the clocks is critical for the DAC. For pure data transfer, single-ended should be enough, but for precise clock edges and timing, I can see how LVDS might help. By the way, now that I think about it some more, the differential signalling might not be needed for the data lines; it may really only help for the clock lines. But, since putting a ground trace between signals is a good idea anyway, it doesn't hurt to be prepared for differential as an option on every signal.
However, the loopback idea won't work on any ARM I know of because of how hardware is designed. MCLK is either an input or an output it isn't 2 pins. And there is no synchronous way to generate the MCLK elsewhere and always have it as a slave. It needs to 'know' to switch modes.
You're absolutely right. That's what I get for jumping ahead with a proposal without looking into all of the details. In fact, MCLK probably works this way for most CPUs. About the only exception I know of is the TMS320 McBSP, which is a 6-pin SPI superset with individual transmit clock and receive clock. I happen to be designing with the TMS320VC5506 on my current project, so perhaps I was suffering from tunnel vision because what I proposed originally would be easy with the McBSP or McASP.
On that note, though, I think it might be good to look over the data sheets for the popular DAC chips in use in the DIY community, in case any of them require the return data line. I know that serial ADS require two-way SPI traffic, because you have to set modes and read samples. A DAC can usually get by with one-way traffic because you can set modes and write samples without ever reading anything back. But it wouldn't hurt to do a quick scan of what's out there to see whether one or more common DAC chips somehow requires both input and output data. Only problem is that adding the return data would bump the 10-pin to 14-pin, because I don't think 12-pin is very common.
Let me add though I don't mind adding pins to the ribbon. MCLK GND TCLK GND DATA GND WS GND CLK Select GND would be fine also. But to add LVDS to any ARM I can think of would add a chip. If you want an option of replacing the processing board with something else and just want all the ribbons for the 'standard DAC' boards to be the same that's fine I don't mind adding GNDs. What I don't get is why LVDS would be needed at all...
Yes, LVDS would require an extra chip. I consider it optional. As I mentioned above, I saw some comments on diyAudio about how LVDS would be ideal. It seems at least reasonable to assume that it could improve jitter by removing ground bounce or other inter-board signalling problems. By the way, the diyAudio forum also seems to speak highly of the NVE Corp.
IsoLoop GMR isolation chips. Those are also possibly overkill, but adding them would not require any revision of the pinout, so there's no need to design around it. Comparing the two, LVDS would require an extra chip on board boards, but the GMR isolation would probably work if it were just on one board. Also, I'm not really clear whether both LVDS and GMR would be useful, or if it's a one or the other situation.
[quote author="rsdio"][quote author="brian"]Simplified Proposed Processing/DAC I2S connection[/quote]
Perfect!
(personally, I might change the order of the pins, but that's a non-technical aesthetic)[/quote]
I think details like the pin order might be changed based on some layout consideration, although if I want to make this truly modular it might be better to forgo that kind of consideration in terms of some sort of logical arrangement but I see no reason for any order.
On the LVDS stuff, the ground bounce is interesting but well if you do your grounding well shouldn't happen. You would have to inject quite a large current to move the GND reference... it is possible, however in addition you need to have this current injection level to GND vary somewhere else to create jitter. Again it is possible but seems improbable.
I have 2 projects in front of this one to try to get some traffic for teho Labs, but hopefully there will be something physical in a few months.
User interface specification:
I am leaning toward the idea of using UART transport for the interface between the display/control board and the processing board.
The main question is should the interface be packet based or escape sequence based. A plain text interface would allow for control from a serial terminal easier. A packet based OP-CODE like command language would be more compact.
The choices are really SPI/I2C/UART, the simplicity of baud rate in SPI is its most appealing feature but SPI bandwidth isn't really needed here and not everything has a SPI. I2C is probably the least likely as it is the least supported (? <- just my feeling, may be off) by common controllers.
There is also a specification for track information that needs definition. Meta tags support a lot more information that I think is practical to have in the database. The question is what fields are important to preserve?
Performer (Artist/Band/Orchestra/Etc)
Work Title (Album)
Track Title (Song Title/Movement)
These would seem to be the minimum to store. What else release date? These fields should be limited to what people would conceivably most often want to sort/search by. Perhaps label is relevant for Classical people who might just recall it was a Naxos release of Ives' whatever.
I would like to make the specification pretty flexible but still very compact. I will have to define data structures pretty early on. My feeling is some sort of linear array with multiple linked lists of indexes is most likely. It seems like it could get quite messy and probably requires more thought than I have given it. Other than the decode stuff this is certainly the most critical software bit. Suggestions welcome.
Hmm, doesn't a high-level, modular display interface add to the cost by requiring more chips? It seems like you would need a second uC on the display board to interpret your custom commands. Wouldn't it be cheaper to just directly connect a raw display to the main board via SPI and have the firmware talk directly to the display? Granted, hard-coding the display like this would mean that DIY folks couldn't easily swap out a different display, but with open source firmware they could easily make the needed changes.
The other half of the question is user input - and I assume that could be implemented via direct I/O pins on the main board and a few buttons.
P.S. I realize it might seem silly for me to suggest an elaborate and expandable interconnect for the DAC board and then suggest a very hard-coded and specific interconnect for the display and input buttons, but I can't think of a universal interface that doesn't require extra parts.
Yes it does add to the cost, but this is a cost well worth it. Some people will want a touch screen some will be okay with some simple LCD display. It may be possible to support something rudimentary on the main board like a HD44780 display and rotatory encoder for the extremely cost sensitive folks. A secondary micro to handle these functions need not even use a XTAL or lots of bipass caps. A 2 dollar attiny2313 would probably work for many display choices. Nothing will prevent people from using the main board to support displays but the modular system I think should define a separate interface so that some people can just buy a board knowing it will be a black box and not have to code for ARM stuff at all or wade through a lot of code, they can just use a PIC/AVR or whatever they know/like. The reference amplifier board is going to cost a lot more :-)
So in summary my goal is to define a modular system but nothing should prevent people from making it less modular by writing more software.
[quote author="brian"]I think details like the pin order might be changed based on some layout consideration, although if I want to make this truly modular it might be better to forgo that kind of consideration in terms of some sort of logical arrangement but I see no reason for any order.[/quote]I have a habit of making pin 1 be GND for all connectors. It's nice to have the pin 1 arrow pointing to a known signal, too. But, I see that you placed GND on pin 10, so that's just as good.
I have the urge (call it OCD) to place MCLK next to Master/Slave CLK Select, since one controls the other. Apart from that, the order would be the same.
01 GND (single-ended reference)
02 Master/Slave CLK Select
03 MCLK+
04 MCLK-
05 BCLK+
06 BCLK-
07 WCLK+
08 WCLK-
09 DATA+
10 DATA-
Note that if the converter board is single-ended, then all the (-) pins should be left floating, to avoid stressing an LVDS driver on the decoder board. Instead, the reference GND would be used for all signals. Conversely, if the decoder board is single-ended, then all the (-) pins should be grounded, to provide proper referencing for an LVDS driver on the converter board. Connections for differential mode should be obvious.
I may have them GND through cutable/solder jumper on a single ended board, or a physical jumper if it is a problem.
Why would having the neg terminals GND "stress" a differential input? (I agree that in the input diff pair +/- x biases the pair differently than +2x/0) You said use the single GND for ref for all? I am afraid I don't follow how that would work if it is assuming LVDS on the decoder and sees a floating pin on each neg input.
I think you can get by without any jumpers or 'bowtie' pads if you're willing to design a board to be just single-ended or just differential. I guess there might be an advantage to designing the board for differential but make it work if you leave off the differential driver/receiver, in which case the same PCB could be reused for both variations and be configured by jumpers, but I was thinking that folks interested in differential would have vastly different ideas for the other parts of the circuit too, so there might not be any point to having jumpers. In other words, the converter boards with really high-end DAC chips would always be designed for differential, but cheaper converter boards would always be designed single-ended to reduce parts counts and cost. Make sense? I realize I may be missing something, so feel free to point out anything.
Stress is like this: If a differential driver is sending a '0', then '+' will be low and '-' will be high (3.3V?). But if a single-ended receiver has a trace connecting ground to every '-' pin, then the differential driver will be attempting to push 3.3V onto a grounded trace (although the ribbon will be in between with some amount of resistance). Therefore, the only solution seems to be that single-ended receivers should leave the '-' pin unconnected (floating). But the single-ended receiver still needs some reference for the '+' pin, so that has to be the common GND.
Example:
LVDS decoder -> single-ended converter; all pins are connected on the decoder, but the converter leaves BCLK- WCLK- and DATA- floating.
Single-ended converter -> LVDS converter; BCLK- WCLK- and DATA- are grounded because they're transmitting, all pins are connected on the converter, too.
The tricky part is MCLK-, because that depends upon whether the converter is master or slave. If the converter is master, then MCLK- will always be connected, either to the LVDS driver for differential or GND for single-ended. If the converter is slave, then MCLK- should float on single-ended boards or connect to the LVDS receiver chip if present. So far, it's easy on the converter board. The real problem is the decoder board, which has to work with both master and slave converter boards, and both single-ended and differential converter boards. Now that I think about it, this might require a jumper on the decoder board, but just for MCLK- (not any of the other pins which have a dedicated direction). You may already have thought of this.
I really should check into LVDS and other differential chip solutions. I am assuming that a differential input will accept GND on its '-' input and still properly track the single-ended signal on the '+' input, but I am also sure there are some differential chips out there that need an actual negative or positive voltage on '-' (i.e., not zero, because that would be inconclusive). It's certainly possible to make such a single-ended-compatible differential receiver using discrete op-amp circuits, but that might be ridiculously complicated, so I am hoping there is a compatible driver chip. I don't think that EIA-485 (neé RS-485) will work, because I seem to recall that it always outputs either positive or negative voltage, never gound, and further I recall that it might not receive '+' vs. GND very well (not a strong enough indication).
By the way, I realize that the voltage should be part of the standard, too, otherwise it wouldn't be very compatible across multiple boards. What's the most common DAC digital I/O voltage these days? 3.3V? Are there still any significant 5V DAC chips? Can the 5V DACs just be handled with extra chips to interface from a 3.3V interconnect to 5V on the rest of the board? Are there many DAC chips with less than 3.3V logic I/O? Most processor chips seem to have 3.3V support for I/O, so hopefully that won't pose a problem. Even the processors that run on 1.2V internally use 3.3V for I/O. My inclination would be to make the most common voltage the default, so that the lowest parts count will be the most common. Boards with chips that work on other voltages would have to convert the voltage at the connector.
I don't think you understood, what I meant about GNDs etc but I think we should put off any discussion of these tiny details until I actually have time to lay stuff out.
I am much more interested in my query about library information, as that part of the system is very unconstrained by cost etc.
I will say this though: LVDS and LVCMOS/LVTTL signaling aren't really compatible if you read the specs both boards need LVDS or single ended unless you want a complex mess. To support both even the option of both on ever board is likely to be burdensome. LVDS support would cost 3 dollars per board (6 for the system) roughly.
[quote author="brian"]I will say this though: LVDS and LVCMOS/LVTTL signaling aren't really compatible if you read the specs both boards need LVDS or single ended unless you want a complex mess. To support both even the option of both on ever board is likely to be burdensome. LVDS support would cost 3 dollars per board (6 for the system) roughly.[/quote]
I only threw in LVDS as a specific because it was mentioned on diyAudio, and it seems like a convenient shorthand. I am actually thinking in terms of differential signalling versus single-ended. As I mentioned, at some point we'll have to talk about specific chips and see if they're compatible with positive-only single-ended signals. Or, more precisely, since you're probably not terribly interested in this, I'll have to investigate a reasonable solution and document whatever I find here. At a high level, the connector pin-out at least appears to be compatible with differential, and that's enough to satisfy me for now.
Thanks for the feedback.
As I learn more about ARM, I might be able to help out with your other tasks, but for now I hope that others here in the Dangerous Prototypes community will jump in.
I appreciate your input and I will do my best to support as many customizations as possible. Differential signaling maybe of use if people need very long interconnects between the decoder and DAC. I will look into it at a later date more.
Another assumption I am making is that WCLK and BCLK can be derived from MCLK no matter who is Master. It that true? If not, then the direction of the other clock pins needs to reverse according to master/slave select as well.
WCLK and BCLK are created (I think) always by the sender be it slave or master. They are simply synced to the master clock.
In other news, the RockBox decoder for FLAC looks pretty portable. Although it only supports a subset of the full FLAC (standard reference encoder blocksizes etc).
The reference decoder API for libFLAC is woefully complex. I will probably do some more work trying to port libFLAC before I commit because I like BSD a lot more than LGPL but so far the smallest size I have gotten for a FLAC->WAV converter is 600 kB!!! vs 16 kB for the FFMPEG based RockBox version.
I am sure libFLAC can be made smaller, but there was also a comment on RockBox mailing list that after they switched the FFMPEG based decode their were much faster. FFMPEG is also more actively developed and is being backported to RockBox on occation whereas there is essentially no development now on FLAC/libFLAC (as far as I know).
I am actually using an ancient branch of RockBox for study because some of the code contributed by the original author of the FLAC decoder was hacked in such a way to make it much more specific to RockBox whereas the original decoder was more portable.
It isn't that large a code base, I will just go though function by function and update it if needed.
I have never tried porting FLAC, but are you leaving out libFLAC++? Seems like you might get an oversized image if you include both libFLAC and libFLAC++. libFLAC++ is really just a wrapper on top of libFLAC ... you don't need libFLAC++ (you may already know this, but others have certainly been confused). Have you looked at the assembly output to see where the big chunks of code are coming from? What about debugging printf() statements? Those seem like unnecessary baggage, and I assume there is some kind of #define to remove them. Admittedly, I'm doing a lot of hand-waving here without having looked into the source in years, especially not with a mind for embedded targets, but maybe these comments will help you.
Active development is kinda meaningless when the main FLAC sources are from the original author and represent the official specification. Why would anything need to change? To use a Steve Jobs-ism: It Just Works. The fact that FFMPEG is actively under development just tells me that they're still catching up to a very stationary target. Personally, I prefer projects that are not actively under development, because I do not like downloading updates, changing code, or re-compiling repeatedly.
Now I didn't use the C++ wrapper. You can find lots of comments on how big libFLAC is if you dig on google (mostly mail list stuff). There are functions that handle every part of the spec of course 90% of them you don't need. The makefile environment is *huge* and there is essentially no documentation for #defines to remove code.
The closest thing you get is "you can prune it by editing the automake file" well... yes of course... if you knew what was safe to cut, but you don't. More over automake! I never use that for bare metal stuff... It was certainly written with OS media players in mind not embedded (the Codec was for everything but the library I don't feel was targeted at all for embedded though it is mentioned as I note in the docs that you could prune the .am files).
You are correct there is nothing that needs to be improved to make things work for libFLAC and for all PC like targets any optimization would be meaningless. There are ASM optimizations for x86 in it but none for ARM. On the other hand FFMPEG and RockBox both have ARM optimizations.
Anyway....
The good news is that the RockBox decoder is pretty good. It supports up to 24-bit in theory though I have only tested with 16 bit 44.1 kHz stereo files...
The decoder as written requires 32-bit * frameLength * channels + max(FrameSize) + a huge File buffer. This does not fit in the 96 kB of SRAM on the target micro however, some minor changes and huge File buffer can be made as small as you want with a hit to performance.
The upshot to this is I have working ARM FLAC code done already. It reads a .flac and spits out a .wav both over USB mass storage. I believe the main bottleneck is write performance and file system performance in general which I will check again later...
Over USB FS (12 Mbit) I recall the read thoughput was about 720 kB/s, writing to a flash tends to be slower even if it isn't really saturating the card, because you are writing individual sectors that are smaller than the block size of the flash it is slower. So it might only write at 400 kB/s or less... Add to that that it is basically 50% of the throughput because there is only 1 pair of differential data lines and you start to see the issue....
Discarding the decoded data the task runs at about 2x realtime and with a very fast USB flash it goes at about 75-80% of real time for read+write back.
I haven't added the ARM optimizations as yet but if it is as I think limited by the file system this might not make much difference. Give that the WAV is larger and the "write" part of the operation the slowness is not surprising. I should probably try it with a USB hard disc.
Anyway, since this is pretty unoptimized and you don't actually need to do a filesystem write during decode I see no reason why this CPU won't be more than enough to do at least 16-bit stereo FLAC which is my very most important thing to do. This totally avoids having to pay for a $$$ coprocessor chip (VS1053 or otherwise).
While the decoder will fit in the SRAM in the CPU, it does not leave a ton for other purposes (probably less than 10 k for everything else when the file system is taken into account as well). So external memory might still be a good idea... My development board certainly will have the RAM. If it looks like the openHiFi can run without extra memory, I will lay out the board in such a way that those pins can be used if the SDRAM is not places. (It is a *lot* of pins).
[quote author="brian"]You can find lots of comments on how big libFLAC is if you dig on google (mostly mail list stuff). There are functions that handle every part of the spec of course 90% of them you don't need. The makefile environment is *huge* and there is essentially no documentation for #defines to remove code.[/quote]./configure --help
There are ASM optimizations for x86 in it but none for ARM. On the other hand FFMPEG and RockBox both have ARM optimizations.
Eric Wong ported to ARM7TDMI (http://http://git.bogomips.org/cgit/flac-arm-1.1.3.git/) in March, 2009, although I haven't checked to see if that link is still valid. His must be the reference to ARM that I remember.
In April of 2010, Josh Coalson claimed that FLAC had already been built for ARM in many places, and that he built a version for nslu from the original sources with no modifications. You might want to try joining the FLAC Developer mailing list if you're running into problems.
The decoder as written requires 32-bit * frameLength * channels + max(FrameSize) + a huge File buffer. This does not fit in the 96 kB of SRAM on the target micro however, some minor changes and huge File buffer can be made as small as you want with a hit to performance.
Why do you need a huge file buffer? You should use the streaming protocol and convert one block at a time before passing it off to the DAC.
Discarding the decoded data the task runs at about 2x realtime and with a very fast USB flash it goes at about 75-80% of real time for read+write back.
That's probably a decent test for prototyping purposes, but it sure seems like the wrong approach for a FLAC player.
I will def look at that guy's ARMTDMI7 port as that is close to Cortex M3...
[quote author="rsdio"][quote author="brian"]
Discarding the decoded data the task runs at about 2x realtime and with a very fast USB flash it goes at about 75-80% of real time for read+write back.
That's probably a decent test for prototyping purposes, but it sure seems like the wrong approach for a FLAC player.[/quote][/quote]
I thought that was rather self evident, that this was just a test...
A stream approach can also work but all it saves you is the frame buffer, it also makes it much slower because you transfer each subframe which are smaller. For flash based things that will be much slower.
The buffersize is only the size of a frame and how big it will be decoded. The smallest buffersize would be the size of the smallest subframe and the size that would be decoded. Both libFLAC and FFMPEG/RockBox require substantial rewrites I think to go to just subframe buffers. Again it will be much slower that way.
I am sure you read this at some point since I am pretty sure that is you writing:
http://www.mail-archive.com/flac-dev@xi ... 01047.html (http://www.mail-archive.com/flac-dev@xiph.org/msg01047.html)
But really Dave is quite right. They used libFLAC and abandoned it, that should tell you something.
Believe me if I can make things work nicely with libFLAC I will because of my personal preference for permissive licenses over copyleft for most things.
PS: (./configure --help) yes I already read that...
PPS: This is the sort of chip that is typically used in a portable player
http://www.austriamicrosystems.com/eng/ ... ers/AS3525 (http://www.austriamicrosystems.com/eng/Products/Mobile-Entertainment/Analog-Integrated-Microcontrollers/AS3525)
320 kB SRAM... look at how poor the DAC is though :-) Not only that, it seems to use S/PDIF internally. This is the chip in the some of the newer Sansa players.
FLAC speed @ 80 MHz Cortex M3
Flash ~ r/w time 1:24 (write ~ 60s; read ~ 30s)
Decode time to .WAV/dump to flash ~ 2:44
File length 4:03 (16-bit stereo 44.1 kHz)
Nominally 33% of CLK to decode -> 26.6 MHz for realtime
With file read overhead 45% of clock
http://www.rockbox.org/wiki/CodecPerfor ... RM7TDMI_41 (http://www.rockbox.org/wiki/CodecPerformanceComparison#Sansa_e200_40ARM7TDMI_41)
The second (Sourcery G++ Lite 2009q3-67) table suggests that ~ 13.5 MHz should be achievable with ASM code included.
The ASM code needs porting to Cortex-M3 (thumb2 only).
The performance listed above is already probably good enough actually... The RockBox people care about CPU overhead because of the impact on battery life. Here all that matters is enough time to service other IRQs.
[quote author="brian"]The ASM code needs porting to Cortex-M3 (thumb2 only).[/quote]I've only just started learning about ARM, but isn't THUMB2 a superset of THUMB? ... or is it that the ARM instructions are missing from Cortex-M3? It's a little confusing when some ARM chips support ARM, THUMB, and Jazelle...
THUMB is a simplified ARM instruction set, thumb-2 is a superset of thumb. The code appears to be written for ARM5/ARM6 based cores like the ARM9 (I know it is confusing how they are numbered).
It would have to work on ARM7TDMI (the T stands for thumb) but this core is obsolete though still used a lot (all the .NET boards you see for enthusiast people are ARM7TDMI). Most of the errors that came up were due to certain kinds of Loads not being allowed, I will have to double check that it is calling instructions that really don't exist... There might be equivalents for Thumb2.
You can read some about Thumb2 here:
http://infocenter.arm.com/help/index.js ... 02s01.html (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0338g/ch01s02s01.html)
There are a lot of conditional instructions used in the ASM code in the library which if you are compiling to thumb aren't allowed also. I would have to use explicit conditions. Pretty much it will have to be rewritten, but the old version is a good guide on how to do so.
The functional if non-optimal code base I have now is a proof for doing debugging against and testing improvements, which I can roll together.
I am now talking to the RockBox people also to see if I can make the hardware easier for someone over there to port it to this platform. I myself am not likely to do that amount of coding, as what I need is not a RockBox but not having that OSS tied to close source hardware could be good for everyone. Though they are focused on portable audio, I suspect those willing to work hard to get FLAC on their X portable might be the type that would want a openHiFi also (if they don't already have some special purpose computer for it).
Edit: This might be a better place to start to learn about the differences between thumb and regular ARM
http://www.arm.com/products/processors/ ... ctures.php (http://www.arm.com/products/processors/technologies/instruction-set-architectures.php)
I actually am learning these details myself now also... I haven't used anything but cortex Ms which all use Thumb. M3 is Thumb2 only meaning it can't run regular ARM instructions. This is why there will have to be a port to make it go.
The basic idea with Thumb/Thumb2 is to reduce code size so that ARM can kill off 8/16 bit computers. The full ARM instruction set is the most robust (mostly) but if you are mostly working with bytes it has more overhead, thus slower.
[quote author="brian"]I am now talking to the RockBox people also to see if I can make the hardware easier for someone over there to port it to this platform. I myself am not likely to do that amount of coding, as what I need is not a RockBox but not having that OSS tied to close source hardware could be good for everyone. Though they are focused on portable audio, I suspect those willing to work hard to get FLAC on their X portable might be the type that would want a openHiFi also (if they don't already have some special purpose computer for it).[/quote]Right; rather than port RockBox to the processor you first selected, why not just select another ARM chip that will run the RockBox FLAC code as is? Is there a big difference between M3 and A3, for example? I think we may have covered this on the forum before (sorry).
I actually am learning these details myself now also... I haven't used anything but cortex Ms which all use Thumb. M3 is Thumb2 only meaning it can't run regular ARM instructions. This is why there will have to be a port to make it go.
I did not realize there were THUMB2-only ARM chips that did not run regular ARM instructions, but your original message did imply that
I have been investigating the iPhone lineage, which started on an ARM11 chip that runs ARMv6 code. They're now on ARMv7, and the build environment builds separate code images for v6 and v7. The impression that I was starting to get is that everything is backwards compatible, but that's obviously the wrong impression. Folks were already pointing out that the old ARMv6 iPhones support VFPv2, and the new ARMv7 iPhones support NEON, but not every ARMv6/v7 chip supports those extensions, so there are clearly groups of added features which are not present on every ARM chip. I just ended up with the mistaken impression that ARM always implements the ARM instruction set, although in hind sight I guess it makes sense that removing the original 32-bit ARM instructions could save some resources and lower the cost.
The basic idea with Thumb/Thumb2 is to reduce code size so that ARM can kill off 8/16 bit computers. The full ARM instruction set is the most robust (mostly) but if you are mostly working with bytes it has more overhead, thus slower.
Well, I don't know about killing off 8/16 computers!
Basically, it seems that ARM was designed as a more-or-less 'pure' 32-bit RISC processor without consideration for embedded applications. However, once ARM discovered that they were exploding in the embedded market, they realized that it would be smart to cater to the limitations of embedded systems. Breaking from the pure 32-bit model is beneficial when you have limited memory resources, especially if your bus is limited to 16-bit or even 8-bit. Basically, ARM started out more efficient than most general processors, and lately they've been tweaking it to be even more efficient for the embedded applications where it is being used more and more.
Is there a big difference between A and M: yes. Read the wikipedia page on ARM at the very min.
NEON is like 3DNow or SSE1/2/3 were for x86.
As for getting a chip that just runs RockBox ASM code. ARM9s would but they mostly are done with external flash for booting. ARM7TDMI probably would but it is dead and probably not a great idea for new designs...
The other general issue is that very very view chips have I2S on them that aren't BGAs.
As I said it is already fast enough :-) I will port it when I have the time. Even if I wanted a full music player I would think only really AAC, MP3, FLAC, OGG, WAV very relevant. I guess some people have things in WMA also.
Thumb is basically 16 bit overhead. If you can get 32-bit ALUs at 16-bit cost why wouldn't you use it? Cortex M0 is targeted at the 8/16 market. Just look at the kill your old micro get a free M0 that LPC is running right now...
Here is a press release stating exactly what I said: http://www.nxp.com/news/content/file_1642.html (http://www.nxp.com/news/content/file_1642.html)
0.65 in volume... Yes a ultra tiny PIC/AVR with no SRAM and 6 pins is cheaper in volume still but we are talking about LDO type volume cost here... There is a point were it almost does not matter except ultra high volume.
Windows 8 will be released on ARM... The only real question is where the line between x86 and ARM will end up in complexity. There are perhaps 2 semiconductor process nodes left before CMOS is no longer worth shrinking. At that point Intel's advantage of manufacturing smaller first will be gone. ARM might become ubiquitous (It already is really), but x86 software legacy in business at the very least will maintain it for decades. It will be interesting to see what Intel is like as a company after 2020.
Let me add there would be a ton of options if I could do BGAs but I find fine pitch xQFP hard enough... I can't imagine doing a BGA
EG: SAM9261 would be a great chip to use. But only available in BGA.
The Cortex M4 is pretty much perfect for this application I think but the question is who will release them and when NXP is first in line I think: LPC4327 is quite nice, but it is still in sampling and not production (who knows what it costs...)
http://www.electronicsweekly.com/Articl ... pliers.htm (http://www.electronicsweekly.com/Articles/2011/02/24/50561/cortex-m4-is-a-big-number-for-arm-and-mcu-suppliers.htm)
Very interesting. I would expect 6 months min before tools and channel availability for anything announced now. This sort of thing will probably have to wait for Rev 2. Code other than peripheral code would be comparable if these turn out to be awesome as M4 is like M3 is thumb2. It will cost more though... So if it can be done on the M3 that is still the lowest cost game in town. ARM9 and M4 and all of the A/R will be more money for the same amount of stuff.
There are two companies with M4s detailed. Freescale and NXP... NXP has a M0+M4 while Freescale is just a single M4. Ti has M4s in a A15 SoC, but nothing alone. If Ti releases libraries as nice as for M3 for an M4 implementation more like Freescales that would be amazingly nice to use for any audio application.
The problem is outside of Keil and maybe IAR tool support will be slow to come.
[quote author="brian"]The other general issue is that very very view chips have I2S on them that aren't BGAs.[/quote]There's the rub, then. I2S and non-BGA are the constraining factors, I'd agree.
Thumb is basically 16 bit overhead. If you can get 32-bit ALUs at 16-bit cost why wouldn't you use it? Cortex M0 is targeted at the 8/16 market. Just look at the kill your old micro get a free M0 that LPC is running right now...
My interpretation is that a 32-bit CPU with a 16-bit or 8-bit memory interface is not very efficient because it takes at least a couple of memory cycles for every instruction. 32-bit instructions are great if you have 32-bit memory, but you're probably stuck with BGA for 32-bit memory. It's the external memory interface, not the ALU, that determines the efficiency of the instruction set. So, Thumb goes hand in hand with the other realities of embedded systems. If ARM had been able to predict way back in 1983 that they would take off in the embedded world (perhaps on the way to dominating the desktop), then they might have started with Thumb from the beginning.
Windows 8 will be released on ARM... The only real question is where the line between x86 and ARM will end up in complexity. There are perhaps 2 semiconductor process nodes left before CMOS is no longer worth shrinking. At that point Intel's advantage of manufacturing smaller first will be gone. ARM might become ubiquitous (It already is really), but x86 software legacy in business at the very least will maintain it for decades. It will be interesting to see what Intel is like as a company after 2020.
Don't start counting your "Windows 8 on ARM" eggs just yet. Windows is complete dog crap, and so are the programs that share that ecosystem. I don't want to go off on too much of a tangent from the topic of designing a hardware music player, but I want to share some of my experience. Way back in the early nineties, IBM had desktop PowerPC computers while I was cutting my teeth on the NeXTdimension (with Motorola 56000 DSP sharing the board). Around that time, I looked into a contracting opportunity to port Windows software like Word and Excel to the PowerPC. I decided to pass because of the nature of the contracting offer, but I kept an eye on the project. After approximately two years of effort, they finally gave up without releasing anything. Windows and its prominent programs are not written to be portable at all. I suppose some of the bad habits may have been broken since that time, especially since Microsoft was keen on running on PowerPC, but do you see Windows on anything but x86 and x86 clones? Right now I say Microsoft is just blowing a lot of hot air and making wishful statements. Do you remember how many times they promised that the next version of Windows would no longer be based on DOS? They just kept postponing that upgrade, and they'll do the same after they discover that they can't get Windows to run on ARM.
As a comparison, NeXTSTEP supported as many as 5 or 6 processor platforms over time, and a huge graphics program like PageMaker was ported from one processor to another in only 2 weeks. Not only was the porting task not abandoned in frustration, but the product shipped and made money. This shows how much the OS and the programming environment and community fosters portability. OSX inherited NeXTSTEP, and that's why Apple has been able to transition from PowerPC to Intel to ARM and actually ship products. In fact, until Snow Leopard, both PowerPC and Intel were supported. Theoretically, the ability to host 6 or 7 different processor types on the same OS is still there. Apple will ship OSX on ARM before Microsoft ships Windows on ARM. I guess Apple already is doing this, if you count iOS as similar, but even if they wanted to move their A4 chip to the desktop, they'd have a much easier time than Microsoft.
I think you should read about Thumb more. My understanding is it is the exact backwards of what you think. The instructions are 16-bit and the execution and buses are 32-bit. By making the instructions smaller you compact the code which gives it the ability to fit programs into a 8/16 bit kind of size... Most of the die size being flash and all...
I don't generally think it is a good practice to make blanket statements like "all programs that run on windows are dog crap". I can't say I agree with much you said, but I will leave it at that. I don't think you would listen to anything I would say and it would go even more off topic. I will say one thing though Windows NT ran on Alpha, and Windows server runs on IA64.
[quote author="brian"]I think you should read about Thumb more. My understanding is it is the exact backwards of what you think. The instructions are 16-bit and the execution and buses are 32-bit. By making the instructions smaller you compact the code which gives it the ability to fit programs into a 8/16 bit kind of size... Most of the die size being flash and all...[/quote]You're talking about internal execution and buses, which are surely 32-bit. The EPI (External Peripheral Interface) can operate in 8-bit, 16-bit, or 32-bit mode. Looks like SDRAM is limited to 16-bit. Host Bus is limited to 8-bit/16-bit. Looks like the only thing 32-bit is the GPIO for interfacing to CPLD or FPGA. This is all from the Texas Instruments LM3S9B90 data sheet. Basically, GPIO defaults to something other than EPI, and if you want external memory you have to trade GPIO for EPI. The less you use for external memory, the more GPIO you have remaining for other uses.
But, you're right that Thumb is mostly about making the code smaller. This is even though the internal Flash memory seems to be 32-bit. I think here it's more of a loose correlation between limited pins, limited external bus widths, limited memory sizes, and smaller instruction widths. I don't think that's exactly backwards from what I said, but it's probably not as directly tied as I implied.
I thought we were talking about the Thumb/Cortex M in general.
If you read the manuals on M3:
http://infocenter.arm.com/help/topic/co ... gahha.html (http://infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/Babgahha.html)
You will see it is as I said 32-bit bus. Luminary simply choose to support a narrower bus to save pins/reduce package size I expect.
Thumb itself is just an ISA simplification over regular ARM code for compact flash use. It probably simplifies the decode front end on the CPU also, hence the thumb2 only designs in hardware with the M series.
As you suggested, I revisited the Wikipedia article on the ARM architecture. Under
Thumb, it states:
In situations where the memory port or bus width is constrained to less than 32 bits, the shorter Thumb opcodes allow increased performance compared with 32-bit ARM code, as less program code may need to be loaded into the processor over the constrained memory bandwidth.
That's basically all I was trying to say, but I emphasized it a bit too much.
[quote author="brian"]Luminary simply choose to support a narrower bus to save pins/reduce package size I expect.[/quote]My point is that even when the chip does have the pin count to make a 32-bit bus possible, practical embedded designs often benefit from reducing the bus width anyway, because it reduces the size of the board, reduces the chip count, and/or allows cheaper chips to be selected for external memory. In other words, a general rule of embedded design is that narrower buses have benefits, and since ARM is catering to the embedded market it makes sense to use Thumb. Although Thumb is primarily concerned with reducing the code size regardless of bus width, that still goes hand-in-hand with working efficiently on narrower buses.
Thumb itself is just an ISA simplification over regular ARM code for compact flash use. It probably simplifies the decode front end on the CPU also, hence the thumb2 only designs in hardware with the M series.
I disagree with this statement, if I understand what you wrote. ARM was originally designed after reading some of the early RISC papers. One general attribute of RISC is a rather wide instruction word so that every part of the control system has a dedicated bit in each opcode. In other words, the ideal RISC removes the instruction decoding process completely, by allocating instruction bits to directly control the execution unit. Thumb adds a layer, so it is the opposite of simplification. e.g., Thumb only allows some instructions to be conditional, not all of them, because there are no longer dedicated condition bits in the opcode. Only Thumb instructions need to be decoded, ARM instructions should not need to be decoded.
I suppose in one respect, you could call Thumb-only a simplification compared to an ARM+Thumb implementation, because the combination requires additional circuitry to allow the original ARM instructions to directly control the execution unit when running in ARM mode. In other words, if you need Thumb, but you don't need ARM, then Thumb-only is a simplification compared to ARM+Thumb. However, if you compare ARM-only to Thumb-only, then it seems that the only possibility is that ARM-only is simpler than Thumb-only.
Getting back to the topic and your challenges, I sure don't envy your task of finding a chip that has all of the necessary hardware features (I2S, non-BGA) plus has support for the existing FLAC code so that you don't have to port ARM code to Thumb-only.
I think I get most of your points. Thanks for making them more clear.
Thumb-2 isn't Thumb so you know. I think the peripheral bus datawidth for M3 is 32-bit based on the ARM documentations I referenced.
In any case to the main point. USB HOST + I2S + lots of SRAM + not BGA was the task. I looked and looked and looked before I started anything. I had a STM32 based version of this project going, the connectivity line was the only line that had USB Host, and the RAM capped at 48 KB I think, and there was no external memory bus that would map to the CPU address space. The performance line would do that but not the connectivity line. The top of the performance line (XL series) had I2S and more RAM but no USB host!
Atmel had nothing with I2S except ARM9s which were expensive and no longer in production except BGA versions. Freescale had the i.MX233 which seemed perfect for this, until you read the docs carefully. TQFN doesn't have the I2S on it just the BGA!!!! Grrrr. (Plus the i.MX233 is really designed by the looks of it to run an OS not do bare metal development).
It was at this time I found Luminary parts. I had no idea it existed because NXP(LPC)/STM32 ARM chips dominate the enthusiast market and the places I looked for development boards (Olimex etc) with a smattering of AD and Atmel boards. I forget how I even found them. It might have been from looking at the board support in OpenOCD actually because I figured I would need that to do development.
I got a LM3S9B92 board and was amazed by how much simpler it was to write for than STM32. It had Ethernet, USB host, I2S, external memory if you couldn't do it in 96 KB. In short it had everything I was looking for. For now I am using that board to do development, I think the lack of ASM optimization for FLAC is a bump in the road. It will just be more fun. All of the RockBox codecs are done as best they could in C and then if someone wanted to write ASM for some chip it was added (ColdFire and ARM are the only two I have seen).
I expect M4 based media players to be released. This may mean there will be ports of other codecs to Thumb2 within the year...
[quote author="brian"]Thumb-2 isn't Thumb so you know. I think the peripheral bus datawidth for M3 is 32-bit based on the ARM documentations I referenced.[/quote]Thumb-2 is an extension of Thumb, not an unrelated instruction set, so I was just being lazy and typing fewer characters. As for bus width, it's great that the chips all have 32 data bit pins, but there will always be an advantage in embedded systems to have compatibility with 16-bit and 8-bit memory - fewer chips, cheaper chips, less board space, fewer placement costs, etc. My point all along is that 16-bit opcodes, while primarily reducing total code size, also make it more reasonable to deal with embedded limitations efficiently. I think the Wikipedia articles support everything I've been saying.
In any case to the main point. USB HOST + I2S + lots of SRAM + not BGA was the task.
Thanks for the reminder of the complete list of requirements. I don't know what the exact threshold for 'lots' might be, but I can certainly see how everything you listed is necessary. Although it's possible to add a second chip for the USB Host feature, it sure would be a pain to write two different firmwares. Way better to have one chip that does everything.
I'm currently working on a custom TMS320VC5506 board that I designed around this chip, because it has USB and DSP capabilities. I could have thrown a cheap PIC on the board with a high-speed serial link to a non-USB DSP, but then I'd be dealing with developing two sets of firmware, plus the added issues of timing between two chips. Despite the fact that USB is awkward on a 16-bit DSP, it's probably still less trouble than separate chips.
... I think the lack of ASM optimization for FLAC is a bump in the road. It will just be more fun.
Right on, man! That's the attitude. I'm sure you'll learn a great deal about ARM and FLAC, and then I'll be jealous. Both are technologies that I'd like to learn in depth. My FLAC coding has only been above the library level, not the interior.
Lots of RAM means enough to buffer a frame of any format I want to support.
Cortex M4 would gives you some DSP stuff plus HS USB for data typically. The only chips in distribution channels are Freescale parts at the moment though (BGA version). In a year they probably will be commonplace. I think people are realizing there are lots of SoC like applications that don't require the full power of what is now in smartphones, and are more cost sensitive. To me it looks like LPC's designs recognize this fact. I will be very interested to see what Ti and Atmel do with it.
Optional things I also was looking for in a chip were: Ethernet/IP and external RAM interface. I can imagine lots of things that you would want internet access for. Having effective unlimited if slightly slower memory is nice :-)
Did you see this section in the FLAC Documentation, 1.2.1?Embedded DeveloperslibFLAC has grown larger over time as more functionality has been included, but much of it may be unnecessary for a particular embedded implementation. Unused parts may be pruned by some simple editing of src/libFLAC/Makefile.am. In general, the decoders, encoders, and metadata interface are all independent from each other.
It is easiest to just describe the dependencies:
- All modules depend on the Format module.
- The decoders and encoders depend on the bitbuffer.
- The decoder is independent of the encoder. The encoder uses the decoder because of the verify feature, but this can be removed if not needed.
- Parts of the metadata interface require the stream decoder (but not the encoder).
- Ogg support is selectable through the compile time macro [tt:]FLAC__HAS_OGG[/tt:].
For example, if your application only requires the stream decoder, no encoder, and no metadata interface, you can remove the stream encoder and the metadata interface, which will greatly reduce the size of the library.
Also, there are several places in the libFLAC code with comments marked with "OPT:" where a #define can be changed to enable code that might be faster on a specific platform. Experimenting with these can yield faster binaries.
Yes I did, I refer to this section in my post at Fri Feb 25, 2011 12:35 pm.
The automake file is undocumented except for inline comments all of which I read.
Stripping OGG support reduced the library by ~20 KB only out of I think 300K with a full decoder weighing in at 700+ KB without encoder support as well. Basically I spent many hours looking to strip it down and got about 40 K out of 700 out of a functional decoder. I presume that half of the size comes from the linked math library to implement functions like log. In FFMPEG log is done via a look-up table which is much more efficient both in code and memory for number sizes used.
RockBox FFMPEG decoder: ~20 KB library, it is 10-20x smaller. Full program with file system and decoder < 32 KB. Which is less than 1/10 of that of just the libFLAC library. I am sure there is a way to strip it down more but I was tired of wasting my time on it.
And let me assure you it is a total waste of time to use libFLAC. Why? Because FLAC was one of the main reasons a lot of people cared about RockBox, and the decoder was added in 2005. If you look at the change log and bugs you will see there is basically only one, which was 24 bit compatibility with FLAC 1.2.1 in 2007!
I frankly don't see why you seem hung up on using libFLAC, when there is a much better solution for an embedded target in existence...
The *only* reason I will make any attempt to use libFLAC is to avoid LGPL, I'd rather use BSD, but that is the only reason, and right now that reason isn't worth any of my time, as I can link to LGPL just fine and I have made no code changes to the decoder itself.
[quote author="brian"]I frankly don't see why you seem hung up on using libFLAC, when there is a much better solution for an embedded target in existence...[/quote]
I'm not hung up on using libFLAC, it just seems that way to you.
I'm really only interested in making sure you have more options open rather than fewer. I happened to be writing a utility to use libFLAC and had to dig into the documentation and sources to get some answers, and I noticed a few things that I wanted to pass on in case you missed them.
After seeing the documentation about the OPT comments, I randomly came across an example in the libFLAC source. The unfortunate fact was that the code uses an #if 1 ... #else ... #endif construct which cannot be used from the command line. This sort of coding requires the source to be altered, rather than allowing the compiler command-line to select different behavior. Now that I've seen such an example, I must say that I am a bit disappointed. It's nice to have a // OPT: comment in the source explaining the two code paths, but it would have been nice to see a little more effort so that this could be accessed without diverging the source. This last bit is something that I discovered after my last posting, so it makes the situation less ideal than it appears when I read the documentation.
I'll keep listening to hear your progress, and I certainly wish you luck whichever direction(s) you take with the FLAC support.
Thanks RSDIO. You just kept bringing it up :-)
I am pretty happy with the what things are progressing. Having written decode that is doable in less than real time already I am def going to do this project to completion at some point. It probably won't get a board until summer though.
I have gotten back to this project. Some info and a video are here:
http://teholabs.com/2011/12/openhifi-update/ (http://teholabs.com/2011/12/openhifi-update/)
I read through the last few pages on processor discussion. Given the dates on this thread a lot has changed in product offerings since. I would check out the STM32F4 series. It has an ARM Cortex-M4F at 168 MHz. A M4F is basically an M3 with a single cycle multiply accumulate instruction, some arithmetic SIMD, and a single precision FPU. It would be perfect for any codec algorithm. And as a nice bonus, the F4 series also has 2 I2S interfaces, high speed USB host, 8-bit eMMC or 4-bit SDIO @ 50MHz, Ethernet MII/RMII, 1MB on-die flash, 196KB on-die SRAM, and comes in a LQFP-64 with most of those features! LQFP-100 if you want external memory and/or Ethernet. LQFP-144 if you need both. I've been searching for a MCU with that feature set (plus CAN) for a while. I call it my 'holy grail' MCU. :)
i have the st32f4 discovery board, an have not had a good chance to start playing with it yet, but its impressive feature set has pushed it to the top of a list for a future product.
i need to look into finding opensource libraries for the adc/dac and usb.
if there is a eagle library available for it that will save me some time.
The ST standard peripheral library compiles with gcc and supports both. They even provide many many USB class drivers for both host access and peripheral emulation. They also provide a DSP library that includes many common functions - IIR & FIR filters, sliding dot products, a few transforms.
Another nice point is that nearly all of the STM32 line share common or at least compatible pin-outs/muxes. So if you need a larger or smaller part than what you have selected, you can use the same boards.
ST really has a winner in the STM32 F1, F2, and F4 series.
Alan,
I am well aware of Ti, FreeScales and STM's Cortex M4 chips. There is no reason to writing a project that works over from scratch because something else was released. STM32F4 lacks a dynamic memory controller, and does not have an internal PHY for Ethernet compared to the chip selected. This lowers the cost of the chip selected verse STM32F4. While the larger 192 Kb SRAM may allow most decompression schemes to run with buffers that just fit in the SRAM, most media players if you take them apart use large SDRAM/DDR buffers. There would be nothing wrong with selecting STM32F4 if you wanted to, but as for me I like LM3S chips much more than STM32 chips and have written code for both. My productivity writing code for LM3S is about 2-4 times the rate it was writing code for STM32F103.
[quote author="alanh"]The ST standard peripheral library compiles with gcc and supports both. ...[/quote]
My memories of the ST Std Peripheral library are that it was more painful to use than just writing code from scratch. I really did not like use of pointer indirection for everything. It made the examples a lot less clear and readable. I just checked that they haven't improved the code base by downloading the latest version and it is still written in the same way.
I remember there was an open source project to writing the library (here it is: http://www.hermann-uwe.de/blog/libopens ... ontrollers (http://www.hermann-uwe.de/blog/libopenstm32-a-free-software-firmware-library-for-stm32-arm-cortex-m3-microcontrollers)) It was just started last time I used STM32.
If you think the STM32 libraries are great, I would recommend you download StellarisWare (Ti) and just look at the board examples. There isn't really any comparison IMHO, not in terms of topics or usability. Most everything compiles on GCC because codesourcery and crossworks are both gcc based FWIW.
I never said the std peripheral library was great, just that it exists.
STM32F2/F4's have a 16-bit data/26-bit address parallel bus that can be used to interface SRAM. I wasn't aware Stellaris had a family member with an on-board SDR/DDR DRAM controller. My mistake. I also wasn't aware of any audio codec that required several megabytes of RAM for frame level decode; and I've written most. Again my mistake.
You are correct. A new product offering is no reason to change an existing design. However an existing design is also no reason to ignore a new product offering. I was just pointing out it's existence and it's feature set since it was barely announced when the discussion of processor choice a couple pages back was progressing. Just trying to help not offend. High speed USB was mentioned. The eMMC port on the STM32F2/F4 could also be used for an SD card in more than just SPI mode. M3/M4 code itself is portable as is the toolchain. The only thing that would need to change is an abstraction layer basically around storage, I2S, and basic UART/GPIOs and you could easily switch to the next great M3/M4 from TI, NXP, or ST that is just around the corner. Seemed logical to me.
Sorry Alan. I think the STM32F4 chips would indeed be a reasonable choice today. Certainly one of the main target applications of Cortex M4 is multimedia. I think I overall like the Freescale chips a bit more than ST's but that's based on reading the white papers.
All I was trying to curtail was a protracted discussion on what chips to use. The firmware I will write is going to be for LM3S9xxx chips. It will be open source in the end so people are welcome to port it to something else though like M4 although the DSP stuff wouldn't be directly used. As far as size of buffers, it largely would depend on the CODEC and blocksize. I think FLAC to support the full largest block size may be something like 2 MB though don't quote me on that. The reference encoder for red book audio though uses only 10s of KB blocks (which is still quite big) in a minimal implementation. I am pretty pleased I am able to do the core functions at only 50 MHz, with lots of CPU cycles free.
I'm currently doing h.264 HD decode (BP) on a 50 MHz M3 with PLD accelerated iDCT. Next to that, audio is easy :)
The really nice thing about C-M3/M4 is the simplicity of everything. ISRs with auto-push, initial stack from the reset vector, unified address space, etc. If you target M3 or M4 and pay attention to separating out MCU specific features, it would be trivial to port the code to the thousands of variants. I look forward to seeing a finished project. Sounds like a good idea. If the DAC module interface is organized well, I'm tempted to make an HDMI transmitter board for it.
iDCT is the heavy lifting, still that is pretty awesome! I wonder how much CPU Ogg will take (Tremor port probably) as it has DCT in it as I recall.
My memories of the ST Std Peripheral library are that it was more painful to use than just writing code from scratch.
Almost for the last two years the original library was replace with the ARM standard for ALL ARM products CMSIS. Using this library you can compile your program in any ARM controller only changing few headers. Its a big improvement over other processors, it´s easy of implement and have standard librarys for all the devices inside the controller. You need to remember that M3 it´s a microcontroller, not a microprocessor and it´s optimized for this task.
I really did not like use of pointer indirection for everything.
Why??? It´s the key for power on several languages and it´s the must for efficient class implementation. Pointer indirection it´s fast, optimist and the compiler do the stuff for you to produce great and compact code.
At this point, use the microcontroller that you like and fill comfortable, program in the language that you prefer and without doubt you can finish in less time than if you use a new fancy controller that not like to you. I move everything to ARM because exist a lot of devices that are compatible and I can select the best price performance ratio.
I love pointers, but I don't like CMSIS. The intermediate structured fields make it very hard at times to just read code and know what registers are being changed. Which in turn makes understanding the chip on the datasheet harder. Nor is creating structures to simply set registers efficient. CMSIS is indeed designed for portability. All major vendors offer libraries for it but I find it clunky to use compared with programming directly in ARM, AVR, MSP430, etc.
The idea that CMSIS makes it trivial to port core across any micro (even any CM3) is absurd though. Every chip has different peripheral limitations. Even if they both have SPI ports one chip can run them at 25 MHz another at 50 MHz, if your code needed the later a port isn't just changing a header (of course you might not select such a chip, but there are other clock tree nuances that can be different between chips etc). Portability without an OS is very hard on the hardware level. Embedded coding for me is about squeezing the most out of a given chip, which is quite fun but typically require very hardware specific code.
I would rather not debate ARM, my coding preferences or my chip selection in this thread at length though. OpenHiFi is on LM3S9xxx currently based on my Procyon dev board. It uses the very powerful StellarisWare libraries to make the code clean and easy to learn. I have done most of the code cleanup and it will be published soon, in a very early alpha form.
As promised, I have published cleaned up source files for openHiFi:
https://github.com/teholabs/openHiFi (https://github.com/teholabs/openHiFi)
It takes "ls" and "p <filepath>" as commands only at this time and only plays 16-bit 44.1 KHz audio files to I2S at with a 256x MCLK from the internal PLL.
Also I have published the pretty minor port of ffmpeg's FLAC decoder as required by LGPL:
https://github.com/teholabs/libffmpegFLAC (https://github.com/teholabs/libffmpegFLAC)
I don't remember changing any code in it actually just getting to compile/run on Cortex M3.
My code is MIT licensed, StellarisWare has the Ti no-GPL license, ffmpeg is linked to has LGPL, and Chan's software is MIT-like licensed.
I have made another update video:
http://teholabs.com/2011/12/openhifi-update-2/ (http://teholabs.com/2011/12/openhifi-update-2/)
More to come.