[attachment=1]
http://dev.dangerousprototypes.com/parts (http://dev.dangerousprototypes.com/parts)
Search Github hardware by component to find examples and references for your next project. Big schematic and PCB previews mean quick and easy browsing without opening a CAD program.
TomKeddie first mentioned this idea at Hacker Camp Shenzhen, and later in the forum and on WeChat. Tom generously shared his scraping/search method. Eagle 6+ files are XML, so we can find them on Github by searching for the "eagle SYSTEM" tag in files with "extension:sch". That gives more than the maximum 100 pages of results, so we filter by file size and increment size 500 bytes at a time "size:1001...1500". We use the normal user search interface, parse the HTML results, and grab all urls ending with .sch. While Github has an API, that API doesn't give access to search code search without specifying a repository by name (probably so people don't do what we did...).
There are a number of limitations. Only Eagle projects are indexed, a search for KiCad files inspired little interest in expanding. Before Eagle 6.0 files were not XML and will not be included. Github search only indexes code files up to ~390K, larger files will not be included. If Github improves access via the API we will expand the index.
Scraping all of the files over terrible Chinese internet took about a week. Searches were performed at minimum 32 seconds apart. Some files were incomplete (lacked closing tag). Some files have extra '>>', usually a merge error tag. Some files were just corrupt. These all caused Eagle to hang until the window was manually closed. Working with Eagle from the command line has been most unpleasant.
[attachment=0]
Very cool!
We use the normal user search interface, parse the HTML results
You got a bit of code to share for that? Would be interested to try and verify the lack of KiCad files and maybe try and do a gerber search as well.
EDIT: Anyone else wondering about this it is pretty easy to search GitHub for Eagle and KiCad projects by putting "language:Eagle" or "language:KiCad" in the search bar main GitHub search bar. Eagle currently returns 5,136 projects and KiCad 1,564.
I did my own githubscraping and I found that the 'language:eagle' helps a bit, but it also return several KiCad repositories/files. Is this an error in GitHub or something a GitHub owner does?
I don't know how many time the language setting is wrong, still collecting data :) however some preliminary stats:
3800 repositories gave 7000 eagle schematic files processed, about 90000 component used, consisting of about 1 symbol and about 12 packages.
I wrote the scraping script that fetches the files from github. It uses the extension and part of the content (header) to choose which sch files to fetch, it then tries to find a matching brd file.
It doesn't use the API because I couldn't do the search I wanted using the api.