From honeydew.srv.cs.cmu.edu!bb3.andrew.cmu.edu!andrew.cmu.edu!postman+ Fri Sep 24 15:41:50 EDT 1993 Article: 12480 of comp.ai.neural-nets Xref: honeydew.srv.cs.cmu.edu comp.ai.neural-nets:12480 Path: honeydew.srv.cs.cmu.edu!bb3.andrew.cmu.edu!andrew.cmu.edu!postman+ From: Matthew.White@cs.cmu.edu Newsgroups: comp.ai.neural-nets Subject: CMU Learning Benchmark Database Updated Date: Fri, 24 Sep 1993 03:15:48 -0400 Organization: Carnegie Mellon, Pittsburgh, PA Lines: 82 Message-ID: NNTP-Posting-Host: po5.andrew.cmu.edu The CMU Learning Benchmark Archive has been updated. As you may know, in the past, all the data sets in this collection have been in varying formats, requiring that code be written to parse each one. This was a waste of everybody's time. These old data sets have been replaced with data sets in a standardized format. Now, all benchmarks consist of a file detailing the benchmark and another file that is either a data set (.data) or a program to generate the appropriate data set (.c). Data sets currently avaialable are: nettalk Pronunciation of English words. parity N-input parity. protein Prediction of secondary structure of proteins. sonar Classification of sonar signals. two-spirals Distinction of a twin spiral pattern. vowel Speaker independant recognition of vowels. xor Traditional xor. To accompany this new data file format is a file describing the format and a C library to parse the data file format. In addition, the simulator (C version) for Cascade-Correlation has been rewritten to use the new file format. Both the parsing code and the cascade correlation code are distributed as compressed shell archives and should compile with any ANSI/ISO compatible C compiler. Code currently available: nevprop1.16.shar A user friendly version of quickprop. cascor1a.shar The re-engineered version of the Cascade Correlation algorithm. parse1.shar C code for the parsing algorithm to the new data set format. Data sets and code are available via anonymous FTP. Instructions follow. If you have difficulties with either the data sets or the programs, please send mail to: neural-bench@cs.cmu.edu. Any comments or suggestions should also be sent to that address. Let me urge you not to hold back questions as it is our single best way to spot places for improvement in our methods of doing things. If you would like to submit a data set to the CMU Learning Benchmark Archive, send email to neural-bench@cs.cmu.edu. All data sets should be in the CMU data file format. If you have difficulty converting your data file, contact us for assistance. Matt White Maintainer, CMU Learning Benchmark Archive ------------------------------------------------------------------------------- Directions for FTPing datasets: For people whose systems support AFS, you can access the files directly from directory "/afs/cs.cmu.edu/project/connect/bench". For people accessing these files via FTP: 1. Create an FTP connection from wherever you are to machine "ftp.cs.cmu.edu". The internet address of this machine is 128.2.206.173, for those who need it. 2. Log in as user "anonymous" with your own internet address as password. You may see an error message that says "filenames may not have /.. in them" or something like that. Just ignore it. 3. Change remote directory to "/afs/cs/project/connect/bench". NOTE: you must do this in a single atomic operation. Some of the super directories on this path are not accessible to outside users. 4. At this point the "dir" command in FTP should give you a listing of files in this directory. Use get or mget to fetch the ones you want. If you want to access a compressed file (with suffix .Z) be sure to give the "binary" command before doing the "get". (Some version of FTP use different names for these operations -- consult your local system maintainer if you have trouble with this.) 5. The directory "/afs/cs/project/connect/code" contains public-domain programs implementing the Quickprop and Cascade-Correlation algorithms, among other things. Access it in the same way.