CS 352 Spring 2006
Programming Assignment 1
Due Date:
5 Mar, 2006 5:00pm
Sections
Updates
-
[
28-02-2006
] Added a Sakai page for this class. If you do not see this class when you log in to Sakai, please email romoore _at_ cs.rutgers.edu. All further questions about the project should be directed to the forum.
-
[
26-02-2006
] Fixed a formatting error in the Submitting the Project section. Added FAQ section and List of Sections(above).
-
[
25-02-2006
] There have been a few issues with the seed you have to connect to. Specifically, the port may be different than 63334. This issue is in the process of being resolved, but until then just search for the peer at 172.23.26.98 and use the port given by the tracker.
-
[
17-02-2006
] Posted Bencoder.java and BencoderTester.java. Fixed a bug in TorrentFileHandler.java that would cause a crash when parsing Lists.
-
[
15-02-2006
] Changed the listed port for the peer to 63334, which was previously incorrectly listed as 6969. Updated the TorrentFileHandler.java file to be more robust with more error checking.
-
[
14-02-2006
] Posted the
.torrent file
file for this project.
-
[
14-02-2006
] TorrentFileHandler.java now correctly hashes the 'info' section of the .torrent file. Commented out the 'package' commands in the .java files. This was causing them to fail to compile on the cereal machines.
Before jumping into the project I would like to mention one thing. Most of the time your first attempt at a software project is either too slow, too buggy, or is difficult to maintain/expand. Be prepared to throw away the first versions of most of your code. You can even plan to have them as throw-aways. You don't waste time doing this because you learn where you can make improvements in your final version.
Theory of Operation
In this project, you will build a simple BitTorrent(BT) client which will serve as the foundation for a fully-featured
BT client you will build in later assignments. For project 1, your client will load a
.torrent file
, interface with a tracker
and a single peer and will download a single file(Jpeg) from that peer. When the file is finished downloading, you will save it
to the hard disk. All communication will be done over TCP.
The file your program will read in is a .torrent file containing metadata for the file to be downloaded. You will be supplied
with
a class
that will take in a String containing the file name (in the local directory) or the complete file path and will
return a
TorrentFile
object containing all the necessary information. An
example
of usage will be included with the class.
Assignment Specifics
Your assignment should basically do the following:
-
Take as a command-line argument the name of the .torrent file to be loaded and the name of the file to save the data to.
For example:
java mybtclient somefile.torrent picture.jpg
-
Pass the name of the
.torrent file
to the
TorrentFileHandler
class as a String and store the returned
TorrentFile
object.
-
Open a TCP socket on the local machine and send an HTTP_GET request to the tracker at the IP address and port specified by
the TorrentFile object.
-
Capture the response from the tracker and decode it in order to get the list of peers. From this list of peers,
use only the peer at IP address 172.23.26.98
. You must extract this IP from the list, hard-coding it is not acceptable.
-
Open a TCP socket on the local machine and contact the peer using the BT peer protocol and request a piece of the file.
-
Download the piece of the file and verify its SHA-1 hash against the hash stored in the TorrentFile object. The first time you
begin the download, you need to contact the tracker and let it know you are starting download.
-
After a piece is downloaded and verified, the peer is notified that you have completed the piece.
-
Repeat steps 5-7 (using the same TCP connection) for the rest of the file.
-
When the file is finished, you must contact the tracker and send it the
completed
event and properly close all TCP connections
-
Save the file to the hard disk according to the second command-line argument.
Files Available
The files below are intended to help you concentrate on the networking portion of the project and not have to worry about how to bencode
or "unbencode" the torrent file and some of the communication.
-
Kinkakuji - MainTemple3.JPG.torrent
- The torrent file containing metdata about the file to be downloaded.
-
TorrentFile.java
- Stores the data from a .torrent file in an easy-to-access format.
-
TorrentFileHandler.java
- Accepts a String corresponding to a filename for a .torrent file and returns a TorrentFile object containing the unencoded
data stored in the file.
-
TorrentFileHandlerTester.java
- An example of how to use the TorrentFileHandler and TorrentFile classes.
-
Bencoder.java
- Accepts Strings, Integers, Vectors (Lists), and HashMaps (Dictionaries) and returns them as bencoded byte arrays. Also accepts
bencoded byte arrays of Strings, Integers, Lists, and Dictionaries and returns them as Strings, Integers, Vectors and HashMaps.
-
BencoderTester.java
- An example of how to use the Bencoder class.
Programming Style
Writing maintainable code is just as important as writing correct code. Consequently, you will be graded on your style.
Below is a list of suggestions to
make your code easier to understand and maintain:
-
Conditions - All of your methods must have clearly specified preconditions and postconditions.
A precondition is what you assume to be true
before
your method is called, and a postcondition is what you can show to be true
after
your method has returned. Trivial preconditions (such as "int a is an integer") should be omitted.
The best place to put the pre and postconditions is in a comment before the first line of each method.
Example:
/*
Precondition: b is non-zero.
Postcondition: returns a / b.
*/
double divide(double a, double b)
{
return (a / b);
}
-
Comments - Comments for variables and for sections of code that does not have an obvious result. Please do not comment
obvious statements or sections of code. For example,
int i = 0; //i is an integer and the initial value is 0
is not useful and unnecessarily clutters the code.
-
Naming - Names of classes, methods, and variables should be descriptive. For example, if a class has a
field containing its hash value, naming the field
hash_value
is much better than naming it
hv
.
-
Exceptions - Catch exceptions and do something useful with them. For example, print a statement describing what went wrong
to System.error.
-
Easy to understand loops/recursion. When possible, it's best to avoid
break
statements whenever possible. When using recursion, tail-recursion is often preferable because "smart" compilers can
often translate it into a loop.
-
Efficiency - Avoid being terribly inefficient. For example, if you have to sort 10M objects, it's better to sort them with a
O(n*lg(n))
algorithm than a
O(n2)
algorithm.
-
Encapsulation - Encapsulation is a useful feature of object-oriented languages, and so you should try to write classes that expose
as little as possible. This way you can change the implementation of a class without affecting the rest of your program.
Bencoding (Pronounced "Bee Encoding")
Bencoding a method of encoding binary data. Tracker responses, and interpeer communication will be bencoded. Below is how
data types are bencoded according to the BT protocol. [The following list is taken from
http://www.bittorrent.com/protocol.html
]
-
Strings are length-prefixed base ten followed by a colon and the string. They are encoded in UTF-8.
For example 4:spam corresponds to 'spam'.
-
Integers are represented by an 'i' followed by the number in ASCII base 10 followed by an 'e'. For example i3e corresponds
to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero,
such as i03e, are invalid, other than i0e, which of course corresponds to 0.
-
Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example, l4:spam4:egse
corresponds to ['spam', 'eggs'].
-
Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by
an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow':'moo', 'spam':'eggs'} and d4:spaml1:a1:bee
corresponds to {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted as raw strings, not
alphanumerics).
Communication With the Tracker
Your client must take the information supplied by the TorrentFile object and use it to communicate with the tracker. The tracker's
IP address and port number will be given to you by the TorrentFile object, and your program must then contact the tracker.
Your program will send an HTTP_GET request to the tracker with the following key/value pairs. Note that these are NOT bencoded, but must be properly escaped [this list is taken from
http://www.bittorrent.com/protocol.html
]:
-
info_hash
- The 20 byte(160-bit) SHA1 hash of the bencoded form of the info value from the metainfo file. This value will almost
certainly have to be escaped.
-
peer_id
- A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start
of a new download. This value will almost certainly have to be escaped.
-
ip
- An optional parameter giving the IP (or DNS name) which this peer is at.
-
port
- The port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if that
port is taken try 6882, then 6883, etc. and give up after 6889.
-
uploaded
- The total amount uploaded so far, encoded in base ten ascii.
-
downloaded
- The total amount downloaded so far, encoded in base ten ascii.
-
left
- The number of bytes this peer still has to downloaded, encoded in base ten ascii.
This key is important
- If you do not specify how much you have left to download, the tracker assumes you are a seed and will not return any seeds in the peer list.
-
event
- This is an optional key which maps to
started
,
completed
, or
stopped
(or
empty
, which is the same as not being present. If not present, this is one of the announcements done at regular intervals.
An announcement using
started
is sent when a download first begins, and one using
completed
is sent when the download is complete. No
completed
is sent if the file was complete when started. Downloaders send an announcement using
stopped
when they cease downloading.
The response from the tracker is a bencoded dictionary and contains two keys:
-
interval
- Maps to the number of seconds the downloader should wait between regular rerequests.
-
peers
- Maps to a list of dictionaries corresponding to peers, each of which contains the keys
peer id
,
ip
, and
port
, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively.
Communicating With the Peer
Handshaking between peers begins with character nineteen (decimal) followed by the string 'BitTorrent protocol'. After the fixed
headers are 8 reserved bytes which are set to 0. Next is the 20-byte SHA-1 hash of the bencoded form of the info value from the
metainfo (.torrent) file. The next 20-bytes are the peer id generated by the client. The info_hash should be the same as sent to the
tracker, and the peer_id is the same as sent to the tracker. If the info_hash is different between two peers, then the connection is
dropped.
-
All integers are encoded as 4-bytes big-endian.
-
The peer_id should simply be random numbers.
After the handshake, messages between peers take the form of
<length prefix><message ID><payload>
, where length prefix is a 4-byte
big-endian value and message ID is a single decimal character. The payload depends on the message.
Please consult either of the BT-related resources for detailed information describing these messages.
Below is a list of messages that need to be implemented in the project.
-
keep-alive:
<length prefix>
is 0000. There is no message ID and no payload.
These should be sent around once every 2 minutes to prevent peers from closing connections.
-
interested:
<length prefix>
is 0001 and message ID is 2. There is no payload.
-
uninterested:
<length prefix>
is 0001 and message ID is 3. There is no payload.
-
have:
<length prefix>
is 0005 and message ID is 4. The payload is a zero-based index of the piece that has just been
downloaded and verified.
-
request:
<length prefix>
is 0013 and message ID is 6. The payload is as follows:
<index><begin><length>
Where
<index>
is an integer specifying the zero-based piece index,
<begin>
is an integer specifying the zero-based
byte offset within the piece, and
<length>
is the integer specifying the requested length.
<length>
is typically
2^14 (16384) bytes. A smaller piece should only be used if the piece length is not divisible by 16384. A peer may
close the connection if a block larger than 2^14 bytes is requested.
-
piece:
<length prefix>
is 0009+X and message ID is 7. The payload is as follows:
<index><begin><block>
Where
<index>
is an integer specifying the zero-based piece index,
<begin>
is an integer specifying the
zero-based byte offset within the piece, and
<block>
which is a block of data, and is a subset of the piece specified by
<index>
.
The Write-Up
-
Include your name as it appears on the roster and your student ID.
-
A high-level description of how your program works. Specifically, describe the high-level interactions between the classes.
This is a good way to make sure that your program doesn't have any classes depending on the implementations of other classes.
If you know how to use LaTeX, this should be fairly easy to diagram. Also, OpenOffice.org's Draw program has an Export to PDF
feature that is very simple to use.
-
A brief (about 1 paragraph) description of each class in the program.
-
Optionally, you may include a section on feedback about the program. Please only provide constructive/useful comments or
insights. What parts of the project were the most challenging and which were the easiest? Did you have trouble finding
resources for the project? Were any parts confusing to you?
Submitting the Project
The project should be submitted through the Computer Science
Handin Server.
-
Make sure your name is in comments at the top of every file submitted.
-
The "main" starting method should be in a file called RUBTClient.java.
-
The write-up in HTML or PDF format saved as a writeup.<html/pdf>.
-
All files should be submitted either as a .zip or .tar file. This file should be named <NetID>.<zip/tar>, where
<NetID> is your eden NetID.
Grading
-
Programming Style: 15%
-
Correctly setting up TCP sockets/connections: 15%
-
Correctly interfacing with the tracker: 20%
-
Correctly interfacing with the peer: 20%
-
Correctly downloading and verifying the file: 20%
-
Write-up 10%
Resources
It is
strongly
recommended that you bookmark or download the
Sun Java 1.5.0 API
, as well as read the following pages:
Please email romoore _at_ cs _dot_ rutgers _dot_ edu concerning any questions or problems with this page.
Please visit the Sakai website for questions concerning this project.
Frequently Asked Questions
I've been getting a lot of similar questions lately, so I'll post general forms of them as well as advice on how to solve the underlying problems. As a bit of general advice, if you aren't sure what a packet is supposed to look like, you can always fire up a working client (the mainline BitTorrent client, for example) and then use a network packet sniffing tool like Ethereal to capture the traffic and analyze it.
-
I can't connect to the tracker.
I've tried creating a Socket and sending the bytes through myself by grabbing the InputStream and OutputStream from the Socket, but I can't get a response from the tracker.
Contacting the tracker is a simple HTTP GET request. This type of operation is done by every web browser today and even some smaller programs. As a result, there exists a class
java.net.URL
that can easily handle this type of request. As I mentioned in the presentation - whatever you're trying to do in Java has probably already been done a thousand times, so it's usually faster to find an existing implementation than to write one yourself.