CS 352 Spring 2006
Programming Assignment 2
Due Date:
29 Mar, 2006 11:59pm (Extended (3))
Sections
Updates
-
2006-03-16
Your client should listen on the port given to the tracker for incoming connections. You are only required to accept incoming connections from the IPs listed below, but if you feel your client is robust enough, you may accept connections from any peer that contacts your client via the protocol.
-
2006-03-10
- 4 of the Cereal lab computers are now running BT clients. They are programmed to only upload/download at 5KB/s, and after 1 hour they will restart with no data. This should help you test your uploading to other peers. The machines being used are alphabits (128.6.13.147), frostedflakes (128.6.13.149), trix (128.6.13.139) and kix (128.6.13.152). Please make sure your code only uploads/downloads to/from these peers so that you won't encounter any bugs resulting from other students' clients.
Before jumping into the project I would like to mention one thing.
Most of the time your first attempt at a software project is either too
slow, too buggy, or is difficult to maintain/expand. Be prepared to
throw away the first versions of most of your code. You can even plan
to have them as throw-aways. You don't waste time doing this because
you learn where you can make improvements in your final version.
Theory of Operation
The second project will expand upon what you did for the first. Your client must now open a .torrent file, contact the tracker as well as
multiple peers and be able to upload and download to the peers simultaneously. This means you will have to manage the state of several connections
and work with multiple threads. Your client should also accept incoming connections from the peers at IP addresses listed above.
If your code for project 1 was well thought-out, it should be easy to expand your program for multiple downloads, and uploading should not be
much more difficult than listening for more types of packets and sending new ones. You will not have to worry about choking any peers, but like
the first project, you will have to make sure to unchoke them and keep them up to date on what pieces your client has downloaded.
Assignment Specifics
Your assignment should basically do the following:
-
Take as a command-line argument the name of the .torrent file to be loaded and the name of the file to save the data to.
For example:
java mybtclient somefile.torrent picture.jpg
-
Pass the name of the .torrent file to the
TorrentFileHandler
class as a String and store the returned
TorrentFile
object.
-
Open a TCP socket on the local machine and send an HTTP GET request to the tracker at the IP address and port specified by
the TorrentFile object.
-
Capture the response from the tracker and decode it in order to get the list of peers. From this list of peers,
use only a pre-selected list of peers to be specified
. You must extract these IP from the list, hard-coding it is not acceptable, except hard-coding the comparison.
-
Open TCP connections to the peers and be able to download simultaneously from several at one time.
-
Download one or more pieces of the file and verify its SHA-1 hash against the hash stored in the TorrentFile object.
-
After a piece is downloaded and verified, the peers are notified that you have completed the piece.
- Other peers should be able to request pieces from your client that you have verified, and your client should send those pieces to the peers.
-
Repeat steps 5-8 (using the same TCP connections) for the rest of the file. Make sure your client has uploaded at least 10% of the file size before closing your program.
-
When the file is finished, you must contact the tracker and send it the
completed
event and properly close all connections.
In addition to the above basic run-through of your program's execution, your program should be able to detect input from the user at any point
(you can decide what the input should be), and allow the user to close the program, suspending download of the file. The user should be able
to restart the program at a later time and resume download. Any non-verified pieces/blocks should be discarded before suspending, and the program
should also save how much it has uploaded and downloaded for that torrent.
How you choose to implement saving the state of your program is up to you, but a good start is using the Serializable interface for many
Java objects. If you do not use an object that implements the Serializable interface, you must write your own method for storing your data to disk
and restoring it upon start-up.
Files Available
The files below are intended to help you concentrate
on the networking portion of the project and not have to worry about
how to bencode or "unbencode" the torrent file and some of the
communication.
-
Exploration_2005-08-31.MP3.torrent
- The torrent file for Project 2.
-
TorrentFile.java
- Stores the data from a .torrent file in an easy-to-access format.
-
TorrentFileHandler.java
- Accepts a String corresponding to a filename for a .torrent file and returns a TorrentFile object containing the unencoded
data stored in the file.
-
TorrentFileHandlerTester.java
- An example of how to use the TorrentFileHandler and TorrentFile classes.
-
Bencoder.java
- Accepts Strings, Integers, Vectors (Lists), and HashMaps
(Dictionaries) and returns them as bencoded byte arrays. Also accepts
bencoded byte arrays of Strings, Integers, Lists, and Dictionaries and
returns them as Strings, Integers, Vectors and HashMaps.
-
BencoderTester.java
- An example of how to use the Bencoder class.
Programming Style
Writing maintainable code is just as important as writing correct code. Consequently, you will be graded on your style.
Below is a list of suggestions to
make your code easier to understand and maintain:
-
Conditions - All of your methods must have clearly specified preconditions and postconditions.
A precondition is what you assume to be true
before
your method is called, and a postcondition is what you can show to be true
after
your method has returned. Trivial preconditions (such as "int a is an integer") should be omitted.
The best place to put the pre and postconditions is in a comment before the first line of each method.
Example:
/*
Precondition: b is non-zero.
Postcondition: returns a / b.
*/
double divide(double a, double b)
{
return (a / b);
}
-
Comments - Comments for variables and for sections of code that does not have an obvious purpose. Please do not comment
obvious statements or sections of code. For example,
int i = 0; //i is an integer and the initial value is 0
is not useful and unnecessarily clutters the code.
-
Naming - Names of classes, methods, and variables should be descriptive. For example, if a class has a
field containing its hash value, naming the field
hash_value
is much better than naming it
hv
.
-
Exceptions - Catch exceptions and do something useful with them. For example, print a statement describing what went wrong
to System.err.
-
Easy to understand loops/recursion. When possible, it's best to avoid
break
statements whenever possible. When using recursion, tail-recursion is often preferable because "smart" compilers can
often translate it into a loop. Return statements within an if/else clause should be avoided whenever possible.
-
Efficiency - Avoid being terribly inefficient. For example, if you have to sort 10M objects, it's better to sort them with a
O(n*lg(n))
algorithm than a
O(n2)
algorithm.
-
Encapsulation/Objects - Encapsulation is a useful feature of
object-oriented languages, and because you are writing your program in an OO language, your program must separate the functionality into multiple classes. This way you can change the
implementation of a class without affecting the rest of your program. An example of separation would be to have separate FileHandler, PeerInterface, and TrackerInterface classes to manipulate files, interface with peers and interface with the tracker, respectively. If you do not separate your program into functional units, you will lose points on the Style portion of the grading.
Bencoding (Pronounced "Bee Encoding")
Bencoding a method of encoding binary data. Tracker responses, and interpeer communication will be bencoded. Below is how
data types are bencoded according to the BT protocol. [The following list is taken from
http://www.bittorrent.com/protocol.html
]
-
Strings are length-prefixed base ten followed by a colon and the string. They are encoded in UTF-8.
For example 4:spam corresponds to 'spam'.
-
Integers are represented by an 'i' followed by the number in ASCII base 10 followed by an 'e'. For example i3e corresponds
to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero,
such as i03e, are invalid, other than i0e, which of course corresponds to 0.
-
Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example, l4:spam4:egse
corresponds to ['spam', 'eggs'].
-
Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by
an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow':'moo', 'spam':'eggs'} and d4:spaml1:a1:bee
corresponds to {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted as raw strings, not
alphanumerics).
Communication With the Tracker
Your client must take the information supplied by
the TorrentFile object and use it to communicate with the tracker. The
tracker's IP address and port number will be given to you by the
TorrentFile object, and your program must then contact the tracker.
Your program will send an HTTP GET request to the tracker with the
following key/value pairs. Note that these are NOT bencoded, but must
be properly escaped [this list is taken from
http://www.bittorrent.com/protocol.html
]:
-
info_hash
- The 20 byte(160-bit) SHA1 hash of the bencoded form of the info value from the metainfo file. This value will almost
certainly have to be escaped.
-
peer_id
- A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start
of a new download. This value will almost certainly have to be escaped.
-
ip
- An optional parameter giving the IP (or DNS name) which this peer is at.
-
port
- The port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if that
port is taken try 6882, then 6883, etc. and give up after 6889.
-
uploaded
- The total amount uploaded so far, encoded in base ten ascii.
-
downloaded
- The total amount downloaded so far, encoded in base ten ascii.
-
left
- The number of bytes this peer still has to downloaded, encoded in base ten ascii.
This key is important
- If you do not specify how much you have left to download, the tracker
assumes you are a seed and will not return any seeds in the peer list.
-
event
- This is an optional key which maps to
started
,
completed
, or
stopped
(or
empty
, which is the same as not being present. If not present, this is one of the announcements done at regular intervals.
An announcement using
started
is sent when a download first begins, and one using
completed
is sent when the download is complete. No
completed
is sent if the file was complete when started. Downloaders send an announcement using
stopped
when they cease downloading.
The response from the tracker is a bencoded dictionary and contains two keys:
-
interval
- Maps to the number of seconds the downloader should wait between regular rerequests.
-
peers
- Maps to a list of dictionaries corresponding to peers, each of which contains the keys
peer id
,
ip
, and
port
, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively.
In addition to what your program did for the last project, your client should also periodically update its status to the tracker. The update
period should be no less than the interval returned by the tracker. This may change during execution, so it should be
updated each time the tracker is contacted.
Communicating With the Peer
Handshaking between peers begins with character
nineteen (decimal) followed by the string 'BitTorrent protocol'. After
the fixed headers are 8 reserved bytes which are set to 0. Next is the
20-byte SHA-1 hash of the bencoded form of the info value from the
metainfo (.torrent) file. The next 20-bytes are the peer id generated
by the client. The info_hash should be the same as sent to the tracker,
and the peer_id is the same as sent to the tracker. If the info_hash is
different between two peers, then the connection is dropped.
-
All integers are encoded as 4-bytes big-endian. e.g. 1,234 should be encoded as (hex) 00
-
The peer_id should simply be random numbers.
After the handshake, messages between peers take the form of
<length prefix><message ID><payload>
, where length prefix is a 4-byte
big-endian value and message ID is a single decimal character. The payload depends on the message.
Please consult either of the BT-related resources for detailed information describing these messages.
Below is a list of messages that need to be implemented in the project.
-
keep-alive:
<length prefix>
is 0. There is no message ID and no payload.
These should be sent around once every 2 minutes to prevent peers from closing connections. These only need to be sent if no other packets are sent within a 2-minute interval.
- choke: <length prefix> is 1 and message ID is 1. There is no payload.
- unchoke: <length prefix> is 1 and the message ID is 2. There is no payload.
-
interested:
<length prefix>
is 1 and message ID is 2. There is no payload.
-
uninterested:
<length prefix>
is 1 and message ID is 3. There is no payload.
-
have:
<length prefix>
is 5 and message ID is 4. The payload is a zero-based index of the piece that has just been
downloaded and verified.
-
request:
<length prefix>
is 13 and message ID is 6. The payload is as follows:
<index><begin><length>
Where
<index>
is an integer specifying the zero-based piece index,
<begin>
is an integer specifying the zero-based
byte offset within the piece, and
<length>
is the integer specifying the requested length.
<length>
is typically
2^14 (16384) bytes. A smaller piece should only be used if the piece length is not divisible by 16384. A peer may
close the connection if a block larger than 2^14 bytes is requested.
-
piece:
<length prefix>
is 9+X and message ID is 7. The payload is as follows:
<index><begin><block>
Where
<index>
is an integer specifying the zero-based piece index,
<begin>
is an integer specifying the
zero-based byte offset within the piece, and
<block>
which is a block of data, and is a subset of the piece specified by
<index>
.
During the first project, a lot of people were confused about the order of messages after the handshake. Below is an example of what would
take place between two peers setting-up a connection and starting sharing.
-
The local host opens a TCP Socket to the remote peer and sends the handshake packet. The local host then listens for the remote peer to respond.
-
Upon receiving the handshake packet and verifying the info_hash, the remote peer responds with a similar handshake packet (except with its peer_id).
The remote peer then listens for the local host to send a bitfield or other packet. The remote host can send a bitfield packet to the local host at this time.
-
Upon receiving the handshake and verifying the info_hash, the local host then (optionally) sends a bitfield packet which tells the remote peer
which pieces it has downloaded and verified so far.
-
If the local host is interested in what the remote peer has downloaded, then it sends an interested packet, otherwise it sends an uninterested packet.
If the remote peer is interested in what the local host has downloaded, then it sends an interested packet, otherwise it sends an uninterested packet.
-
When the local host, or the remote peer, is ready to download/upload to the other, it will send an unchoke packet.
Please note that clients will, and your client should, ignore any request packets received while a remote peer is choked. A client should only
upload to another client if the connection is unchoked AND the remote peer is interested. This means that a remote peer will not reply to the
local hosts's request packets unless you have expressed interest AND the remote peer has sent an unchoke packet. If the remote peer sends a choke packet
during data transfer, any outstanding requests will be discarded and unanswered - they should be re-requested after the next unchoke packet.
The Write-Up
-
Include your name as it appears on the roster and your student ID.
-
A high-level written description of how your program works. Specifically, describe the high-level interactions between the classes.
This is a good way to make sure that your program doesn't have any classes depending on the implementations of other classes.
You may include diagrams or illustrations if you feel it would aid your explanation or if you have a hard time explaining how your
program works. However, you MUST include some written description of your program. If you know how to use LaTeX, this should be fairly
easy to diagram. Also, OpenOffice.org's Draw program has an Export to PDF feature that is very simple to use. This section should be
around 100 words, or 2 paragraphs.
-
A brief (about 50 words or 1 paragraph) description of each class in the program. Describe what it does in general, the main public methods it
provides, and any public fields it exposes. Note that classes should generally NOT have public fields unless they are Static (e.g. constants).
-
Optionally, you may include a section on feedback about the program. Please only provide constructive/useful comments or
insights. What parts of the project were the most challenging and which were the easiest? Did you have trouble finding
resources for the project? Were any parts confusing to you? Including feedback will help shape the direction of the next project as well
as help in pointing-out trouble spots to look at during grading.
Submitting the Project
The project should be submitted through the Computer Science
Handin Server.
-
Make sure your name is in comments at the top of every file submitted.
-
The "main" starting method should be in a file called RUBTClient.java.
-
The write-up in RTF, HTML or PDF format saved as a writeup.<html/pdf/rtf>. Please do not submit a write-up in any other formats
-
All files should be submitted either as a .zip or .tar file. This file should be named "<NetID>.<zip/tar>", where
<NetID> is your eden NetID. If you are submitting as a group, only one person needs to submit, but the file should be named with both NetIDs, e.g. "<NetID1>.<NetID2>.<zip/tar>"
-
Late submissions will not be accepted. If you have not completed the project, submit what you have and you will be graded for partial credit.
Grading
-
Programming Style: 15%
-
Correctly interfacing with the tracker (retrieving peer list, periodic updates, after completing download): 10%
-
Correctly interfacing with the peers (handshaking, maintaining state, keep-alive): 10%
- Correctly downloading from at least 2 peers simultaneously: 20%
- Correctly uploading to at least 2 peers simultaneously: 20%
-
Correctly downloading and verifying the file: 15%
-
Write-up 10%
Resources
It is
strongly
recommended that you bookmark or download the
Sun Java 1.5.0 API
, as well as read the following pages:
Please email romoore _at_ cs _dot_ rutgers _dot_ edu concerning any questions or problems with this page.
Please visit the Sakai website for questions concerning this project.
Frequently Asked Questions
I've been getting a lot of similar questions lately, so
I'll post general forms of them as well as advice on how to solve the
underlying problems. As a bit of general advice, if you aren't sure
what a packet is supposed to look like, you can always fire up a
working client (the mainline BitTorrent client, for example) and then
use a network packet sniffing tool like Ethereal to capture the traffic
and analyze it.