Packets in an audio stream can be distorted relative to one another during the traversal of a packet switched network. This distortion can be mainly attributed to queues in routers between the source and the destination. The queues can consist of packets either from our own flow, or from other flows. The contribution of this work is a Markov model for the time delay variation of packet audio in this scenario. Our model is extensible, and show this by including sender silence suppression and packet loss into the model. By comparing the model to wide area traffic traces we show the possibility to generate an audio arrival process similar to those created by real conditions. This is done by comparing the probability density functions of our model to the real captured data.