All good (TCP) conversations start with a handshake. (Or an elbow bump in 2020).
The wiki page HTTP briefly mentions this:

a connection is initiated via TCP to the web server (SYN; SYN,ACK; ACK).

Check the http.cap sample capture or the first minute of this video. Without an open TCP connection, things don't proceed to sending HTTP and the browser shows ERR_CONNECTION_REFUSED.

For the second question on capture, it might help to make a diagram. (Wired network examples)
Assuming your phone has a wireless connection, look at WLAN capture and Monitor mode which is very much YMMV depending on the OS you're trying to make the capture with.