Having finished writing another script on Bash, I realized that everything should be completely different, but everything worked. I want to show you what profanities and crutches I wrote in order to solve the problem, but so far without a car of knowledge. In other words, a caricature of programming.
Task
Something became necessary to:
- Printed a lot of rhymes for a word, except for squares
- Crossed many rhymes of two words
For what? Well here it is - and that’s it.
Who does not know, a square rhyme (in common parlance - a square) is two words that have the last two letters in the spelling, which (often this only) makes them a rhyme. For example, roses - frost; the tire is a car. The use of squares in modern versification is not particularly approved by people, due to their primitiveness.
Decision
The simplest solution seemed to me to write a script on Bash, using the already existing rhyme generator - HOST, which first of all selects them according to harmonies, and not by spelling. What is a HOST? Because if you specify the real name of the site - they will say that advertising. Why not continue to use it? Firstly, despite its advantage in selecting rhymes according to harmonies, it often produces squares. Secondly, you still have to think with your brains, spend time switching between tabs, the effort to remember repeating words in lists to find rhymes for two words.
Getting Strong Rhymes
What do i know? I know about the wget utility, which downloads the page at the specified URL. Well, we fulfill the request - we get the HTML page in the file, which is called a word for rhyme. For example, look for the word "here":
wget https://HOST/rifma/
But I only need a list of words, how to get rid of everything else? We look and see that the list of words is arranged, however strange it may seem, in the form of a list, and the words are in the <li> </li> tags. Well, we have a wonderful sed utility - and write it down:
cat $word | grep '<li>' | sed -e "s%<li>%%" | sed -e "s%</li>%%" | sed -e "s/ //g" | sed -e "/^$/d" 1> $word
First, from the word file, select the lines that contain the <li> tag - we get a bunch of empty tags and lines with words. We remove the tag itself and its closing one - here percent symbols are used instead of slashes because the </li> tag itself already has a slash, which is why sed does not understand you a bit. And with interest, everything is fine. We remove all spaces from the file, delete empty lines. Voila - a complete list of words.
In order to remove words rhyming due to the last letters, select the last two letters from the original word and clean the list:
squad=${word:((${#word}-2)):2} cat $word | sed -e "/.$squad$/d" 1> $word
We look, try - everything works ... so, but where is the list for the word "play"? And for the word "I'm coming"? The file is empty! And this is all because these words are verbs, and we know what they do with those who rhyme into verbs. The verb rhyme is even worse than the square one, for the most verbs in the Russian language, and even all with the same endings, which is why they did not appear in the final file after checking the endings.
However, not in a hurry. For every word there are not only rhymes, but also assonances, which sometimes sound much better than rhyme - for that they are also assonances (French assonance, from Latin assono - I sound good).
Get Assonances
Here the fun begins: the assonances appear on a separate URL, and on the same page, by executing a script, sending an HTTP request and receiving a response. How to tell wget 'to press a button? But in any way. It’s sad.
Noticing that the URL in the line is still somehow changing, I copied what was there after switching to assonances, and pasted in a new browser tab - strong rhymes opened. Not that.
In fact, I thought, the server should not care if the script that sends the request is executed, or whether the person himself types it. So? And who knows, let's go check it out.
Where to send? What to send? An HTTP request to the server’s IP, there is something like GET ... then there is something HTTP / 1.1 ... We need to see what and where the browser sends. Install wireshark , look at the traffic:
0040 37 5d a3 84 27 e7 fb 13 6d 93 ed cd 56 04 9d 82 7]£.'çû.m.íÍV...
0050 32 7c fb 67 46 71 dd 36 4d 42 3d f3 62 1b e0 ad 2|ûgFqÝ6MB=ób.à.
0060 ef 87 be 05 6a f9 e1 01 41 fc 25 5b c0 77 d3 94 ï.¾.jùá.Aü%[ÀwÓ.
Um ... what? Oh yes, we have HTTPS. What to do? Arrange a MITM attack on yourself? Ideally, the victim herself will help us.
In general, guessing to climb the browser, I still found the request itself, and the recipient. Go:
Dialog with the terminal
Huh. Hey hey. Indeed, what I expected by sending a bare HTTP request to an HTTPS port. Is it encrypt now? All this fuss with RSA keys, then with SHA256. And why, there is OpenSSL for such matters. Well, we already know what to do, just first remove the Referer and Cookie fields - I think they will not greatly affect the matter:
telnet IP PORT Trying IP... Connected to IP. Escape character is '^]'. GET /rifma/%D0%BC%D0%B0%D1%82%D1%8C?mode=block&type=asn HTTP/1.1 Host: HOST Accept-Language: en-US,en;q=0.5 X-Requested-With: XMLHttpRequest Connection: close HTTP/1.1 400 Bad Request Server: nginx/1.8.0 Date: Sun, 03 Nov 2019 20:06:59 GMT Content-Type: text/html; charset=utf-8 Content-Length: 270 Connection: close <html> <head><title>400 The plain HTTP request was sent to HTTPS port</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <center>The plain HTTP request was sent to HTTPS port</center> <hr><center>nginx/1.8.0</center> </body> </html> Connection closed by foreign host.
Dialog with the terminal
openssl s_client -connect IP:PORT { , } GET /rifma/%D0%B7%D0%B4%D0%B5%D1%81%D1%8C?mode=block&type=asn HTTP/1.1 Host: HOST User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 Accept: text/javascript,text/html,application/xml,text/xml,*/* Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate, br X-Requested-With: XMLHttpRequest Connection: keep-alive HTTP/1.1 200 OK Content-Type: text/html;charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding Status: 200 OK Date: Sun, 03 Nov 2019 20:34:33 GMT Set-Cookie: COOKIE X-Powered-By: Phusion Passenger 5.0.16 Server: nginx/1.8.0 + Phusion Passenger 5.0.16 Expires: Thu, 01 Jan 1970 00:00:01 GMT Cache-Control: no-cache Strict-Transport-Security: max-age=31536000 Content-Security-Policy: block-all-mixed-content Content-Encoding: gzip
Is this a checkmate on the server room? Well, at least 200 OK answered me, which means cookies and referrer do not affect anything. Gzip compression, but ASCII characters are copied when copying. Similarly, you can remove the Accept-encoding line . Everything is fine - we get an HTML document, now with assonances. But here are two questions: how to run OpenSSL and pass data to it with a script? And how to read the output, if after receiving the answer we remain as if in the “shell” of OpenSSL? If you can think of something with the second, but with the first ...
It’s good that there is a Habr , where I read about the expect utility, which automates the process of interacting with programs that are waiting for human interaction. Even more attractive is the autoexpect command that generates an expect script for your actions. Well, run, do it all and here’s the finished script. Only it is very huge, and all because OpenSSL displays certificates, keys, and expect expects to display all of this. Do we need this? Not. We demolish the first prompt, leaving only the last line break '\ r'. Also, we remove the User-Agent and Accept fields from our request - they do not affect anything. So, let's start it. The script is executed, but where is the coveted HTML document? Expect ate it. In order to make him spit it out, you need to put:
set results $expect_out(buffer)
before the end of the script - this is how the output of the expect command executed will be written and displayed. In summary, something like this:
Script expect
#!/usr/bin/expect -f set timeout -1 spawn openssl s_client -connect IP:PORT match_max 100000 expect -exact " ---\r " send -- "GET /rifma/%d0%b7%d0%b4%d0%b5%d1%81%d1%8c?mode=block&type=asn HTTP/1.1\rHost: HOST\rAccept-Language: en-US,en;q=0.5\rX-Requested-With: XMLHttpRequest\rConnection: close" expect -exact "GET /rifma/%d0%b7%d0%b4%d0%b5%d1%81%d1%8c?mode=block&type=asn HTTP/1.1\r Host: HOST\r Accept-Language: en-US,en;q=0.5\r X-Requested-With: XMLHttpRequest\r Connection: close" send -- "\r" set results $expect_out(buffer) expect -exact "\r " send -- "\r" expect eof
But that's not all! As you can see, in all the examples the request URL was static, but it is he who is responsible for what word the assonances will be displayed to. And so it turns out that we will constantly search by the word "% d0% b7% d0% b4% d0% b5% d1% 81% d1% 8c" in ASCII or "here" in UTF-8. What to do? Of course, just simply every time generate a new script, friends! Only not autoexpect 'ohm, but with echo , because in our country, nothing changes but the word. And long live the new problem: how would we somehow intelligently translate a word from Cyrillic into a URL format? Something for the terminal is nothing special either. Well, nothing, can we? We can:
Look what I can!
function furl { furl=$(echo "$word" | sed 's::%d0%90:g;s::%d0%91:g;s::%d0%92:g;s::%d0%93:g;s::%d0%94:g;s::%d0%95:g;s::%d0%96:g;s::%d0%97:g;s::%d0%98:g;s::%d0%99:g;s::%d0%9a:g;s::%d0%9b:g;s::%d0%9c:g;s::%d0%9d:g;s::%d0%9e:g;s::%d0%9f:g;s::%d0%a0:g;s::%d0%a1:g;s::%d0%a2:g;s::%d0%a3:g;s::%d0%a4:g;s::%d0%a5:g;s::%d0%a6:g;s::%d0%a7:g;s::%d0%a8:g;s::%d0%a9:g;s::%d0%aa:g;s::%d0%ab:g;s::%d0%ac:g;s::%d0%ad:g;s::%d0%ae:g;s::%d0%af:g;s::%d0%b0:g;s::%d0%b1:g;s::%d0%b2:g;s::%d0%b3:g;s::%d0%b4:g;s::%d0%b5:g;s::%d0%b6:g;s::%d0%b7:g;s::%d0%b8:g;s::%d0%b9:g;s::%d0%ba:g;s::%d0%bb:g;s::%d0%bc:g;s::%d0%bd:g;s::%d0%be:g;s::%d0%bf:g;s::%d1%80:g;s::%d1%81:g;s::%d1%82:g;s::%d1%83:g;s::%d1%84:g;s::%d1%85:g;s::%d1%86:g;s::%d1%87:g;s::%d1%88:g;s::%d1%89:g;s::%d1%8a:g;s::%d1%8b:g;s::%d1%8c:g;s::%d1%8d:g;s::%d1%8e:g;s::%d1%8f:g;s::%d1%91:g;s::%d0%81:g')}
In total, we have a script that converts the word into ASCII text, generating another script that requests through the OpenSSL server page of the site with the assonances. And then we redirect the output of the last script to the file and in the old fashion we pass it through the "filters" of the excess, squares and add it to the file.
Intersection of many. Total
Actually this is exactly what causes the least problems. We carry out the above procedures for two words, then from two lists we compare each word with each and if a match is found, we derive it. Now we have a script that takes two words to the input and displays a list of words that rhyme with both, taking into account the assonances, and all this without manually switching between the four tabs and remembering the words “by eye” - that's all collected, recorded and discarded automatically. Perfectly.
The purpose of this publication was to show that if a person needs something, then he will do it anyway. Very inefficient, crooked, creepy, but that will work.