<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Python on DonDoIT</title><link>/tags/python/</link><description>Recent content in Python on DonDoIT</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 13 Feb 2025 21:42:52 +0200</lastBuildDate><atom:link href="/tags/python/index.xml" rel="self" type="application/rss+xml"/><item><title>Python_async</title><link>/posts/python/python_async/</link><pubDate>Thu, 13 Feb 2025 21:42:52 +0200</pubDate><guid>/posts/python/python_async/</guid><description>&lt;h1 id="1-is-it-still-single-threaded-when-using-python-asyncawait"&gt;1. Is it still single threaded when using python async/await?&lt;/h1&gt;
&lt;p&gt;Yes!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When an &lt;code&gt;async&lt;/code&gt; function e.g named &lt;code&gt;task1&lt;/code&gt; runs and hits the &lt;code&gt;await&lt;/code&gt; keyword, the event loop &lt;strong&gt;pauses&lt;/strong&gt; the execution of &lt;code&gt;task1&lt;/code&gt; and delegates the work to:
&lt;ul&gt;
&lt;li&gt;OS Kernel (for I/O)&lt;/li&gt;
&lt;li&gt;System timers (for sleep)&lt;/li&gt;
&lt;li&gt;Thread pool (for CPU-bound work, if used explicitly)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;After that, the event loop continues and pick up another &lt;code&gt;async&lt;/code&gt; function in its queue to execute.&lt;/li&gt;
&lt;li&gt;While the event loop is executing other tasks, if the OS finishes working on the delegated job and gives back the result. The result is then put into a &amp;ldquo;result queue&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;When all the &lt;code&gt;async&lt;/code&gt; tasks in the event loop&amp;rsquo;s queue are executed, the event loop will pick up the results from the results queue, in the order of first come first serve.&lt;/li&gt;
&lt;li&gt;The event loop will now continue the execution of the task of first result that gets pickup, and continue until it hits another &lt;code&gt;await&lt;/code&gt; or finished the execution.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Async functions (&lt;code&gt;async def&lt;/code&gt;) run in the same thread&lt;/strong&gt; and do not block the event loop as long as they contain &lt;code&gt;await&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regular (&lt;code&gt;def&lt;/code&gt;) functions block the event loop&lt;/strong&gt; unless explicitly run in a separate thread or process.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="2-what-kind-of-tasks-would-be-best-to-use-asyncawait"&gt;2. What kind of tasks would be best to use &lt;code&gt;async/await&lt;/code&gt;?&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Things like I/O, sleep, network requests, file reads.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Best practice:&lt;/strong&gt; Keep async functions fully async and offload blocking operations to threads or processes when necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="3-prevent-blocking"&gt;3. Prevent blocking?&lt;/h1&gt;
&lt;p&gt;To prevent blocking, use:&lt;/p&gt;</description><content>&lt;h1 id="1-is-it-still-single-threaded-when-using-python-asyncawait"&gt;1. Is it still single threaded when using python async/await?&lt;/h1&gt;
&lt;p&gt;Yes!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When an &lt;code&gt;async&lt;/code&gt; function e.g named &lt;code&gt;task1&lt;/code&gt; runs and hits the &lt;code&gt;await&lt;/code&gt; keyword, the event loop &lt;strong&gt;pauses&lt;/strong&gt; the execution of &lt;code&gt;task1&lt;/code&gt; and delegates the work to:
&lt;ul&gt;
&lt;li&gt;OS Kernel (for I/O)&lt;/li&gt;
&lt;li&gt;System timers (for sleep)&lt;/li&gt;
&lt;li&gt;Thread pool (for CPU-bound work, if used explicitly)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;After that, the event loop continues and pick up another &lt;code&gt;async&lt;/code&gt; function in its queue to execute.&lt;/li&gt;
&lt;li&gt;While the event loop is executing other tasks, if the OS finishes working on the delegated job and gives back the result. The result is then put into a &amp;ldquo;result queue&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;When all the &lt;code&gt;async&lt;/code&gt; tasks in the event loop&amp;rsquo;s queue are executed, the event loop will pick up the results from the results queue, in the order of first come first serve.&lt;/li&gt;
&lt;li&gt;The event loop will now continue the execution of the task of first result that gets pickup, and continue until it hits another &lt;code&gt;await&lt;/code&gt; or finished the execution.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Async functions (&lt;code&gt;async def&lt;/code&gt;) run in the same thread&lt;/strong&gt; and do not block the event loop as long as they contain &lt;code&gt;await&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regular (&lt;code&gt;def&lt;/code&gt;) functions block the event loop&lt;/strong&gt; unless explicitly run in a separate thread or process.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="2-what-kind-of-tasks-would-be-best-to-use-asyncawait"&gt;2. What kind of tasks would be best to use &lt;code&gt;async/await&lt;/code&gt;?&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Things like I/O, sleep, network requests, file reads.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Best practice:&lt;/strong&gt; Keep async functions fully async and offload blocking operations to threads or processes when necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="3-prevent-blocking"&gt;3. Prevent blocking?&lt;/h1&gt;
&lt;p&gt;To prevent blocking, use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;asyncio.to_thread()&lt;/code&gt;&lt;/strong&gt; → Runs a regular function in a separate thread (good for I/O).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;asyncio.run_in_executor()&lt;/code&gt; with &lt;code&gt;ProcessPoolExecutor&lt;/code&gt;&lt;/strong&gt; → Runs CPU-heavy tasks in a separate process (avoids GIL limitations).&lt;/li&gt;
&lt;/ul&gt;</content></item><item><title>Regex in Python (part 3)</title><link>/posts/python/regex3/</link><pubDate>Thu, 28 Nov 2024 19:16:14 +0200</pubDate><guid>/posts/python/regex3/</guid><description>&lt;h1 id="find-expression-containing-numbers-and-symbols-in-a-specific-format"&gt;Find expression containing numbers and symbols in a specific format&lt;/h1&gt;
&lt;p&gt;Assuming that we have this piece of text that contains an IPv4 address that we want to extract.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;You&amp;#39;ve recently logged in from an IP address 111.222.211.122&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The full range of IP addresses can go from 0.0.0.0 to 255.255.255.255, so we can use the following regex pattern to search&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;\d\d\d.\d\d\d.\d\d\d.\d\d\d&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result of this will be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; print(re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(pattern, text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; [&lt;span style="color:#e6db74"&gt;&amp;#39;111.222.211.122&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, if the text now has something extra like this:&lt;/p&gt;</description><content>&lt;h1 id="find-expression-containing-numbers-and-symbols-in-a-specific-format"&gt;Find expression containing numbers and symbols in a specific format&lt;/h1&gt;
&lt;p&gt;Assuming that we have this piece of text that contains an IPv4 address that we want to extract.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;You&amp;#39;ve recently logged in from an IP address 111.222.211.122&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The full range of IP addresses can go from 0.0.0.0 to 255.255.255.255, so we can use the following regex pattern to search&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;\d\d\d.\d\d\d.\d\d\d.\d\d\d&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result of this will be&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; print(re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(pattern, text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; [&lt;span style="color:#e6db74"&gt;&amp;#39;111.222.211.122&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;However, if the text now has something extra like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;You&amp;#39;ve recently logged in from an IP address 111.222.211.122, and something weird like this 123123123123122&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;now our search result will be &lt;code&gt;['111.222.211.122', '123123123123122']&lt;/code&gt;. This is because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;\d\d\d&lt;/code&gt; will try to match any 3 digit numbers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.&lt;/code&gt; actually will try to match anything, so &lt;code&gt;1231&lt;/code&gt; would match with the pattern &lt;code&gt;\d\d\d.&lt;/code&gt;. The same goes for &lt;code&gt;123!&lt;/code&gt; or &lt;code&gt;123@&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we want to specifically match the dot &lt;code&gt;.&lt;/code&gt;, add a backslash &lt;code&gt;\&lt;/code&gt; in front of the dot. It&amp;rsquo;s going to be like this: &lt;code&gt;\d\d\d\.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extra tips:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the pattern &lt;code&gt;\d+&lt;/code&gt; will help matching a number of any length&lt;/li&gt;
&lt;/ul&gt;</content></item><item><title>Regex in Python (part 2)</title><link>/posts/python/regex2/</link><pubDate>Thu, 21 Nov 2024 18:47:19 +0200</pubDate><guid>/posts/python/regex2/</guid><description>&lt;h1 id="find-words-of-specifc-length-starting-with-specific-letter"&gt;Find words of specifc length starting with specific letter&lt;/h1&gt;
&lt;p&gt;Assuming we want to search for all the 2-character words that start with an &lt;code&gt;i&lt;/code&gt; and ends with e.g &lt;code&gt;s, t, o, n, l&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;I live in Finland and the cold is killing me&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;i[stonl]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;matches &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(pattern, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When printing the result &lt;code&gt;matches&lt;/code&gt;, you&amp;rsquo;ll get &lt;code&gt;['in', 'in', 'is', 'il', 'in']&lt;/code&gt;, which come from&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;I live [in] F[in]land and the cold [is] k[il]l[in]g me
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This pattern is not only finding a word, but also sub-string of a word that matches the pattern.&lt;/p&gt;</description><content>&lt;h1 id="find-words-of-specifc-length-starting-with-specific-letter"&gt;Find words of specifc length starting with specific letter&lt;/h1&gt;
&lt;p&gt;Assuming we want to search for all the 2-character words that start with an &lt;code&gt;i&lt;/code&gt; and ends with e.g &lt;code&gt;s, t, o, n, l&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;I live in Finland and the cold is killing me&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;i[stonl]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;matches &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(pattern, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When printing the result &lt;code&gt;matches&lt;/code&gt;, you&amp;rsquo;ll get &lt;code&gt;['in', 'in', 'is', 'il', 'in']&lt;/code&gt;, which come from&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;I live [in] F[in]land and the cold [is] k[il]l[in]g me
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This pattern is not only finding a word, but also sub-string of a word that matches the pattern.&lt;/p&gt;
&lt;h2 id="with-"&gt;With &lt;code&gt;^&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;When we modify our pattern a bit by adding &lt;code&gt;^&lt;/code&gt; so that it would look like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;^i[stonl]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now our matches result will be &lt;code&gt;[]&lt;/code&gt;. This is because &lt;code&gt;^&lt;/code&gt; indicate that we&amp;rsquo;re looking for the word or substring of a word, which is at the beginning of the text, or in other word, in this case, &lt;code&gt;i&lt;/code&gt; must be the first character in our text.&lt;/p&gt;
&lt;p&gt;How about &lt;code&gt;&amp;quot;^i[stonl][nm]&amp;quot;&lt;/code&gt;? This means that we&amp;rsquo;re searching for a substring of a word, which starts with &lt;code&gt;i&lt;/code&gt; as the first letter in the text, follow by one of the characters &lt;code&gt;s, t, o, n, l&lt;/code&gt; and ends with either &lt;code&gt;n&lt;/code&gt; or &lt;code&gt;m&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extra tips:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We can use &lt;code&gt;$&lt;/code&gt; to search for the pattern at the end of a line. For example, &lt;code&gt;r&amp;quot;me$&amp;quot;&lt;/code&gt; will find any word that ends with &lt;code&gt;me&lt;/code&gt; at the end of the line.&lt;/li&gt;
&lt;li&gt;If we want to specifically search for an independent word, use &lt;code&gt;\b&lt;/code&gt;. For example:
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;This is example0 and example1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; pattern &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;r&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;\bexample[01]?\b&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; print(re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(pattern, text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; [&lt;span style="color:#e6db74"&gt;&amp;#34;example0&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;example1&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hope this is helpful &amp;#x1f60a;.&lt;/p&gt;</content></item><item><title>Regex in Python (part 1)</title><link>/posts/python/regex/</link><pubDate>Thu, 21 Nov 2024 00:05:47 +0200</pubDate><guid>/posts/python/regex/</guid><description>&lt;h1 id="search-for-string-in-text"&gt;Search for string in text&lt;/h1&gt;
&lt;p&gt;Assuming that we have a text &lt;code&gt;the quick brown fox jumped over the lazy dog&lt;/code&gt;, and we want to search for e.g &lt;code&gt;quick&lt;/code&gt; in the text.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;the quick brown fox jumped over the lazy dog&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;match&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;search(&lt;span style="color:#e6db74"&gt;&amp;#34;quick&amp;#34;&lt;/span&gt;, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As said in &lt;code&gt;.search()&lt;/code&gt; &lt;a href="https://docs.python.org/3/library/re.html#re.Pattern.search"&gt;documentation&lt;/a&gt;, this method will look for the first location where it finds a match, and returns a &lt;a href="https://docs.python.org/3/library/re.html#re.Match"&gt;&lt;code&gt;re.Match&lt;/code&gt;&lt;/a&gt; object if found, otherwise returns &lt;code&gt;None&lt;/code&gt;.&lt;/p&gt;</description><content>&lt;h1 id="search-for-string-in-text"&gt;Search for string in text&lt;/h1&gt;
&lt;p&gt;Assuming that we have a text &lt;code&gt;the quick brown fox jumped over the lazy dog&lt;/code&gt;, and we want to search for e.g &lt;code&gt;quick&lt;/code&gt; in the text.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;the quick brown fox jumped over the lazy dog&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;match&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;search(&lt;span style="color:#e6db74"&gt;&amp;#34;quick&amp;#34;&lt;/span&gt;, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As said in &lt;code&gt;.search()&lt;/code&gt; &lt;a href="https://docs.python.org/3/library/re.html#re.Pattern.search"&gt;documentation&lt;/a&gt;, this method will look for the first location where it finds a match, and returns a &lt;a href="https://docs.python.org/3/library/re.html#re.Match"&gt;&lt;code&gt;re.Match&lt;/code&gt;&lt;/a&gt; object if found, otherwise returns &lt;code&gt;None&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If we &lt;code&gt;print(match)&lt;/code&gt;, we&amp;rsquo;ll see &lt;code&gt;&amp;lt;re.Match object; span=(4, 9), match='quick'&amp;gt;&lt;/code&gt; which indicate that the matching string starts at the index &lt;code&gt;4&lt;/code&gt; and ends at index &lt;code&gt;9&lt;/code&gt; exclusively.&lt;/p&gt;
&lt;p&gt;To get the matched value that the &lt;code&gt;re.Match&lt;/code&gt; object is holding, we can simply use&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;match&lt;/span&gt;&lt;span style="color:#f92672"&gt;.&lt;/span&gt;group()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id="find-characters-by-type"&gt;Find characters by type&lt;/h1&gt;
&lt;p&gt;Assuming we&amp;rsquo;re now working with a slightly different bit of text from the example above&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;the quick brown fox jumped over the lazy dog 1234567890 !@#$%^&amp;amp;*()_&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="find-alphanumeric-characters"&gt;Find alphanumeric characters&lt;/h2&gt;
&lt;p&gt;To find all the word characters, we can use regex expression &lt;a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#word-character-w"&gt;&lt;code&gt;\w&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;characters &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(&lt;span style="color:#e6db74"&gt;&amp;#34;\w&amp;#34;&lt;/span&gt;, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When printing the result &lt;code&gt;characters&lt;/code&gt;, we&amp;rsquo;ll get all the characters in the text splited into a list, however, &lt;code&gt;!@#$%^&amp;amp;*()&lt;/code&gt; won&amp;rsquo;t be returned as they are not considered word characters, &lt;strong&gt;except&lt;/strong&gt; &lt;code&gt;_&lt;/code&gt;.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[&amp;#39;t&amp;#39;,&amp;#39;h&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;q&amp;#39;,&amp;#39;u&amp;#39;,&amp;#39;i&amp;#39;,&amp;#39;c&amp;#39;,&amp;#39;k&amp;#39;,&amp;#39;b&amp;#39;,&amp;#39;r&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;w&amp;#39;,&amp;#39;n&amp;#39;,&amp;#39;f&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;x&amp;#39;,&amp;#39;j&amp;#39;,&amp;#39;u&amp;#39;,&amp;#39;m&amp;#39;,&amp;#39;p&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;d&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;v&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;r&amp;#39;,&amp;#39;t&amp;#39;,&amp;#39;h&amp;#39;,&amp;#39;e&amp;#39;,&amp;#39;l&amp;#39;,&amp;#39;a&amp;#39;,&amp;#39;z&amp;#39;,&amp;#39;y&amp;#39;,&amp;#39;d&amp;#39;,&amp;#39;o&amp;#39;,&amp;#39;g&amp;#39;,&amp;#39;1&amp;#39;,&amp;#39;2&amp;#39;,&amp;#39;3&amp;#39;,&amp;#39;4&amp;#39;,&amp;#39;5&amp;#39;,&amp;#39;6&amp;#39;,&amp;#39;7&amp;#39;,&amp;#39;8&amp;#39;,&amp;#39;9&amp;#39;,&amp;#39;0&amp;#39;,&amp;#39;_&amp;#39;]
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="find-any-characters"&gt;Find any characters&lt;/h2&gt;
&lt;p&gt;To find any character, doesn&amp;rsquo;t matter if it&amp;rsquo;s word character or not, use &lt;a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#any-character-"&gt;&lt;code&gt;.&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;any_characters &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(&lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; that now the result also contains whitespaces &lt;code&gt;' '&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#e6db74"&gt;&amp;#39;t&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;h&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;e&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;q&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;u&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;i&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;k&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;o&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;w&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;n&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;f&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;o&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;x&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;j&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;u&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;m&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;p&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;e&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;d&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;o&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;v&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;e&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;t&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;h&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;e&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;l&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;y&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;d&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;o&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;g&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;3&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;4&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;5&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;6&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;7&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;8&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;9&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39; &amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;!&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;@&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;#&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;$&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;%&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;^&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;amp;&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;*&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;(&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;)&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;_&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="find-non-word-characters"&gt;Find non-word characters&lt;/h2&gt;
&lt;p&gt;Opposite to &lt;code&gt;\w&lt;/code&gt;, we have &lt;a href="https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#non-word-character-w"&gt;&lt;code&gt;\W&lt;/code&gt; (uppercase)&lt;/a&gt; that we can use to find all non-word characters&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;non_word_characters &lt;span style="color:#f92672"&gt;=&lt;/span&gt; re&lt;span style="color:#f92672"&gt;.&lt;/span&gt;findall(&lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, text)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The result now only contains whitespaces and symbols characters&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[&amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39; &amp;#39;, &amp;#39;!&amp;#39;, &amp;#39;@&amp;#39;, &amp;#39;#&amp;#39;, &amp;#39;$&amp;#39;, &amp;#39;%&amp;#39;, &amp;#39;^&amp;#39;, &amp;#39;&amp;amp;&amp;#39;, &amp;#39;*&amp;#39;, &amp;#39;(&amp;#39;, &amp;#39;)&amp;#39;]
&lt;/code&gt;&lt;/pre&gt;</content></item></channel></rss>