0

I have created a simple PHP script using the simple_html_dom.php class. I fetch some information about movies from a website. I have one foreach loop inside another foreach loop. When I try to display the moviename inside the foreach loop I get the last moviename. What I want to achieve is to get each one of the unique movienames in each item. The problem is with the $movie variable.

(When i echo the $movie var on line 27 i get the correct result but I want to have each moviename inside the youtube links on line 33…)

<?php
include("simple_html_dom.php");
    
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
    
foreach($html->find('tr.header') as $header) {
    $header->outertext = '';
}
        
foreach($html->find('td') as $bottom) {
    if ($bottom->colspan == '9') {
        $bottom->outertext = '';
    }
}
        
foreach($html->find('td.vertTh') as $vert) {
    $vert->outertext = '';
}   
    
foreach($html->find("div.detName") as $movie) {
    $movie = $movie->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    foreach($html->find('img') as $img) {
    
        if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
            $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
        }
    }
}   
    
$html->save();
foreach($html->find("table") as $title) {
    echo $title->outertext . '<br>';
}
?>

ORIGINAL SOURCE:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

How it's now:

The HTML code that replaces the IMG elements and the problem being that the links are the same for ALL elements, when they should be Unique for each element like the MovieTitles:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>&nbsp;&nbsp;
  <a href="https://www.youtube.com/results?search_query=            The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26  " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

23
  • what's the size of $html->find('img')? Commented Dec 16, 2020 at 23:46
  • @RafaelDouradoD It fetches 30 items and displays them, 30 titles with images. Commented Dec 16, 2020 at 23:50
  • The foreach($html->find('img') as $img) loop is replacing all the images in the page for each movie. So for the first movie it will replace <img ...> with <a ...search_query=movie1><img ...></a>, then the next movie will replace that with <a ... search_query=movie1><a ... search_query=movie2><img ...></a></a>. Each iteration will nest it another time. Commented Dec 17, 2020 at 0:05
  • I suspect you only want to replace the images in the same DIV, not all the images on the whole page. Commented Dec 17, 2020 at 0:06
  • @Barmar This is what it looks like: i.postimg.cc/hvPvRzJM/tpbscr.png I'm basically replacing all the empty images with the youtube icon. The idea is that when you click the youtube icon you get sent to youtube.com/results?search_query=MOVIE+Trailer Commented Dec 17, 2020 at 0:17

2 Answers 2

1

The image you want is nested in one of the siblings of the detName DIV. So you can search for it by searching within the parent element.

Since find() allows more complex CSS selectors, you can search specifically for the image you want, rather than looping through all the images.

foreach($html->find("div.detName") as $movieDiv) {
    $movie = $movieDiv->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    $img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
    if ($img) {
        $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you @Barmar I will try this When I get home from work.
1

Ideally, you should just look to pull out the data, (change it if need be) then build your table from that.

?php
include("simple_html_dom.php");

$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);

function remove_junk($movie_name) {
    // you get the idea.. maybe a db or further stripping
    return str_replace([
        'WEB-DL.X26',
        'GalaxyRG',
        '.1080p.WEB-DL.X26', 
        '0.HDRip.XviD.AC3-EVO[TGx]',
        '.720p.BluRay.800MB.x264-'
    ], '', $movie_name);
}

$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
    //
    $td = $tr->find('td');

    // buggy simple_html_dom doesn't see tbody
    if ($tr->parent->tag === 'table' && isset($td[1])) {
        $td = $tr->find('td');

        $name = trim($td[1]->find('.detName', 0)->plaintext);

        $links = [];
        foreach ($td[1]->find('a') as $link) {
            $links[] = $link->href;
        }

        $info = $td[1]->find('.detDesc', 0)->plaintext;
        $info = explode(', ', $info);

        $uploaded = trim(str_replace(['Uploaded', '&nbsp;'], ' ', $info[0]));
        $size = trim(str_replace(['Size', '&nbsp;'], ' ', $info[1]));
        $ULed = trim(str_replace(['ULed by'], ' ', $info[2]));

        $movies[] = [
            'name' => $name,
            'links' => [
                'site' => $links[0],
                'magnet' => $links[1],
                'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
            ],
            'uploaded' => $uploaded,
            'size' => $size,
            'ULed' => [
                'user' => $ULed,
                'link' => $links[3]
            ],
            'seeds' => trim($td[2]->plaintext),
            'leecher' => trim($td[3]->plaintext)
        ];
    }
}  

print_r($movies);

Would yield an array in the following structure.

Array (
    ... snip
    [30] => Array
        (
            [name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
            [links] => Array
                (
                    [site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
                    [magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
                    [youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
                )

            [uploaded] => 12-07 01:51
            [size] => 798.15 MiB
            [ULed] => Array
                (
                    [user] => sotnikam
                    [link] => https://tpb.party/user/sotnikam/
                )

            [seeds] => 351
            [leecher] => 57
        )

)

Which then you can loop over to build your own styled table, youtube link included.. though it would be better to scrape all in a task to put the resulting data in a db, then do a query instead, this way you can store them so your not scraping the site on every request and can detect if the source changes before showing a broken site.

1 Comment

Thank you @Lawrence Cherone for your solution as well, I will definately have use for this, especially the Remove_junk function that you made.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.